In many areas, practitioners need to analyze large datasets that challenge
conventional single-machine computing. To scale up data analysis, distributed
and parallel computing approaches are increasingly needed. Datasets are spread
out over several computing units, which do most of the analysis locally, and
communicate short messages. Here we study a fundamental and highly important
problem in this area: How to do ridge regression in a distributed computing
environment? Ridge regression is an extremely popular method for supervised
learning, and has several optimality properties, thus it is important to study.
We study one-shot methods that construct weighted combinations of ridge
regression estimators computed on each machine. By analyzing the mean squared
error in a high dimensional random-effects model where each predictor has a
small effect, we discover several new phenomena.
1. Infinite-worker limit: The distributed estimator works well for very large
numbers of machines, a phenomenon we call "infinite-worker limit".
2....

more |
pdf
| html
arxivml:
"One-shot distributed ridge regression in high dimensions",
Edgar Dobriban, Yue Sheng
https://t.co/V5qfvzt3tT

arxiv_cs_LG:
One-shot distributed ridge regression in high dimensions. Edgar Dobriban and Yue Sheng https://t.co/RShwYMpu5G

StatsPapers:
One-shot distributed ridge regression in high dimensions. https://t.co/ccxksx3YKG

Distributed ridge

Stargazers: 0

Subscribers: 1

Subscribers: 1

Forks: 0

Open Issues: 0

Open Issues: 0

None.

Sample Sizes : None.

Authors: 2

Total Words: 19246

Unqiue Words: 3441

We consider the problem of estimating the parameters of a multivariate
Bernoulli process with auto-regressive feedback in the high-dimensional setting
where the number of samples available is much less than the number of
parameters. This problem arises in learning interconnections of networks of
dynamical systems with spiking or binary-valued data. We allow the process to
depend on its past up to a lag $p$, for a general $p \ge 1$, allowing for more
realistic modeling in many applications. We propose and analyze an
$\ell_1$-regularized maximum likelihood estimator (MLE) under the assumption
that the parameter tensor is approximately sparse. Rigorous analysis of such
estimators is made challenging by the dependent and non-Gaussian nature of the
process as well as the presence of the nonlinearities and multi-level feedback.
We derive precise upper bounds on the mean-squared estimation error in terms of
the number of samples, dimensions of the process, the lag $p$ and other key
statistical properties of the model. The ideas presented...

more |
pdf
| html
arxivml:
"High-Dimensional Bernoulli Autoregressive Process with Long-Range Dependence",
Parthe Pandit, Mojtaba Sahraee-Arda…
https://t.co/8FGzbfwxV4

Memoirs:
High-Dimensional Bernoulli Autoregressive Process with Long-Range Dependence. https://t.co/uKVpiPBdMv

None.

None.

Sample Sizes : None.

Authors: 5

Total Words: 14504

Unqiue Words: 3312

Leading methods for support recovery in high-dimensional regression, such as
Lasso, have been well-studied and their limitations in the context of
correlated design have been characterized with precise incoherence conditions.
In this work, we present a similar treatment of selection consistency for
marginal regression (MR), a computationally efficient family of methods with
connections to decision trees. Selection based on marginal regression is also
referred to as covariate screening or independence screening and is a popular
approach in applied work, especially in ultra high-dimensional settings. We
identify the underlying factors---which we denote as \emph{MR
incoherence}---affecting MR's support recovery performance. Our near complete
characterization provides a much more nuanced and optimistic view of MR in
comparison to previous works. To ground our results, we provide a broad
taxonomy of results for leading feature selection methods, relating the
behavior of Lasso, OMP, SIS, and MR. We also lay the foundation for...

more |
pdf
| html
None.

StatsPapers:
On the support recovery of marginal regression. https://t.co/orjM9r3sd9

None.

None.

Sample Sizes : None.

Authors: 3

Total Words: 10883

Unqiue Words: 2361

We consider Hadamard product parametrization as a change-of-variable
(over-parametrization) technique for solving least square problems in the
context of linear regression. Despite the non-convexity and exponentially many
saddle points induced by the change-of-variable, we show that under certain
conditions, this over-parametrization leads to implicit regularization: if we
directly apply gradient descent to the residual sum of squares with
sufficiently small initial values, then under proper early stopping rule, the
iterates converge to a nearly sparse rate-optimal solution with relatively
better accuracy than explicit regularized approaches. In particular, the
resulting estimator does not suffer from extra bias due to explicit penalties,
and can achieve the parametric root-$n$ rate (independent of the dimension)
under proper conditions on the signal-to-noise ratio. We perform simulations to
compare our methods with high dimensional linear regression with explicit
regularizations. Our results illustrate advantages of using...

more |
pdf
| html
StatsPapers:
Implicit Regularization via Hadamard Product Over-Parametrization in High-Dimensional Linear Regression. https://t.co/ZYgkEnwIJ6

None.

None.

Sample Sizes : None.

Authors: 3

Total Words: 18386

Unqiue Words: 3475

Assert is a website where the best academic papers on arXiv (computer science, math, physics), bioRxiv (biology), BITSS (reproducibility), EarthArXiv (earth science), engrXiv (engineering), LawArXiv (law), PsyArXiv (psychology), SocArXiv (social science), and SportRxiv (sport research) bubble to the top each day.

Papers are scored (in real-time) based on how verifiable they are (as determined by their Github repos) and how interesting they are (based on Twitter).

To see top papers, follow us on twitter @assertpub_ (arXiv), @assert_pub (bioRxiv), and @assertpub_dev (everything else).

To see beautiful figures extracted from papers, follow us on Instagram.

*Tracking 100,376 papers.*

Sort results based on if they are interesting or reproducible.

Interesting

Reproducible