Top 4 Arxiv Papers Today in Statistics Theory


2.017 Mikeys
#1. One-shot distributed ridge regression in high dimensions
Edgar Dobriban, Yue Sheng
In many areas, practitioners need to analyze large datasets that challenge conventional single-machine computing. To scale up data analysis, distributed and parallel computing approaches are increasingly needed. Datasets are spread out over several computing units, which do most of the analysis locally, and communicate short messages. Here we study a fundamental and highly important problem in this area: How to do ridge regression in a distributed computing environment? Ridge regression is an extremely popular method for supervised learning, and has several optimality properties, thus it is important to study. We study one-shot methods that construct weighted combinations of ridge regression estimators computed on each machine. By analyzing the mean squared error in a high dimensional random-effects model where each predictor has a small effect, we discover several new phenomena. 1. Infinite-worker limit: The distributed estimator works well for very large numbers of machines, a phenomenon we call "infinite-worker limit". 2....
more | pdf | html
Figures
Tweets
arxivml: "One-shot distributed ridge regression in high dimensions", Edgar Dobriban, Yue Sheng https://t.co/V5qfvzt3tT
arxiv_cs_LG: One-shot distributed ridge regression in high dimensions. Edgar Dobriban and Yue Sheng https://t.co/RShwYMpu5G
StatsPapers: One-shot distributed ridge regression in high dimensions. https://t.co/ccxksx3YKG
Github

Distributed ridge

Repository: dist_ridge
User: dobriban
Language: MATLAB
Stargazers: 0
Subscribers: 1
Forks: 0
Open Issues: 0
Youtube
None.
Other stats
Sample Sizes : None.
Authors: 2
Total Words: 19246
Unqiue Words: 3441

2.007 Mikeys
#2. High-Dimensional Bernoulli Autoregressive Process with Long-Range Dependence
Parthe Pandit, Mojtaba Sahraee-Ardakan, Arash A. Amini, Sundeep Rangan, Alyson K. Fletcher
We consider the problem of estimating the parameters of a multivariate Bernoulli process with auto-regressive feedback in the high-dimensional setting where the number of samples available is much less than the number of parameters. This problem arises in learning interconnections of networks of dynamical systems with spiking or binary-valued data. We allow the process to depend on its past up to a lag $p$, for a general $p \ge 1$, allowing for more realistic modeling in many applications. We propose and analyze an $\ell_1$-regularized maximum likelihood estimator (MLE) under the assumption that the parameter tensor is approximately sparse. Rigorous analysis of such estimators is made challenging by the dependent and non-Gaussian nature of the process as well as the presence of the nonlinearities and multi-level feedback. We derive precise upper bounds on the mean-squared estimation error in terms of the number of samples, dimensions of the process, the lag $p$ and other key statistical properties of the model. The ideas presented...
more | pdf | html
Figures
Tweets
arxivml: "High-Dimensional Bernoulli Autoregressive Process with Long-Range Dependence", Parthe Pandit, Mojtaba Sahraee-Arda… https://t.co/8FGzbfwxV4
Memoirs: High-Dimensional Bernoulli Autoregressive Process with Long-Range Dependence. https://t.co/uKVpiPBdMv
Github
None.
Youtube
None.
Other stats
Sample Sizes : None.
Authors: 5
Total Words: 14504
Unqiue Words: 3312

2.001 Mikeys
#3. On the support recovery of marginal regression
S. Jalil Kazemitabar, Arash A. Amini, Ameet Talwalkar
Leading methods for support recovery in high-dimensional regression, such as Lasso, have been well-studied and their limitations in the context of correlated design have been characterized with precise incoherence conditions. In this work, we present a similar treatment of selection consistency for marginal regression (MR), a computationally efficient family of methods with connections to decision trees. Selection based on marginal regression is also referred to as covariate screening or independence screening and is a popular approach in applied work, especially in ultra high-dimensional settings. We identify the underlying factors---which we denote as \emph{MR incoherence}---affecting MR's support recovery performance. Our near complete characterization provides a much more nuanced and optimistic view of MR in comparison to previous works. To ground our results, we provide a broad taxonomy of results for leading feature selection methods, relating the behavior of Lasso, OMP, SIS, and MR. We also lay the foundation for...
more | pdf | html
Figures
None.
Tweets
StatsPapers: On the support recovery of marginal regression. https://t.co/orjM9r3sd9
Github
None.
Youtube
None.
Other stats
Sample Sizes : None.
Authors: 3
Total Words: 10883
Unqiue Words: 2361

2.001 Mikeys
#4. Implicit Regularization via Hadamard Product Over-Parametrization in High-Dimensional Linear Regression
Peng Zhao, Yun Yang, Qiao-Chu He
We consider Hadamard product parametrization as a change-of-variable (over-parametrization) technique for solving least square problems in the context of linear regression. Despite the non-convexity and exponentially many saddle points induced by the change-of-variable, we show that under certain conditions, this over-parametrization leads to implicit regularization: if we directly apply gradient descent to the residual sum of squares with sufficiently small initial values, then under proper early stopping rule, the iterates converge to a nearly sparse rate-optimal solution with relatively better accuracy than explicit regularized approaches. In particular, the resulting estimator does not suffer from extra bias due to explicit penalties, and can achieve the parametric root-$n$ rate (independent of the dimension) under proper conditions on the signal-to-noise ratio. We perform simulations to compare our methods with high dimensional linear regression with explicit regularizations. Our results illustrate advantages of using...
more | pdf | html
Figures
Tweets
StatsPapers: Implicit Regularization via Hadamard Product Over-Parametrization in High-Dimensional Linear Regression. https://t.co/ZYgkEnwIJ6
Github
None.
Youtube
None.
Other stats
Sample Sizes : None.
Authors: 3
Total Words: 18386
Unqiue Words: 3475

About

Assert is a website where the best academic papers on arXiv (computer science, math, physics), bioRxiv (biology), BITSS (reproducibility), EarthArXiv (earth science), engrXiv (engineering), LawArXiv (law), PsyArXiv (psychology), SocArXiv (social science), and SportRxiv (sport research) bubble to the top each day.

Papers are scored (in real-time) based on how verifiable they are (as determined by their Github repos) and how interesting they are (based on Twitter).

To see top papers, follow us on twitter @assertpub_ (arXiv), @assert_pub (bioRxiv), and @assertpub_dev (everything else).

To see beautiful figures extracted from papers, follow us on Instagram.

Tracking 100,376 papers.

Search
Sort results based on if they are interesting or reproducible.
Interesting
Reproducible
Categories
All
Astrophysics
Cosmology and Nongalactic Astrophysics
Earth and Planetary Astrophysics
Astrophysics of Galaxies
High Energy Astrophysical Phenomena
Instrumentation and Methods for Astrophysics
Solar and Stellar Astrophysics
Condensed Matter
Disordered Systems and Neural Networks
Mesoscale and Nanoscale Physics
Materials Science
Other Condensed Matter
Quantum Gases
Soft Condensed Matter
Statistical Mechanics
Strongly Correlated Electrons
Superconductivity
Computer Science
Artificial Intelligence
Hardware Architecture
Computational Complexity
Computational Engineering, Finance, and Science
Computational Geometry
Computation and Language
Cryptography and Security
Computer Vision and Pattern Recognition
Computers and Society
Databases
Distributed, Parallel, and Cluster Computing
Digital Libraries
Discrete Mathematics
Data Structures and Algorithms
Emerging Technologies
Formal Languages and Automata Theory
General Literature
Graphics
Computer Science and Game Theory
Human-Computer Interaction
Information Retrieval
Information Theory
Machine Learning
Logic in Computer Science
Multiagent Systems
Multimedia
Mathematical Software
Numerical Analysis
Neural and Evolutionary Computing
Networking and Internet Architecture
Other Computer Science
Operating Systems
Performance
Programming Languages
Robotics
Symbolic Computation
Sound
Software Engineering
Social and Information Networks
Systems and Control
Economics
Econometrics
General Economics
Theoretical Economics
Electrical Engineering and Systems Science
Audio and Speech Processing
Image and Video Processing
Signal Processing
General Relativity and Quantum Cosmology
General Relativity and Quantum Cosmology
High Energy Physics - Experiment
High Energy Physics - Experiment
High Energy Physics - Lattice
High Energy Physics - Lattice
High Energy Physics - Phenomenology
High Energy Physics - Phenomenology
High Energy Physics - Theory
High Energy Physics - Theory
Mathematics
Commutative Algebra
Algebraic Geometry
Analysis of PDEs
Algebraic Topology
Classical Analysis and ODEs
Combinatorics
Category Theory
Complex Variables
Differential Geometry
Dynamical Systems
Functional Analysis
General Mathematics
General Topology
Group Theory
Geometric Topology
History and Overview
Information Theory
K-Theory and Homology
Logic
Metric Geometry
Mathematical Physics
Numerical Analysis
Number Theory
Operator Algebras
Optimization and Control
Probability
Quantum Algebra
Rings and Algebras
Representation Theory
Symplectic Geometry
Spectral Theory
Statistics Theory
Mathematical Physics
Mathematical Physics
Nonlinear Sciences
Adaptation and Self-Organizing Systems
Chaotic Dynamics
Cellular Automata and Lattice Gases
Pattern Formation and Solitons
Exactly Solvable and Integrable Systems
Nuclear Experiment
Nuclear Experiment
Nuclear Theory
Nuclear Theory
Physics
Accelerator Physics
Atmospheric and Oceanic Physics
Applied Physics
Atomic and Molecular Clusters
Atomic Physics
Biological Physics
Chemical Physics
Classical Physics
Computational Physics
Data Analysis, Statistics and Probability
Physics Education
Fluid Dynamics
General Physics
Geophysics
History and Philosophy of Physics
Instrumentation and Detectors
Medical Physics
Optics
Plasma Physics
Popular Physics
Physics and Society
Space Physics
Quantitative Biology
Biomolecules
Cell Behavior
Genomics
Molecular Networks
Neurons and Cognition
Other Quantitative Biology
Populations and Evolution
Quantitative Methods
Subcellular Processes
Tissues and Organs
Quantitative Finance
Computational Finance
Economics
General Finance
Mathematical Finance
Portfolio Management
Pricing of Securities
Risk Management
Statistical Finance
Trading and Market Microstructure
Quantum Physics
Quantum Physics
Statistics
Applications
Computation
Methodology
Machine Learning
Other Statistics
Statistics Theory
Feedback
Online
Stats
Tracking 100,376 papers.