Top 10 Arxiv Papers Today in Statistics Theory


0.0 Mikeys
#1. Prediction and estimation consistency of sparse multi-class penalized optimal scoring
Irina Gaynanova
Sparse linear discriminant analysis via penalized optimal scoring is a successful tool for classification in high-dimensional settings. While the variable selection consistency of sparse optimal scoring has been established, the corresponding prediction and estimation consistency results have been lacking. We bridge this gap by providing probabilistic bounds on out-of-sample prediction error and estimation error of multi-class penalized optimal scoring allowing for diverging number of classes.
more | pdf | html
Figures
None.
Tweets
Github
None.
Youtube
None.
Other stats
Sample Sizes : None.
Authors: 1
Total Words: 9846
Unqiue Words: 1923

0.0 Mikeys
#2. Asymptotic conditional inference via a Steining of selection probabilities
Snigdha Panigrahi
Many scientific studies are modeled as hierarchical procedures where the starting point of data-analysis is based on pilot samples that are employed to determine parameters of interest. With the availability of more data, the scientist is tasked with conducting a meta-analysis based on the augmented data-sets, that combines his explorations from the pilot stage with a confirmatory study. Casting these two-staged procedures into a conditional framework, inference is based on a carved likelihood. Such a likelihood is obtained by conditioning the law of the augmented data (from both the stages) upon the selection carried out on the first stage data. In fact, conditional inference in hierarchically-modeled investigations or equivalently, in settings, where some samples are reserved for inference, is asymptotically equivalent to a Gaussian randomization scheme. Identifying the probabilistic behavior of the selection event under Gaussian perturbation to be very different from heavy tailed randomizations in Tian and Taylor (2018),...
more | pdf | html
Figures
None.
Tweets
mathSTb: Snigdha Panigrahi : Asymptotic conditional inference via a Steining of selection probabilities https://t.co/kyi0uZbUxn https://t.co/Gmi4Nz7rYP
StatsPapers: Asymptotic conditional inference via a Steining of selection probabilities. https://t.co/XJHTgKI5mM
Github
None.
Youtube
None.
Other stats
Sample Sizes : None.
Authors: 1
Total Words: 15275
Unqiue Words: 2437

0.0 Mikeys
#3. Sparse space-time models: Concentration Inequalities and Lasso
Guilherme Ost, Patricia Reynaud-Bouret
Inspired by Kalikow-type decompositions, we introduce a new stochastic model of infinite neuronal networks, for which we establish oracle inequalities for Lasso methods and restricted eigenvalue properties for the associated Gram matrix with high probability. These results hold even if the network is only partially observed. The main argument rely on the fact that concentration inequalities can easily be derived whenever the transition probabilities of the underlying process admit a sparse space-time representation.
more | pdf | html
Figures
None.
Tweets
Github
None.
Youtube
None.
Other stats
Sample Sizes : [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]
Authors: 2
Total Words: 13947
Unqiue Words: 2721

0.0 Mikeys
#4. Asymptotically Optimal Quickest Change Detection In Multistream Data - Part 1: General Stochastic Models
Alexander Tartakovsky
Assume that there are multiple data streams (channels, sensors) and in each stream the process of interest produces generally dependent and non-identically distributed observations. When the process is in a normal mode (in-control), the (pre-change) distribution is known, but when the process becomes abnormal there is a parametric uncertainty, i.e., the post-change (out-of-control) distribution is known only partially up to a parameter. Both the change point and the post-change parameter are unknown. Moreover, the change affects an unknown subset of streams, so that the number of affected streams and their location are unknown in advance. A good changepoint detection procedure should detect the change as soon as possible after its occurrence while controlling for a risk of false alarms. We consider a Bayesian setup with a given prior distribution of the change point and propose two sequential mixture-based change detection rules, one mixes a Shiryaev-type statistic over both the unknown subset of affected streams and the unknown...
more | pdf | html
Figures
None.
Tweets
Github
None.
Youtube
None.
Other stats
Sample Sizes : None.
Authors: 1
Total Words: 15087
Unqiue Words: 2318

0.0 Mikeys
#5. Asymptotically and computationally efficient tensorial JADE
Joni Virta, Niko Lietzén, Pauliina Ilmonen, Klaus Nordhausen
In this work, we propose a novel method for tensorial independent component analysis. Our approach is based on TJADE and $ k $-JADE, two recently proposed generalizations of the classical JADE algorithm. Our novel method achieves the consistency and the limiting distribution of TJADE under mild assumptions, and at the same time offers notable improvement in computational speed. Detailed mathematical proofs of the statistical properties of our method are given and, as a special case, a conjecture on the properties of $ k $-JADE is resolved. Simulations and timing comparisons demonstrate remarkable gain in speed. Moreover, the desired efficiency is obtained approximately for finite samples. The method is applied successfully to large-scale video data, for which neither TJADE nor $ k $-JADE is feasible.
more | pdf | html
Figures
Tweets
StatsPapers: Asymptotically and computationally efficient tensorial JADE. https://t.co/ey0PNZsfcu
Github
None.
Youtube
None.
Other stats
Sample Sizes : None.
Authors: 4
Total Words: 15220
Unqiue Words: 3054

0.0 Mikeys
#6. A Unified Framework for Testing High Dimensional Parameters: A Data-Adaptive Approach
Cheng Zhou, Xinsheng Zhang, Wenxin Zhou, Han Liu
High dimensional hypothesis test deals with models in which the number of parameters is significantly larger than the sample size. Existing literature develops a variety of individual tests. Some of them are sensitive to the dense and small disturbance, and others are sensitive to the sparse and large disturbance. Hence, the powers of these tests depend on the assumption of the alternative scenario. This paper provides a unified framework for developing new tests which are adaptive to a large variety of alternative scenarios in high dimensions. In particular, our framework includes arbitrary hypotheses which can be tested using high dimensional $U$-statistic based vectors. Under this framework, we first develop a broad family of tests based on a novel variant of the $L_p$-norm with $p\in \{1,\dots,\infty\}$. We then combine these tests to construct a data-adaptive test that is simultaneously powerful under various alternative scenarios. To obtain the asymptotic distributions of these tests, we utilize the multiplier bootstrap for...
more | pdf | html
Figures
Tweets
Github
None.
Youtube
None.
Other stats
Sample Sizes : None.
Authors: 4
Total Words: 29122
Unqiue Words: 4672

0.0 Mikeys
#7. A Bayesian nonparametric approach for generalized Bradley-Terry models in random environment
Sylvain Le Corff, Matthieu Lerasle, Elodie Vernet
This paper deals with the estimation of the unknown distribution of hidden random variables from the observation of pairwise comparisons between these variables. This problem is inspired by recent developments on Bradley-Terry models in random environment since this framework happens to be relevant to predict for instance the issue of a championship from the observation of a few contests per team. This paper provides three contributions on a Bayesian nonparametric approach to solve this problem. First, we establish contraction rates of the posterior distribution. We also propose a Markov Chain Monte Carlo algorithm to approximately sample from this posterior distribution inspired from a recent Bayesian nonparametric method for hidden Markov models. Finally, the performance of this algorithm are appreciated by comparing predictions on the issue of a championship based on the actual values of the teams and those obtained by sampling from the estimated posterior distribution.
more | pdf | html
Figures
None.
Tweets
StatsPapers: A Bayesian nonparametric approach for generalized Bradley-Terry models in random environment. https://t.co/RuKlCEyh9r
Github
None.
Youtube
None.
Other stats
Sample Sizes : None.
Authors: 3
Total Words: 13219
Unqiue Words: 2497

0.0 Mikeys
#8. Variational Approximation Accuracy in Bayesian Non-negative Matrix Factorization
Naoki Hayashi
Non-negative matrix factorization (NMF) is a knowledge discovery method that is used for many fields, besides, its variational inference and Gibbs sampling method are also well-known. However, the variational approximation accuracy is not yet clarified, since NMF is not statistically regular and the prior used in the variational Bayesian NMF (VBNMF) has zero or divergence points. In this paper, using algebraic geometrical methods, we theoretically analyze the difference of the negative log evidence/marginal likelihood (free energy) between VBNMF and Bayesian NMF, and give a lower bound of the approximation accuracy, asymptotically. The results quantitatively show how well the VBNMF algorithm can approximate Bayesian NMF.
more | pdf | html
Figures
None.
Tweets
arxivml: "Variational Approximation Accuracy in Bayesian Non-negative Matrix Factorization", Naoki Hayashi https://t.co/LwJXbS0uGG
mathSTb: Naoki Hayashi : Variational Approximation Accuracy in Bayesian Non-negative Matrix Factorization https://t.co/RiSHziS4UV
Memoirs: Variational Approximation Accuracy in Bayesian Non-negative Matrix Factorization. https://t.co/gRIKrqhzMS
nhayashi1994: RT @mathSTb: Naoki Hayashi : Variational Approximation Accuracy in Bayesian Non-negative Matrix Factorization https://t.co/RiSHziS4UV
Github
None.
Youtube
None.
Other stats
Sample Sizes : None.
Authors: 1
Total Words: 5665
Unqiue Words: 1344

0.0 Mikeys
#9. Efficient Statistics, in High Dimensions, from Truncated Samples
Constantinos Daskalakis, Themis Gouleakis, Christos Tzamos, Manolis Zampetakis
We provide an efficient algorithm for the classical problem, going back to Galton, Pearson, and Fisher, of estimating, with arbitrary accuracy the parameters of a multivariate normal distribution from truncated samples. Truncated samples from a $d$-variate normal ${\cal N}(\mathbf{\mu},\mathbf{\Sigma})$ means a samples is only revealed if it falls in some subset $S \subseteq \mathbb{R}^d$; otherwise the samples are hidden and their count in proportion to the revealed samples is also hidden. We show that the mean $\mathbf{\mu}$ and covariance matrix $\mathbf{\Sigma}$ can be estimated with arbitrary accuracy in polynomial-time, as long as we have oracle access to $S$, and $S$ has non-trivial measure under the unknown $d$-variate normal distribution. Additionally we show that without oracle access to $S$, any non-trivial estimation is impossible.
more | pdf | html
Figures
Tweets
Memoirs: Efficient Statistics, in High Dimensions, from Truncated Samples. https://t.co/UvT9bSNL4O
Dinesh1Bhandari: RT @Memoirs: Efficient Statistics, in High Dimensions, from Truncated Samples. https://t.co/UvT9bSNL4O
Github
None.
Youtube
None.
Other stats
Sample Sizes : None.
Authors: 4
Total Words: 10282
Unqiue Words: 2119

0.0 Mikeys
#10. Multiscale change point detection for dependent data
Holger Dette, Theresa Schüler, Mathias Vetter
In this paper we study the theoretical properties of the simultaneous multiscale change point estimator (SMUCE) proposed by Frick et al. (2014) in regression models with dependent error processes. Empirical studies show that in this case the change point estimate is inconsistent, but it is not known if alternatives suggested in the literature for correlated data are consistent. We propose a modification of SMUCE scaling the basic statistic by the long run variance of the error process, which is estimated by a difference-type variance estimator calculated from local means from different blocks. For this modification we prove model consistency for physical dependent error processes and illustrate the finite sample performance by means of a simulation study.
more | pdf | html
Figures
Tweets
MathPaper: Multiscale change point detection for dependent data. https://t.co/8UeZI5BPaf
Github
None.
Youtube
None.
Other stats
Sample Sizes : None.
Authors: 3
Total Words: 7974
Unqiue Words: 1939

About

Assert is a website where the best academic papers on arXiv (computer science, math, physics), bioRxiv (biology), BITSS (reproducibility), EarthArXiv (earth science), engrXiv (engineering), LawArXiv (law), PsyArXiv (psychology), SocArXiv (social science), and SportRxiv (sport research) bubble to the top each day.

Papers are scored (in real-time) based on how verifiable they are (as determined by their Github repos) and how interesting they are (based on Twitter).

To see top papers, follow us on twitter @assertpub_ (arXiv), @assert_pub (bioRxiv), and @assertpub_dev (everything else).

To see beautiful figures extracted from papers, follow us on Instagram.

Tracking 72,893 papers.

Search
Sort results based on if they are interesting or reproducible.
Interesting
Reproducible
Categories
All
Astrophysics
Cosmology and Nongalactic Astrophysics
Earth and Planetary Astrophysics
Astrophysics of Galaxies
High Energy Astrophysical Phenomena
Instrumentation and Methods for Astrophysics
Solar and Stellar Astrophysics
Condensed Matter
Disordered Systems and Neural Networks
Mesoscale and Nanoscale Physics
Materials Science
Other Condensed Matter
Quantum Gases
Soft Condensed Matter
Statistical Mechanics
Strongly Correlated Electrons
Superconductivity
Computer Science
Artificial Intelligence
Hardware Architecture
Computational Complexity
Computational Engineering, Finance, and Science
Computational Geometry
Computation and Language
Cryptography and Security
Computer Vision and Pattern Recognition
Computers and Society
Databases
Distributed, Parallel, and Cluster Computing
Digital Libraries
Discrete Mathematics
Data Structures and Algorithms
Emerging Technologies
Formal Languages and Automata Theory
General Literature
Graphics
Computer Science and Game Theory
Human-Computer Interaction
Information Retrieval
Information Theory
Machine Learning
Logic in Computer Science
Multiagent Systems
Multimedia
Mathematical Software
Numerical Analysis
Neural and Evolutionary Computing
Networking and Internet Architecture
Other Computer Science
Operating Systems
Performance
Programming Languages
Robotics
Symbolic Computation
Sound
Software Engineering
Social and Information Networks
Systems and Control
Economics
Econometrics
General Economics
Theoretical Economics
Electrical Engineering and Systems Science
Audio and Speech Processing
Image and Video Processing
Signal Processing
General Relativity and Quantum Cosmology
General Relativity and Quantum Cosmology
High Energy Physics - Experiment
High Energy Physics - Experiment
High Energy Physics - Lattice
High Energy Physics - Lattice
High Energy Physics - Phenomenology
High Energy Physics - Phenomenology
High Energy Physics - Theory
High Energy Physics - Theory
Mathematics
Commutative Algebra
Algebraic Geometry
Analysis of PDEs
Algebraic Topology
Classical Analysis and ODEs
Combinatorics
Category Theory
Complex Variables
Differential Geometry
Dynamical Systems
Functional Analysis
General Mathematics
General Topology
Group Theory
Geometric Topology
History and Overview
Information Theory
K-Theory and Homology
Logic
Metric Geometry
Mathematical Physics
Numerical Analysis
Number Theory
Operator Algebras
Optimization and Control
Probability
Quantum Algebra
Rings and Algebras
Representation Theory
Symplectic Geometry
Spectral Theory
Statistics Theory
Mathematical Physics
Mathematical Physics
Nonlinear Sciences
Adaptation and Self-Organizing Systems
Chaotic Dynamics
Cellular Automata and Lattice Gases
Pattern Formation and Solitons
Exactly Solvable and Integrable Systems
Nuclear Experiment
Nuclear Experiment
Nuclear Theory
Nuclear Theory
Physics
Accelerator Physics
Atmospheric and Oceanic Physics
Applied Physics
Atomic and Molecular Clusters
Atomic Physics
Biological Physics
Chemical Physics
Classical Physics
Computational Physics
Data Analysis, Statistics and Probability
Physics Education
Fluid Dynamics
General Physics
Geophysics
History and Philosophy of Physics
Instrumentation and Detectors
Medical Physics
Optics
Plasma Physics
Popular Physics
Physics and Society
Space Physics
Quantitative Biology
Biomolecules
Cell Behavior
Genomics
Molecular Networks
Neurons and Cognition
Other Quantitative Biology
Populations and Evolution
Quantitative Methods
Subcellular Processes
Tissues and Organs
Quantitative Finance
Computational Finance
Economics
General Finance
Mathematical Finance
Portfolio Management
Pricing of Securities
Risk Management
Statistical Finance
Trading and Market Microstructure
Quantum Physics
Quantum Physics
Statistics
Applications
Computation
Methodology
Machine Learning
Other Statistics
Statistics Theory
Feedback
Online
Stats
Tracking 72,893 papers.