##### #1. Prediction and estimation consistency of sparse multi-class penalized optimal scoring
###### Irina Gaynanova
Sparse linear discriminant analysis via penalized optimal scoring is a successful tool for classification in high-dimensional settings. While the variable selection consistency of sparse optimal scoring has been established, the corresponding prediction and estimation consistency results have been lacking. We bridge this gap by providing probabilistic bounds on out-of-sample prediction error and estimation error of multi-class penalized optimal scoring allowing for diverging number of classes.
##### #2. Asymptotic conditional inference via a Steining of selection probabilities
###### Snigdha Panigrahi
Many scientific studies are modeled as hierarchical procedures where the starting point of data-analysis is based on pilot samples that are employed to determine parameters of interest. With the availability of more data, the scientist is tasked with conducting a meta-analysis based on the augmented data-sets, that combines his explorations from the pilot stage with a confirmatory study. Casting these two-staged procedures into a conditional framework, inference is based on a carved likelihood. Such a likelihood is obtained by conditioning the law of the augmented data (from both the stages) upon the selection carried out on the first stage data. In fact, conditional inference in hierarchically-modeled investigations or equivalently, in settings, where some samples are reserved for inference, is asymptotically equivalent to a Gaussian randomization scheme. Identifying the probabilistic behavior of the selection event under Gaussian perturbation to be very different from heavy tailed randomizations in Tian and Taylor (2018),...
##### #3. Sparse space-time models: Concentration Inequalities and Lasso
###### Guilherme Ost, Patricia Reynaud-Bouret
Inspired by Kalikow-type decompositions, we introduce a new stochastic model of infinite neuronal networks, for which we establish oracle inequalities for Lasso methods and restricted eigenvalue properties for the associated Gram matrix with high probability. These results hold even if the network is only partially observed. The main argument rely on the fact that concentration inequalities can easily be derived whenever the transition probabilities of the underlying process admit a sparse space-time representation.
##### #4. Asymptotically Optimal Quickest Change Detection In Multistream Data - Part 1: General Stochastic Models
###### Alexander Tartakovsky
Assume that there are multiple data streams (channels, sensors) and in each stream the process of interest produces generally dependent and non-identically distributed observations. When the process is in a normal mode (in-control), the (pre-change) distribution is known, but when the process becomes abnormal there is a parametric uncertainty, i.e., the post-change (out-of-control) distribution is known only partially up to a parameter. Both the change point and the post-change parameter are unknown. Moreover, the change affects an unknown subset of streams, so that the number of affected streams and their location are unknown in advance. A good changepoint detection procedure should detect the change as soon as possible after its occurrence while controlling for a risk of false alarms. We consider a Bayesian setup with a given prior distribution of the change point and propose two sequential mixture-based change detection rules, one mixes a Shiryaev-type statistic over both the unknown subset of affected streams and the unknown...
##### #5. Asymptotically and computationally efficient tensorial JADE
###### Joni Virta, Niko Lietzén, Pauliina Ilmonen, Klaus Nordhausen
In this work, we propose a novel method for tensorial independent component analysis. Our approach is based on TJADE and $k$-JADE, two recently proposed generalizations of the classical JADE algorithm. Our novel method achieves the consistency and the limiting distribution of TJADE under mild assumptions, and at the same time offers notable improvement in computational speed. Detailed mathematical proofs of the statistical properties of our method are given and, as a special case, a conjecture on the properties of $k$-JADE is resolved. Simulations and timing comparisons demonstrate remarkable gain in speed. Moreover, the desired efficiency is obtained approximately for finite samples. The method is applied successfully to large-scale video data, for which neither TJADE nor $k$-JADE is feasible.
##### #6. A Unified Framework for Testing High Dimensional Parameters: A Data-Adaptive Approach
###### Cheng Zhou, Xinsheng Zhang, Wenxin Zhou, Han Liu
High dimensional hypothesis test deals with models in which the number of parameters is significantly larger than the sample size. Existing literature develops a variety of individual tests. Some of them are sensitive to the dense and small disturbance, and others are sensitive to the sparse and large disturbance. Hence, the powers of these tests depend on the assumption of the alternative scenario. This paper provides a unified framework for developing new tests which are adaptive to a large variety of alternative scenarios in high dimensions. In particular, our framework includes arbitrary hypotheses which can be tested using high dimensional $U$-statistic based vectors. Under this framework, we first develop a broad family of tests based on a novel variant of the $L_p$-norm with $p\in \{1,\dots,\infty\}$. We then combine these tests to construct a data-adaptive test that is simultaneously powerful under various alternative scenarios. To obtain the asymptotic distributions of these tests, we utilize the multiplier bootstrap for...
##### #7. A Bayesian nonparametric approach for generalized Bradley-Terry models in random environment
###### Sylvain Le Corff, Matthieu Lerasle, Elodie Vernet
This paper deals with the estimation of the unknown distribution of hidden random variables from the observation of pairwise comparisons between these variables. This problem is inspired by recent developments on Bradley-Terry models in random environment since this framework happens to be relevant to predict for instance the issue of a championship from the observation of a few contests per team. This paper provides three contributions on a Bayesian nonparametric approach to solve this problem. First, we establish contraction rates of the posterior distribution. We also propose a Markov Chain Monte Carlo algorithm to approximately sample from this posterior distribution inspired from a recent Bayesian nonparametric method for hidden Markov models. Finally, the performance of this algorithm are appreciated by comparing predictions on the issue of a championship based on the actual values of the teams and those obtained by sampling from the estimated posterior distribution.
##### #8. Variational Approximation Accuracy in Bayesian Non-negative Matrix Factorization
###### Naoki Hayashi
Non-negative matrix factorization (NMF) is a knowledge discovery method that is used for many fields, besides, its variational inference and Gibbs sampling method are also well-known. However, the variational approximation accuracy is not yet clarified, since NMF is not statistically regular and the prior used in the variational Bayesian NMF (VBNMF) has zero or divergence points. In this paper, using algebraic geometrical methods, we theoretically analyze the difference of the negative log evidence/marginal likelihood (free energy) between VBNMF and Bayesian NMF, and give a lower bound of the approximation accuracy, asymptotically. The results quantitatively show how well the VBNMF algorithm can approximate Bayesian NMF.
##### #9. Efficient Statistics, in High Dimensions, from Truncated Samples
###### Constantinos Daskalakis, Themis Gouleakis, Christos Tzamos, Manolis Zampetakis
We provide an efficient algorithm for the classical problem, going back to Galton, Pearson, and Fisher, of estimating, with arbitrary accuracy the parameters of a multivariate normal distribution from truncated samples. Truncated samples from a $d$-variate normal ${\cal N}(\mathbf{\mu},\mathbf{\Sigma})$ means a samples is only revealed if it falls in some subset $S \subseteq \mathbb{R}^d$; otherwise the samples are hidden and their count in proportion to the revealed samples is also hidden. We show that the mean $\mathbf{\mu}$ and covariance matrix $\mathbf{\Sigma}$ can be estimated with arbitrary accuracy in polynomial-time, as long as we have oracle access to $S$, and $S$ has non-trivial measure under the unknown $d$-variate normal distribution. Additionally we show that without oracle access to $S$, any non-trivial estimation is impossible.
##### #10. Multiscale change point detection for dependent data
###### Holger Dette, Theresa Schüler, Mathias Vetter
In this paper we study the theoretical properties of the simultaneous multiscale change point estimator (SMUCE) proposed by Frick et al. (2014) in regression models with dependent error processes. Empirical studies show that in this case the change point estimate is inconsistent, but it is not known if alternatives suggested in the literature for correlated data are consistent. We propose a modification of SMUCE scaling the basic statistic by the long run variance of the error process, which is estimated by a difference-type variance estimator calculated from local means from different blocks. For this modification we prove model consistency for physical dependent error processes and illustrate the finite sample performance by means of a simulation study.
