Sparse linear discriminant analysis via penalized optimal scoring is a
successful tool for classification in high-dimensional settings. While the
variable selection consistency of sparse optimal scoring has been established,
the corresponding prediction and estimation consistency results have been
lacking. We bridge this gap by providing probabilistic bounds on out-of-sample
prediction error and estimation error of multi-class penalized optimal scoring
allowing for diverging number of classes.

Many scientific studies are modeled as hierarchical procedures where the
starting point of data-analysis is based on pilot samples that are employed to
determine parameters of interest. With the availability of more data, the
scientist is tasked with conducting a meta-analysis based on the augmented
data-sets, that combines his explorations from the pilot stage with a
confirmatory study. Casting these two-staged procedures into a conditional
framework, inference is based on a carved likelihood. Such a likelihood is
obtained by conditioning the law of the augmented data (from both the stages)
upon the selection carried out on the first stage data. In fact, conditional
inference in hierarchically-modeled investigations or equivalently, in
settings, where some samples are reserved for inference, is asymptotically
equivalent to a Gaussian randomization scheme. Identifying the probabilistic
behavior of the selection event under Gaussian perturbation to be very
different from heavy tailed randomizations in Tian and Taylor (2018),...

Inspired by Kalikow-type decompositions, we introduce a new stochastic model
of infinite neuronal networks, for which we establish oracle inequalities for
Lasso methods and restricted eigenvalue properties for the associated Gram
matrix with high probability. These results hold even if the network is only
partially observed. The main argument rely on the fact that concentration
inequalities can easily be derived whenever the transition probabilities of the
underlying process admit a sparse space-time representation.

Assume that there are multiple data streams (channels, sensors) and in each
stream the process of interest produces generally dependent and non-identically
distributed observations. When the process is in a normal mode (in-control),
the (pre-change) distribution is known, but when the process becomes abnormal
there is a parametric uncertainty, i.e., the post-change (out-of-control)
distribution is known only partially up to a parameter. Both the change point
and the post-change parameter are unknown. Moreover, the change affects an
unknown subset of streams, so that the number of affected streams and their
location are unknown in advance. A good changepoint detection procedure should
detect the change as soon as possible after its occurrence while controlling
for a risk of false alarms. We consider a Bayesian setup with a given prior
distribution of the change point and propose two sequential mixture-based
change detection rules, one mixes a Shiryaev-type statistic over both the
unknown subset of affected streams and the unknown...

In this work, we propose a novel method for tensorial independent component
analysis. Our approach is based on TJADE and $ k $-JADE, two recently proposed
generalizations of the classical JADE algorithm. Our novel method achieves the
consistency and the limiting distribution of TJADE under mild assumptions, and
at the same time offers notable improvement in computational speed. Detailed
mathematical proofs of the statistical properties of our method are given and,
as a special case, a conjecture on the properties of $ k $-JADE is resolved.
Simulations and timing comparisons demonstrate remarkable gain in speed.
Moreover, the desired efficiency is obtained approximately for finite samples.
The method is applied successfully to large-scale video data, for which neither
TJADE nor $ k $-JADE is feasible.

High dimensional hypothesis test deals with models in which the number of
parameters is significantly larger than the sample size. Existing literature
develops a variety of individual tests. Some of them are sensitive to the dense
and small disturbance, and others are sensitive to the sparse and large
disturbance. Hence, the powers of these tests depend on the assumption of the
alternative scenario. This paper provides a unified framework for developing
new tests which are adaptive to a large variety of alternative scenarios in
high dimensions. In particular, our framework includes arbitrary hypotheses
which can be tested using high dimensional $U$-statistic based vectors. Under
this framework, we first develop a broad family of tests based on a novel
variant of the $L_p$-norm with $p\in \{1,\dots,\infty\}$. We then combine these
tests to construct a data-adaptive test that is simultaneously powerful under
various alternative scenarios. To obtain the asymptotic distributions of these
tests, we utilize the multiplier bootstrap for...

This paper deals with the estimation of the unknown distribution of hidden
random variables from the observation of pairwise comparisons between these
variables. This problem is inspired by recent developments on Bradley-Terry
models in random environment since this framework happens to be relevant to
predict for instance the issue of a championship from the observation of a few
contests per team. This paper provides three contributions on a Bayesian
nonparametric approach to solve this problem. First, we establish contraction
rates of the posterior distribution. We also propose a Markov Chain Monte Carlo
algorithm to approximately sample from this posterior distribution inspired
from a recent Bayesian nonparametric method for hidden Markov models. Finally,
the performance of this algorithm are appreciated by comparing predictions on
the issue of a championship based on the actual values of the teams and those
obtained by sampling from the estimated posterior distribution.

Non-negative matrix factorization (NMF) is a knowledge discovery method that
is used for many fields, besides, its variational inference and Gibbs sampling
method are also well-known. However, the variational approximation accuracy is
not yet clarified, since NMF is not statistically regular and the prior used in
the variational Bayesian NMF (VBNMF) has zero or divergence points. In this
paper, using algebraic geometrical methods, we theoretically analyze the
difference of the negative log evidence/marginal likelihood (free energy)
between VBNMF and Bayesian NMF, and give a lower bound of the approximation
accuracy, asymptotically. The results quantitatively show how well the VBNMF
algorithm can approximate Bayesian NMF.

We provide an efficient algorithm for the classical problem, going back to
Galton, Pearson, and Fisher, of estimating, with arbitrary accuracy the
parameters of a multivariate normal distribution from truncated samples.
Truncated samples from a $d$-variate normal ${\cal
N}(\mathbf{\mu},\mathbf{\Sigma})$ means a samples is only revealed if it falls
in some subset $S \subseteq \mathbb{R}^d$; otherwise the samples are hidden and
their count in proportion to the revealed samples is also hidden. We show that
the mean $\mathbf{\mu}$ and covariance matrix $\mathbf{\Sigma}$ can be
estimated with arbitrary accuracy in polynomial-time, as long as we have oracle
access to $S$, and $S$ has non-trivial measure under the unknown $d$-variate
normal distribution. Additionally we show that without oracle access to $S$,
any non-trivial estimation is impossible.

In this paper we study the theoretical properties of the simultaneous
multiscale change point estimator (SMUCE) proposed by Frick et al. (2014) in
regression models with dependent error processes. Empirical studies show that
in this case the change point estimate is inconsistent, but it is not known if
alternatives suggested in the literature for correlated data are consistent. We
propose a modification of SMUCE scaling the basic statistic by the long run
variance of the error process, which is estimated by a difference-type variance
estimator calculated from local means from different blocks. For this
modification we prove model consistency for physical dependent error processes
and illustrate the finite sample performance by means of a simulation study.

