#1. Minimax Rates in Network Analysis: Graphon Estimation, Community Detection and Hypothesis Testing
Chao Gao, Zongming Ma
This paper surveys some recent developments in fundamental limits and optimal algorithms for network analysis. We focus on minimax optimal rates in three fundamental problems of network analysis: graphon estimation, community detection, and hypothesis testing. For each problem, we review state-of-the-art results in the literature followed by general principles behind the optimal procedures that lead to minimax estimation and testing. This allows us to connect problems in network analysis to other statistical inference problems from a general perspective.
#2. The autoregression bootstrap for kernel estimates of smooth nonlinear functional time series
Johannes T. N. Krebs, Jürgen E. Franke
Functional times series have become an integral part of both functional data and time series analysis. This paper deals with the functional autoregressive model of order 1 and the autoregression bootstrap for smooth functions. The regression operator is estimated in the framework developed by Ferraty and Vieu [2004] and Ferraty et al. [2007] which is here extended to the double functional case under an assumption of stationary ergodic data which dates back to Laib and Louani [2010]. The main result of this article is the characterization of the asymptotic consistency of the bootstrapped regression operator.
#3. State-dependent jump activity estimation for Markovian semimartingales
Fabian Mies
The jump behavior of an infinitely active It\^o semimartingale can be conveniently characterized by a jump activity index of Blumenthal-Getoor type, typically assumed to be constant in time. We study Markovian semimartingales with a non-constant, state-dependent jump activity index and a non-vanishing continuous diffusion component. Nonparametric estimators for the functional jump activity index as well as for the drift function are proposed and shown to be asymptotically normal under combined high-frequency and long-time-span asymptotics. The results are based on a novel uniform bound on the Markov generator of the jump diffusion.
#4. On a minimum distance procedure for threshold selection in tail analysis
Holger Drees, Anja Janßen, Sidney I. Resnick, Tiandong Wang
Power-law distributions have been widely observed in different areas of scientific research. Practical estimation issues include how to select a threshold above which observations follow a power-law distribution and then how to estimate the power-law tail index. A minimum distance selection procedure (MDSP) is proposed in Clauset et al. (2009) and has been widely adopted in practice, especially in the analyses of social networks. However, theoretical justifications for this selection procedure remain scant. In this paper, we study the asymptotic behavior of the selected threshold and the corresponding power-law index given by the MDSP. We find that the MDSP tends to choose too high a threshold level and leads to Hill estimates with large variances and root mean squared errors for simulated data with Pareto-like tails.
#5. A Schur transform for spatial stochastic processes
James Mathews
The variance, higher order moments, covariance, and joint moments or cumulants are shown to be special cases of a certain tensor in $V^{\otimes n}$ defined in terms of a collection $X_1,...,X_n$ of $V$-valued random variables, for an appropriate finite-dimensional real vector space $V$. A statistical transform is proposed from such collections--finite spatial stochastic processes--to numerical tuples using the Schur-Weyl decomposition of $V^{\otimes n}$. It is analogous to the Fourier transform, replacing the periodicity group $\mathbb{Z}$, $\mathbb{R}$, or $U(1)$ with the permutation group $S_{n}$. As a test case, we apply the transform to one of the datasets used for benchmarking the Continuous Registration Challenge, the thoracic 4D Computed Tomography (CT) scans from the M.D. Anderson Cancer Center available for download from DIR-Lab. Further applications to morphometry and statistical shape analysis are suggested.
#6. Towards Characterising Bayesian Network Models under Selection
Angelos P. Armen, Robin J. Evans
Real-life statistical samples are often plagued by selection bias, which complicates drawing conclusions about the general population. When learning causal relationships between the variables is of interest, the sample may be assumed to be from a distribution in a causal Bayesian network (BN) model under selection. Understanding the constraints in the model under selection is the first step towards recovering causal structure in the original model. The conditional-independence (CI) constraints in a BN model under selection have been already characterised; there exist, however, additional, non-CI constraints in such models. In this work, some initial results are provided that simplify the characterisation problem. In addition, an algorithm is designed for identifying compelled ancestors (definite causes) from a completed partially directed acyclic graph (CPDAG). Finally, a non-CI, non-factorisation constraint in a BN model under selection is computed for the first time.
#7. Multiscale change point detection for dependent data
Holger Dette, Theresa Schüler, Mathias Vetter
In this paper we study the theoretical properties of the simultaneous multiscale change point estimator (SMUCE) proposed by Frick et al. (2014) in regression models with dependent error processes. Empirical studies show that in this case the change point estimate is inconsistent, but it is not known if alternatives suggested in the literature for correlated data are consistent. We propose a modification of SMUCE scaling the basic statistic by the long run variance of the error process, which is estimated by a difference-type variance estimator calculated from local means from different blocks. For this modification we prove model consistency for physical dependent error processes and illustrate the finite sample performance by means of a simulation study.
#8. Composite likelihood estimation for a Gaussian process under fixed domain asymptotics
François Bachoc, Moreno Bevilacqua, Daira Velandia
We study composite likelihood estimation of the covariance parameters with data from a one-dimensional Gaussian process with exponential covariance function under fixed domain asymptotics. We show that the weighted pairwise maximum likelihood estimator of the microergodic parameter can be consistent or inconsistent , depending on the range of admissible parameter values in the likelihood optimization. On the contrary, the weighted pairwise conditional maximum likelihood estimator is always consistent. Both estimators are also asymptotically Gaussian when they are consistent, with asymptotic variance larger or strictly larger than that of the maximum likelihood estimator. A simulation study is presented in order to compare the finite sample behavior of the pairwise likelihood estimators with their asymptotic distributions.
#9. Partial recovery bounds for clustering with the relaxed $K$means
Christophe Giraud, Nicolas Verzelen
We investigate the clustering performances of the relaxed $K$means in the setting of sub-Gaussian Mixture Model (sGMM) and Stochastic Block Model (SBM). After identifying the appropriate signal-to-noise ratio (SNR), we prove that the misclassification error decay exponentially fast with respect to this SNR. These partial recovery bounds for the relaxed $K$means improve upon results currently known in the sGMM setting. In the SBM setting, applying the relaxed $K$means SDP allows to handle general connection probabilities whereas other SDPs investigated in the literature are restricted to the assortative case (where within group probabilities are larger than between group probabilities). Again, this partial recovery bound complements the state-of-the-art results. All together, these results put forward the versatility of the relaxed $K$means.
#10. Bootstrapping Max Statistics in High Dimensions: Near-Parametric Rates Under Weak Variance Decay and Application to Functional Data Analysis
Miles E. Lopes, Zhenhua Lin, Hans-Georg Mueller
In recent years, bootstrap methods have drawn attention for their ability to approximate the laws of "max statistics" in high-dimensional problems. A leading example of such a statistic is the coordinate-wise maximum of a sample average of $n$ random vectors in $\mathbb{R}^p$. Existing results for this statistic show that the bootstrap can work when $n\ll p$, and rates of approximation (in Kolmogorov distance) have been obtained with only logarithmic dependence in $p$. Nevertheless, one of the challenging aspects of this setting is that established rates tend to scale like $n^{-1/6}$ as a function of $n$. The main purpose of this paper is to demonstrate that improvement in rate is possible when extra model structure is available. Specifically, we show that if the coordinate-wise variances of the observations exhibit decay, then a nearly $n^{-1/2}$ rate can be achieved, independent of $p$. Furthermore, a surprising aspect of this dimension-free rate is that it holds even when the decay is very weak. As a numerical illustration,...
