#1. Asymptotically Exact Variational Bayes for High-Dimensional Binary Regression Models
Augusto Fasano, Daniele Durante, Giacomo Zanella
State-of-the-art methods for Bayesian inference on regression models with binary responses are either computationally impractical or inaccurate in high dimensions. To cover this gap we propose a novel variational approximation for the posterior distribution of the coefficients in high-dimensional probit regression. Our method leverages a representation with global and local variables but, unlike for classical mean-field assumptions, it avoids a fully factorized approximation, and instead assumes a factorization only for the local variables. We prove that the resulting variational approximation belongs to a tractable class of unified skew-normal distributions that preserves the skewness of the actual posterior and, unlike for state-of-the-art variational Bayes solutions, converges to the exact posterior as the number of predictors p increases. A scalable coordinate ascent variational algorithm is proposed to obtain the optimal parameters of the approximating densities. As we show with both theoretical results and an application to...
#2. A nonparametric framework for inferring orders of categorical data from category-real ordered pairs
Chainarong Amornbunchornvej, Navaporn Surasvadi, Anon Plangprasopchok, Suttipong Thajchayapong
Given a dataset of careers and incomes, how large a difference of income between any pair of careers would be? Given a dataset of travel time records, how long do we need to spend more when choosing a public transportation mode $A$ instead of $B$ to travel? In this paper, we propose a framework that is able to infer orders of categories as well as magnitudes of difference of real numbers between each pair of categories using Estimation statistics framework. Not only reporting whether an order of categories exists, but our framework also reports the magnitude of difference of each consecutive pairs of categories in the order. In large dataset, our framework is scalable well compared with the existing framework. The proposed framework has been applied to two real-world case studies: 1) ordering careers by incomes based on information of 350,000 households living in Khon Kaen province, Thailand, and 2) ordering sectors by closing prices based on 1060 companies' closing prices of NASDAQ stock markets between years 2000 and 2016. The...
#3. Causal inference using Bayesian non-parametric quasi-experimental design
Max Hinne, Marcel A. J. van Gerven, Luca Ambrogioni
The de facto standard for causal inference is the randomized controlled trial, where one compares an manipulated group with a control group in order to determine the effect of an intervention. However, this research design is not always realistically possible due to pragmatic or ethical concerns. In these situations, quasi-experimental designs may provide a solution, as these allow for causal conclusions at the cost of additional design assumptions. In this paper, we provide a generic framework for quasi-experimental design using Bayesian model comparison, and we show how it can be used as an alternative to several common research designs. We provide a theoretical motivation for a Gaussian process based approach and demonstrate its convenient use in a number of simulations. Finally, we apply the framework to determine the effect of population-based thresholds for municipality funding in France, of the 2005 smoking ban in Sicily on the number of acute coronary events, and of the effect of an alleged historical phantom border in the...
#4. Assessing the uncertainty in statistical evidence with the possibility of model misspecification using a non-parametric bootstrap
Mark L. Taper, Subhash R Lele, José-Miguel Ponciano, Brian Dennis
Empirical evidence, e.g. observed likelihood ratio, is an estimator of the difference of the divergences between two competing models (or, model sets) and the true generating mechanism. It is unclear how to use such empirical evidence in scientific practice. Scientists usually want to know "how often would I get this level of evidence". The answer to this question depends on the true generating mechanism along with the models under consideration. In many situations, having observed the data, we can approximate the true generating mechanism non-parametrically by assuming far less structure than the parametric models being compared. We use a resampling method based on the non-parametric estimate of the true generating mechanism to estimate a confidence interval for the empirical evidence that is robust to model misspecification. Such a confidence interval tells us how variable the empirical evidence would be if the experiment (or observational study) were to be replicated. In our simulations, variability in empirical evidence...
#5. GET: Global envelopes in R
Mari Myllymäki, Tomáš Mrkvička
This work describes the R package GET that implements global envelopes, which can be employed for central regions of functional or multivariate data, for graphical Monte Carlo and permutation tests where the test statistic is multivariate or functional, and for global confidence and prediction bands. Intrinsic graphical interpretation property is introduced for global envelopes, and the global envelopes included in the GET package that have the property are described and compared. Examples of different use of global envelopes and their implementation in the GET package are presented, including global envelopes for single and several one- or two-dimensional functions, goodness-of-fit and permutation tests, graphical functional analysis of variance (ANOVA) and general linear model (GLM), comparison of distributions, and confidence bands in polynomial regression.
#6. Akaike's Bayesian information criterion (ABIC) or not ABIC for geophysical inversion
Peiliang Xu
Akaike's Bayesian information criterion (ABIC) has been widely used in geophysical inversion and beyond. However, little has been done to investigate its statistical aspects. We present an alternative derivation of the marginal distribution of measurements, whose maximization directly leads to the invention of ABIC by Akaike. We show that ABIC is to statistically estimate the variance of measurements and the prior variance by maximizing the marginal distribution of measurements. The determination of the regularization parameter on the basis of ABIC is actually equivalent to estimating the relative weighting factor between the variance of measurements and the prior variance for geophysical inverse problems. We show that if the noise level of measurements is unknown, ABIC tends to produce a substantially biased estimate of the variance of measurements. In particular, since the prior mean is generally unknown but arbitrarily treated as zero in geophysical inversion, ABIC does not produce a reasonable estimate for the prior variance either.
#7. How bettering the best? Answers via blending models and cluster formulations in density-based clustering
Alessandro Casa, Luca Scrucca, Giovanna Menardi
With the recent growth in data availability and complexity, and the associated outburst of elaborate modeling approaches, model selection tools have become a lifeline, providing objective criteria to deal with this increasingly challenging landscape. In fact, basing predictions and inference on a single model may be limiting if not harmful; ensemble approaches, which combine different models, have been proposed to overcome the selection step, and proven fruitful especially in the supervised learning framework. Conversely, these approaches have been scantily explored in the unsupervised setting. In this work we focus on the model-based clustering formulation, where a plethora of mixture models, with different number of components and parametrizations, is tipically estimated. We propose an ensemble clustering approach that circumvents the single best model paradigm, while improving stability and robustness of the partitions. A new density estimator, being a convex linear combination of the density estimates in the ensemble,...
