#1. Stable Multiple Time Step Simulation/Prediction from Lagged Dynamic Network Regression Models
Abhirup Mallik, Zack W. Almquist
Recent developments in computers and automated data collection strategies have greatly increased the interest in statistical modeling of dynamic networks. Many of the statistical models employed for inference on large-scale dynamic networks suffer from limited forward simulation/prediction ability. A major problem with many of the forward simulation procedures is the tendency for the model to become degenerate in only a few time steps, i.e., the simulation/prediction procedure results in either null graphs or complete graphs. Here, we describe an algorithm for simulating a sequence of networks generated from lagged dynamic network regression models DNR(V), a sub-family of TERGMs. We introduce a smoothed estimator for forward prediction based on smoothing of the change statistics obtained for a dynamic network regression model. We focus on the implementation of the algorithm, providing a series of motivating examples with comparisons to dynamic network models from the literature. We find that our algorithm significantly improves...
#2. A practical example for the non-linear Bayesian filtering of model parameters
Matthieu Bulté, Jonas Latz, Elisabeth Ullmann
In this tutorial we consider the non-linear Bayesian filtering of static parameters in a time-dependent model. We outline the theoretical background and discuss appropriate solvers. We focus on particle-based filters and present Sequential Importance Sampling (SIS) and Sequential Monte Carlo (SMC). Throughout the paper we illustrate the concepts and techniques with a practical example using real-world data. The task is to estimate the gravitational acceleration of the Earth $g$ by using observations collected from a simple pendulum. Importantly, the particle filters enable the adaptive updating of the estimate for $g$ as new observations become available. For tutorial purposes we provide the data set and a Python implementation of the particle filters.
#3. Fast computation of p-values for the permutation test based on Pearson's correlation coefficient and other statistical tests
Jean-Marie Droz
Permutation tests are among the simplest and most widely used statistical tools. Their p-values can be computed by a straightforward sampling of permutations. However, this way of computing p-values is often so slow that it is replaced by an approximation, which is accurate only for part of the interesting range of parameters. Moreover, the accuracy of the approximation can usually not be improved by increasing the computation time. We introduce a new sampling-based algorithm which uses the fast Fourier transform to compute p-values for the permutation test based on Pearson's correlation coefficient. The algorithm is practically and asymptotically faster than straightforward sampling. Typically, its complexity is logarithmic in the input size, while the complexity of straightforward sampling is linear. The idea behind the algorithm can also be used to accelerate the computation of p-values for many other common statistical tests. The algorithm is easy to implement, but its analysis involves results from the representation theory...
#4. A distributed regression analysis application based on SAS software. Part I: Linear and logistic regression
Qoua L. Her, Yury Vilk, Jessica Young, Zilu Zhang, Jessica M. Malenfant, Sarah Malek, Sengwee Toh
Previous work has demonstrated the feasibility and value of conducting distributed regression analysis (DRA), a privacy-protecting analytic method that performs multivariable-adjusted regression analysis with only summary-level information from participating sites. To our knowledge, there are no DRA applications in SAS, the statistical software used by several large national distributed data networks (DDNs), including the Sentinel System and PCORnet. SAS/IML is available to perform the required matrix computations for DRA in the SAS system. However, not all data partners in these large DDNs have access to SAS/IML, which is licensed separately. In this first article of a two-paper series, we describe a DRA application developed for use in Base SAS and SAS/STAT modules for linear and logistic DRA within horizontally partitioned DDNs and its successful tests.
#5. Correlated pseudo-marginal Metropolis-Hastings using quasi-Newton proposals
Johan Dahlin, Adrian Wills, Brett Ninness
Pseudo-marginal Metropolis-Hastings (pmMH) is a versatile algorithm for sampling from target distributions which are not easy to evaluate point-wise. However, pmMH requires good proposal distributions to sample efficiently from the target, which can be problematic to construct in practice. This is especially a problem for high-dimensional targets when the standard random-walk proposal is inefficient. We extend pmMH to allow for constructing the proposal based on information from multiple past iterations. As a consequence, quasi-Newton (qN) methods can be employed to form proposals which utilize gradient information to guide the Markov chain to areas of high probability and to construct approximations of the local curvature to scale step sizes. The proposed method is demonstrated on several problems which indicate that qN proposals can perform better than other common Hessian-based proposals.
#6. MPS: An R package for modelling new families of distributions
Mahdi Teimouri
We introduce an \verb|R| package, called \verb|MPS|, for computing the probability density function, computing the cumulative distribution function, computing the quantile function, simulating random variables, and estimating the parameters of 24 new shifted families of distributions. By considering an extra shift (location) parameter for each family more flexibility yields. Under some situations, since the maximum likelihood estimators may fail to exist, we adopt the well-known maximum product spacings approach to estimate the parameters of shifted 24 new families of distributions. The performance of the \verb|MPS| package for computing the cdf, pdf, and simulating random samples will be checked by examples. The performance of the maximum product spacings approach is demonstrated by executing \verb|MPS| package for three sets of real data. As it will be shown, for the first set, the maximum likelihood estimators break down but \verb|MPS| package find them. For the second set, adding the location parameter leads to acceptance the...
#7. Adaptive Approximation Error Models for Efficient Uncertainty Quantification with Application to Multiphase Subsurface Fluid Flow
Tiangang Cui, Colin Fox, Michael J O'Sullivan
Sample-based Bayesian inference provides a route to uncertainty quantification in the geosciences, though is very computationally demanding in the na\"ive form that requires simulating an accurate computer model at each iteration. We present a new approach that adaptively builds a stochastic model for the error induced by a reduced model. This enables sampling from the correct target distribution at reduced computational cost, while avoiding appreciable loss of statistical efficiency. We build on recent simplified conditions for adaptive Markov chain Monte Carlo algorithms to give practical approximation schemes and algorithms with guaranteed convergence. We demonstrate the efficacy of our new approach on two computational examples, including calibration of a large-scale numerical model of a real geothermal reservoir, that show good computational and statistical efficiencies on both synthetic and measured data sets.
#8. Iterative proportional scaling revisited: a modern optimization perspective
Yiyuan She, Shao Tang
This paper revisits the classic iterative proportional scaling (IPS) from a modern optimization perspective. In contrast to the criticisms made in the literature, we show that based on a coordinate descent characterization, IPS can be slightly modified to deliver coefficient estimates, and from a majorization-minimization standpoint, IPS can be extended to handle log-affine models with features not necessarily binary-valued or nonnegative. Furthermore, some state-of-the-art optimization techniques such as block-wise computation, randomization and momentum-based acceleration can be employed to provide more scalable IPS algorithms, as well as some regularized variants of IPS for concurrent feature selection.
#9. Biologically Plausible Online Principal Component Analysis Without Recurrent Neural Dynamics
Victor Minden, Cengiz Pehlevan, Dmitri B. Chklovskii
Artificial neural networks that learn to perform Principal Component Analysis (PCA) and related tasks using strictly local learning rules have been previously derived based on the principle of similarity matching: similar pairs of inputs should map to similar pairs of outputs. However, the operation of these networks (and of similar networks) requires a fixed-point iteration to determine the output corresponding to a given input, which means that dynamics must operate on a faster time scale than the variation of the input. Further, during these fast dynamics such networks typically "disable" learning, updating synaptic weights only once the fixed-point iteration has been resolved. Here, we derive a network for PCA-based dimensionality reduction that avoids this fast fixed-point iteration. The key novelty of our approach is a modification of the similarity matching objective to encourage near-diagonality of a synaptic weight matrix. We then approximately invert this matrix using a Taylor series approximation, replacing the previous...
#10. plsRglm: Partial least squares linear and generalized linear regression for processing incomplete datasets by cross-validation and bootstrap techniques with R
F. Bertrand, M. Maumy-Bertrand
The aim of the plsRglm package is to deal with complete and incomplete datasets through several new techniques or, at least, some which were not yet implemented in R. Indeed, not only does it make available the extension of the PLS regression to the generalized linear regression models, but also bootstrap techniques, leave-one-out and repeated $k$-fold cross-validation. In addition, graphical displays help the user to assess the significance of the predictors when using bootstrap techniques. Biplots (Fig. 4) can be used to delve into the relationship between individuals and variables.
