### Top 10 Arxiv Papers Today in Statistics

##### #1. Bayesian Inference for Big Spatial Data Using Non-stationary Spectral Simulation
###### Hou-Cheng Yang, Jonathan R. Bradley
It is increasingly understood that the assumption of stationarity is unrealistic for many spatial processes. In this article, we combine dimension expansion with a spectral method to model big non-stationary spatial fields in a computationally efficient manner. Specifically, we use Mejia and Rodriguez-Iturbe (1974)'s spectral simulation approach to simulate a spatial process with a covariogram at locations that have an expanded dimension. We introduce Bayesian hierarchical modelling to dimension expansion, which originally has only been modeled using a method of moments approach. In particular, we simulate from the posterior distribution using a collapsed Gibbs sampler. Our method is both full rank and non-stationary, and can be applied to big spatial data because it does not involve storing and inverting large covariance matrices. Additionally, we have fewer parameters than many other non-stationary spatial models. We demonstrate the wide applicability of our approach using a simulation study, and an application using ozone data...
more | pdf | html
###### Tweets
StatsPapers: Bayesian Inference for Big Spatial Data Using Non-stationary Spectral Simulation. https://t.co/hoTHURUn8a
None.
None.
###### Other stats
Sample Sizes : None.
Authors: 2
Total Words: 8941
Unqiue Words: 2224

##### #2. Learning Sparse Classifiers: Continuous and Mixed Integer Optimization Perspectives
###### Antoine Dedieu, Hussein Hazimeh, Rahul Mazumder
We consider a discrete optimization based approach for learning sparse classifiers, where the outcome depends upon a linear combination of a small subset of features. Recent work has shown that mixed integer programming (MIP) can be used to solve (to optimality) $\ell_0$-regularized problems at scales much larger than what was conventionally considered possible in the statistics and machine learning communities. Despite their usefulness, MIP-based approaches are significantly slower compared to relatively mature algorithms based on $\ell_1$-regularization and relatives. We aim to bridge this computational gap by developing new MIP-based algorithms for $\ell_0$-regularized classification. We propose two classes of scalable algorithms: an exact algorithm that can handle $p\approx 50,000$ features in a few minutes, and approximate algorithms that can address instances with $p\approx 10^6$ in times comparable to fast $\ell_1$-based algorithms. Our exact algorithm is based on the novel idea of \textsl{integrality generation}, which...
more | pdf | html
###### Tweets
arxivml: "Learning Sparse Classifiers: Continuous and Mixed Integer Optimization Perspectives", Antoine Dedieu, Hussein Hazi… https://t.co/oTzJ7juP18
arxiv_cs_LG: Learning Sparse Classifiers: Continuous and Mixed Integer Optimization Perspectives. Antoine Dedieu, Hussein Hazimeh, and Rahul Mazumder https://t.co/1tjrVwKTsP
Memoirs: Learning Sparse Classifiers: Continuous and Mixed Integer Optimization Perspectives. https://t.co/U8HRAowRa4
###### Github

Efficient Algorithms for L0 Regularized Learning

Repository: L0Learn
User: hazimehh
Language: C++
Stargazers: 48
Subscribers: 8
Forks: 12
Open Issues: 2
None.
###### Other stats
Sample Sizes : [805, 140, 420]
Authors: 3
Total Words: 16705
Unqiue Words: 3570

##### #3. Neglecting Uncertainties Leads to Suboptimal Decisions About Home-Owners Flood Risk Management
###### Mahkameh Zarekarizi, Vivek Srikrishnan, Klaus Keller
Homeowners around the world elevate houses to manage flood risks. Deciding how high to elevate the house poses a nontrivial decision problem. The U.S. Federal Emergency Management Agency (FEMA) recommends elevating a house to the Base Flood Elevation (the elevation of the 100-yr flood) plus a freeboard. This recommendation neglects many uncertainties. Here we use a multi-objective robust decision-making framework to analyze this decision in the face of deep uncertainties. We find strong interactions between the economic, engineering, and Earth science uncertainties, illustrating the need for an integrated analysis. We show that considering deep uncertainties surrounding flood hazards, the discount rate, the house lifetime, and the fragility increases the economically optimal house elevation to values well above the recommendation by FEMA. An improved decision-support for home-owners has the potential to drastically improve decisions and outcomes.
more | pdf | html
None.
###### Tweets
StatsPapers: Neglecting Uncertainties Leads to Suboptimal Decisions About Home-Owners Flood Risk Management. https://t.co/fhyUhO98h7
None.
None.
###### Other stats
Sample Sizes : None.
Authors: 3
Total Words: 0
Unqiue Words: 0

##### #4. Coarsened mixtures of hierarchical skew normal kernels for flow cytometry analyses
###### Shai Gorsky, Cliburn Chan, Li Ma
Flow cytometry (FCM) is the standard multi-parameter assay used to measure single cell phenotype and functionality. It is commonly used to quantify the relative frequencies of cell subsets in blood and disaggregated tissues. A typical analysis of FCM data involves cell classification - the identification of cell subgroups in the sample - and comparisons of the cell subgroups across samples. While modern experiments often necessitate the collection and processing of samples in multiple batches, analysis of FCM data across batches is challenging because the locations in the marker space of cell subsets may vary across samples. Differences across samples may occur because of true biological variation or technical reasons such as antibody lot effects or instrument optics. An important step in comparative analyses of multi-sample FCM data is cross-sample calibration, whose goal is to align cell subsets across multiple samples in the presence of variations in locations, so that variation due to technical reasons is minimized and true...
more | pdf | html
None.
###### Tweets
StatsPapers: Coarsened mixtures of hierarchical skew normal kernels for flow cytometry analyses. https://t.co/BtaL0CGgFV
None.
None.
###### Other stats
Sample Sizes : None.
Authors: 3
Total Words: 0
Unqiue Words: 0

##### #5. Bayesian inference of dynamics from partial and noisy observations using data assimilation and machine learning
###### Marc Bocquet, Julien Brajard, Alberto Carrassi, Laurent Bertino
The reconstruction from observations of high-dimensional chaotic dynamics such as geophysical flows is hampered by (i) the partial and noisy observations that can realistically be obtained, (ii) the need to learn from long time series of data, and (iii) the unstable nature of the dynamics. To achieve such inference from the observations over long time series, it has been suggested to combine data assimilation and machine learning in several ways. We show how to unify these approaches from a Bayesian perspective using expectation-maximization and coordinate descents. Implementations and approximations of these methods are also discussed. Finally, we numerically and successfully test the approach on two relevant low-order chaotic models with distinct identifiability.
more | pdf | html
None.
###### Tweets
BrundageBot: Bayesian inference of dynamics from partial and noisy observations using data assimilation and machine learning. Marc Bocquet, Julien Brajard, Alberto Carrassi, and Laurent Bertino https://t.co/hxt3fIcxoO
arxiv_cs_LG: Bayesian inference of dynamics from partial and noisy observations using data assimilation and machine learning. Marc Bocquet, Julien Brajard, Alberto Carrassi, and Laurent Bertino https://t.co/5lgitwBrp3
Memoirs: Bayesian inference of dynamics from partial and noisy observations using data assimilation and machine learning. https://t.co/9TnQlsDbZI
arxivml: "Bayesian inference of dynamics from partial and noisy observations using data assimilation and machine learning", … https://t.co/5sVdHpJj8r
None.
None.
###### Other stats
Sample Sizes : None.
Authors: 4
Total Words: 0
Unqiue Words: 0

##### #6. Multifidelity Approximate Bayesian Computation with Sequential Monte Carlo Parameter Sampling
###### Thomas P. Prescott, Ruth E. Baker
Multifidelity approximate Bayesian computation (MF-ABC) is a likelihood-free technique for parameter inference that exploits model approximations to significantly increase the speed of ABC algorithms (Prescott and Baker, 2020). Previous work has considered MF-ABC only in the context of rejection sampling, which does not explore parameter space particularly efficiently. In this work, we integrate the multifidelity approach with the ABC sequential Monte Carlo (ABC-SMC) algorithm into a new MF-ABC-SMC algorithm. We show that the improvements generated by each of ABC-SMC and MF-ABC to the efficiency of generating Monte Carlo samples and estimates from the ABC posterior are amplified when the two techniques are used together.
more | pdf | html
###### Tweets
ABC_Research: Prescott and Baker develop Multifidelity Approximate Bayesian Computation with Sequential Monte Carlo Parameter Sampling https://t.co/ZZFVpi92Sx @ruth_baker
StatsPapers: Multifidelity Approximate Bayesian Computation with Sequential Monte Carlo Parameter Sampling. https://t.co/tGqEV5SI4n
###### Github

Multifidelity Approximate Bayesian Computation with Sequential Monte Carlo Parameter Sampling

Repository: mf-abc-smc
User: tpprescott
Language: Julia
Stargazers: 0
Subscribers: 1
Forks: 0
Open Issues: 0
None.
###### Other stats
Sample Sizes : None.
Authors: 2
Total Words: 16769
Unqiue Words: 2921

##### #7. Markov Chain Monte Carlo Methods, a survey with some frequent misunderstandings
###### Christian P. Robert, Wu Changye
In this chapter, we review some of the most standard MCMC tools used in Bayesian computation, along with vignettes on standard misunderstandings of these approaches taken from Q \&~A's on the forum Cross-validated answered by the first author.
more | pdf | html
None.
###### Tweets
JRBerrendero: Markov Chain Monte Carlo Methods, a survey with some frequent misunderstandings. https://t.co/YzFUuuhpCL https://t.co/r3R0ugRgzt
tmasada: [2001.06249] Markov Chain Monte Carlo Methods, a survey with some frequent misunderstandings https://t.co/X2ZCfkrcIw
StatsPapers: Markov Chain Monte Carlo Methods, a survey with some frequent misunderstandings. https://t.co/wKVaIUcHwj
JAdP: RT @StatsPapers: Markov Chain Monte Carlo Methods, a survey with some frequent misunderstandings. https://t.co/wKVaIUcHwj
leeswijzer: RT @StatsPapers: Markov Chain Monte Carlo Methods, a survey with some frequent misunderstandings. https://t.co/wKVaIUcHwj
ballforest: RT @StatsPapers: Markov Chain Monte Carlo Methods, a survey with some frequent misunderstandings. https://t.co/wKVaIUcHwj
RexDouglass: RT @StatsPapers: Markov Chain Monte Carlo Methods, a survey with some frequent misunderstandings. https://t.co/wKVaIUcHwj
xiangze750: RT @StatsPapers: Markov Chain Monte Carlo Methods, a survey with some frequent misunderstandings. https://t.co/wKVaIUcHwj
masatokun_markn: RT @StatsPapers: Markov Chain Monte Carlo Methods, a survey with some frequent misunderstandings. https://t.co/wKVaIUcHwj
dizzy_my_future: RT @StatsPapers: Markov Chain Monte Carlo Methods, a survey with some frequent misunderstandings. https://t.co/wKVaIUcHwj
PratheepaJ: RT @StatsPapers: Markov Chain Monte Carlo Methods, a survey with some frequent misunderstandings. https://t.co/wKVaIUcHwj
ustazz: RT @StatsPapers: Markov Chain Monte Carlo Methods, a survey with some frequent misunderstandings. https://t.co/wKVaIUcHwj
MishakinSergey: RT @StatsPapers: Markov Chain Monte Carlo Methods, a survey with some frequent misunderstandings. https://t.co/wKVaIUcHwj
None.
None.
###### Other stats
Sample Sizes : None.
Authors: 2
Total Words: 0
Unqiue Words: 0

##### #8. Causal models for dynamical systems
###### Jonas Peters, Stefan Bauer, Niklas Pfister
A probabilistic model describes a system in its observational state. In many situations, however, we are interested in the system's response under interventions. The class of structural causal models provides a language that allows us to model the behaviour under interventions. It can been taken as a starting point to answer a plethora of causal questions, including the identification of causal effects or causal structure learning. In this chapter, we provide a natural and straight-forward extension of this concept to dynamical systems, focusing on continuous time models. In particular, we introduce two types of causal kinetic models that differ in how the randomness enters into the model: it may either be considered as observational noise or as systematic driving noise. In both cases, we define interventions and therefore provide a possible starting point for causal inference. In this sense, the book chapter provides more questions than answers. The focus of the proposed causal kinetic models lies on the dynamics themselves...
more | pdf | html
None.
###### Tweets
StatsPapers: Causal models for dynamical systems. https://t.co/Wz2hgxnSFo
dizzy_my_future: RT @StatsPapers: Causal models for dynamical systems. https://t.co/Wz2hgxnSFo
None.
None.
###### Other stats
Sample Sizes : None.
Authors: 3
Total Words: 0
Unqiue Words: 0

##### #9. Communication-Efficient Distributed Estimator for Generalized Linear Models with a Diverging Number of Covariates
###### Ping Zhou, Zhen Yu, Jingyi Ma, Maozai Tian
Distributed statistical inference has recently attracted immense attention. Herein, we study the asymptotic efficiency of the maximum likelihood estimator (MLE), the one-step MLE, and the aggregated estimating equation estimator for generalized linear models with a diverging number of covariates. Then a novel method is proposed to obtain an asymptotically efficient estimator for large-scale distributed data by two rounds of communication between local machines and the central server. The assumption on the number of machines in this paper is more relaxed and thus practical for real-world applications. Simulations and a case study demonstrate the satisfactory finite-sample performance of the proposed estimators.
more | pdf | html
None.
###### Tweets
arxiv_cs_LG: Communication-Efficient Distributed Estimator for Generalized Linear Models with a Diverging Number of Covariates. Ping Zhou, Zhen Yu, Jingyi Ma, and Maozai Tian https://t.co/kfD1qsdesf
Memoirs: Communication-Efficient Distributed Estimator for Generalized Linear Models with a Diverging Number of Covariates. https://t.co/h5lbxmbUVa
arxivml: "Communication-Efficient Distributed Estimator for Generalized Linear Models with a Diverging Number of Covariates"… https://t.co/f6AIORHsIi
None.
None.
###### Other stats
Sample Sizes : None.
Authors: 4
Total Words: 0
Unqiue Words: 0

##### #10. Optimal Crossover Designs for Generalized Linear Models
###### Jeevan Jankar, Abhyuday Mandal, Jie Yang
We identify locally $D$-optimal crossover designs for generalized linear models. We use generalized estimating equations to estimate the model parameters along with their variances. To capture the dependency among the observations coming from the same subject, we propose six different correlation structures. We identify the optimal allocations of units for different sequences of treatments. For two-treatment crossover designs, we show via simulations that the optimal allocations are reasonably robust to different choices of the correlation structures. We discuss a real example of multiple treatment crossover experiments using Latin square designs. Using a simulation study, we show that a two-stage design with our locally $D$-optimal design at the second stage is more efficient than the uniform design, especially when the responses from the same subject are correlated.
more | pdf | html
None.
###### Tweets
StatsPapers: Optimal Crossover Designs for Generalized Linear Models. https://t.co/UCVTsCv3FO
None.
None.
###### Other stats
Sample Sizes : None.
Authors: 3
Total Words: 0
Unqiue Words: 0

Assert is a website where the best academic papers on arXiv (computer science, math, physics), bioRxiv (biology), BITSS (reproducibility), EarthArXiv (earth science), engrXiv (engineering), LawArXiv (law), PsyArXiv (psychology), SocArXiv (social science), and SportRxiv (sport research) bubble to the top each day.

Papers are scored (in real-time) based on how verifiable they are (as determined by their Github repos) and how interesting they are (based on Twitter).

To see top papers, follow us on twitter @assertpub_ (arXiv), @assert_pub (bioRxiv), and @assertpub_dev (everything else).

To see beautiful figures extracted from papers, follow us on Instagram.

Tracking 255,445 papers.

###### Search
Sort results based on if they are interesting or reproducible.
Interesting
Reproducible
Online
###### Stats
Tracking 255,445 papers.