Top 9 Arxiv Papers Today in Methodology


0.0 Mikeys
#1. Clinical Prediction Models to Predict the Risk of Multiple Binary Outcomes: a comparison of approaches
Glen P. Martin, Matthew Sperrin, Kym I. E. Snell, Iain Buchan, Richard D. Riley
Clinical prediction models (CPMs) are used to predict clinically relevant outcomes or events. Typically, prognostic CPMs are derived to predict the risk of a single future outcome. However, with rising emphasis on the prediction of multi-morbidity, there is growing need for CPMs to simultaneously predict risks for each of multiple future outcomes. A common approach to multi-outcome risk prediction is to derive a CPM for each outcome separately, then multiply the predicted risks. This approach is only valid if the outcomes are conditionally independent given the covariates, and it fails to exploit the potential relationships between the outcomes. This paper outlines several approaches that could be used to develop prognostic CPMs for multiple outcomes. We consider four methods, ranging in complexity and assumed conditional independence assumptions: namely, probabilistic classifier chain, multinomial logistic regression, multivariate logistic regression, and a Bayesian probit model. These are compared with methods that rely on...
more | pdf | html
Figures
None.
Tweets
MatthewSperrin: Preprint: Clinical Prediction Models to Predict the Risk of Multiple Binary Outcomes: a comparison of approaches https://t.co/mbmdDlMMzO work with @glen_martin1 @Richard_D_Riley @Kym_Snell and @profbuchan - comments welcome!
StatsPapers: Clinical Prediction Models to Predict the Risk of Multiple Binary Outcomes: a comparison of approaches. https://t.co/zyQT62VyBP
Github
None.
Youtube
None.
Other stats
Sample Sizes : None.
Authors: 5
Total Words: 0
Unqiue Words: 0

0.0 Mikeys
#2. Bayesian inference for treatment effects under nested subsets of controls
Spencer Woody, Carlos M. Carvalho, Jared S. Murray
When constructing a model to estimate the causal effect of a treatment, it is necessary to control for other factors which may have confounding effects. Because the ignorability assumption is not testable, however, it is usually unclear which set of controls is appropriate, and effect estimation is generally sensitive to this choice. A common approach in this case is to fit several models, each with a different set of controls, but it is difficult to reconcile inference under the multiple resulting posterior distributions for the treatment effect. Therefore we propose a two-stage approach to measure the sensitivity of effect estimation with respect to control specification. In the first stage, a model is fit with all available controls using a prior carefully selected to adjust for confounding. In the second stage, posterior distributions are calculated for the treatment effect under nested sets of controls by propagating posterior uncertainty in the original model. We demonstrate how our approach can be used to detect the most...
more | pdf | html
Figures
None.
Tweets
StatsPapers: Bayesian inference for treatment effects under nested subsets of controls. https://t.co/PTcwQhNOkK
Github
None.
Youtube
None.
Other stats
Sample Sizes : None.
Authors: 3
Total Words: 0
Unqiue Words: 0

0.0 Mikeys
#3. A Monte Carlo EM Algorithm for the Parameter Estimation of Aggregated Hawkes Processes
Leigh Shlomovich, Edward Cohen, Niall Adams, Lekha Patel
A key difficulty that arises from real event data is imprecision in the recording of event time-stamps. In many cases, retaining event times with a high precision is expensive due to the sheer volume of activity. Combined with practical limits on the accuracy of measurements, aggregated data is common. In order to use point processes to model such event data, tools for handling parameter estimation are essential. Here we consider parameter estimation of the Hawkes process, a type of self-exciting point process that has found application in the modeling of financial stock markets, earthquakes and social media cascades. We develop a novel optimization approach to parameter estimation of aggregated Hawkes processes using a Monte Carlo Expectation-Maximization (MC-EM) algorithm. Through a detailed simulation study, we demonstrate that existing methods are capable of producing severely biased and highly variable parameter estimates and that our novel MC-EM method significantly outperforms them in all studied circumstances. These...
more | pdf | html
Figures
None.
Tweets
eakcohen: New preprint from Leigh, another one of my brilliant PhD students. Understanding the repercussions of aggregating/binning event data and developing methodologies for handling data of this type is one of the areas my group is currently interested in https://t.co/LjD278fpiU https://t.co/s8ZVvbRdOZ
StatsPapers: A Monte Carlo EM Algorithm for the Parameter Estimation of Aggregated Hawkes Processes. https://t.co/5P48bpfIcR
Github
None.
Youtube
None.
Other stats
Sample Sizes : None.
Authors: 4
Total Words: 0
Unqiue Words: 0

0.0 Mikeys
#4. Non-linear Mediation Analysis with High-dimensional Mediators whose Causal Structure is Unknown
Wen Wei Loh, Beatrijs Moerkerke, Tom Loeys, Stijn Vansteelandt
With multiple potential mediators on the causal pathway from a treatment to an outcome, we consider the problem of decomposing the effects along multiple possible causal path(s) through each distinct mediator. Under Pearl's path-specific effects framework (Pearl, 2001; Avin et al., 2005), such fine-grained decompositions necessitate stringent assumptions, such as correctly specifying the causal structure among the mediators, and there being no unobserved confounding among the mediators. In contrast, interventional direct and indirect effects for multiple mediators (Vansteelandt and Daniel, 2017) can be identified under much weaker conditions, while providing scientifically relevant causal interpretations. Nonetheless, current estimation approaches require (correctly) specifying a model for the joint mediator distribution, which can be difficult when there is a high-dimensional set of possibly continuous and non-continuous mediators. In this article, we avoid the need for modeling this distribution, by building on a definition...
more | pdf | html
Figures
None.
Tweets
StatsPapers: Non-linear Mediation Analysis with High-dimensional Mediators whose Causal Structure is Unknown. https://t.co/35VbZZlRJt
Github
None.
Youtube
None.
Other stats
Sample Sizes : None.
Authors: 4
Total Words: 0
Unqiue Words: 0

0.0 Mikeys
#5. Understanding the stochastic partial differential equation approach to smoothing
David L Miller, Richard Glennie, Andrew E Seaton
Correlation and smoothness are terms used to describe a wide variety of random quantities. In time, space, and many other domains, they both imply the same idea: quantities that occur closer together are more similar than those further apart. Two popular statistical models that represent this idea are basis-penalty smoothers (Wood, 2017) and stochastic partial differential equations (SPDE) (Lindgren et al., 2011). In this paper, we discuss how the SPDE can be interpreted as a smoothing penalty and can be fitted using the R package mgcv, allowing practitioners with existing knowledge of smoothing penalties to better understand the implementation and theory behind the SPDE approach.
more | pdf | html
Figures
None.
Tweets
StatsPapers: Understanding the stochastic partial differential equation approach to smoothing. https://t.co/hc2lM27uK4
Github
None.
Youtube
None.
Other stats
Sample Sizes : None.
Authors: 3
Total Words: 7608
Unqiue Words: 2248

0.0 Mikeys
#6. Maximum Likelihood Estimation of Spatially Varying Coefficient Models for Large Data with an Application to Real Estate Price Prediction
Jakob A. Dambon, Fabio Sigrist, Reinhard Furrer
In regression models for spatial data, it is often assumed that the marginal effects of covariates on the response are constant over space. In practice, this assumption might often be questionable. In this article, we show how a Gaussian process-based spatially varying coefficient (SVC) model can be estimated using maximum likelihood estimation (MLE). In addition, we present an approach that scales to large data by applying covariance tapering. We compare our methodology to existing methods such as a Bayesian approach using the stochastic partial differential equation (SPDE) link, geographically weighted regression (GWR), and eigenvector spatial filtering (ESF) in both a simulation study and an application where the goal is to predict prices of real estate apartments in Switzerland. The results from both the simulation study and application show that the MLE approach results in increased predictive accuracy and more precise estimates. Since we use a model-based approach, we can also provide predictive variances. In contrast to...
more | pdf | html
Figures
None.
Tweets
StatsPapers: Maximum Likelihood Estimation of Spatially Varying Coefficient Models for Large Data with an Application to Real Estate Price Prediction. https://t.co/M6P301Bzuh
Github
None.
Youtube
None.
Other stats
Sample Sizes : None.
Authors: 3
Total Words: 0
Unqiue Words: 0

0.0 Mikeys
#7. A numerically stable algorithm for integrating Bayesian models using Markov melding
Andrew A. Manderson, Robert J. B. Goudie
When statistical analyses consider multiple data sources, Markov melding provides a method for combining the source-specific Bayesian models. Models often contain different quantities of information due to variation in the richness of model-specific data, or availability of model-specific prior information. We show that this can make the multi-stage Markov chain Monte Carlo sampler employed by Markov melding unstable and unreliable. We propose a robust multi-stage algorithm that estimates the required prior marginal self-density ratios using weighted samples, dramatically improving accuracy in the tails of the distribution, thus stabilising the algorithm and providing reliable inference. We demonstrate our approach using an evidence synthesis for inferring HIV prevalence, and an evidence synthesis of A/H1N1 influenza.
more | pdf | html
Figures
None.
Tweets
StatsPapers: A numerically stable algorithm for integrating Bayesian models using Markov melding. https://t.co/WEeo0BzJGt
Github
None.
Youtube
None.
Other stats
Sample Sizes : None.
Authors: 2
Total Words: 0
Unqiue Words: 0

0.0 Mikeys
#8. A Deep Learning Algorithm for High-Dimensional Exploratory Item Factor Analysis
Christopher J. Urban, Daniel J. Bauer
Deep learning methods are the gold standard for non-linear statistical modeling in computer vision and in natural language processing but are rarely used in psychometrics. To bridge this gap, we present a novel deep learning algorithm for exploratory item factor analysis (IFA). Our approach combines a deep artificial neural network (ANN) model called a variational autoencoder (VAE) with recent work that uses regularization for exploratory factor analysis. We first provide overviews of ANNs and VAEs. We then describe how to conduct exploratory IFA with a VAE and demonstrate our approach in two empirical examples and in two simulated examples. Our empirical results were consistent with existing psychological theory across random starting values. Our simulations suggest that the VAE consistently recovers the data generating factor pattern with moderate-sized samples. Secondary loadings were underestimated with a complex factor structure and intercept parameter estimates were moderately biased with both simple and complex...
more | pdf | html
Figures
None.
Tweets
BrundageBot: A Deep Learning Algorithm for High-Dimensional Exploratory Item Factor Analysis. Christopher J. Urban and Daniel J. Bauer https://t.co/wlesX9lrXX
arxiv_cs_LG: A Deep Learning Algorithm for High-Dimensional Exploratory Item Factor Analysis. Christopher J. Urban and Daniel J. Bauer https://t.co/AWvFMsALQy
StatsPapers: A Deep Learning Algorithm for High-Dimensional Exploratory Item Factor Analysis. https://t.co/B4tvtnYArZ
Github
None.
Youtube
None.
Other stats
Sample Sizes : None.
Authors: 2
Total Words: 0
Unqiue Words: 0

0.0 Mikeys
#9. Knockoffs with Side Information
Zhimei Ren, Emmanuel Candès
We consider the problem of assessing the importance of multiple variables or factors from a dataset when side information is available. In principle, using side information can allow the statistician to pay attention to variables with a greater potential, which in turn, may lead to more discoveries. We introduce an adaptive knockoff filter, which generalizes the knockoff procedure (Barber and Cand\`es, 2015; Cand\`es et al., 2018) in that it uses both the data at hand and side information to adaptively order the variables under study and focus on those that are most promising. Adaptive knockoffs controls the finite-sample false discovery rate (FDR) and we demonstrate its power by comparing it with other structured multiple testing methods. We also apply our methodology to real genetic data in order to find associations between genetic variants and various phenotypes such as Crohn's disease and lipid levels. Here, adaptive knockoffs makes more discoveries than reported in previous studies on the same datasets.
more | pdf | html
Figures
None.
Tweets
StatsPapers: Knockoffs with Side Information. https://t.co/HZ2OylEJ01
Github
None.
Youtube
None.
Other stats
Sample Sizes : None.
Authors: 2
Total Words: 0
Unqiue Words: 0

About

Assert is a website where the best academic papers on arXiv (computer science, math, physics), bioRxiv (biology), BITSS (reproducibility), EarthArXiv (earth science), engrXiv (engineering), LawArXiv (law), PsyArXiv (psychology), SocArXiv (social science), and SportRxiv (sport research) bubble to the top each day.

Papers are scored (in real-time) based on how verifiable they are (as determined by their Github repos) and how interesting they are (based on Twitter).

To see top papers, follow us on twitter @assertpub_ (arXiv), @assert_pub (bioRxiv), and @assertpub_dev (everything else).

To see beautiful figures extracted from papers, follow us on Instagram.

Tracking 257,103 papers.

Search
Sort results based on if they are interesting or reproducible.
Interesting
Reproducible
Categories
All
Astrophysics
Cosmology and Nongalactic Astrophysics
Earth and Planetary Astrophysics
Astrophysics of Galaxies
High Energy Astrophysical Phenomena
Instrumentation and Methods for Astrophysics
Solar and Stellar Astrophysics
Condensed Matter
Disordered Systems and Neural Networks
Mesoscale and Nanoscale Physics
Materials Science
Other Condensed Matter
Quantum Gases
Soft Condensed Matter
Statistical Mechanics
Strongly Correlated Electrons
Superconductivity
Computer Science
Artificial Intelligence
Hardware Architecture
Computational Complexity
Computational Engineering, Finance, and Science
Computational Geometry
Computation and Language
Cryptography and Security
Computer Vision and Pattern Recognition
Computers and Society
Databases
Distributed, Parallel, and Cluster Computing
Digital Libraries
Discrete Mathematics
Data Structures and Algorithms
Emerging Technologies
Formal Languages and Automata Theory
General Literature
Graphics
Computer Science and Game Theory
Human-Computer Interaction
Information Retrieval
Information Theory
Machine Learning
Logic in Computer Science
Multiagent Systems
Multimedia
Mathematical Software
Numerical Analysis
Neural and Evolutionary Computing
Networking and Internet Architecture
Other Computer Science
Operating Systems
Performance
Programming Languages
Robotics
Symbolic Computation
Sound
Software Engineering
Social and Information Networks
Systems and Control
Economics
Econometrics
General Economics
Theoretical Economics
Electrical Engineering and Systems Science
Audio and Speech Processing
Image and Video Processing
Signal Processing
General Relativity and Quantum Cosmology
General Relativity and Quantum Cosmology
High Energy Physics - Experiment
High Energy Physics - Experiment
High Energy Physics - Lattice
High Energy Physics - Lattice
High Energy Physics - Phenomenology
High Energy Physics - Phenomenology
High Energy Physics - Theory
High Energy Physics - Theory
Mathematics
Commutative Algebra
Algebraic Geometry
Analysis of PDEs
Algebraic Topology
Classical Analysis and ODEs
Combinatorics
Category Theory
Complex Variables
Differential Geometry
Dynamical Systems
Functional Analysis
General Mathematics
General Topology
Group Theory
Geometric Topology
History and Overview
Information Theory
K-Theory and Homology
Logic
Metric Geometry
Mathematical Physics
Numerical Analysis
Number Theory
Operator Algebras
Optimization and Control
Probability
Quantum Algebra
Rings and Algebras
Representation Theory
Symplectic Geometry
Spectral Theory
Statistics Theory
Mathematical Physics
Mathematical Physics
Nonlinear Sciences
Adaptation and Self-Organizing Systems
Chaotic Dynamics
Cellular Automata and Lattice Gases
Pattern Formation and Solitons
Exactly Solvable and Integrable Systems
Nuclear Experiment
Nuclear Experiment
Nuclear Theory
Nuclear Theory
Physics
Accelerator Physics
Atmospheric and Oceanic Physics
Applied Physics
Atomic and Molecular Clusters
Atomic Physics
Biological Physics
Chemical Physics
Classical Physics
Computational Physics
Data Analysis, Statistics and Probability
Physics Education
Fluid Dynamics
General Physics
Geophysics
History and Philosophy of Physics
Instrumentation and Detectors
Medical Physics
Optics
Plasma Physics
Popular Physics
Physics and Society
Space Physics
Quantitative Biology
Biomolecules
Cell Behavior
Genomics
Molecular Networks
Neurons and Cognition
Other Quantitative Biology
Populations and Evolution
Quantitative Methods
Subcellular Processes
Tissues and Organs
Quantitative Finance
Computational Finance
Economics
General Finance
Mathematical Finance
Portfolio Management
Pricing of Securities
Risk Management
Statistical Finance
Trading and Market Microstructure
Quantum Physics
Quantum Physics
Statistics
Applications
Computation
Methodology
Machine Learning
Other Statistics
Statistics Theory
Feedback
Online
Stats
Tracking 257,103 papers.