### Top 9 Arxiv Papers Today in Methodology

##### #1. Clinical Prediction Models to Predict the Risk of Multiple Binary Outcomes: a comparison of approaches
###### Glen P. Martin, Matthew Sperrin, Kym I. E. Snell, Iain Buchan, Richard D. Riley
Clinical prediction models (CPMs) are used to predict clinically relevant outcomes or events. Typically, prognostic CPMs are derived to predict the risk of a single future outcome. However, with rising emphasis on the prediction of multi-morbidity, there is growing need for CPMs to simultaneously predict risks for each of multiple future outcomes. A common approach to multi-outcome risk prediction is to derive a CPM for each outcome separately, then multiply the predicted risks. This approach is only valid if the outcomes are conditionally independent given the covariates, and it fails to exploit the potential relationships between the outcomes. This paper outlines several approaches that could be used to develop prognostic CPMs for multiple outcomes. We consider four methods, ranging in complexity and assumed conditional independence assumptions: namely, probabilistic classifier chain, multinomial logistic regression, multivariate logistic regression, and a Bayesian probit model. These are compared with methods that rely on...
more | pdf | html
None.
###### Tweets
MatthewSperrin: Preprint: Clinical Prediction Models to Predict the Risk of Multiple Binary Outcomes: a comparison of approaches https://t.co/mbmdDlMMzO work with @glen_martin1 @Richard_D_Riley @Kym_Snell and @profbuchan - comments welcome!
StatsPapers: Clinical Prediction Models to Predict the Risk of Multiple Binary Outcomes: a comparison of approaches. https://t.co/zyQT62VyBP
None.
None.
###### Other stats
Sample Sizes : None.
Authors: 5
Total Words: 0
Unqiue Words: 0

##### #2. Bayesian inference for treatment effects under nested subsets of controls
###### Spencer Woody, Carlos M. Carvalho, Jared S. Murray
When constructing a model to estimate the causal effect of a treatment, it is necessary to control for other factors which may have confounding effects. Because the ignorability assumption is not testable, however, it is usually unclear which set of controls is appropriate, and effect estimation is generally sensitive to this choice. A common approach in this case is to fit several models, each with a different set of controls, but it is difficult to reconcile inference under the multiple resulting posterior distributions for the treatment effect. Therefore we propose a two-stage approach to measure the sensitivity of effect estimation with respect to control specification. In the first stage, a model is fit with all available controls using a prior carefully selected to adjust for confounding. In the second stage, posterior distributions are calculated for the treatment effect under nested sets of controls by propagating posterior uncertainty in the original model. We demonstrate how our approach can be used to detect the most...
more | pdf | html
None.
###### Tweets
StatsPapers: Bayesian inference for treatment effects under nested subsets of controls. https://t.co/PTcwQhNOkK
None.
None.
###### Other stats
Sample Sizes : None.
Authors: 3
Total Words: 0
Unqiue Words: 0

##### #3. A Monte Carlo EM Algorithm for the Parameter Estimation of Aggregated Hawkes Processes
###### Leigh Shlomovich, Edward Cohen, Niall Adams, Lekha Patel
A key difficulty that arises from real event data is imprecision in the recording of event time-stamps. In many cases, retaining event times with a high precision is expensive due to the sheer volume of activity. Combined with practical limits on the accuracy of measurements, aggregated data is common. In order to use point processes to model such event data, tools for handling parameter estimation are essential. Here we consider parameter estimation of the Hawkes process, a type of self-exciting point process that has found application in the modeling of financial stock markets, earthquakes and social media cascades. We develop a novel optimization approach to parameter estimation of aggregated Hawkes processes using a Monte Carlo Expectation-Maximization (MC-EM) algorithm. Through a detailed simulation study, we demonstrate that existing methods are capable of producing severely biased and highly variable parameter estimates and that our novel MC-EM method significantly outperforms them in all studied circumstances. These...
more | pdf | html
None.
###### Tweets
eakcohen: New preprint from Leigh, another one of my brilliant PhD students. Understanding the repercussions of aggregating/binning event data and developing methodologies for handling data of this type is one of the areas my group is currently interested in https://t.co/LjD278fpiU https://t.co/s8ZVvbRdOZ
StatsPapers: A Monte Carlo EM Algorithm for the Parameter Estimation of Aggregated Hawkes Processes. https://t.co/5P48bpfIcR
None.
None.
###### Other stats
Sample Sizes : None.
Authors: 4
Total Words: 0
Unqiue Words: 0

##### #4. Non-linear Mediation Analysis with High-dimensional Mediators whose Causal Structure is Unknown
###### Wen Wei Loh, Beatrijs Moerkerke, Tom Loeys, Stijn Vansteelandt
With multiple potential mediators on the causal pathway from a treatment to an outcome, we consider the problem of decomposing the effects along multiple possible causal path(s) through each distinct mediator. Under Pearl's path-specific effects framework (Pearl, 2001; Avin et al., 2005), such fine-grained decompositions necessitate stringent assumptions, such as correctly specifying the causal structure among the mediators, and there being no unobserved confounding among the mediators. In contrast, interventional direct and indirect effects for multiple mediators (Vansteelandt and Daniel, 2017) can be identified under much weaker conditions, while providing scientifically relevant causal interpretations. Nonetheless, current estimation approaches require (correctly) specifying a model for the joint mediator distribution, which can be difficult when there is a high-dimensional set of possibly continuous and non-continuous mediators. In this article, we avoid the need for modeling this distribution, by building on a definition...
more | pdf | html
None.
###### Tweets
StatsPapers: Non-linear Mediation Analysis with High-dimensional Mediators whose Causal Structure is Unknown. https://t.co/35VbZZlRJt
None.
None.
###### Other stats
Sample Sizes : None.
Authors: 4
Total Words: 0
Unqiue Words: 0

##### #5. Understanding the stochastic partial differential equation approach to smoothing
###### David L Miller, Richard Glennie, Andrew E Seaton
Correlation and smoothness are terms used to describe a wide variety of random quantities. In time, space, and many other domains, they both imply the same idea: quantities that occur closer together are more similar than those further apart. Two popular statistical models that represent this idea are basis-penalty smoothers (Wood, 2017) and stochastic partial differential equations (SPDE) (Lindgren et al., 2011). In this paper, we discuss how the SPDE can be interpreted as a smoothing penalty and can be fitted using the R package mgcv, allowing practitioners with existing knowledge of smoothing penalties to better understand the implementation and theory behind the SPDE approach.
more | pdf | html
None.
###### Tweets
StatsPapers: Understanding the stochastic partial differential equation approach to smoothing. https://t.co/hc2lM27uK4
None.
None.
###### Other stats
Sample Sizes : None.
Authors: 3
Total Words: 7608
Unqiue Words: 2248

##### #6. Maximum Likelihood Estimation of Spatially Varying Coefficient Models for Large Data with an Application to Real Estate Price Prediction
###### Jakob A. Dambon, Fabio Sigrist, Reinhard Furrer
In regression models for spatial data, it is often assumed that the marginal effects of covariates on the response are constant over space. In practice, this assumption might often be questionable. In this article, we show how a Gaussian process-based spatially varying coefficient (SVC) model can be estimated using maximum likelihood estimation (MLE). In addition, we present an approach that scales to large data by applying covariance tapering. We compare our methodology to existing methods such as a Bayesian approach using the stochastic partial differential equation (SPDE) link, geographically weighted regression (GWR), and eigenvector spatial filtering (ESF) in both a simulation study and an application where the goal is to predict prices of real estate apartments in Switzerland. The results from both the simulation study and application show that the MLE approach results in increased predictive accuracy and more precise estimates. Since we use a model-based approach, we can also provide predictive variances. In contrast to...
more | pdf | html
None.
###### Tweets
StatsPapers: Maximum Likelihood Estimation of Spatially Varying Coefficient Models for Large Data with an Application to Real Estate Price Prediction. https://t.co/M6P301Bzuh
None.
None.
###### Other stats
Sample Sizes : None.
Authors: 3
Total Words: 0
Unqiue Words: 0

##### #7. A numerically stable algorithm for integrating Bayesian models using Markov melding
###### Andrew A. Manderson, Robert J. B. Goudie
When statistical analyses consider multiple data sources, Markov melding provides a method for combining the source-specific Bayesian models. Models often contain different quantities of information due to variation in the richness of model-specific data, or availability of model-specific prior information. We show that this can make the multi-stage Markov chain Monte Carlo sampler employed by Markov melding unstable and unreliable. We propose a robust multi-stage algorithm that estimates the required prior marginal self-density ratios using weighted samples, dramatically improving accuracy in the tails of the distribution, thus stabilising the algorithm and providing reliable inference. We demonstrate our approach using an evidence synthesis for inferring HIV prevalence, and an evidence synthesis of A/H1N1 influenza.
more | pdf | html
None.
###### Tweets
StatsPapers: A numerically stable algorithm for integrating Bayesian models using Markov melding. https://t.co/WEeo0BzJGt
None.
None.
###### Other stats
Sample Sizes : None.
Authors: 2
Total Words: 0
Unqiue Words: 0

##### #8. A Deep Learning Algorithm for High-Dimensional Exploratory Item Factor Analysis
###### Christopher J. Urban, Daniel J. Bauer
Deep learning methods are the gold standard for non-linear statistical modeling in computer vision and in natural language processing but are rarely used in psychometrics. To bridge this gap, we present a novel deep learning algorithm for exploratory item factor analysis (IFA). Our approach combines a deep artificial neural network (ANN) model called a variational autoencoder (VAE) with recent work that uses regularization for exploratory factor analysis. We first provide overviews of ANNs and VAEs. We then describe how to conduct exploratory IFA with a VAE and demonstrate our approach in two empirical examples and in two simulated examples. Our empirical results were consistent with existing psychological theory across random starting values. Our simulations suggest that the VAE consistently recovers the data generating factor pattern with moderate-sized samples. Secondary loadings were underestimated with a complex factor structure and intercept parameter estimates were moderately biased with both simple and complex...
more | pdf | html
None.
###### Tweets
BrundageBot: A Deep Learning Algorithm for High-Dimensional Exploratory Item Factor Analysis. Christopher J. Urban and Daniel J. Bauer https://t.co/wlesX9lrXX
arxiv_cs_LG: A Deep Learning Algorithm for High-Dimensional Exploratory Item Factor Analysis. Christopher J. Urban and Daniel J. Bauer https://t.co/AWvFMsALQy
StatsPapers: A Deep Learning Algorithm for High-Dimensional Exploratory Item Factor Analysis. https://t.co/B4tvtnYArZ
None.
None.
###### Other stats
Sample Sizes : None.
Authors: 2
Total Words: 0
Unqiue Words: 0

##### #9. Knockoffs with Side Information
###### Zhimei Ren, Emmanuel Candès
We consider the problem of assessing the importance of multiple variables or factors from a dataset when side information is available. In principle, using side information can allow the statistician to pay attention to variables with a greater potential, which in turn, may lead to more discoveries. We introduce an adaptive knockoff filter, which generalizes the knockoff procedure (Barber and Cand\es, 2015; Cand\es et al., 2018) in that it uses both the data at hand and side information to adaptively order the variables under study and focus on those that are most promising. Adaptive knockoffs controls the finite-sample false discovery rate (FDR) and we demonstrate its power by comparing it with other structured multiple testing methods. We also apply our methodology to real genetic data in order to find associations between genetic variants and various phenotypes such as Crohn's disease and lipid levels. Here, adaptive knockoffs makes more discoveries than reported in previous studies on the same datasets.
more | pdf | html
None.
###### Tweets
StatsPapers: Knockoffs with Side Information. https://t.co/HZ2OylEJ01
None.
None.
###### Other stats
Sample Sizes : None.
Authors: 2
Total Words: 0
Unqiue Words: 0

Assert is a website where the best academic papers on arXiv (computer science, math, physics), bioRxiv (biology), BITSS (reproducibility), EarthArXiv (earth science), engrXiv (engineering), LawArXiv (law), PsyArXiv (psychology), SocArXiv (social science), and SportRxiv (sport research) bubble to the top each day.

Papers are scored (in real-time) based on how verifiable they are (as determined by their Github repos) and how interesting they are (based on Twitter).

To see top papers, follow us on twitter @assertpub_ (arXiv), @assert_pub (bioRxiv), and @assertpub_dev (everything else).

To see beautiful figures extracted from papers, follow us on Instagram.

Tracking 257,103 papers.

###### Search
Sort results based on if they are interesting or reproducible.
Interesting
Reproducible
Online
###### Stats
Tracking 257,103 papers.