Top 10 Arxiv Papers Today in Methodology


2.027 Mikeys
#1. A causal inference framework for cancer cluster investigations using publicly available data
Rachel C. Nethery, Yue Yang, Anna J. Brown, Francesca Dominici
Often, a community becomes alarmed when high rates of cancer are noticed, and residents suspect that the cancer cases could be caused by a known source of hazard. In response, the CDC recommends that departments of health perform a standardized incidence ratio (SIR) analysis to determine whether the observed cancer incidence is higher than expected. This approach has several limitations that are well documented in the literature. In this paper we propose a novel causal inference approach to cancer cluster investigations, rooted in the potential outcomes framework. Assuming that a source of hazard representing a potential cause of increased cancer rates in the community is identified a priori, we introduce a new estimand called the causal SIR (cSIR). The cSIR is a ratio defined as the expected cancer incidence in the exposed population divided by the expected cancer incidence under the (counterfactual) scenario of no exposure. To estimate the cSIR we need to overcome two main challenges: 1) identify unexposed populations that are...
more | pdf | html
Figures
None.
Tweets
StatsPapers: A causal inference framework for cancer cluster investigations using publicly available data. https://t.co/hrCpwGRhNY
dizzy_my_future: RT @StatsPapers: A causal inference framework for cancer cluster investigations using publicly available data. https://t.co/hrCpwGRhNY
Github
None.
Youtube
None.
Other stats
Sample Sizes : None.
Authors: 4
Total Words: 11373
Unqiue Words: 2763

2.01 Mikeys
#2. Sparsely Observed Functional Time Series: Estimation and Prediction
Tomáš Rubín, Victor M. Panaretos
Functional time series analysis, whether based on time of frequency domain methodology, has traditionally been carried out under the assumption of complete observation of the constituent series of curves, assumed stationary. Nevertheless, as is often the case with independent functional data, it may well happen that the data available to the analyst are not the actual sequence of curves, but relatively few and noisy measurements per curve, potentially at different locations in each curve's domain. Under this sparse sampling regime, neither the established estimators of the time series' dynamics, nor their corresponding theoretical analysis will apply. The subject of this paper is to tackle the problem of estimating the dynamics and of recovering the latent process of smooth curves in the sparse regime. Assuming smoothness of the latent curves, we construct a consistent nonparametric estimator of the series' spectral density operator and use it develop a frequency-domain recovery approach, that predicts the latent curve at a given...
more | pdf | html
Figures
None.
Tweets
StatsPapers: Sparsely Observed Functional Time Series: Estimation and Prediction. https://t.co/zSi6YBbZ6c
Github
None.
Youtube
None.
Other stats
Sample Sizes : None.
Authors: 2
Total Words: 23675
Unqiue Words: 4165

2.01 Mikeys
#3. Minimax Posterior Convergence Rates and Model Selection Consistency in High-dimensional DAG Models based on Sparse Cholesky Factors
Kyoungjae Lee, Jaeyong Lee, Lizhen Lin
In this paper, we study the high-dimensional sparse directed acyclic graph (DAG) models under the empirical sparse Cholesky prior. Among our results, strong model selection consistency or graph selection consistency is obtained under more general conditions than those in the existing literature. Compared to Cao, Khare and Ghosh (2017), the required conditions are weakened in terms of the dimensionality, sparsity and lower bound of the nonzero elements in the Cholesky factor. Furthermore, our result does not require the irrepresentable condition, which is necessary for Lasso type methods. We also derive the posterior convergence rates for precision matrices and Cholesky factors with respect to various matrix norms. The obtained posterior convergence rates are the fastest among those of the existing Bayesian approaches. In particular, we prove that our posterior convergence rates for Cholesky factors are the minimax or at least nearly minimax depending on the relative size of true sparseness for the entire dimension. The simulation...
more | pdf | html
Figures
None.
Tweets
StatsPapers: Minimax Posterior Convergence Rates and Model Selection Consistency in High-dimensional DAG Models based on Sparse Cholesky Factors. https://t.co/T0AjMd97lK
Github
None.
Youtube
None.
Other stats
Sample Sizes : None.
Authors: 3
Total Words: 21594
Unqiue Words: 3237

2.01 Mikeys
#4. Quantile Regression Modeling of Recurrent Event Risk
Huijuan Ma, Limin Peng, Chiung-Yu Huang, Haoda Fu
Progression of chronic disease is often manifested by repeated occurrences of disease-related events over time. Delineating the heterogeneity in the risk of such recurrent events can provide valuable scientific insight for guiding customized disease management. In this paper, we present a new modeling framework for recurrent event data, which renders a flexible and robust characterization of individual multiplicative risk of recurrent event through quantile regression that accommodates both observed covariates and unobservable frailty. The proposed modeling requires no distributional specification of the unobservable frailty, while permitting the exploration of dynamic covariate effects. We develop estimation and inference procedures for the proposed model through a novel adaptation of the principle of conditional score. The asymptotic properties of the proposed estimator, including the uniform consistency and weak convergence, are established. Extensive simulation studies demonstrate satisfactory finite-sample performance of the...
more | pdf | html
Figures
None.
Tweets
StatsPapers: Quantile Regression Modeling of Recurrent Event Risk. https://t.co/Gjtsxzkswj
Github
None.
Youtube
None.
Other stats
Sample Sizes : None.
Authors: 4
Total Words: 12999
Unqiue Words: 2592

2.001 Mikeys
#5. What is really needed to justify ignoring the response mechanism for modelling purposes?
John C Galati
With incomplete data, the standard argument for when the response mechanism can be ignored for modelling Purposes requires that realised Missing at Random (MAR) holds for each density in the model and that distinctness of parameters holds for the model's parameter space. We explain why the distinctness of parameters criterion is too general because it allows the validity of an analysis to be determined by a factor different from any of (i) the observed data, (ii) the likelihood used to analyse the data and (iii) the analyst's assumptions about the underlying data generation process. We further explain why realised MAR alone, when applied appropriately, provides sufficient justification for ignoring the response mechanism when making direct likelihood inferences from incomplete data.
more | pdf | html
Figures
None.
Tweets
StatsPapers: What is really needed to justify ignoring the response mechanism for modelling purposes?. https://t.co/VqRb5qevHy
Github
None.
Youtube
None.
Other stats
Sample Sizes : None.
Authors: 1
Total Words: 1833
Unqiue Words: 571

2.001 Mikeys
#6. An Overview of Semiparametric Extensions of Finite Mixture Models
Sijia Xiang, Weixin Yao, Guangren Yang
Finite mixture models have been a very important tool for exploring complex data structures in many scientific areas, for example, economics, epidemiology, finance. In the past decade, semiparametric techniques have been popularly introduced into traditional finite mixture models, and so semiparametric mixture models have experienced exciting development in methodologies, theories and applications. In this article, we provide a selective overview of newly-developed semiparametric mixture models, discuss their estimation methodologies, theoretical properties if applied, and some open questions. Recent developments and some open questions are also discussed.
more | pdf | html
Figures
None.
Tweets
StatsPapers: An Overview of Semiparametric Extensions of Finite Mixture Models. https://t.co/rRwtcMMQWG
Github
None.
Youtube
None.
Other stats
Sample Sizes : None.
Authors: 3
Total Words: 10701
Unqiue Words: 2606

2.0 Mikeys
#7. The Augmented Synthetic Control Method
Eli Ben-Michael, Avi Feller, Jesse Rothstein
The synthetic control method (SCM) is a popular approach for estimating the impact of a treatment on a single unit in panel data settings. The "synthetic control" is a weighted average of control units that balances the treated unit's pre-treatment outcomes as closely as possible. The curse of dimensionality, however, means that SCM does not generally achieve exact balance, which can bias the SCM estimate. We propose an extension, Augmented SCM, which uses an outcome model to estimate the bias due to covariate imbalance and then de-biases the original SCM estimate, analogous to bias correction for inexact matching. We motivate this approach by showing that SCM is a (regularized) inverse propensity score weighting estimator, with pre-treatment outcomes as covariates and a ridge penalty on the propensity score coefficients. We give theoretical guarantees for specific cases and propose a new inference procedure. We demonstrate gains from Augmented SCM with extensive simulation studies and apply this framework to canonical...
more | pdf | html
Figures
None.
Tweets
F_Bethke: New paper explaining the "Augmented Synthetic Control Method" by Ben-Michael et al. Also provides new augsynth #rstats package. https://t.co/2Mok5we0bQ #DataScience #Econometrics
StatsPapers: The Augmented Synthetic Control Method. https://t.co/L1Yigm3QJV
econometriclub: RT @F_Bethke: New paper explaining the "Augmented Synthetic Control Method" by Ben-Michael et al. Also provides new augsynth #rstats package. https://t.co/HrNqJVLnN7 #DataScience #Econometrics
afranks53: RT @StatsPapers: The Augmented Synthetic Control Method. https://t.co/L1Yigm3QJV
lihua_lei_stat: RT @StatsPapers: The Augmented Synthetic Control Method. https://t.co/L1Yigm3QJV
ehkonomulka: RT @StatsPapers: The Augmented Synthetic Control Method. https://t.co/L1Yigm3QJV
Github

Augmented Synthetic Control Method

Repository: augsynth
User: ebenmichael
Language: R
Stargazers: 2
Subscribers: 0
Forks: 0
Open Issues: 0
Youtube
None.
Other stats
Sample Sizes : None.
Authors: 3
Total Words: 19346
Unqiue Words: 3589

1.972 Mikeys
#8. Kernel Smoothing of the Treatment Effect CDF
Jonathan Levy, Mark van der Laan
We provide a CV-TMLE estimator for a kernel smoothed version of the cumulative distribution of the random variable giving the treatment effect or so-called blip for a randomly drawn individual. We must first assume the treatment effect or so-called blip distribution is continuous. We then derive the efficient influence curve of the kernel smoothed version of the blip CDF. Our CV-TMLE estimator is asymptotically efficient under two conditions, one of which involves a second order remainder term which, in this case, shows us that knowledge of the treatment mechanism does not guarantee a consistent estimate. The remainder term also teaches us exactly how well we need to estimate the nuisance parameters to guarantee asymptotic efficiency. Through simulations we verify theoretical properties of the estimator and show the importance of machine learning over conventional regression approaches to fitting the nuisance parameters. We also derive the bias and variance of the estimator, the orders of which are analogous to a kernel density...
more | pdf | html
Figures
Tweets
Github
None.
Youtube
None.
Other stats
Sample Sizes : None.
Authors: 2
Total Words: 5583
Unqiue Words: 1559

0.0 Mikeys
#9. Model selection by minimum description length: Lower-bound sample sizes for the Fisher information approximation
Daniel W. Heck, Morten Moshagen, Edgar Erdfelder
The Fisher information approximation (FIA) is an implementation of the minimum description length principle for model selection. Unlike information criteria such as AIC or BIC, it has the advantage of taking the functional form of a model into account. Unfortunately, FIA can be misleading in finite samples, resulting in an inversion of the correct rank order of complexity terms for competing models in the worst case. As a remedy, we propose a lower-bound $N'$ for the sample size that suffices to preclude such errors. We illustrate the approach using three examples from the family of multinomial processing tree models.
more | pdf | html
Figures
None.
Tweets
StatsPapers: Model selection by minimum description length: Lower-bound sample sizes for the Fisher information approximation. https://t.co/GFHdVXszYi
Github
None.
Youtube
None.
Other stats
Sample Sizes : None.
Authors: 3
Total Words: 4534
Unqiue Words: 1352

0.0 Mikeys
#10. Improving Safety of the Continual Reassessment Method via a Modified Allocation Rule
Pavel Mozgunov, Thomas Jaki
This paper proposes a novel criterion for the allocation of patients in Phase~I dose-escalation clinical trials aiming to find the maximum tolerated dose (MTD). Conventionally, using a model-based approach the next patient is allocated to the dose with the toxicity estimate closest (in terms of the absolute or squared distance) to the maximum acceptable toxicity. This approach, however, ignores the uncertainty in point estimates and ethical concerns of assigning a lot of patients to overly toxic doses. Motivated by recent discussions in the theory of estimation in restricted parameter spaces, we propose a criterion which accounts for both of these issues. The criterion requires a specification of one additional parameter only which has a simple and intuitive interpretation. We incorporate the proposed criterion into the one-parameter Bayesian continual reassessment method (CRM) and show, using simulations, that it results in the same proportion of correct selections on average as the original design, but in fewer mean number of...
more | pdf | html
Figures
None.
Tweets
StatsPapers: Improving Safety of the Continual Reassessment Method via a Modified Allocation Rule. https://t.co/ZhXd1pgLtz
Github
None.
Youtube
None.
Other stats
Sample Sizes : None.
Authors: 2
Total Words: 8371
Unqiue Words: 2099

About

Assert is a website where the best academic papers on arXiv (computer science, math, physics), bioRxiv (biology), BITSS (reproducibility), EarthArXiv (earth science), engrXiv (engineering), LawArXiv (law), PsyArXiv (psychology), SocArXiv (social science), and SportRxiv (sport research) bubble to the top each day.

Papers are scored (in real-time) based on how verifiable they are (as determined by their Github repos) and how interesting they are (based on Twitter).

To see top papers, follow us on twitter @assertpub_ (arXiv), @assert_pub (bioRxiv), and @assertpub_dev (everything else).

To see beautiful figures extracted from papers, follow us on Instagram.

Tracking 57,756 papers.

Search
Sort results based on if they are interesting or reproducible.
Interesting
Reproducible
Categories
All
Astrophysics
Cosmology and Nongalactic Astrophysics
Earth and Planetary Astrophysics
Astrophysics of Galaxies
High Energy Astrophysical Phenomena
Instrumentation and Methods for Astrophysics
Solar and Stellar Astrophysics
Condensed Matter
Disordered Systems and Neural Networks
Mesoscale and Nanoscale Physics
Materials Science
Other Condensed Matter
Quantum Gases
Soft Condensed Matter
Statistical Mechanics
Strongly Correlated Electrons
Superconductivity
Computer Science
Artificial Intelligence
Hardware Architecture
Computational Complexity
Computational Engineering, Finance, and Science
Computational Geometry
Computation and Language
Cryptography and Security
Computer Vision and Pattern Recognition
Computers and Society
Databases
Distributed, Parallel, and Cluster Computing
Digital Libraries
Discrete Mathematics
Data Structures and Algorithms
Emerging Technologies
Formal Languages and Automata Theory
General Literature
Graphics
Computer Science and Game Theory
Human-Computer Interaction
Information Retrieval
Information Theory
Machine Learning
Logic in Computer Science
Multiagent Systems
Multimedia
Mathematical Software
Numerical Analysis
Neural and Evolutionary Computing
Networking and Internet Architecture
Other Computer Science
Operating Systems
Performance
Programming Languages
Robotics
Symbolic Computation
Sound
Software Engineering
Social and Information Networks
Systems and Control
Economics
Econometrics
General Economics
Theoretical Economics
Electrical Engineering and Systems Science
Audio and Speech Processing
Image and Video Processing
Signal Processing
General Relativity and Quantum Cosmology
General Relativity and Quantum Cosmology
High Energy Physics - Experiment
High Energy Physics - Experiment
High Energy Physics - Lattice
High Energy Physics - Lattice
High Energy Physics - Phenomenology
High Energy Physics - Phenomenology
High Energy Physics - Theory
High Energy Physics - Theory
Mathematics
Commutative Algebra
Algebraic Geometry
Analysis of PDEs
Algebraic Topology
Classical Analysis and ODEs
Combinatorics
Category Theory
Complex Variables
Differential Geometry
Dynamical Systems
Functional Analysis
General Mathematics
General Topology
Group Theory
Geometric Topology
History and Overview
Information Theory
K-Theory and Homology
Logic
Metric Geometry
Mathematical Physics
Numerical Analysis
Number Theory
Operator Algebras
Optimization and Control
Probability
Quantum Algebra
Rings and Algebras
Representation Theory
Symplectic Geometry
Spectral Theory
Statistics Theory
Mathematical Physics
Mathematical Physics
Nonlinear Sciences
Adaptation and Self-Organizing Systems
Chaotic Dynamics
Cellular Automata and Lattice Gases
Pattern Formation and Solitons
Exactly Solvable and Integrable Systems
Nuclear Experiment
Nuclear Experiment
Nuclear Theory
Nuclear Theory
Physics
Accelerator Physics
Atmospheric and Oceanic Physics
Applied Physics
Atomic and Molecular Clusters
Atomic Physics
Biological Physics
Chemical Physics
Classical Physics
Computational Physics
Data Analysis, Statistics and Probability
Physics Education
Fluid Dynamics
General Physics
Geophysics
History and Philosophy of Physics
Instrumentation and Detectors
Medical Physics
Optics
Plasma Physics
Popular Physics
Physics and Society
Space Physics
Quantitative Biology
Biomolecules
Cell Behavior
Genomics
Molecular Networks
Neurons and Cognition
Other Quantitative Biology
Populations and Evolution
Quantitative Methods
Subcellular Processes
Tissues and Organs
Quantitative Finance
Computational Finance
Economics
General Finance
Mathematical Finance
Portfolio Management
Pricing of Securities
Risk Management
Statistical Finance
Trading and Market Microstructure
Quantum Physics
Quantum Physics
Statistics
Applications
Computation
Methodology
Machine Learning
Other Statistics
Statistics Theory
Feedback
Online
Stats
Tracking 57,756 papers.