Top 10 Arxiv Papers Today in Methodology


2.04 Mikeys
#1. Semantic and Cognitive Tools to Aid Statistical Inference: Replace Confidence and Significance by Compatibility and Surprise
Zad R. Chow, Sander Greenland
Researchers often misinterpret and misrepresent statistical outputs. This abuse has led to a large literature on modification or replacement of testing thresholds and P-values with confidence intervals, Bayes factors, and other devices. Because the core problems appear cognitive rather than statistical, we review some simple proposals to aid researchers in interpreting statistical outputs. These proposals emphasize logical and information concepts over probability, and thus may be more robust to common misinterpretations than are traditional descriptions. The latter treat statistics as referring to targeted hypotheses conditional on background assumptions. In contrast, we advise reinterpretation of P-values and interval estimates in unconditional terms, in which they describe compatibility of data with the entire set of analysis assumptions. We use the Shannon transform of the P-value $p$, also known as the surprisal or S-value $s=-log(p)$, to provide a measure of the information supplied by the testing procedure against these...
more | pdf | html
Figures
None.
Tweets
dailyzad: Also, S-values are one of the topics we do a deep dive on in the first of our recent pair of papers about improving statistical interpretations https://t.co/Cyv3u2knX7 https://t.co/C9tb8NnEjT
dailyzad: In paper 1, we discuss: - A comprehensive discussion of P-value issues and their reconciliation with S-values - Testing alternatives rather than just the null - Graphical functions/tables to present alternative results 2/7 https://t.co/Cyv3u2knX7 https://t.co/EBWhPPpVpG
dailyzad: THREAD Happy to say that two papers by @Lester_Domes and I, on how we all can improve statistical teaching, reviewing, and practice via cognitive/semantic tools are up on arXiv 1/7 1: https://t.co/Cyv3u2knX7 2: https://t.co/la8HpJXmMr #statstwitter #epitwitter #datascience https://t.co/ghtZCXITZm
rtorkar: Very nice paper by Zad R. Chow and @Lester_Domes where they discuss the S-value, i.e., S=-log(p). I will introduce this in a course since I believe it conceptually makes the whole p-value thingy easier to understand! https://t.co/fZzXihrxs1 Quotes: 1/3
Mr1Paleo: #RT @PaleoFoundation: RT @dailyzad: THREAD Happy to say that two papers by @Lester_Domes and I, on how we all can improve statistical teaching, reviewing, and practice via cognitive/semantic tools are up on arXiv 1/7 1: https://t.co/LNd26oVyfx 2: … https://t.co/BoT9uaNb1L
BioPapers: Semantic and Cognitive Tools to Aid Statistical Inference: Replace Confidence and Significance by Compatibility and Surprise. https://t.co/T6zGsmOSaV
Github
None.
Youtube
None.
Other stats
Sample Sizes : None.
Authors: 2
Total Words: 8692
Unqiue Words: 3076

2.027 Mikeys
#2. Bayesian Analysis of Multidimensional Functional Data
John Shamshoian, Damla Senturk, Shafali Jeste, Donatello Telesca
Multi-dimensional functional data arises in numerous modern scientific experimental and observational studies. In this paper we focus on longitudinal functional data, a structured form of multidimensional functional data. Operating within a longitudinal functional framework we aim to capture low dimensional interpretable features. We propose a computationally efficient nonparametric Bayesian method to simultaneously smooth observed data, estimate conditional functional means and functional covariance surfaces. Statistical inference is based on Monte Carlo samples from the posterior measure through adaptive blocked Gibbs sampling. Several operative characteristics associated with the proposed modeling framework are assessed comparatively in a simulated environment. We illustrate the application of our work in two case studies. The first case study involves age-specific fertility collected over time for various countries. The second case study is an implicit learning experiment in children with Autism Spectrum Disorder (ASD).
more | pdf | html
Figures
None.
Tweets
StatsPapers: Bayesian Analysis of Multidimensional Functional Data. https://t.co/VVRfsekO2H
389jan: RT @StatsPapers: Bayesian Analysis of Multidimensional Functional Data. https://t.co/VVRfsekO2H
Github
None.
Youtube
None.
Other stats
Sample Sizes : None.
Authors: 4
Total Words: 0
Unqiue Words: 0

2.009 Mikeys
#3. To Aid Statistical Inference, Emphasize Unconditional Descriptions of Statistics
Sander Greenland, Zad R. Chow
We have elsewhere reviewed proposals to reform terminology and improve interpretations of conventional statistics by emphasizing logical and information concepts over probability concepts. We here give detailed reasons and methods for reinterpreting statistics (including but not limited to) P-values and interval estimates in unconditional terms, which describe compatibility of observations with an entire set of analysis assumptions, rather than just a narrow target hypothesis. Such reinterpretations help avoid overconfident inferences whenever there is uncertainty about the assumptions used to derive and compute the statistical results. Examples of such assumptions include not only standard statistical modeling assumptions, but also assumptions about absence of systematic errors, protocol violations, and data corruption. Unconditional descriptions introduce uncertainty about such assumptions directly into statistical presentations of results, rather than leaving that only to the informal discussion that ensues. We thus...
more | pdf | html
Figures
None.
Tweets
dailyzad: In paper 2: - Why unconditional interpretations of statistics need to be emphasized - Why terminology change is needed for reform - How discussion needs to move on to decisions and their costs 3/7 https://t.co/la8HpJXmMr https://t.co/nuQErXTwja
BioPapers: To Aid Statistical Inference, Emphasize Unconditional Descriptions of Statistics. https://t.co/vJ9a7bGSBK
Github
None.
Youtube
None.
Other stats
Sample Sizes : None.
Authors: 2
Total Words: 0
Unqiue Words: 0

2.001 Mikeys
#4. Conditional Information and Inference in Response-Adaptive Allocation Designs
Adam Lane
Response-adaptive allocation designs refer to a class of designs where the probability an observation is assigned to a treatment is changed throughout an experiment based on the accrued responses. Such procedures result in random treatment sample sizes. Most of the current literature considers unconditional inference procedures in the analysis of response-adaptive allocation designs. The focus of this work is inference conditional on the observed treatment sample sizes. The inverse of information is a description of the large sample variance of the parameter estimates. A simple form for the conditional information relative to unconditional information is derived. It is found that conditional information can be greater than unconditional information. It is also shown that the variance of the conditional maximum likelihood estimate can be less than the variance of the unconditional maximum likelihood estimate. Finally, a conditional bootstrap procedure is developed that, in the majority of cases examined, resulted in narrower...
more | pdf | html
Figures
None.
Tweets
StatsPapers: Conditional Information and Inference in Response-Adaptive Allocation Designs. https://t.co/L4ckzwFtYR
Github
None.
Youtube
None.
Other stats
Sample Sizes : None.
Authors: 1
Total Words: 13161
Unqiue Words: 2676

2.001 Mikeys
#5. Regularization in Generalized Semiparametric Mixed-Effects Model for Longitudinal Data
M. Taavoni, M. Arashi
This paper considers the problem of simultaneous variable selection and estimation in the generalized semiparametric mixed-effects model for longitudinal data when the number of parameters diverges with the sample size. A penalization type of generalized estimating equation method is proposed while using regression spline to approximate the nonparametric component. Our approach applies SCAD to the estimating equation objective function in order to simultaneously estimate parameters and select the important variables. The proposed procedure involves the specification of the posterior distribution of the random effects, which cannot be evaluated in a closed form. However, it is possible to approximate this posterior distribution by producing random draws from the distribution using a Metropolis algorithm, which does not require the specification of the posterior distribution. For practical implementation, we develop an appropriate iterative algorithm to select the significant variables and estimate the nonzero coefficient functions....
more | pdf | html
Figures
None.
Tweets
StatsPapers: Regularization in Generalized Semiparametric Mixed-Effects Model for Longitudinal Data. https://t.co/cIHT1p54tZ
Github
None.
Youtube
None.
Other stats
Sample Sizes : [50, 100]
Authors: 2
Total Words: 11317
Unqiue Words: 2821

2.001 Mikeys
#6. Interpretable Principal Components Analysis for Multilevel Multivariate Functional Data, with Application to EEG Experiments
Jun Zhang, Greg J Siegle, Wendy D'Andrea, Robert T Krafty
Many studies collect functional data from multiple subjects that have both multilevel and multivariate structures. An example of such data comes from popular neuroscience experiments where participants' brain activity is recorded using modalities such as EEG and summarized as power within multiple time-varying frequency bands within multiple electrodes, or brain regions. Summarizing the joint variation across multiple frequency bands for both whole-brain variability between subjects, as well as location-variation within subjects, can help to explain neural reactions to stimuli. This article introduces a novel approach to conducting interpretable principal components analysis on multilevel multivariate functional data that decomposes total variation into subject-level and replicate-within-subject-level (i.e. electrode-level) variation, and provides interpretable components that can be both sparse among variates (e.g. frequency bands) and have localized support over time within each frequency band. The sparsity and localization...
more | pdf | html
Figures
None.
Tweets
StatsPapers: Interpretable Principal Components Analysis for Multilevel Multivariate Functional Data, with Application to EEG Experiments. https://t.co/WyaZXgjsBt
Github
None.
Youtube
None.
Other stats
Sample Sizes : None.
Authors: 4
Total Words: 0
Unqiue Words: 0

2.0 Mikeys
#7. Improved Semiparametric Analysis of Polygenic Gene-Environment Interactions in Case-Control Studies
Tianying Wang, Alex Asher
Standard logistic regression analysis of case-control data has low power to detect gene-environment interactions, but until recently it was the only method that could be used on complex polygenic data for which parametric distributional models are not feasible. Under the assumption of gene-environment independence in the underlying population, Stalder et al. (2017, Biometrika, 104, 801-812) developed a retrospective method that treats both genetic and environmental variables nonparametrically. However, the mathematical symmetry of genetic and environmental variables is overlooked. We propose an improvement to the method of Stalder et al. (2017) that increases the efficiency of the estimates with no additional assumptions and modest computational cost. This improvement is achieved by treating the genetic and environmental variables symmetrically to generate two sets of parameter estimates that are combined to generate a more efficient estimate. We employ a semiparametric framework to develop the asymptotic theory of the estimator,...
more | pdf | html
Figures
None.
Tweets
Github

R package caseControlGE

Repository: caseControlGE
User: alexasher
Language: R
Stargazers: 0
Subscribers: 1
Forks: 0
Open Issues: 0
Youtube
None.
Other stats
Sample Sizes : None.
Authors: 2
Total Words: 9089
Unqiue Words: 2283

1.997 Mikeys
#8. Multiclass classification of growth curves using random change points and heterogeneous random effects
Vincent Chin, Jarod Y. L. Lee, Louise M. Ryan, Robert Kohn, Scott A. Sisson
Faltering growth among children is a nutritional problem prevalent in low to medium income countries; it is generally defined as a slower rate of growth compared to a reference healthy population of the same age and gender. As faltering is closely associated with reduced physical, intellectual and economic productivity potential, it is important to identify faltered children and be able to characterise different growth patterns so that targeted treatments can be designed and administered. We introduce a multiclass classification model for growth trajectory that flexibly extends a current classification approach called the broken stick model, which is a piecewise linear model with breaks at fixed knot locations. Heterogeneity in growth patterns among children is captured using mixture distributed random effects, whereby the mixture components determine the classification of children into subgroups. The mixture distribution is modelled using a Dirichlet process prior, which avoids the need to choose the "true" number of mixture...
more | pdf | html
Figures
Tweets
Github
None.
Youtube
None.
Other stats
Sample Sizes : None.
Authors: 5
Total Words: 8532
Unqiue Words: 2449

1.997 Mikeys
#9. A goodness-of-fit test for the functional linear model with functional response
Eduardo García-Portugués, Javier Álvarez-Liébana, Gonzalo Álvarez-Pérez, Wenceslao González-Manteiga
The Functional Linear Model with Functional Response (FLMFR) is one of the most fundamental models to asses the relation between two functional random variables. In this paper, we propose a novel goodness-of-fit test for the FLMFR against a general, unspecified, alternative. The test statistic is formulated in terms of a Cram\'er-von Mises norm over a doubly-projected empirical process which, using geometrical arguments, yields an easy-to-compute weighted quadratic norm. A resampling procedure calibrates the test through a wild bootstrap on the residuals and the use convenient computational procedures. As a sideways contribution, and since the statistic requires from a reliable estimator of the FLMFR, we discuss and compare several regularized estimators, providing a new one specifically convenient for our test. The finite sample behavior of the test, regarding power and size, is illustrated via a complete simulation study. Also, the new proposal is compared with previous significance tests. Two novel real datasets illustrate the...
more | pdf | html
Figures
None.
Tweets
Github
None.
Youtube
None.
Other stats
Sample Sizes : None.
Authors: 4
Total Words: 0
Unqiue Words: 0

1.997 Mikeys
#10. Efficient and Robust Estimation of Linear Regression with Normal Errors
Alain Desgagné
Linear regression with normally distributed errors - including particular cases such as ANOVA, Student's t-test or location-scale inference - is a widely used statistical procedure. In this case the ordinary least squares estimator possesses remarkable properties but is very sensitive to outliers. Several robust alternatives have been proposed, but there is still significant room for improvement. This paper thus proposes an original method of estimation that offers the best efficiency simultaneously in the absence and the presence of outliers, both for the estimation of the regression coefficients and the scale parameter. The approach first consists in broadening the normal assumption of the errors to a mixture of the normal and the filtered-log-Pareto (FLP), an original distribution designed to represent the outliers. The expectation-maximization (EM) algorithm is then adapted and we obtain the N-FLP estimators of the regression coefficients, the scale parameter and the proportion of outliers, along with probabilities of each...
more | pdf | html
Figures
None.
Tweets
Github
None.
Youtube
None.
Other stats
Sample Sizes : None.
Authors: 1
Total Words: 8684
Unqiue Words: 1969

About

Assert is a website where the best academic papers on arXiv (computer science, math, physics), bioRxiv (biology), BITSS (reproducibility), EarthArXiv (earth science), engrXiv (engineering), LawArXiv (law), PsyArXiv (psychology), SocArXiv (social science), and SportRxiv (sport research) bubble to the top each day.

Papers are scored (in real-time) based on how verifiable they are (as determined by their Github repos) and how interesting they are (based on Twitter).

To see top papers, follow us on twitter @assertpub_ (arXiv), @assert_pub (bioRxiv), and @assertpub_dev (everything else).

To see beautiful figures extracted from papers, follow us on Instagram.

Tracking 192,930 papers.

Search
Sort results based on if they are interesting or reproducible.
Interesting
Reproducible
Categories
All
Astrophysics
Cosmology and Nongalactic Astrophysics
Earth and Planetary Astrophysics
Astrophysics of Galaxies
High Energy Astrophysical Phenomena
Instrumentation and Methods for Astrophysics
Solar and Stellar Astrophysics
Condensed Matter
Disordered Systems and Neural Networks
Mesoscale and Nanoscale Physics
Materials Science
Other Condensed Matter
Quantum Gases
Soft Condensed Matter
Statistical Mechanics
Strongly Correlated Electrons
Superconductivity
Computer Science
Artificial Intelligence
Hardware Architecture
Computational Complexity
Computational Engineering, Finance, and Science
Computational Geometry
Computation and Language
Cryptography and Security
Computer Vision and Pattern Recognition
Computers and Society
Databases
Distributed, Parallel, and Cluster Computing
Digital Libraries
Discrete Mathematics
Data Structures and Algorithms
Emerging Technologies
Formal Languages and Automata Theory
General Literature
Graphics
Computer Science and Game Theory
Human-Computer Interaction
Information Retrieval
Information Theory
Machine Learning
Logic in Computer Science
Multiagent Systems
Multimedia
Mathematical Software
Numerical Analysis
Neural and Evolutionary Computing
Networking and Internet Architecture
Other Computer Science
Operating Systems
Performance
Programming Languages
Robotics
Symbolic Computation
Sound
Software Engineering
Social and Information Networks
Systems and Control
Economics
Econometrics
General Economics
Theoretical Economics
Electrical Engineering and Systems Science
Audio and Speech Processing
Image and Video Processing
Signal Processing
General Relativity and Quantum Cosmology
General Relativity and Quantum Cosmology
High Energy Physics - Experiment
High Energy Physics - Experiment
High Energy Physics - Lattice
High Energy Physics - Lattice
High Energy Physics - Phenomenology
High Energy Physics - Phenomenology
High Energy Physics - Theory
High Energy Physics - Theory
Mathematics
Commutative Algebra
Algebraic Geometry
Analysis of PDEs
Algebraic Topology
Classical Analysis and ODEs
Combinatorics
Category Theory
Complex Variables
Differential Geometry
Dynamical Systems
Functional Analysis
General Mathematics
General Topology
Group Theory
Geometric Topology
History and Overview
Information Theory
K-Theory and Homology
Logic
Metric Geometry
Mathematical Physics
Numerical Analysis
Number Theory
Operator Algebras
Optimization and Control
Probability
Quantum Algebra
Rings and Algebras
Representation Theory
Symplectic Geometry
Spectral Theory
Statistics Theory
Mathematical Physics
Mathematical Physics
Nonlinear Sciences
Adaptation and Self-Organizing Systems
Chaotic Dynamics
Cellular Automata and Lattice Gases
Pattern Formation and Solitons
Exactly Solvable and Integrable Systems
Nuclear Experiment
Nuclear Experiment
Nuclear Theory
Nuclear Theory
Physics
Accelerator Physics
Atmospheric and Oceanic Physics
Applied Physics
Atomic and Molecular Clusters
Atomic Physics
Biological Physics
Chemical Physics
Classical Physics
Computational Physics
Data Analysis, Statistics and Probability
Physics Education
Fluid Dynamics
General Physics
Geophysics
History and Philosophy of Physics
Instrumentation and Detectors
Medical Physics
Optics
Plasma Physics
Popular Physics
Physics and Society
Space Physics
Quantitative Biology
Biomolecules
Cell Behavior
Genomics
Molecular Networks
Neurons and Cognition
Other Quantitative Biology
Populations and Evolution
Quantitative Methods
Subcellular Processes
Tissues and Organs
Quantitative Finance
Computational Finance
Economics
General Finance
Mathematical Finance
Portfolio Management
Pricing of Securities
Risk Management
Statistical Finance
Trading and Market Microstructure
Quantum Physics
Quantum Physics
Statistics
Applications
Computation
Methodology
Machine Learning
Other Statistics
Statistics Theory
Feedback
Online
Stats
Tracking 192,930 papers.