### Top 10 Arxiv Papers Today in Methodology

##### #1. Semantic and Cognitive Tools to Aid Statistical Inference: Replace Confidence and Significance by Compatibility and Surprise
###### Zad R. Chow, Sander Greenland
Researchers often misinterpret and misrepresent statistical outputs. This abuse has led to a large literature on modification or replacement of testing thresholds and P-values with confidence intervals, Bayes factors, and other devices. Because the core problems appear cognitive rather than statistical, we review some simple proposals to aid researchers in interpreting statistical outputs. These proposals emphasize logical and information concepts over probability, and thus may be more robust to common misinterpretations than are traditional descriptions. The latter treat statistics as referring to targeted hypotheses conditional on background assumptions. In contrast, we advise reinterpretation of P-values and interval estimates in unconditional terms, in which they describe compatibility of data with the entire set of analysis assumptions. We use the Shannon transform of the P-value $p$, also known as the surprisal or S-value $s=-log(p)$, to provide a measure of the information supplied by the testing procedure against these...
more | pdf | html
None.
###### Tweets
dailyzad: Also, S-values are one of the topics we do a deep dive on in the first of our recent pair of papers about improving statistical interpretations https://t.co/Cyv3u2knX7 https://t.co/C9tb8NnEjT
dailyzad: In paper 1, we discuss: - A comprehensive discussion of P-value issues and their reconciliation with S-values - Testing alternatives rather than just the null - Graphical functions/tables to present alternative results 2/7 https://t.co/Cyv3u2knX7 https://t.co/EBWhPPpVpG
dailyzad: THREAD Happy to say that two papers by @Lester_Domes and I, on how we all can improve statistical teaching, reviewing, and practice via cognitive/semantic tools are up on arXiv 1/7 1: https://t.co/Cyv3u2knX7 2: https://t.co/la8HpJXmMr #statstwitter #epitwitter #datascience https://t.co/ghtZCXITZm
rtorkar: Very nice paper by Zad R. Chow and @Lester_Domes where they discuss the S-value, i.e., S=-log(p). I will introduce this in a course since I believe it conceptually makes the whole p-value thingy easier to understand! https://t.co/fZzXihrxs1 Quotes: 1/3
Mr1Paleo: #RT @PaleoFoundation: RT @dailyzad: THREAD Happy to say that two papers by @Lester_Domes and I, on how we all can improve statistical teaching, reviewing, and practice via cognitive/semantic tools are up on arXiv 1/7 1: https://t.co/LNd26oVyfx 2: … https://t.co/BoT9uaNb1L
BioPapers: Semantic and Cognitive Tools to Aid Statistical Inference: Replace Confidence and Significance by Compatibility and Surprise. https://t.co/T6zGsmOSaV
None.
None.
###### Other stats
Sample Sizes : None.
Authors: 2
Total Words: 8692
Unqiue Words: 3076

##### #2. Bayesian Analysis of Multidimensional Functional Data
###### John Shamshoian, Damla Senturk, Shafali Jeste, Donatello Telesca
Multi-dimensional functional data arises in numerous modern scientific experimental and observational studies. In this paper we focus on longitudinal functional data, a structured form of multidimensional functional data. Operating within a longitudinal functional framework we aim to capture low dimensional interpretable features. We propose a computationally efficient nonparametric Bayesian method to simultaneously smooth observed data, estimate conditional functional means and functional covariance surfaces. Statistical inference is based on Monte Carlo samples from the posterior measure through adaptive blocked Gibbs sampling. Several operative characteristics associated with the proposed modeling framework are assessed comparatively in a simulated environment. We illustrate the application of our work in two case studies. The first case study involves age-specific fertility collected over time for various countries. The second case study is an implicit learning experiment in children with Autism Spectrum Disorder (ASD).
more | pdf | html
None.
###### Tweets
StatsPapers: Bayesian Analysis of Multidimensional Functional Data. https://t.co/VVRfsekO2H
389jan: RT @StatsPapers: Bayesian Analysis of Multidimensional Functional Data. https://t.co/VVRfsekO2H
None.
None.
###### Other stats
Sample Sizes : None.
Authors: 4
Total Words: 0
Unqiue Words: 0

##### #3. To Aid Statistical Inference, Emphasize Unconditional Descriptions of Statistics
###### Sander Greenland, Zad R. Chow
We have elsewhere reviewed proposals to reform terminology and improve interpretations of conventional statistics by emphasizing logical and information concepts over probability concepts. We here give detailed reasons and methods for reinterpreting statistics (including but not limited to) P-values and interval estimates in unconditional terms, which describe compatibility of observations with an entire set of analysis assumptions, rather than just a narrow target hypothesis. Such reinterpretations help avoid overconfident inferences whenever there is uncertainty about the assumptions used to derive and compute the statistical results. Examples of such assumptions include not only standard statistical modeling assumptions, but also assumptions about absence of systematic errors, protocol violations, and data corruption. Unconditional descriptions introduce uncertainty about such assumptions directly into statistical presentations of results, rather than leaving that only to the informal discussion that ensues. We thus...
more | pdf | html
None.
###### Tweets
dailyzad: In paper 2: - Why unconditional interpretations of statistics need to be emphasized - Why terminology change is needed for reform - How discussion needs to move on to decisions and their costs 3/7 https://t.co/la8HpJXmMr https://t.co/nuQErXTwja
BioPapers: To Aid Statistical Inference, Emphasize Unconditional Descriptions of Statistics. https://t.co/vJ9a7bGSBK
None.
None.
###### Other stats
Sample Sizes : None.
Authors: 2
Total Words: 0
Unqiue Words: 0

##### #4. Conditional Information and Inference in Response-Adaptive Allocation Designs
Response-adaptive allocation designs refer to a class of designs where the probability an observation is assigned to a treatment is changed throughout an experiment based on the accrued responses. Such procedures result in random treatment sample sizes. Most of the current literature considers unconditional inference procedures in the analysis of response-adaptive allocation designs. The focus of this work is inference conditional on the observed treatment sample sizes. The inverse of information is a description of the large sample variance of the parameter estimates. A simple form for the conditional information relative to unconditional information is derived. It is found that conditional information can be greater than unconditional information. It is also shown that the variance of the conditional maximum likelihood estimate can be less than the variance of the unconditional maximum likelihood estimate. Finally, a conditional bootstrap procedure is developed that, in the majority of cases examined, resulted in narrower...
more | pdf | html
None.
###### Tweets
StatsPapers: Conditional Information and Inference in Response-Adaptive Allocation Designs. https://t.co/L4ckzwFtYR
None.
None.
###### Other stats
Sample Sizes : None.
Authors: 1
Total Words: 13161
Unqiue Words: 2676

##### #5. Regularization in Generalized Semiparametric Mixed-Effects Model for Longitudinal Data
###### M. Taavoni, M. Arashi
This paper considers the problem of simultaneous variable selection and estimation in the generalized semiparametric mixed-effects model for longitudinal data when the number of parameters diverges with the sample size. A penalization type of generalized estimating equation method is proposed while using regression spline to approximate the nonparametric component. Our approach applies SCAD to the estimating equation objective function in order to simultaneously estimate parameters and select the important variables. The proposed procedure involves the specification of the posterior distribution of the random effects, which cannot be evaluated in a closed form. However, it is possible to approximate this posterior distribution by producing random draws from the distribution using a Metropolis algorithm, which does not require the specification of the posterior distribution. For practical implementation, we develop an appropriate iterative algorithm to select the significant variables and estimate the nonzero coefficient functions....
more | pdf | html
None.
###### Tweets
StatsPapers: Regularization in Generalized Semiparametric Mixed-Effects Model for Longitudinal Data. https://t.co/cIHT1p54tZ
None.
None.
###### Other stats
Sample Sizes : [50, 100]
Authors: 2
Total Words: 11317
Unqiue Words: 2821

##### #6. Interpretable Principal Components Analysis for Multilevel Multivariate Functional Data, with Application to EEG Experiments
###### Jun Zhang, Greg J Siegle, Wendy D'Andrea, Robert T Krafty
Many studies collect functional data from multiple subjects that have both multilevel and multivariate structures. An example of such data comes from popular neuroscience experiments where participants' brain activity is recorded using modalities such as EEG and summarized as power within multiple time-varying frequency bands within multiple electrodes, or brain regions. Summarizing the joint variation across multiple frequency bands for both whole-brain variability between subjects, as well as location-variation within subjects, can help to explain neural reactions to stimuli. This article introduces a novel approach to conducting interpretable principal components analysis on multilevel multivariate functional data that decomposes total variation into subject-level and replicate-within-subject-level (i.e. electrode-level) variation, and provides interpretable components that can be both sparse among variates (e.g. frequency bands) and have localized support over time within each frequency band. The sparsity and localization...
more | pdf | html
None.
###### Tweets
StatsPapers: Interpretable Principal Components Analysis for Multilevel Multivariate Functional Data, with Application to EEG Experiments. https://t.co/WyaZXgjsBt
None.
None.
###### Other stats
Sample Sizes : None.
Authors: 4
Total Words: 0
Unqiue Words: 0

##### #7. Improved Semiparametric Analysis of Polygenic Gene-Environment Interactions in Case-Control Studies
###### Tianying Wang, Alex Asher
Standard logistic regression analysis of case-control data has low power to detect gene-environment interactions, but until recently it was the only method that could be used on complex polygenic data for which parametric distributional models are not feasible. Under the assumption of gene-environment independence in the underlying population, Stalder et al. (2017, Biometrika, 104, 801-812) developed a retrospective method that treats both genetic and environmental variables nonparametrically. However, the mathematical symmetry of genetic and environmental variables is overlooked. We propose an improvement to the method of Stalder et al. (2017) that increases the efficiency of the estimates with no additional assumptions and modest computational cost. This improvement is achieved by treating the genetic and environmental variables symmetrically to generate two sets of parameter estimates that are combined to generate a more efficient estimate. We employ a semiparametric framework to develop the asymptotic theory of the estimator,...
more | pdf | html
None.
###### Github

R package caseControlGE

Repository: caseControlGE
User: alexasher
Language: R
Stargazers: 0
Subscribers: 1
Forks: 0
Open Issues: 0
None.
###### Other stats
Sample Sizes : None.
Authors: 2
Total Words: 9089
Unqiue Words: 2283

##### #8. Multiclass classification of growth curves using random change points and heterogeneous random effects
###### Vincent Chin, Jarod Y. L. Lee, Louise M. Ryan, Robert Kohn, Scott A. Sisson
Faltering growth among children is a nutritional problem prevalent in low to medium income countries; it is generally defined as a slower rate of growth compared to a reference healthy population of the same age and gender. As faltering is closely associated with reduced physical, intellectual and economic productivity potential, it is important to identify faltered children and be able to characterise different growth patterns so that targeted treatments can be designed and administered. We introduce a multiclass classification model for growth trajectory that flexibly extends a current classification approach called the broken stick model, which is a piecewise linear model with breaks at fixed knot locations. Heterogeneity in growth patterns among children is captured using mixture distributed random effects, whereby the mixture components determine the classification of children into subgroups. The mixture distribution is modelled using a Dirichlet process prior, which avoids the need to choose the "true" number of mixture...
more | pdf | html
None.
None.
###### Other stats
Sample Sizes : None.
Authors: 5
Total Words: 8532
Unqiue Words: 2449

##### #9. A goodness-of-fit test for the functional linear model with functional response
###### Eduardo García-Portugués, Javier Álvarez-Liébana, Gonzalo Álvarez-Pérez, Wenceslao González-Manteiga
The Functional Linear Model with Functional Response (FLMFR) is one of the most fundamental models to asses the relation between two functional random variables. In this paper, we propose a novel goodness-of-fit test for the FLMFR against a general, unspecified, alternative. The test statistic is formulated in terms of a Cram\'er-von Mises norm over a doubly-projected empirical process which, using geometrical arguments, yields an easy-to-compute weighted quadratic norm. A resampling procedure calibrates the test through a wild bootstrap on the residuals and the use convenient computational procedures. As a sideways contribution, and since the statistic requires from a reliable estimator of the FLMFR, we discuss and compare several regularized estimators, providing a new one specifically convenient for our test. The finite sample behavior of the test, regarding power and size, is illustrated via a complete simulation study. Also, the new proposal is compared with previous significance tests. Two novel real datasets illustrate the...
more | pdf | html
None.
None.
None.
###### Other stats
Sample Sizes : None.
Authors: 4
Total Words: 0
Unqiue Words: 0

##### #10. Efficient and Robust Estimation of Linear Regression with Normal Errors
###### Alain Desgagné
Linear regression with normally distributed errors - including particular cases such as ANOVA, Student's t-test or location-scale inference - is a widely used statistical procedure. In this case the ordinary least squares estimator possesses remarkable properties but is very sensitive to outliers. Several robust alternatives have been proposed, but there is still significant room for improvement. This paper thus proposes an original method of estimation that offers the best efficiency simultaneously in the absence and the presence of outliers, both for the estimation of the regression coefficients and the scale parameter. The approach first consists in broadening the normal assumption of the errors to a mixture of the normal and the filtered-log-Pareto (FLP), an original distribution designed to represent the outliers. The expectation-maximization (EM) algorithm is then adapted and we obtain the N-FLP estimators of the regression coefficients, the scale parameter and the proportion of outliers, along with probabilities of each...
more | pdf | html
None.
None.
None.
###### Other stats
Sample Sizes : None.
Authors: 1
Total Words: 8684
Unqiue Words: 1969

Assert is a website where the best academic papers on arXiv (computer science, math, physics), bioRxiv (biology), BITSS (reproducibility), EarthArXiv (earth science), engrXiv (engineering), LawArXiv (law), PsyArXiv (psychology), SocArXiv (social science), and SportRxiv (sport research) bubble to the top each day.

Papers are scored (in real-time) based on how verifiable they are (as determined by their Github repos) and how interesting they are (based on Twitter).

To see top papers, follow us on twitter @assertpub_ (arXiv), @assert_pub (bioRxiv), and @assertpub_dev (everything else).

To see beautiful figures extracted from papers, follow us on Instagram.

Tracking 192,930 papers.

###### Search
Sort results based on if they are interesting or reproducible.
Interesting
Reproducible
Online
###### Stats
Tracking 192,930 papers.