Researchers often misinterpret and misrepresent statistical outputs. This
abuse has led to a large literature on modification or replacement of testing
thresholds and P-values with confidence intervals, Bayes factors, and other
devices. Because the core problems appear cognitive rather than statistical, we
review some simple proposals to aid researchers in interpreting statistical
outputs. These proposals emphasize logical and information concepts over
probability, and thus may be more robust to common misinterpretations than are
traditional descriptions. The latter treat statistics as referring to targeted
hypotheses conditional on background assumptions. In contrast, we advise
reinterpretation of P-values and interval estimates in unconditional terms, in
which they describe compatibility of data with the entire set of analysis
assumptions. We use the Shannon transform of the P-value $p$, also known as the
surprisal or S-value $s=-log(p)$, to provide a measure of the information
supplied by the testing procedure against these...

dailyzad:
Also, S-values are one of the topics we do a deep dive on in the first of our recent pair of papers about improving statistical interpretations
https://t.co/Cyv3u2knX7 https://t.co/C9tb8NnEjT

dailyzad:
In paper 1, we discuss:
- A comprehensive discussion of P-value issues and their reconciliation with S-values
- Testing alternatives rather than just the null
- Graphical functions/tables to present alternative results 2/7
https://t.co/Cyv3u2knX7 https://t.co/EBWhPPpVpG

dailyzad:
THREAD
Happy to say that two papers by @Lester_Domes and I, on how we all can improve statistical teaching, reviewing, and practice via cognitive/semantic tools are up on arXiv 1/7
1: https://t.co/Cyv3u2knX7
2: https://t.co/la8HpJXmMr
#statstwitter #epitwitter #datascience https://t.co/ghtZCXITZm

rtorkar:
Very nice paper by Zad R. Chow and @Lester_Domes where they discuss the S-value, i.e., S=-log(p). I will introduce this in a course since I believe it conceptually makes the whole p-value thingy easier to understand! https://t.co/fZzXihrxs1 Quotes: 1/3

Mr1Paleo:
#RT @PaleoFoundation: RT @dailyzad: THREAD
Happy to say that two papers by @Lester_Domes and I, on how we all can improve statistical teaching, reviewing, and practice via cognitive/semantic tools are up on arXiv 1/7
1: https://t.co/LNd26oVyfx
2: … https://t.co/BoT9uaNb1L

BioPapers:
Semantic and Cognitive Tools to Aid Statistical Inference: Replace Confidence and Significance by Compatibility and Surprise. https://t.co/T6zGsmOSaV

Multi-dimensional functional data arises in numerous modern scientific
experimental and observational studies. In this paper we focus on longitudinal
functional data, a structured form of multidimensional functional data.
Operating within a longitudinal functional framework we aim to capture low
dimensional interpretable features. We propose a computationally efficient
nonparametric Bayesian method to simultaneously smooth observed data, estimate
conditional functional means and functional covariance surfaces. Statistical
inference is based on Monte Carlo samples from the posterior measure through
adaptive blocked Gibbs sampling. Several operative characteristics associated
with the proposed modeling framework are assessed comparatively in a simulated
environment. We illustrate the application of our work in two case studies. The
first case study involves age-specific fertility collected over time for
various countries. The second case study is an implicit learning experiment in
children with Autism Spectrum Disorder (ASD).

StatsPapers:
Bayesian Analysis of Multidimensional Functional Data. https://t.co/VVRfsekO2H

389jan:
RT @StatsPapers: Bayesian Analysis of Multidimensional Functional Data. https://t.co/VVRfsekO2H

We have elsewhere reviewed proposals to reform terminology and improve
interpretations of conventional statistics by emphasizing logical and
information concepts over probability concepts. We here give detailed reasons
and methods for reinterpreting statistics (including but not limited to)
P-values and interval estimates in unconditional terms, which describe
compatibility of observations with an entire set of analysis assumptions,
rather than just a narrow target hypothesis. Such reinterpretations help avoid
overconfident inferences whenever there is uncertainty about the assumptions
used to derive and compute the statistical results. Examples of such
assumptions include not only standard statistical modeling assumptions, but
also assumptions about absence of systematic errors, protocol violations, and
data corruption. Unconditional descriptions introduce uncertainty about such
assumptions directly into statistical presentations of results, rather than
leaving that only to the informal discussion that ensues. We thus...

dailyzad:
In paper 2:
- Why unconditional interpretations of statistics need to be emphasized
- Why terminology change is needed for reform
- How discussion needs to move on to decisions and their costs 3/7
https://t.co/la8HpJXmMr https://t.co/nuQErXTwja

BioPapers:
To Aid Statistical Inference, Emphasize Unconditional Descriptions of Statistics. https://t.co/vJ9a7bGSBK

Response-adaptive allocation designs refer to a class of designs where the
probability an observation is assigned to a treatment is changed throughout an
experiment based on the accrued responses. Such procedures result in random
treatment sample sizes. Most of the current literature considers unconditional
inference procedures in the analysis of response-adaptive allocation designs.
The focus of this work is inference conditional on the observed treatment
sample sizes. The inverse of information is a description of the large sample
variance of the parameter estimates. A simple form for the conditional
information relative to unconditional information is derived. It is found that
conditional information can be greater than unconditional information. It is
also shown that the variance of the conditional maximum likelihood estimate can
be less than the variance of the unconditional maximum likelihood estimate.
Finally, a conditional bootstrap procedure is developed that, in the majority
of cases examined, resulted in narrower...

StatsPapers:
Conditional Information and Inference in Response-Adaptive Allocation Designs. https://t.co/L4ckzwFtYR

This paper considers the problem of simultaneous variable selection and
estimation in the generalized semiparametric mixed-effects model for
longitudinal data when the number of parameters diverges with the sample size.
A penalization type of generalized estimating equation method is proposed while
using regression spline to approximate the nonparametric component. Our
approach applies SCAD to the estimating equation objective function in order to
simultaneously estimate parameters and select the important variables. The
proposed procedure involves the specification of the posterior distribution of
the random effects, which cannot be evaluated in a closed form. However, it is
possible to approximate this posterior distribution by producing random draws
from the distribution using a Metropolis algorithm, which does not require the
specification of the posterior distribution. For practical implementation, we
develop an appropriate iterative algorithm to select the significant variables
and estimate the nonzero coefficient functions....

StatsPapers:
Regularization in Generalized Semiparametric Mixed-Effects Model for Longitudinal Data. https://t.co/cIHT1p54tZ

Many studies collect functional data from multiple subjects that have both
multilevel and multivariate structures. An example of such data comes from
popular neuroscience experiments where participants' brain activity is recorded
using modalities such as EEG and summarized as power within multiple
time-varying frequency bands within multiple electrodes, or brain regions.
Summarizing the joint variation across multiple frequency bands for both
whole-brain variability between subjects, as well as location-variation within
subjects, can help to explain neural reactions to stimuli. This article
introduces a novel approach to conducting interpretable principal components
analysis on multilevel multivariate functional data that decomposes total
variation into subject-level and replicate-within-subject-level (i.e.
electrode-level) variation, and provides interpretable components that can be
both sparse among variates (e.g. frequency bands) and have localized support
over time within each frequency band. The sparsity and localization...

StatsPapers:
Interpretable Principal Components Analysis for Multilevel Multivariate Functional Data, with Application to EEG Experiments. https://t.co/WyaZXgjsBt

Standard logistic regression analysis of case-control data has low power to
detect gene-environment interactions, but until recently it was the only method
that could be used on complex polygenic data for which parametric
distributional models are not feasible. Under the assumption of
gene-environment independence in the underlying population, Stalder et al.
(2017, Biometrika, 104, 801-812) developed a retrospective method that treats
both genetic and environmental variables nonparametrically. However, the
mathematical symmetry of genetic and environmental variables is overlooked. We
propose an improvement to the method of Stalder et al. (2017) that increases
the efficiency of the estimates with no additional assumptions and modest
computational cost. This improvement is achieved by treating the genetic and
environmental variables symmetrically to generate two sets of parameter
estimates that are combined to generate a more efficient estimate. We employ a
semiparametric framework to develop the asymptotic theory of the estimator,...

Faltering growth among children is a nutritional problem prevalent in low to
medium income countries; it is generally defined as a slower rate of growth
compared to a reference healthy population of the same age and gender. As
faltering is closely associated with reduced physical, intellectual and
economic productivity potential, it is important to identify faltered children
and be able to characterise different growth patterns so that targeted
treatments can be designed and administered. We introduce a multiclass
classification model for growth trajectory that flexibly extends a current
classification approach called the broken stick model, which is a piecewise
linear model with breaks at fixed knot locations. Heterogeneity in growth
patterns among children is captured using mixture distributed random effects,
whereby the mixture components determine the classification of children into
subgroups. The mixture distribution is modelled using a Dirichlet process
prior, which avoids the need to choose the "true" number of mixture...

The Functional Linear Model with Functional Response (FLMFR) is one of the
most fundamental models to asses the relation between two functional random
variables. In this paper, we propose a novel goodness-of-fit test for the FLMFR
against a general, unspecified, alternative. The test statistic is formulated
in terms of a Cram\'er-von Mises norm over a doubly-projected empirical process
which, using geometrical arguments, yields an easy-to-compute weighted
quadratic norm. A resampling procedure calibrates the test through a wild
bootstrap on the residuals and the use convenient computational procedures. As
a sideways contribution, and since the statistic requires from a reliable
estimator of the FLMFR, we discuss and compare several regularized estimators,
providing a new one specifically convenient for our test. The finite sample
behavior of the test, regarding power and size, is illustrated via a complete
simulation study. Also, the new proposal is compared with previous significance
tests. Two novel real datasets illustrate the...

Linear regression with normally distributed errors - including particular
cases such as ANOVA, Student's t-test or location-scale inference - is a widely
used statistical procedure. In this case the ordinary least squares estimator
possesses remarkable properties but is very sensitive to outliers. Several
robust alternatives have been proposed, but there is still significant room for
improvement. This paper thus proposes an original method of estimation that
offers the best efficiency simultaneously in the absence and the presence of
outliers, both for the estimation of the regression coefficients and the scale
parameter. The approach first consists in broadening the normal assumption of
the errors to a mixture of the normal and the filtered-log-Pareto (FLP), an
original distribution designed to represent the outliers. The
expectation-maximization (EM) algorithm is then adapted and we obtain the N-FLP
estimators of the regression coefficients, the scale parameter and the
proportion of outliers, along with probabilities of each...

