We propose a new centered autologistic spatio-temporal model for binary data
on a lattice. The centering allows the interpretation of the autoregression
coefficients in separating the large scale structure of the model corresponding
to an expected mean and the small-scale structure corresponding to the
auto-correlation. We discuss the existence of the joint law of the process and
show by simulation the interest of this kind of centering. We propose and show
the efficiency of the maximum pseudo-likelihood estimator and also a method to
choose the best structure of neighborhood. Method is applied to model and fit
epidemiological data about Esca disease on a vineyard of the Bordeaux region.

Sample Sizes : None.

Authors: 3

Total Words: 10114

Unqiue Words: 2272

Subtractive dither is a powerful method for removing the signal dependence of
quantization noise for coarsely-quantized signals. However, estimation from
dithered measurements often naively applies the sample mean or midrange, even
when the total noise is not well described with a Gaussian or uniform
distribution. We show that the generalized Gaussian distribution approximately
describes subtractively-dithered, quantized samples of a Gaussian distribution.
Furthermore, a generalized Gaussian fit leads to simple estimators based on
order statistics that match the performance of more complicated maximum
likelihood estimators requiring iterative solvers. The order statistics-based
estimators outperform both the sample mean and midrange for nontrivial sums of
Gaussian and uniform noise. Additional analysis of the generalized Gaussian
approximation yields rules of thumb for determining when and how to apply
dither to quantized measurements.

Sample Sizes : None.

Authors: 3

Total Words: 8922

Unqiue Words: 2255

We study heterogeneity in the effect of a mindset intervention on
student-level performance through an observational dataset from the National
Study of Learning Mindsets (NSLM). Our analysis uses machine learning (ML) to
address the following associated problems: assessing treatment group overlap
and covariate balance, imputing conditional average treatment effects, and
interpreting imputed effects. By comparing several different model families we
illustrate the flexibility of both off-the-shelf and purpose-built estimators.
We find that the mindset intervention has a positive average effect of 0.26,
95%-CI [0.22, 0.30], and that heterogeneity in the range of [0.1, 0.4] is
moderated by school-level achievement level, poverty concentration, urbanicity,
and student prior expectations.

Sample Sizes : None.

Authors: 1

Total Words: 4534

Unqiue Words: 1558

Cardiovascular diseases (CVDs) is a number one cause of death globally. WHO
estimated that CVD is a cause of 17.9 million deaths (or 31% of all global
deaths) in 2016. It may seem surprising, CVDs can be easily prevented by
altering lifestyle to avoid risk factors. The only requirement needed is to
know your risk prior. Thai CV Risk score is a trustworthy tool to forecast risk
of having cardiovascular event in the future for Thais. This study is an
external validation of the Thai CV risk score. We aim to answer two key
questions. Firstly, Can Thai CV Risk score developed using dataset of people
from central and north western parts of Thailand is applicable to people from
other parts of the country? Secondly, Can Thai CV Risk score developed for
general public works for hospital's patients who tend to have higher risk? We
answer these two questions using a dataset of 1,025 patients (319 males, 35-70
years old) from Lansaka Hospital in the southern Thailand. In brief, we find
that the Thai CV risk score works for southern Thais...

Sample Sizes : None.

Authors: 4

Total Words: 1531

Unqiue Words: 645

We introduce a pliable lasso method for estimation of interaction effects in
the Cox proportional hazards model framework. The pliable lasso is a linear
model that includes interactions between covariates X and a set of modifying
variables Z and assumes sparsity of the main effects and interaction effects.
The hierarchical penalty excludes interaction effects when the corresponding
main effects are zero: this avoids overfitting and an explosion of model
complexity. We extend this method to the Cox model for survival data,
incorporating modifiers that are either fixed or varying in time into the
partial likelihood. For example, this allows modeling of survival times that
differ based on interactions of genes with age, gender, or other demographic
information. The optimization is done by blockwise coordinate descent on a
second order approximation of the objective.

Sample Sizes : None.

Authors: 2

Total Words: 5845

Unqiue Words: 1331

The question in this paper is whether R&D efforts affect education
performance in small classes. Merging two datasets collected from the PISA
studies and the World Development Indicators and using Learning Bayesian
Networks, we prove the existence of a statistical causal relationship between
investment in R&D of a country and its education performance (PISA scores). We
also prove that the effect of R\&D on Education is long term as a country has
to invest at least 10 years before beginning to improve the level of young
pupils.

Sample Sizes : None.

Authors: 2

Total Words: 10182

Unqiue Words: 2954

Evaluating the return on ad spend (ROAS), the causal effect of advertising on
sales, is critical to advertisers for understanding the performance of their
existing marketing strategy as well as how to improve and optimize it. Media
Mix Modeling (MMM) has been used as a convenient analytical tool to address the
problem using observational data. However it is well recognized that MMM
suffers from various fundamental challenges: data collection, model
specification and selection bias due to ad targeting, among others
\citep{chan2017,wolfe2016}.
In this paper, we study the challenge associated with measuring the impact of
search ads in MMM, namely the selection bias due to ad targeting. Using causal
diagrams of the search ad environment, we derive a statistically principled
method for bias correction based on the \textit{back-door} criterion
\citep{pearl2013causality}. We use case studies to show that the method
provides promising results by comparison with results from randomized
experiments. We also report a more complex case...

Sample Sizes : None.

Authors: 7

Total Words: 11025

Unqiue Words: 2676

Judging a gymnastics routine is a noisy process, and the performance of
judges varies widely. In collaboration with the F\'ed\'eration Internationale
de Gymnastique (FIG) and Longines, we are designing and implementing an
improved statistical engine to analyze the performance of gymnastics judges
during and after major competitions like the Olympic Games and the World
Championships. The engine, called the Judge Evaluation Program (JEP), has three
objectives: (1) provide constructive feedback to judges, executive committees
and national federations; (2) assign the best judges to the most important
competitions; (3) detect bias and outright cheating.
Using data from international gymnastics competitions held during the
2013--2016 Olympic cycle, we first develop a marking score evaluating the
accuracy of the marks given by gymnastics judges. Judging a gymnastics routine
is a random process, and we can model this process very accurately using
heteroscedastic random variables. The marking score scales the difference
between the mark...

Sample Sizes : None.

Authors: 2

Total Words: 9117

Unqiue Words: 2472

Increasing installations of distributed electricity generation have vastly
increased the need for stochastic generation and demand data. However, the
effects of such installations is uncertain, as high quality data is not always
available before an installation is completed. In particular, there is a need
for stochastic models of demand and generation profiles for unobserved
prosumers. The model formulated in this paper bridges the gap between the
limited available empirical data, and the large amount of high-quality,
stochastic demand and generation data required for network and system analysis.
The approach employs clustering analysis and a Dirichlet-categorical
hierarchical model of the features of unobserved prosumers. Based on the data
of clusters of prosumers, Markov chain models of demand and generation profiles
are constructed from empirical data, and synthetic demand profiles are
subsequently sampled from these. The sampled traces are cross-validated and
show a good statistical fit to the observed data, and then two case...

Sample Sizes : None.

Authors: 3

Total Words: 7750

Unqiue Words: 2292

Hierarchical random effect models are used for different purposes in clinical
research and other areas. In general, the main focus is on population
parameters related to the expected treatment effects or group differences among
all units of an upper level (e.g. subjects in many settings). Optimal design
for estimation of population parameters are well established for many models.
However, optimal designs for the prediction for the individual units may be
different. Several settings are identiffed in which individual prediction may
be of interest. In this paper we determine optimal designs for the individual
predictions, e.g. in multi-center trials, and compare them to a conventional
balanced design with respect to treatment allocation. Our investigations show,
that balanced designs are far from optimal if the treatment effects vary
strongly as compared to the residual error and more subjects should be
recruited to the active (new) treatment in multi-center trials. Nevertheless,
effciency loss may be limited resulting in a moderate...

Sample Sizes : None.

Authors: 3

Total Words: 3448

Unqiue Words: 901

