Often, a community becomes alarmed when high rates of cancer are noticed, and
residents suspect that the cancer cases could be caused by a known source of
hazard. In response, the CDC recommends that departments of health perform a
standardized incidence ratio (SIR) analysis to determine whether the observed
cancer incidence is higher than expected. This approach has several limitations
that are well documented in the literature. In this paper we propose a novel
causal inference approach to cancer cluster investigations, rooted in the
potential outcomes framework. Assuming that a source of hazard representing a
potential cause of increased cancer rates in the community is identified a
priori, we introduce a new estimand called the causal SIR (cSIR). The cSIR is a
ratio defined as the expected cancer incidence in the exposed population
divided by the expected cancer incidence under the (counterfactual) scenario of
no exposure. To estimate the cSIR we need to overcome two main challenges: 1)
identify unexposed populations that are...

Authors: 4

Total Words: 11373

Unqiue Words: 2763

Functional time series analysis, whether based on time of frequency domain
methodology, has traditionally been carried out under the assumption of
complete observation of the constituent series of curves, assumed stationary.
Nevertheless, as is often the case with independent functional data, it may
well happen that the data available to the analyst are not the actual sequence
of curves, but relatively few and noisy measurements per curve, potentially at
different locations in each curve's domain. Under this sparse sampling regime,
neither the established estimators of the time series' dynamics, nor their
corresponding theoretical analysis will apply. The subject of this paper is to
tackle the problem of estimating the dynamics and of recovering the latent
process of smooth curves in the sparse regime. Assuming smoothness of the
latent curves, we construct a consistent nonparametric estimator of the series'
spectral density operator and use it develop a frequency-domain recovery
approach, that predicts the latent curve at a given...

Authors: 2

Total Words: 23675

Unqiue Words: 4165

In this paper, we study the high-dimensional sparse directed acyclic graph
(DAG) models under the empirical sparse Cholesky prior. Among our results,
strong model selection consistency or graph selection consistency is obtained
under more general conditions than those in the existing literature. Compared
to Cao, Khare and Ghosh (2017), the required conditions are weakened in terms
of the dimensionality, sparsity and lower bound of the nonzero elements in the
Cholesky factor. Furthermore, our result does not require the irrepresentable
condition, which is necessary for Lasso type methods. We also derive the
posterior convergence rates for precision matrices and Cholesky factors with
respect to various matrix norms. The obtained posterior convergence rates are
the fastest among those of the existing Bayesian approaches. In particular, we
prove that our posterior convergence rates for Cholesky factors are the minimax
or at least nearly minimax depending on the relative size of true sparseness
for the entire dimension. The simulation...

Authors: 3

Total Words: 21594

Unqiue Words: 3237

Progression of chronic disease is often manifested by repeated occurrences of
disease-related events over time. Delineating the heterogeneity in the risk of
such recurrent events can provide valuable scientific insight for guiding
customized disease management. In this paper, we present a new modeling
framework for recurrent event data, which renders a flexible and robust
characterization of individual multiplicative risk of recurrent event through
quantile regression that accommodates both observed covariates and unobservable
frailty. The proposed modeling requires no distributional specification of the
unobservable frailty, while permitting the exploration of dynamic covariate
effects. We develop estimation and inference procedures for the proposed model
through a novel adaptation of the principle of conditional score. The
asymptotic properties of the proposed estimator, including the uniform
consistency and weak convergence, are established. Extensive simulation studies
demonstrate satisfactory finite-sample performance of the...

Authors: 4

Total Words: 12999

Unqiue Words: 2592

With incomplete data, the standard argument for when the response mechanism
can be ignored for modelling Purposes requires that realised Missing at Random
(MAR) holds for each density in the model and that distinctness of parameters
holds for the model's parameter space. We explain why the distinctness of
parameters criterion is too general because it allows the validity of an
analysis to be determined by a factor different from any of (i) the observed
data, (ii) the likelihood used to analyse the data and (iii) the analyst's
assumptions about the underlying data generation process. We further explain
why realised MAR alone, when applied appropriately, provides sufficient
justification for ignoring the response mechanism when making direct likelihood
inferences from incomplete data.

Authors: 1

Total Words: 1833

Unqiue Words: 571

Finite mixture models have been a very important tool for exploring complex
data structures in many scientific areas, for example, economics, epidemiology,
finance. In the past decade, semiparametric techniques have been popularly
introduced into traditional finite mixture models, and so semiparametric
mixture models have experienced exciting development in methodologies, theories
and applications. In this article, we provide a selective overview of
newly-developed semiparametric mixture models, discuss their estimation
methodologies, theoretical properties if applied, and some open questions.
Recent developments and some open questions are also discussed.

Authors: 3

Total Words: 10701

Unqiue Words: 2606

The synthetic control method (SCM) is a popular approach for estimating the
impact of a treatment on a single unit in panel data settings. The "synthetic
control" is a weighted average of control units that balances the treated
unit's pre-treatment outcomes as closely as possible. The curse of
dimensionality, however, means that SCM does not generally achieve exact
balance, which can bias the SCM estimate. We propose an extension, Augmented
SCM, which uses an outcome model to estimate the bias due to covariate
imbalance and then de-biases the original SCM estimate, analogous to bias
correction for inexact matching. We motivate this approach by showing that SCM
is a (regularized) inverse propensity score weighting estimator, with
pre-treatment outcomes as covariates and a ridge penalty on the propensity
score coefficients. We give theoretical guarantees for specific cases and
propose a new inference procedure. We demonstrate gains from Augmented SCM with
extensive simulation studies and apply this framework to canonical...

Authors: 3

Total Words: 19346

Unqiue Words: 3589

We provide a CV-TMLE estimator for a kernel smoothed version of the
cumulative distribution of the random variable giving the treatment effect or
so-called blip for a randomly drawn individual. We must first assume the
treatment effect or so-called blip distribution is continuous. We then derive
the efficient influence curve of the kernel smoothed version of the blip CDF.
Our CV-TMLE estimator is asymptotically efficient under two conditions, one of
which involves a second order remainder term which, in this case, shows us that
knowledge of the treatment mechanism does not guarantee a consistent estimate.
The remainder term also teaches us exactly how well we need to estimate the
nuisance parameters to guarantee asymptotic efficiency. Through simulations we
verify theoretical properties of the estimator and show the importance of
machine learning over conventional regression approaches to fitting the
nuisance parameters. We also derive the bias and variance of the estimator, the
orders of which are analogous to a kernel density...

Authors: 2

Total Words: 5583

Unqiue Words: 1559

The Fisher information approximation (FIA) is an implementation of the
minimum description length principle for model selection. Unlike information
criteria such as AIC or BIC, it has the advantage of taking the functional form
of a model into account. Unfortunately, FIA can be misleading in finite
samples, resulting in an inversion of the correct rank order of complexity
terms for competing models in the worst case. As a remedy, we propose a
lower-bound $N'$ for the sample size that suffices to preclude such errors. We
illustrate the approach using three examples from the family of multinomial
processing tree models.

more |
pdf
| html
Authors: 3

Total Words: 4534

Unqiue Words: 1352

This paper proposes a novel criterion for the allocation of patients in
Phase~I dose-escalation clinical trials aiming to find the maximum tolerated
dose (MTD). Conventionally, using a model-based approach the next patient is
allocated to the dose with the toxicity estimate closest (in terms of the
absolute or squared distance) to the maximum acceptable toxicity. This
approach, however, ignores the uncertainty in point estimates and ethical
concerns of assigning a lot of patients to overly toxic doses. Motivated by
recent discussions in the theory of estimation in restricted parameter spaces,
we propose a criterion which accounts for both of these issues. The criterion
requires a specification of one additional parameter only which has a simple
and intuitive interpretation. We incorporate the proposed criterion into the
one-parameter Bayesian continual reassessment method (CRM) and show, using
simulations, that it results in the same proportion of correct selections on
average as the original design, but in fewer mean number of...

Authors: 2

Total Words: 8371

Unqiue Words: 2099

