In this essay I discuss potential outcome and graphical approaches to
causality, and their relevance for empirical work in economics. I review some
of the work on directed acyclic graphs, including the recent "The Book of Why,"
by Pearl and MacKenzie. I also discuss the potential outcome framework
developed by Rubin and coauthors, building on work by Neyman. I then discuss
the relative merits of these approaches for empirical work in economics,
focusing on the questions each answer well, and why much of the the work in
economics is closer in spirit to the potential outcome framework.

autoregress:
Fun to see snippets of twitter convos with @eliasbareinboim, @PHuenermund @Jabaluck et al incorporated in this new Imbens paper.
https://t.co/NeFCRAF74M
Almost certain the conversation is far from over... :) https://t.co/JzaDx5stCa

hmmlowe:
This is a really great read for
(a) economists new to DAGs
(b) economists wanting to see twitter debates referenced in academic papers
Next step: @Jabaluck should include his tweets on google scholar
https://t.co/NkBsnsrZWd https://t.co/65Evmd5LlG

Regressing a scalar response on a random function is nowadays a common
situation. In the nonparametric setting, this paper paves the way for making
the local linear regression based on a projection approach a prominent method
for solving this regression problem. Our asymptotic results demonstrate that
the functional local linear regression outperforms its functional local
constant counterpart. Beyond the estimation of the regression operator itself,
the local linear regression is also a useful tool for predicting the functional
derivative of the regression operator, a promising mathematical object on its
own. The local linear estimator of the functional derivative is shown to be
consistent. On simulated datasets we illustrate good finite sample properties
of both proposed methods. On a real data example of a single-functional index
model we indicate how the functional derivative of the regression operator
provides an original and fast, widely applicable estimating method.

In a wide variety of situations, anomalies in the behaviour of a complex
system, whose health is monitored through the observation of a random vector X
= (X1,. .. , X d) valued in R d , correspond to the simultaneous occurrence of
extreme values for certain subgroups $\alpha$ $\subset$ {1,. .. , d} of
variables Xj. Under the heavy-tail assumption, which is precisely appropriate
for modeling these phenomena, statistical methods relying on multivariate
extreme value theory have been developed in the past few years for identifying
such events/subgroups. This paper exploits this approach much further by means
of a novel mixture model that permits to describe the distribution of extremal
observations and where the anomaly type $\alpha$ is viewed as a latent
variable. One may then take advantage of the model by assigning to any extreme
point a posterior probability for each anomaly type $\alpha$, defining
implicitly a similarity measure between anomalies. It is explained at length
how the latter permits to cluster extreme observations...

Most prediction models that are used in medical research fail to accurately
predict health outcomes due to methodological limitations. Using routinely
collected patient data, we explore the use of a Cox proportional hazard (PH)
model within a latent class framework to model survival of patients with
chronic heart failure (CHF). We identify subgroups of patients based on their
risk with the aid of available covariates. We allow each subgroup to have its
own risk model.We choose an optimum number of classes based on the reported
Bayesian information criteria (BIC). We assess the discriminative ability of
the chosen model using an area under the receiver operating characteristic
curve (AUC) for all the cross-validated and bootstrapped samples.We conduct a
simulation study to compare the predictive performance of our models. Our
proposed latent class model outperforms the standard one class Cox PH model.

We develop factor copula models for analysing the dependence among mixed
continuous and discrete responses. Factor copula models are canonical vine
copulas that involve both observed and latent variables, hence they allow tail,
asymmetric and non-linear dependence. They can be explained as conditional
independence models with latent variables that don't necessarily have an
additive latent structure. We focus on important issues that would interest the
social data analyst, such as model selection and goodness-of-fit. Our general
methodology is demonstrated with an extensive simulation study and illustrated
by re-analysing three mixed response datasets. Our study suggests that there
can be a substantial improvement over the standard factor model for mixed data
and makes the argument for moving to factor copula models.

Suppose we are using a generalized linear model to predict a scalar outcome
$Y$ given a covariate vector $X$. We consider two related problems and propose
a methodology for both. In the first problem, every data point in a large
dataset has both $Y$ and $X$ known, but we wish to use a subset of the data to
limit computational costs. In the second problem, sometimes call "measurement
constraints," $Y$ is expensive to measure and initially is available only for a
small portion of the data. The goal is to select another subset of data where
$Y$ will also be measured. We focus on the more challenging but less
well-studied measurement constraint problem. A popular approach for the first
problem is sampling. However, most existing sampling algorithms require $Y$ is
measured at all data points, so they cannot be used under measurement
constraints. We propose an optimal sampling procedure for massive datasets
under measurement constraints (OSUMC). We show consistency and asymptotic
normality of estimators from a general class of sampling...

A growing number of methods aim to assess the challenging question of
treatment effect variation in observational studies. This special section of
"Observational Studies" reports the results of a workshop conducted at the 2018
Atlantic Causal Inference Conference designed to understand the similarities
and differences across these methods. We invited eight groups of researchers to
analyze a synthetic observational data set that was generated using a recent
large-scale randomized trial in education. Overall, participants employed a
diverse set of methods, ranging from matching and flexible outcome modeling to
semiparametric estimation and ensemble approaches. While there was broad
consensus on the topline estimate, there were also large differences in
estimated treatment effect moderation. This highlights the fact that estimating
varying treatment effects in observational studies is often more challenging
than estimating the average treatment effect alone. We suggest several
directions for future work arising from this workshop.

