The introduction of optical tracking data across sports has given rise to the
ability to dissect athletic performance at a level unfathomable a decade ago.
One specific area that has seen substantial benefit is sports science, as high
resolution coordinate data permits sports scientists to have to-the-second
estimates of external load metrics, such as acceleration load and high speed
running distance, traditionally used to understand the physical toll a game
takes on an athlete. Unfortunately, collecting this data requires installation
of expensive hardware and paying costly licensing fees to data providers,
restricting its availability. Algorithms have been developed that allow a
traditional broadcast feed to be converted to x-y coordinate data, making
tracking data easier to acquire, but coordinates are available for an athlete
only when that player is within the camera frame. Obviously, this leads to
inaccuracies in player load estimates, limiting the usefulness of this data for
sports scientists. In this research, we develop...

Black-box algorithms have had astonishing success in some settings. But their
unpredictable brittleness has provoked serious concern and increased scrutiny.
For any given black-box algorithm understanding where it might fail is
extraordinarily challenging. In contrast, understanding which settings are not
appropriate for black-box deployment requires no more than understanding simply
how they are developed. We introduce a framework that isolates four
problem-features -- measurement, adaptability, resilience, and agnosis -- which
need to be carefully considered before selecting an algorithm. This paper lays
out a principled framework, justified through careful decomposition of the
system components used to develop black-box algorithms, for people to
understand and discuss where black-box algorithms are appropriate and, more
frequently, where they are not appropriate.

Clinical prediction models (CPMs) are used to predict clinically relevant
outcomes or events. Typically, prognostic CPMs are derived to predict the risk
of a single future outcome. However, with rising emphasis on the prediction of
multi-morbidity, there is growing need for CPMs to simultaneously predict risks
for each of multiple future outcomes. A common approach to multi-outcome risk
prediction is to derive a CPM for each outcome separately, then multiply the
predicted risks. This approach is only valid if the outcomes are conditionally
independent given the covariates, and it fails to exploit the potential
relationships between the outcomes. This paper outlines several approaches that
could be used to develop prognostic CPMs for multiple outcomes. We consider
four methods, ranging in complexity and assumed conditional independence
assumptions: namely, probabilistic classifier chain, multinomial logistic
regression, multivariate logistic regression, and a Bayesian probit model.
These are compared with methods that rely on...

Online learning to rank is a core problem in machine learning. In Lattimore
et al. (2018), a novel online learning algorithm was proposed based on
topological sorting. In the paper they provided a set of self-normalized
inequalities (a) in the algorithm as a criterion in iterations and (b) to
provide an upper bound for cumulative regret, which is a measure of algorithm
performance. In this work, we utilized method of mixtures and asymptotic
expansions of certain implicit function to provide a tighter, iterated-log-like
boundary for the inequalities, and as a consequence improve both the algorithm
itself as well as its performance estimation.

The problem of maximizing (or minimizing) the agreement between clusterings,
subject to given marginals, can be formally posed under a common framework for
several agreement measures. Until now, it was possible to find its solution
only through numerical algorithms. Here, an explicit solution is shown for the
case where the two clusterings have two clusters each.

Multi-parametric magnetic resonance imaging (mpMRI) plays an increasingly
important role in the diagnosis of prostate cancer. Various computer-aided
detection algorithms have been proposed for automated prostate cancer detection
by combining information from various mpMRI data components. However, there
exist other features of mpMRI, including the spatial correlation between voxels
and between-patient heterogeneity in the mpMRI parameters, that have not been
fully explored in the literature but could potentially improve cancer detection
if leveraged appropriately. This paper proposes novel voxel-wise Bayesian
classifiers for prostate cancer that account for the spatial correlation and
between-patient heterogeneity in mpMRI. Modeling the spatial correlation is
challenging due to the extreme high dimensionality of the data, and we consider
three computationally efficient approaches using Nearest Neighbor Gaussian
Process (NNGP), knot-based reduced-rank approximation, and a conditional
autoregressive (CAR) model, respectively. The...

When constructing a model to estimate the causal effect of a treatment, it is
necessary to control for other factors which may have confounding effects.
Because the ignorability assumption is not testable, however, it is usually
unclear which set of controls is appropriate, and effect estimation is
generally sensitive to this choice. A common approach in this case is to fit
several models, each with a different set of controls, but it is difficult to
reconcile inference under the multiple resulting posterior distributions for
the treatment effect. Therefore we propose a two-stage approach to measure the
sensitivity of effect estimation with respect to control specification. In the
first stage, a model is fit with all available controls using a prior carefully
selected to adjust for confounding. In the second stage, posterior
distributions are calculated for the treatment effect under nested sets of
controls by propagating posterior uncertainty in the original model. We
demonstrate how our approach can be used to detect the most...

A key difficulty that arises from real event data is imprecision in the
recording of event time-stamps. In many cases, retaining event times with a
high precision is expensive due to the sheer volume of activity. Combined with
practical limits on the accuracy of measurements, aggregated data is common. In
order to use point processes to model such event data, tools for handling
parameter estimation are essential. Here we consider parameter estimation of
the Hawkes process, a type of self-exciting point process that has found
application in the modeling of financial stock markets, earthquakes and social
media cascades. We develop a novel optimization approach to parameter
estimation of aggregated Hawkes processes using a Monte Carlo
Expectation-Maximization (MC-EM) algorithm. Through a detailed simulation
study, we demonstrate that existing methods are capable of producing severely
biased and highly variable parameter estimates and that our novel MC-EM method
significantly outperforms them in all studied circumstances. These...

With multiple potential mediators on the causal pathway from a treatment to
an outcome, we consider the problem of decomposing the effects along multiple
possible causal path(s) through each distinct mediator. Under Pearl's
path-specific effects framework (Pearl, 2001; Avin et al., 2005), such
fine-grained decompositions necessitate stringent assumptions, such as
correctly specifying the causal structure among the mediators, and there being
no unobserved confounding among the mediators. In contrast, interventional
direct and indirect effects for multiple mediators (Vansteelandt and Daniel,
2017) can be identified under much weaker conditions, while providing
scientifically relevant causal interpretations. Nonetheless, current estimation
approaches require (correctly) specifying a model for the joint mediator
distribution, which can be difficult when there is a high-dimensional set of
possibly continuous and non-continuous mediators. In this article, we avoid the
need for modeling this distribution, by building on a definition...

A Bayesian approach to conduct network model selection is presented for a
general class of network models referred to as the congruence class models
(CCMs). CCMs form a broad class that includes as special cases several common
network models, such as the Erd\H{o}s-R\'{e}nyi-Gilbert model, stochastic block
model and many exponential random graph models. Due to the range of models able
to be specified as a CCM, investigators are better able to select a model
consistent with generative mechanisms associated with the observed network
compared to current approaches. In addition, the approach allows for
incorporation of prior information. We utilize the proposed Bayesian network
model selection approach for CCMs to investigate several mechanisms that may be
responsible for the structure of patient-sharing networks, which are associated
with the cost and quality of medical care. We found evidence in support of
heterogeneity in sociality but not selective mixing by provider type nor
degree.

