##### #1. Estimating locomotor demands during team play from broadcast-derived tracking data
###### Jacob Mortensen, Luke Bornn
The introduction of optical tracking data across sports has given rise to the ability to dissect athletic performance at a level unfathomable a decade ago. One specific area that has seen substantial benefit is sports science, as high resolution coordinate data permits sports scientists to have to-the-second estimates of external load metrics, such as acceleration load and high speed running distance, traditionally used to understand the physical toll a game takes on an athlete. Unfortunately, collecting this data requires installation of expensive hardware and paying costly licensing fees to data providers, restricting its availability. Algorithms have been developed that allow a traditional broadcast feed to be converted to x-y coordinate data, making tracking data easier to acquire, but coordinates are available for an athlete only when that player is within the camera frame. Obviously, this leads to inaccuracies in player load estimates, limiting the usefulness of this data for sports scientists. In this research, we develop...
##### #2. The Principled Prediction-Problem Ontology: when black box algorithms are (not) appropriate
###### Jordan Rodu, Michael Baiocchi
Black-box algorithms have had astonishing success in some settings. But their unpredictable brittleness has provoked serious concern and increased scrutiny. For any given black-box algorithm understanding where it might fail is extraordinarily challenging. In contrast, understanding which settings are not appropriate for black-box deployment requires no more than understanding simply how they are developed. We introduce a framework that isolates four problem-features -- measurement, adaptability, resilience, and agnosis -- which need to be carefully considered before selecting an algorithm. This paper lays out a principled framework, justified through careful decomposition of the system components used to develop black-box algorithms, for people to understand and discuss where black-box algorithms are appropriate and, more frequently, where they are not appropriate.
##### #3. Clinical Prediction Models to Predict the Risk of Multiple Binary Outcomes: a comparison of approaches
###### Glen P. Martin, Matthew Sperrin, Kym I. E. Snell, Iain Buchan, Richard D. Riley
Clinical prediction models (CPMs) are used to predict clinically relevant outcomes or events. Typically, prognostic CPMs are derived to predict the risk of a single future outcome. However, with rising emphasis on the prediction of multi-morbidity, there is growing need for CPMs to simultaneously predict risks for each of multiple future outcomes. A common approach to multi-outcome risk prediction is to derive a CPM for each outcome separately, then multiply the predicted risks. This approach is only valid if the outcomes are conditionally independent given the covariates, and it fails to exploit the potential relationships between the outcomes. This paper outlines several approaches that could be used to develop prognostic CPMs for multiple outcomes. We consider four methods, ranging in complexity and assumed conditional independence assumptions: namely, probabilistic classifier chain, multinomial logistic regression, multivariate logistic regression, and a Bayesian probit model. These are compared with methods that rely on...
##### #4. TopRank+: A Refinement of TopRank Algorithm
###### Victor de la Pena, Haolin Zou
Online learning to rank is a core problem in machine learning. In Lattimore et al. (2018), a novel online learning algorithm was proposed based on topological sorting. In the paper they provided a set of self-normalized inequalities (a) in the algorithm as a criterion in iterations and (b) to provide an upper bound for cumulative regret, which is a measure of algorithm performance. In this work, we utilized method of mixtures and asymptotic expansions of certain implicit function to provide a tighter, iterated-log-like boundary for the inequalities, and as a consequence improve both the algorithm itself as well as its performance estimation.
##### #5. Explicit agreement extremes for a $2\times2$ table with given marginals
###### José E. Chacón
The problem of maximizing (or minimizing) the agreement between clusterings, subject to given marginals, can be formally posed under a common framework for several agreement measures. Until now, it was possible to find its solution only through numerical algorithms. Here, an explicit solution is shown for the case where the two clusterings have two clusters each.
##### #6. Bayesian Spatial Models for Voxel-wise Prostate Cancer Classification Using Multi-parametric MRI Data
###### Jin Jin, Lin Zhang, Ethan Leng, Gregory J. Metzger, Joseph S. Koopmeiners
Multi-parametric magnetic resonance imaging (mpMRI) plays an increasingly important role in the diagnosis of prostate cancer. Various computer-aided detection algorithms have been proposed for automated prostate cancer detection by combining information from various mpMRI data components. However, there exist other features of mpMRI, including the spatial correlation between voxels and between-patient heterogeneity in the mpMRI parameters, that have not been fully explored in the literature but could potentially improve cancer detection if leveraged appropriately. This paper proposes novel voxel-wise Bayesian classifiers for prostate cancer that account for the spatial correlation and between-patient heterogeneity in mpMRI. Modeling the spatial correlation is challenging due to the extreme high dimensionality of the data, and we consider three computationally efficient approaches using Nearest Neighbor Gaussian Process (NNGP), knot-based reduced-rank approximation, and a conditional autoregressive (CAR) model, respectively. The...
##### #7. Bayesian inference for treatment effects under nested subsets of controls
###### Spencer Woody, Carlos M. Carvalho, Jared S. Murray
When constructing a model to estimate the causal effect of a treatment, it is necessary to control for other factors which may have confounding effects. Because the ignorability assumption is not testable, however, it is usually unclear which set of controls is appropriate, and effect estimation is generally sensitive to this choice. A common approach in this case is to fit several models, each with a different set of controls, but it is difficult to reconcile inference under the multiple resulting posterior distributions for the treatment effect. Therefore we propose a two-stage approach to measure the sensitivity of effect estimation with respect to control specification. In the first stage, a model is fit with all available controls using a prior carefully selected to adjust for confounding. In the second stage, posterior distributions are calculated for the treatment effect under nested sets of controls by propagating posterior uncertainty in the original model. We demonstrate how our approach can be used to detect the most...
##### #8. A Monte Carlo EM Algorithm for the Parameter Estimation of Aggregated Hawkes Processes
###### Leigh Shlomovich, Edward Cohen, Niall Adams, Lekha Patel
A key difficulty that arises from real event data is imprecision in the recording of event time-stamps. In many cases, retaining event times with a high precision is expensive due to the sheer volume of activity. Combined with practical limits on the accuracy of measurements, aggregated data is common. In order to use point processes to model such event data, tools for handling parameter estimation are essential. Here we consider parameter estimation of the Hawkes process, a type of self-exciting point process that has found application in the modeling of financial stock markets, earthquakes and social media cascades. We develop a novel optimization approach to parameter estimation of aggregated Hawkes processes using a Monte Carlo Expectation-Maximization (MC-EM) algorithm. Through a detailed simulation study, we demonstrate that existing methods are capable of producing severely biased and highly variable parameter estimates and that our novel MC-EM method significantly outperforms them in all studied circumstances. These...
##### #9. Non-linear Mediation Analysis with High-dimensional Mediators whose Causal Structure is Unknown
###### Wen Wei Loh, Beatrijs Moerkerke, Tom Loeys, Stijn Vansteelandt
With multiple potential mediators on the causal pathway from a treatment to an outcome, we consider the problem of decomposing the effects along multiple possible causal path(s) through each distinct mediator. Under Pearl's path-specific effects framework (Pearl, 2001; Avin et al., 2005), such fine-grained decompositions necessitate stringent assumptions, such as correctly specifying the causal structure among the mediators, and there being no unobserved confounding among the mediators. In contrast, interventional direct and indirect effects for multiple mediators (Vansteelandt and Daniel, 2017) can be identified under much weaker conditions, while providing scientifically relevant causal interpretations. Nonetheless, current estimation approaches require (correctly) specifying a model for the joint mediator distribution, which can be difficult when there is a high-dimensional set of possibly continuous and non-continuous mediators. In this article, we avoid the need for modeling this distribution, by building on a definition...
##### #10. Investigation of Patient-sharing Networks Using a Bayesian Network Model Selection Approach for Congruence Class Models
###### Ravi Goyal, Victor De Gruttola
A Bayesian approach to conduct network model selection is presented for a general class of network models referred to as the congruence class models (CCMs). CCMs form a broad class that includes as special cases several common network models, such as the Erd\H{o}s-R\'{e}nyi-Gilbert model, stochastic block model and many exponential random graph models. Due to the range of models able to be specified as a CCM, investigators are better able to select a model consistent with generative mechanisms associated with the observed network compared to current approaches. In addition, the approach allows for incorporation of prior information. We utilize the proposed Bayesian network model selection approach for CCMs to investigate several mechanisms that may be responsible for the structure of patient-sharing networks, which are associated with the cost and quality of medical care. We found evidence in support of heterogeneity in sociality but not selective mixing by provider type nor degree.
