##### #1. The effect of geographic sampling on extreme precipitation: from models to observations and back again
###### Mark D. Risser, Michael F. Wehner
In light of the significant uncertainties present in global climate models' characterization of precipitation extremes, it is important to properly use observational data sets to determine whether a particular climate model is suitable for simulating extremes. In this paper, we identify two problems with traditional approaches for comparing global climate models and observational data products with respect to extremes: first, daily gridded products are a suboptimal data source to use for this comparison, and second, failing to account for the geographic locations of weather station data can paint a misleading picture with respect to model performance. To demonstrate these problems, we utilize in situ measurements of daily precipitation along with a spatial statistical extreme value analysis to evaluate and compare model performance with respect to extreme climatology. As an illustration, we use model output from five early submissions to the HighResMIP subproject of the CMIP6 experiment (Haarsma et al., 2016), comparing integrated...
##### #2. Long-range Event-level Prediction and Response Simulation for Urban Crime and Global Terrorism with Granger Networks
###### Timmy Li, Yi Huang, James Evans, Ishanu Chattopadhyay
Large-scale trends in urban crime and global terrorism are well-predicted by socio-economic drivers, but focused, event-level predictions have had limited success. Standard machine learning approaches are promising, but lack interpretability, are generally interpolative, and ineffective for precise future interventions with costly and wasteful false positives. Here, we are introducing Granger Network inference as a new forecasting approach for individual infractions with demonstrated performance far surpassing past results, yet transparent enough to validate and extend social theory. Considering the problem of predicting crime in the City of Chicago, we achieve an average AUC of ~90\% for events predicted a week in advance within spatial tiles approximately $1000$ ft across. Instead of pre-supposing that crimes unfold across contiguous spaces akin to diffusive systems, we learn the local transport rules from data. As our key insights, we uncover indications of suburban bias -- how law-enforcement response is modulated by...
##### #3. Causality-based tests to detect the influence of confounders on mobile health diagnostic applications: a comparison with restricted permutations
###### Elias Chaibub Neto, Meghasyam Tummalacherla, Lara Mangravite, Larsson Omberg
Machine learning practice is often impacted by confounders. Confounding can be particularly severe in remote digital health studies where the participants self-select to enter the study. While many different confounding adjustment approaches have been proposed in the literature, most of these methods rely on modeling assumptions, and it is unclear how robust they are to violations of these assumptions. This realization has recently motivated the development of restricted permutation methods to quantify the influence of observed confounders on the predictive performance of a machine learning models and evaluate if confounding adjustment methods are working as expected. In this paper we show, nonetheless, that restricted permutations can generate biased estimates of the contribution of the confounders to the predictive performance of a learner, and we propose an alternative approach to tackle this problem. By viewing a classification task from a causality perspective, we are able to leverage conditional independence tests between...
##### #4. A Lattice and Random Intermediate Point Sampling Design for Animal Movement
###### Elizabeth Eisenhauer, Ephraim Hanks
Animal movement studies have become ubiquitous in animal ecology for estimation of space use and analysis of movement behavior. In these studies, animal movement data are primarily collected at regular time intervals. We propose an irregular sampling design which could lead to greater efficiency and information gain in animal movement studies. Our novel sampling design, called lattice and random intermediate point (LARI), combines samples at regular and random time intervals. We compare the LARI sampling design to regular sampling designs in an example with common black carpenter ant location data, an example with guppy location data, and a simulation study of movement with a point of attraction. We modify a general stochastic differential equation model to allow for irregular time intervals and use this framework to compare sampling designs. When parameters are estimated reasonably well, regular sampling results in greater precision and accuracy in prediction of missing data. However, in each of the data and simulation examples...
##### #5. Bayesian Prediction of Volleyball Sets Using the Truncated Skellam and the Ordered Multinomial Models
###### Ioannis Ntzoufras, Vasilis Palaskas, Sotiris Drikos
In this work, we focus on building Bayesian models to analyze the outcome of a volleyball game as recorded by the difference of the winning sets for the Greek A1 men's League of the regular season 2016/17. More specifically, the first and foremost challenge is to find appropriate models for the response outcome which cannot be based on the usual Poisson or binomial assumptions. Here we will use two major approaches: a) an ordinal multinomial logistic regression model and b) a model based on a truncated version of the Skellam distribution. For the first model, we consider the set difference as an ordinal response variable within the framework of multinomial logistic regression models. Concerning the second model, we adjust the Skellam distribution in order to take into account for the volleyball rules. We fit and compare both models with the same covariate structure as in Karlis & Ntzoufras (2003). Both models are fitted, illustrated and compared using data from the Greek Volleball League for 2016/17.
##### #6. Combinatorial Models of Cross-Country Dual Meets: What is a Big Victory?
###### Kurt S. Riedel
Combinatorial/probabilistic models for cross-country dual-meets are proposed. The first model assumes that all runners are equally likely to finish in any possible order. The second model assumes that each team is selected from a large identically distributed population of potential runners and with each potential runner's ranking determined by the initial draw from the combined population.
##### #7. Identifying predictive biomarkers of CIMAvaxEGF success in advanced Lung Cancer Patients
###### Patricia Luaces, Lizet Sanchez, Danay Saavedra, Tania Crombet, Wim Van der Elst, Ariel Alonso, Geert Molenberghs, Agustin Lage
Objectives: To identify predictive biomarkers of CIMAvaxEGF success in the treatment of Non-Small Cell Lung Cancer Patients. Methods: Data from a clinical trial evaluating the effect on survival time of CIMAvax-EGF versus best supportive care were analyzed retrospectively following the causal inference approach. Pre-treatment potential predictive biomarkers included basal serum EGF concentration, peripheral blood parameters and immunosenescence biomarkers (The proportion of CD8 + CD28- T cells, CD4+ and CD8+ T cells, CD4 CD8 ratio and CD19+ B cells. The 33 patients with complete information were included. The predictive causal information (PCI) was calculated for all possible models. The model with a minimum number of predictors, but with high prediction accuracy (PCI>0.7) was selected. Good, rare and poor responder patients were identified using the predictive probability of treatment success. Results: The mean of PCI increased from 0.486, when only one predictor is considered, to 0.98 using the multivariate approach with all...
##### #8. The use of registry data to extrapolate overall survival results from randomised controlled trials
###### Reynaldo Martina, Keith Abrams, Sylwia Bujkiewicz, David Jenkins, Pascale Dequen, Michael Lees, Frank A. Corvino, Jessica Davies
Background: Pre-marketing authorisation estimates of survival are generally restricted to those observed directly in randomised controlled trials (RCTs). However, for regulatory and Health Technology Assessment (HTA) decision-making a longer time horizon is often required than is studied in RCTs. Therefore, extrapolation is required to estimate long-term treatment effect. Registry data can provide evidence to support extrapolation of treatment effects from RCTs, which are considered the main sources of evidence of effect for new drug applications. A number of methods are available to extrapolate survival data, such as Exponential, Weibull, Gompertz, log-logistic or log-normal parametric models. The different methods have varying functional forms and can result in different survival estimates. Methods: The aim of this paper was to use registry data to supplement the relatively short term RCT data to obtain long term estimates of effect. No formal hypotheses were tested. We explore the above parametric regression models as well as...
##### #9. Human Immunodeficiency Virus(HIV) Cases in the Philippines: Analysis and Forecasting
###### Analaine May A. Tatoy, Roel F. Ceballos
Reports from the Health Department in the Philippines show that cases of Human Immunodeficiency Virus (HIV) are increasing despite management and control efforts by the government. Worldwide, the Philippines has one of the fastest growing number of HIV cases. The aim of the study is to analyze HIV cases by determining the best model in forecasting its future number of cases. The data set was retrieved from National HIV/AIDS and STI Surveillance and Strategic Information Unit (NHSSS) of the Department of Health containing 132 observations. This data set was divided into two parts, one for model building and another for forecast evaluation. The original series has an increasing trend and is nonstationary with indication of non-constant variance. Box-Cox transformation and ordinary differencing were performed on the series. The differenced series is stationary and tentative models were identified through ACF and PACF plots. SARIMA has the smallest chosen AIC value. The chosen model undergoes the diagnostic checking. The residuals of...
