Top 10 Arxiv Papers Today in Statistics


0.0 Mikeys
#1. Estimating locomotor demands during team play from broadcast-derived tracking data
Jacob Mortensen, Luke Bornn
The introduction of optical tracking data across sports has given rise to the ability to dissect athletic performance at a level unfathomable a decade ago. One specific area that has seen substantial benefit is sports science, as high resolution coordinate data permits sports scientists to have to-the-second estimates of external load metrics, such as acceleration load and high speed running distance, traditionally used to understand the physical toll a game takes on an athlete. Unfortunately, collecting this data requires installation of expensive hardware and paying costly licensing fees to data providers, restricting its availability. Algorithms have been developed that allow a traditional broadcast feed to be converted to x-y coordinate data, making tracking data easier to acquire, but coordinates are available for an athlete only when that player is within the camera frame. Obviously, this leads to inaccuracies in player load estimates, limiting the usefulness of this data for sports scientists. In this research, we develop...
more | pdf | html
Figures
None.
Tweets
StatsPapers: Estimating locomotor demands during team play from broadcast-derived tracking data. https://t.co/0iU3uXPqWI
Github
None.
Youtube
None.
Other stats
Sample Sizes : None.
Authors: 2
Total Words: 0
Unqiue Words: 0

0.0 Mikeys
#2. The Principled Prediction-Problem Ontology: when black box algorithms are (not) appropriate
Jordan Rodu, Michael Baiocchi
Black-box algorithms have had astonishing success in some settings. But their unpredictable brittleness has provoked serious concern and increased scrutiny. For any given black-box algorithm understanding where it might fail is extraordinarily challenging. In contrast, understanding which settings are not appropriate for black-box deployment requires no more than understanding simply how they are developed. We introduce a framework that isolates four problem-features -- measurement, adaptability, resilience, and agnosis -- which need to be carefully considered before selecting an algorithm. This paper lays out a principled framework, justified through careful decomposition of the system components used to develop black-box algorithms, for people to understand and discuss where black-box algorithms are appropriate and, more frequently, where they are not appropriate.
more | pdf | html
Figures
None.
Tweets
StatsPapers: The Principled Prediction-Problem Ontology: when black box algorithms are (not) appropriate. https://t.co/DtT2RojQ6U
Github
None.
Youtube
None.
Other stats
Sample Sizes : None.
Authors: 2
Total Words: 0
Unqiue Words: 0

0.0 Mikeys
#3. Clinical Prediction Models to Predict the Risk of Multiple Binary Outcomes: a comparison of approaches
Glen P. Martin, Matthew Sperrin, Kym I. E. Snell, Iain Buchan, Richard D. Riley
Clinical prediction models (CPMs) are used to predict clinically relevant outcomes or events. Typically, prognostic CPMs are derived to predict the risk of a single future outcome. However, with rising emphasis on the prediction of multi-morbidity, there is growing need for CPMs to simultaneously predict risks for each of multiple future outcomes. A common approach to multi-outcome risk prediction is to derive a CPM for each outcome separately, then multiply the predicted risks. This approach is only valid if the outcomes are conditionally independent given the covariates, and it fails to exploit the potential relationships between the outcomes. This paper outlines several approaches that could be used to develop prognostic CPMs for multiple outcomes. We consider four methods, ranging in complexity and assumed conditional independence assumptions: namely, probabilistic classifier chain, multinomial logistic regression, multivariate logistic regression, and a Bayesian probit model. These are compared with methods that rely on...
more | pdf | html
Figures
None.
Tweets
MatthewSperrin: Preprint: Clinical Prediction Models to Predict the Risk of Multiple Binary Outcomes: a comparison of approaches https://t.co/mbmdDlMMzO work with @glen_martin1 @Richard_D_Riley @Kym_Snell and @profbuchan - comments welcome!
StatsPapers: Clinical Prediction Models to Predict the Risk of Multiple Binary Outcomes: a comparison of approaches. https://t.co/zyQT62VyBP
Github
None.
Youtube
None.
Other stats
Sample Sizes : None.
Authors: 5
Total Words: 0
Unqiue Words: 0

0.0 Mikeys
#4. TopRank+: A Refinement of TopRank Algorithm
Victor de la Pena, Haolin Zou
Online learning to rank is a core problem in machine learning. In Lattimore et al. (2018), a novel online learning algorithm was proposed based on topological sorting. In the paper they provided a set of self-normalized inequalities (a) in the algorithm as a criterion in iterations and (b) to provide an upper bound for cumulative regret, which is a measure of algorithm performance. In this work, we utilized method of mixtures and asymptotic expansions of certain implicit function to provide a tighter, iterated-log-like boundary for the inequalities, and as a consequence improve both the algorithm itself as well as its performance estimation.
more | pdf | html
Figures
None.
Tweets
arxivml: "TopRank+: A Refinement of TopRank Algorithm", Victor de la Pena, Haolin Zou https://t.co/svmYvIeLiH
StatsPapers: TopRank+: A Refinement of TopRank Algorithm. https://t.co/jlPCXolDc6
Github
None.
Youtube
None.
Other stats
Sample Sizes : None.
Authors: 2
Total Words: 0
Unqiue Words: 0

0.0 Mikeys
#5. Explicit agreement extremes for a $2\times2$ table with given marginals
José E. Chacón
The problem of maximizing (or minimizing) the agreement between clusterings, subject to given marginals, can be formally posed under a common framework for several agreement measures. Until now, it was possible to find its solution only through numerical algorithms. Here, an explicit solution is shown for the case where the two clusterings have two clusters each.
more | pdf | html
Figures
None.
Tweets
arxivml: "Explicit agreement extremes for a $2\times2$ table with given marginals", José E. Chacón https://t.co/Lik7q1Duza
StatsPapers: Explicit agreement extremes for a $2\times2$ table with given marginals. https://t.co/4qnSO7pBrc
Github
None.
Youtube
None.
Other stats
Sample Sizes : None.
Authors: 1
Total Words: 0
Unqiue Words: 0

0.0 Mikeys
#6. Bayesian Spatial Models for Voxel-wise Prostate Cancer Classification Using Multi-parametric MRI Data
Jin Jin, Lin Zhang, Ethan Leng, Gregory J. Metzger, Joseph S. Koopmeiners
Multi-parametric magnetic resonance imaging (mpMRI) plays an increasingly important role in the diagnosis of prostate cancer. Various computer-aided detection algorithms have been proposed for automated prostate cancer detection by combining information from various mpMRI data components. However, there exist other features of mpMRI, including the spatial correlation between voxels and between-patient heterogeneity in the mpMRI parameters, that have not been fully explored in the literature but could potentially improve cancer detection if leveraged appropriately. This paper proposes novel voxel-wise Bayesian classifiers for prostate cancer that account for the spatial correlation and between-patient heterogeneity in mpMRI. Modeling the spatial correlation is challenging due to the extreme high dimensionality of the data, and we consider three computationally efficient approaches using Nearest Neighbor Gaussian Process (NNGP), knot-based reduced-rank approximation, and a conditional autoregressive (CAR) model, respectively. The...
more | pdf | html
Figures
None.
Tweets
StatsPapers: Bayesian Spatial Models for Voxel-wise Prostate Cancer Classification Using Multi-parametric MRI Data. https://t.co/dp9feUkwmu
Github
None.
Youtube
None.
Other stats
Sample Sizes : None.
Authors: 5
Total Words: 0
Unqiue Words: 0

0.0 Mikeys
#7. Bayesian inference for treatment effects under nested subsets of controls
Spencer Woody, Carlos M. Carvalho, Jared S. Murray
When constructing a model to estimate the causal effect of a treatment, it is necessary to control for other factors which may have confounding effects. Because the ignorability assumption is not testable, however, it is usually unclear which set of controls is appropriate, and effect estimation is generally sensitive to this choice. A common approach in this case is to fit several models, each with a different set of controls, but it is difficult to reconcile inference under the multiple resulting posterior distributions for the treatment effect. Therefore we propose a two-stage approach to measure the sensitivity of effect estimation with respect to control specification. In the first stage, a model is fit with all available controls using a prior carefully selected to adjust for confounding. In the second stage, posterior distributions are calculated for the treatment effect under nested sets of controls by propagating posterior uncertainty in the original model. We demonstrate how our approach can be used to detect the most...
more | pdf | html
Figures
None.
Tweets
StatsPapers: Bayesian inference for treatment effects under nested subsets of controls. https://t.co/PTcwQhNOkK
Github
None.
Youtube
None.
Other stats
Sample Sizes : None.
Authors: 3
Total Words: 0
Unqiue Words: 0

0.0 Mikeys
#8. A Monte Carlo EM Algorithm for the Parameter Estimation of Aggregated Hawkes Processes
Leigh Shlomovich, Edward Cohen, Niall Adams, Lekha Patel
A key difficulty that arises from real event data is imprecision in the recording of event time-stamps. In many cases, retaining event times with a high precision is expensive due to the sheer volume of activity. Combined with practical limits on the accuracy of measurements, aggregated data is common. In order to use point processes to model such event data, tools for handling parameter estimation are essential. Here we consider parameter estimation of the Hawkes process, a type of self-exciting point process that has found application in the modeling of financial stock markets, earthquakes and social media cascades. We develop a novel optimization approach to parameter estimation of aggregated Hawkes processes using a Monte Carlo Expectation-Maximization (MC-EM) algorithm. Through a detailed simulation study, we demonstrate that existing methods are capable of producing severely biased and highly variable parameter estimates and that our novel MC-EM method significantly outperforms them in all studied circumstances. These...
more | pdf | html
Figures
None.
Tweets
eakcohen: New preprint from Leigh, another one of my brilliant PhD students. Understanding the repercussions of aggregating/binning event data and developing methodologies for handling data of this type is one of the areas my group is currently interested in https://t.co/LjD278fpiU https://t.co/s8ZVvbRdOZ
StatsPapers: A Monte Carlo EM Algorithm for the Parameter Estimation of Aggregated Hawkes Processes. https://t.co/5P48bpfIcR
Github
None.
Youtube
None.
Other stats
Sample Sizes : None.
Authors: 4
Total Words: 0
Unqiue Words: 0

0.0 Mikeys
#9. Non-linear Mediation Analysis with High-dimensional Mediators whose Causal Structure is Unknown
Wen Wei Loh, Beatrijs Moerkerke, Tom Loeys, Stijn Vansteelandt
With multiple potential mediators on the causal pathway from a treatment to an outcome, we consider the problem of decomposing the effects along multiple possible causal path(s) through each distinct mediator. Under Pearl's path-specific effects framework (Pearl, 2001; Avin et al., 2005), such fine-grained decompositions necessitate stringent assumptions, such as correctly specifying the causal structure among the mediators, and there being no unobserved confounding among the mediators. In contrast, interventional direct and indirect effects for multiple mediators (Vansteelandt and Daniel, 2017) can be identified under much weaker conditions, while providing scientifically relevant causal interpretations. Nonetheless, current estimation approaches require (correctly) specifying a model for the joint mediator distribution, which can be difficult when there is a high-dimensional set of possibly continuous and non-continuous mediators. In this article, we avoid the need for modeling this distribution, by building on a definition...
more | pdf | html
Figures
None.
Tweets
StatsPapers: Non-linear Mediation Analysis with High-dimensional Mediators whose Causal Structure is Unknown. https://t.co/35VbZZlRJt
Github
None.
Youtube
None.
Other stats
Sample Sizes : None.
Authors: 4
Total Words: 0
Unqiue Words: 0

0.0 Mikeys
#10. Investigation of Patient-sharing Networks Using a Bayesian Network Model Selection Approach for Congruence Class Models
Ravi Goyal, Victor De Gruttola
A Bayesian approach to conduct network model selection is presented for a general class of network models referred to as the congruence class models (CCMs). CCMs form a broad class that includes as special cases several common network models, such as the Erd\H{o}s-R\'{e}nyi-Gilbert model, stochastic block model and many exponential random graph models. Due to the range of models able to be specified as a CCM, investigators are better able to select a model consistent with generative mechanisms associated with the observed network compared to current approaches. In addition, the approach allows for incorporation of prior information. We utilize the proposed Bayesian network model selection approach for CCMs to investigate several mechanisms that may be responsible for the structure of patient-sharing networks, which are associated with the cost and quality of medical care. We found evidence in support of heterogeneity in sociality but not selective mixing by provider type nor degree.
more | pdf | html
Figures
None.
Tweets
StatsPapers: Investigation of Patient-sharing Networks Using a Bayesian Network Model Selection Approach for Congruence Class Models. https://t.co/GxQDM0IYaM
Github
None.
Youtube
None.
Other stats
Sample Sizes : None.
Authors: 2
Total Words: 0
Unqiue Words: 0

About

Assert is a website where the best academic papers on arXiv (computer science, math, physics), bioRxiv (biology), BITSS (reproducibility), EarthArXiv (earth science), engrXiv (engineering), LawArXiv (law), PsyArXiv (psychology), SocArXiv (social science), and SportRxiv (sport research) bubble to the top each day.

Papers are scored (in real-time) based on how verifiable they are (as determined by their Github repos) and how interesting they are (based on Twitter).

To see top papers, follow us on twitter @assertpub_ (arXiv), @assert_pub (bioRxiv), and @assertpub_dev (everything else).

To see beautiful figures extracted from papers, follow us on Instagram.

Tracking 257,273 papers.

Search
Sort results based on if they are interesting or reproducible.
Interesting
Reproducible
Categories
All
Astrophysics
Cosmology and Nongalactic Astrophysics
Earth and Planetary Astrophysics
Astrophysics of Galaxies
High Energy Astrophysical Phenomena
Instrumentation and Methods for Astrophysics
Solar and Stellar Astrophysics
Condensed Matter
Disordered Systems and Neural Networks
Mesoscale and Nanoscale Physics
Materials Science
Other Condensed Matter
Quantum Gases
Soft Condensed Matter
Statistical Mechanics
Strongly Correlated Electrons
Superconductivity
Computer Science
Artificial Intelligence
Hardware Architecture
Computational Complexity
Computational Engineering, Finance, and Science
Computational Geometry
Computation and Language
Cryptography and Security
Computer Vision and Pattern Recognition
Computers and Society
Databases
Distributed, Parallel, and Cluster Computing
Digital Libraries
Discrete Mathematics
Data Structures and Algorithms
Emerging Technologies
Formal Languages and Automata Theory
General Literature
Graphics
Computer Science and Game Theory
Human-Computer Interaction
Information Retrieval
Information Theory
Machine Learning
Logic in Computer Science
Multiagent Systems
Multimedia
Mathematical Software
Numerical Analysis
Neural and Evolutionary Computing
Networking and Internet Architecture
Other Computer Science
Operating Systems
Performance
Programming Languages
Robotics
Symbolic Computation
Sound
Software Engineering
Social and Information Networks
Systems and Control
Economics
Econometrics
General Economics
Theoretical Economics
Electrical Engineering and Systems Science
Audio and Speech Processing
Image and Video Processing
Signal Processing
General Relativity and Quantum Cosmology
General Relativity and Quantum Cosmology
High Energy Physics - Experiment
High Energy Physics - Experiment
High Energy Physics - Lattice
High Energy Physics - Lattice
High Energy Physics - Phenomenology
High Energy Physics - Phenomenology
High Energy Physics - Theory
High Energy Physics - Theory
Mathematics
Commutative Algebra
Algebraic Geometry
Analysis of PDEs
Algebraic Topology
Classical Analysis and ODEs
Combinatorics
Category Theory
Complex Variables
Differential Geometry
Dynamical Systems
Functional Analysis
General Mathematics
General Topology
Group Theory
Geometric Topology
History and Overview
Information Theory
K-Theory and Homology
Logic
Metric Geometry
Mathematical Physics
Numerical Analysis
Number Theory
Operator Algebras
Optimization and Control
Probability
Quantum Algebra
Rings and Algebras
Representation Theory
Symplectic Geometry
Spectral Theory
Statistics Theory
Mathematical Physics
Mathematical Physics
Nonlinear Sciences
Adaptation and Self-Organizing Systems
Chaotic Dynamics
Cellular Automata and Lattice Gases
Pattern Formation and Solitons
Exactly Solvable and Integrable Systems
Nuclear Experiment
Nuclear Experiment
Nuclear Theory
Nuclear Theory
Physics
Accelerator Physics
Atmospheric and Oceanic Physics
Applied Physics
Atomic and Molecular Clusters
Atomic Physics
Biological Physics
Chemical Physics
Classical Physics
Computational Physics
Data Analysis, Statistics and Probability
Physics Education
Fluid Dynamics
General Physics
Geophysics
History and Philosophy of Physics
Instrumentation and Detectors
Medical Physics
Optics
Plasma Physics
Popular Physics
Physics and Society
Space Physics
Quantitative Biology
Biomolecules
Cell Behavior
Genomics
Molecular Networks
Neurons and Cognition
Other Quantitative Biology
Populations and Evolution
Quantitative Methods
Subcellular Processes
Tissues and Organs
Quantitative Finance
Computational Finance
Economics
General Finance
Mathematical Finance
Portfolio Management
Pricing of Securities
Risk Management
Statistical Finance
Trading and Market Microstructure
Quantum Physics
Quantum Physics
Statistics
Applications
Computation
Methodology
Machine Learning
Other Statistics
Statistics Theory
Feedback
Online
Stats
Tracking 257,273 papers.