Top 10 Arxiv Papers Today in Statistics


2.055 Mikeys
#1. The effect of geographic sampling on extreme precipitation: from models to observations and back again
Mark D. Risser, Michael F. Wehner
In light of the significant uncertainties present in global climate models' characterization of precipitation extremes, it is important to properly use observational data sets to determine whether a particular climate model is suitable for simulating extremes. In this paper, we identify two problems with traditional approaches for comparing global climate models and observational data products with respect to extremes: first, daily gridded products are a suboptimal data source to use for this comparison, and second, failing to account for the geographic locations of weather station data can paint a misleading picture with respect to model performance. To demonstrate these problems, we utilize in situ measurements of daily precipitation along with a spatial statistical extreme value analysis to evaluate and compare model performance with respect to extreme climatology. As an illustration, we use model output from five early submissions to the HighResMIP subproject of the CMIP6 experiment (Haarsma et al., 2016), comparing integrated...
more | pdf | html
Figures
None.
Tweets
arxiv_org: The effect of geographic sampling on extreme precipitation: from models to observations a... https://t.co/7SVkj2xA1p https://t.co/LNaiNccat6
RLukeDuBois: RT @arxiv_org: The effect of geographic sampling on extreme precipitation: from models to observations a... https://t.co/7SVkj2xA1p https:/…
b_cavello: RT @arxiv_org: The effect of geographic sampling on extreme precipitation: from models to observations a... https://t.co/7SVkj2xA1p https:/…
martikagv: RT @arxiv_org: The effect of geographic sampling on extreme precipitation: from models to observations a... https://t.co/7SVkj2xA1p https:/…
juancarlosvigol: RT @arxiv_org: The effect of geographic sampling on extreme precipitation: from models to observations a... https://t.co/7SVkj2xA1p https:/…
01717257469: RT @arxiv_org: The effect of geographic sampling on extreme precipitation: from models to observations a... https://t.co/7SVkj2xA1p https:/…
AssistedEvolve: RT @arxiv_org: The effect of geographic sampling on extreme precipitation: from models to observations a... https://t.co/7SVkj2xA1p https:/…
Dmitry77162374: RT @arxiv_org: The effect of geographic sampling on extreme precipitation: from models to observations a... https://t.co/7SVkj2xA1p https:/…
MassBassLol: RT @arxiv_org: The effect of geographic sampling on extreme precipitation: from models to observations a... https://t.co/7SVkj2xA1p https:/…
matingshou: RT @arxiv_org: The effect of geographic sampling on extreme precipitation: from models to observations a... https://t.co/7SVkj2xA1p https:/…
Github
None.
Youtube
None.
Other stats
Sample Sizes : None.
Authors: 2
Total Words: 0
Unqiue Words: 0

2.024 Mikeys
#2. Harmonic Mean Point Processes: Proportional Rate Error Minimization for Obtundation Prediction
Yoonjung Kim, Jeremy C. Weiss
In healthcare, the highest risk individuals for morbidity and mortality are rarely those with the greatest modifiable risk. By contrast, many machine learning formulations implicitly attend to the highest risk individuals. We focus on this problem in point processes, a popular modeling technique for the analysis of the temporal event sequences in electronic health records (EHR) data with applications in risk stratification and risk score systems. We show that optimization of the log-likelihood function also gives disproportionate attention to high risk individuals and leads to poor prediction results for low risk individuals compared to ones at high risk. We characterize the problem and propose an adjusted log-likelihood formulation as a new objective for point processes. We demonstrate the benefits of our method in simulations and in EHR data of patients admitted to the critical care unit for intracerebral hemorrhage.
more | pdf | html
Figures
None.
Tweets
arxiv_org: Harmonic Mean Point Processes: Proportional Rate Error Minimization for Obtundation Predi... https://t.co/jVVAAzpp5W https://t.co/D6f6KszYrC
arxivml: "Harmonic Mean Point Processes: Proportional Rate Error Minimization for Obtundation Prediction", Yoonjung Kim, Jer… https://t.co/dM6R5LlHQm
Github
None.
Youtube
None.
Other stats
Sample Sizes : None.
Authors: 2
Total Words: 0
Unqiue Words: 0

2.023 Mikeys
#3. Exponential Convergence Rates of Classification Errors on Learning with SGD and Random Features
Shingo Yashima, Atsushi Nitanda, Taiji Suzuki
Although kernel methods are widely used in many learning problems, they have poor scalability to large datasets. To address this problem, sketching and stochastic gradient methods are the most commonly used techniques to derive efficient large-scale learning algorithms. In this study, we consider solving a binary classification problem using random features and stochastic gradient descent. In recent research, an exponential convergence rate of the expected classification error under the strong low-noise condition has been shown. We extend these analyses to a random features setting, analyzing the error induced by the approximation of random features in terms of the distance between the generated hypothesis including population risk minimizers and empirical risk minimizers when using general Lipschitz loss functions, to show that an exponential convergence of the expected classification error is achieved even if random features approximation is applied. Additionally, we demonstrate that the convergence rate does not depend on the...
more | pdf | html
Figures
None.
Tweets
BrundageBot: Exponential Convergence Rates of Classification Errors on Learning with SGD and Random Features. Shingo Yashima, Atsushi Nitanda, and Taiji Suzuki https://t.co/UHSlu56ks6
arxivml: "Exponential Convergence Rates of Classification Errors on Learning with SGD and Random Features", Shingo Yashima, … https://t.co/YYe9f8Wi14
Github
None.
Youtube
None.
Other stats
Sample Sizes : None.
Authors: 3
Total Words: 0
Unqiue Words: 0

2.02 Mikeys
#4. Long-range Event-level Prediction and Response Simulation for Urban Crime and Global Terrorism with Granger Networks
Timmy Li, Yi Huang, James Evans, Ishanu Chattopadhyay
Large-scale trends in urban crime and global terrorism are well-predicted by socio-economic drivers, but focused, event-level predictions have had limited success. Standard machine learning approaches are promising, but lack interpretability, are generally interpolative, and ineffective for precise future interventions with costly and wasteful false positives. Here, we are introducing Granger Network inference as a new forecasting approach for individual infractions with demonstrated performance far surpassing past results, yet transparent enough to validate and extend social theory. Considering the problem of predicting crime in the City of Chicago, we achieve an average AUC of ~90\% for events predicted a week in advance within spatial tiles approximately $1000$ ft across. Instead of pre-supposing that crimes unfold across contiguous spaces akin to diffusive systems, we learn the local transport rules from data. As our key insights, we uncover indications of suburban bias -- how law-enforcement response is modulated by...
more | pdf | html
Figures
None.
Tweets
arxivml: "Long-range Event-level Prediction and Response Simulation for Urban Crime and Global Terrorism with Granger Networ… https://t.co/9q7d4i9gaQ
SRoyLee: Long-range Event-level Prediction and Response Simulation for Urban Crime and Global Terrorism with Granger N - https://t.co/Rkt3PEPvaC
Github
None.
Youtube
None.
Other stats
Sample Sizes : None.
Authors: 4
Total Words: 0
Unqiue Words: 0

2.012 Mikeys
#5. Short-term forecasting of solar irradiance without local telemetry: a generalized model using satellite data
Jesus Lago, Karel De Brabandere, Fjo De Ridder, Bart De Schutter
Due to the increasing integration of solar power into the electrical grid, forecasting short-term solar irradiance has become key for many applications, e.g.~operational planning, power purchases, reserve activation, etc. In this context, as solar generators are geographically dispersed and ground measurements are not always easy to obtain, it is very important to have general models that can predict solar irradiance without the need of local data. In this paper, a model that can perform short-term forecasting of solar irradiance in any general location without the need of ground measurements is proposed. To do so, the model considers satellite-based measurements and weather-based forecasts, and employs a deep neural network structure that is able to generalize across locations; particularly, the network is trained only using a small subset of sites where ground data is available, and the model is able to generalize to a much larger number of locations where ground data does not exist. As a case study, 25 locations in The...
more | pdf | html
Figures
None.
Tweets
tweet_nakasho: 人工衛星のデータから機械学習を使って太陽放射強度の予測をした論文...なんか太陽関連の論文が立て続けに出てる!? https://t.co/3hfkx7YX1j
arxivml: "Short-term forecasting of solar irradiance without local telemetry: a generalized model using satellite data", Jes… https://t.co/6YVAcw1YHD
arxiv_cs_LG: Short-term forecasting of solar irradiance without local telemetry: a generalized model using satellite data. Jesus Lago, Karel De Brabandere, Fjo De Ridder, and Bart De Schutter https://t.co/dGEJ6PfgmL
Github
None.
Youtube
None.
Other stats
Sample Sizes : None.
Authors: 4
Total Words: 0
Unqiue Words: 0

2.009 Mikeys
#6. Causality-based tests to detect the influence of confounders on mobile health diagnostic applications: a comparison with restricted permutations
Elias Chaibub Neto, Meghasyam Tummalacherla, Lara Mangravite, Larsson Omberg
Machine learning practice is often impacted by confounders. Confounding can be particularly severe in remote digital health studies where the participants self-select to enter the study. While many different confounding adjustment approaches have been proposed in the literature, most of these methods rely on modeling assumptions, and it is unclear how robust they are to violations of these assumptions. This realization has recently motivated the development of restricted permutation methods to quantify the influence of observed confounders on the predictive performance of a machine learning models and evaluate if confounding adjustment methods are working as expected. In this paper we show, nonetheless, that restricted permutations can generate biased estimates of the contribution of the confounders to the predictive performance of a learner, and we propose an alternative approach to tackle this problem. By viewing a classification task from a causality perspective, we are able to leverage conditional independence tests between...
more | pdf | html
Figures
None.
Tweets
arxiv_org: Causality-based tests to detect the influence of confounders on mobile health diagnostic... https://t.co/sZbC8EqLDK https://t.co/mR7S4YzLIX
Github
None.
Youtube
None.
Other stats
Sample Sizes : None.
Authors: 4
Total Words: 0
Unqiue Words: 0

2.007 Mikeys
#7. Purifying Interaction Effects with the Functional ANOVA: An Efficient Algorithm for Recovering Identifiable Additive Models
Benjamin Lengerich, Sarah Tan, Chun-Hao Chang, Giles Hooker, Rich Caruana
Recent methods for training generalized additive models (GAMs) with pairwise interactions achieve state-of-the-art accuracy on a variety of datasets. Adding interactions to GAMs, however, introduces an identifiability problem: effects can be freely moved between main effects and interaction effects without changing the model predictions. In some cases, this can lead to contradictory interpretations of the same underlying function. This is a critical problem because a central motivation of GAMs is model interpretability. In this paper, we use the Functional ANOVA decomposition to uniquely define interaction effects and thus produce identifiable additive models with purified interactions. To compute this decomposition, we present a fast, exact, mass-moving algorithm that transforms any piecewise-constant function (such as a tree-based model) into a purified, canonical representation. We apply this algorithm to several datasets and show large disparity, including contradictions, between the apparent and the purified effects.
more | pdf | html
Figures
None.
Tweets
arxivml: "Purifying Interaction Effects with the Functional ANOVA: An Efficient Algorithm for Recovering Identifiable Additi… https://t.co/vcRz1cIGH2
arxiv_cs_LG: Purifying Interaction Effects with the Functional ANOVA: An Efficient Algorithm for Recovering Identifiable Additive Models. Benjamin Lengerich, Sarah Tan, Chun-Hao Chang, Giles Hooker, and Rich Caruana https://t.co/KS84le5bgL
Github
None.
Youtube
None.
Other stats
Sample Sizes : None.
Authors: 5
Total Words: 0
Unqiue Words: 0

2.006 Mikeys
#8. Nonconvex Stochastic Nested Optimization via Stochastic ADMM
Zhongruo Wang
We consider the stochastic nested composition optimization problem where the objective is a composition of two expected-value functions. We proposed the stochastic ADMM to solve this complicated objective. In order to find an $\epsilon$ stationary point where the expected norm of the subgradient of corresponding augmented Lagrangian is smaller than $\epsilon$, the total sample complexity of our method is $\mathcal{O}(\epsilon^{-3})$ for the online case and $\mathcal{O} \Bigl((2N_1 + N_2) + (2N_1 + N_2)^{1/2}\epsilon^{-2}\Bigr)$ for the finite sum case. The computational complexity is consistent with proximal version proposed in \cite{zhang2019multi}, but our algorithm can solve more general problem when the proximal mapping of the penalty is not easy to compute.
more | pdf | html
Figures
None.
Tweets
arxivml: "Nonconvex Stochastic Nested Optimization via Stochastic ADMM", Zhongruo Wang https://t.co/oFVsPppiNb
Github
None.
Youtube
None.
Other stats
Sample Sizes : None.
Authors: 1
Total Words: 0
Unqiue Words: 0

2.006 Mikeys
#9. Sparse estimation via $\ell_q$ optimization method in high-dimensional linear regression
Xin Li, Yaohua Hu, Chong Li, Xiaoqi Yang, Tianzi Jiang
In this paper, we discuss the statistical properties of the $\ell_q$ optimization methods $(0<q\leq 1)$, including the $\ell_q$ minimization method and the $\ell_q$ regularization method, for estimating a sparse parameter from noisy observations in high-dimensional linear regression with either a deterministic or random design. For this purpose, we introduce a general $q$-restricted eigenvalue condition (REC) and provide its sufficient conditions in terms of several widely-used regularity conditions such as sparse eigenvalue condition, restricted isometry property, and mutual incoherence property. By virtue of the $q$-REC, we exhibit the stable recovery property of the $\ell_q$ optimization methods for either deterministic or random designs by showing that the $\ell_2$ recovery bound $O(\epsilon^2)$ for the $\ell_q$ minimization method and the oracle inequality and $\ell_2$ recovery bound $O(\lambda^{\frac{2}{2-q}}s)$ for the $\ell_q$ regularization method hold respectively with high probability. The results in this paper are...
more | pdf | html
Figures
None.
Tweets
arxivml: "Sparse estimation via $\ell_q$ optimization method in high-dimensional linear regression", Xin Li, Yaohua Hu, Chon… https://t.co/BOlrOZfAMB
Github
None.
Youtube
None.
Other stats
Sample Sizes : None.
Authors: 5
Total Words: 0
Unqiue Words: 0

2.004 Mikeys
#10. Clustered Gaussian process model with an application to solar irradiance emulation
Chih-Li Sung, Benjamin Haaland, Youngdeok Hwang, Siyuan Lu
A Gaussian process has been one of the important approaches for emulating computer simulations. However, the stationarity assumption for a Gaussian process and the intractability for large-scale dataset limit its availability in practice. In this article, we propose a clustered Gaussian process model which segments the input data into multiple clusters, in each of which a Gaussian process is performed. The stochastic expectation-maximization is employed to efficiently fit the model. In our simulations as well as a real application to solar irradiance emulation, our proposed method had smaller mean square error than its main competitors, with competitive computation time, and provides valuable insights from data by discovering the clusters.
more | pdf | html
Figures
None.
Tweets
tweet_nakasho: clustered ガウス過程 なるものを使って、太陽の放射強度の大規模データをモデリングしたお話。 ガウス過程の勉強をしよう。 https://t.co/OMqVHMQ4Ln
TomiyaAkio: RT @tweet_nakasho: clustered ガウス過程 なるものを使って、太陽の放射強度の大規模データをモデリングしたお話。 ガウス過程の勉強をしよう。 https://t.co/OMqVHMQ4Ln
Github
None.
Youtube
None.
Other stats
Sample Sizes : None.
Authors: 4
Total Words: 11397
Unqiue Words: 2740

About

Assert is a website where the best academic papers on arXiv (computer science, math, physics), bioRxiv (biology), BITSS (reproducibility), EarthArXiv (earth science), engrXiv (engineering), LawArXiv (law), PsyArXiv (psychology), SocArXiv (social science), and SportRxiv (sport research) bubble to the top each day.

Papers are scored (in real-time) based on how verifiable they are (as determined by their Github repos) and how interesting they are (based on Twitter).

To see top papers, follow us on twitter @assertpub_ (arXiv), @assert_pub (bioRxiv), and @assertpub_dev (everything else).

To see beautiful figures extracted from papers, follow us on Instagram.

Tracking 222,125 papers.

Search
Sort results based on if they are interesting or reproducible.
Interesting
Reproducible
Categories
All
Astrophysics
Cosmology and Nongalactic Astrophysics
Earth and Planetary Astrophysics
Astrophysics of Galaxies
High Energy Astrophysical Phenomena
Instrumentation and Methods for Astrophysics
Solar and Stellar Astrophysics
Condensed Matter
Disordered Systems and Neural Networks
Mesoscale and Nanoscale Physics
Materials Science
Other Condensed Matter
Quantum Gases
Soft Condensed Matter
Statistical Mechanics
Strongly Correlated Electrons
Superconductivity
Computer Science
Artificial Intelligence
Hardware Architecture
Computational Complexity
Computational Engineering, Finance, and Science
Computational Geometry
Computation and Language
Cryptography and Security
Computer Vision and Pattern Recognition
Computers and Society
Databases
Distributed, Parallel, and Cluster Computing
Digital Libraries
Discrete Mathematics
Data Structures and Algorithms
Emerging Technologies
Formal Languages and Automata Theory
General Literature
Graphics
Computer Science and Game Theory
Human-Computer Interaction
Information Retrieval
Information Theory
Machine Learning
Logic in Computer Science
Multiagent Systems
Multimedia
Mathematical Software
Numerical Analysis
Neural and Evolutionary Computing
Networking and Internet Architecture
Other Computer Science
Operating Systems
Performance
Programming Languages
Robotics
Symbolic Computation
Sound
Software Engineering
Social and Information Networks
Systems and Control
Economics
Econometrics
General Economics
Theoretical Economics
Electrical Engineering and Systems Science
Audio and Speech Processing
Image and Video Processing
Signal Processing
General Relativity and Quantum Cosmology
General Relativity and Quantum Cosmology
High Energy Physics - Experiment
High Energy Physics - Experiment
High Energy Physics - Lattice
High Energy Physics - Lattice
High Energy Physics - Phenomenology
High Energy Physics - Phenomenology
High Energy Physics - Theory
High Energy Physics - Theory
Mathematics
Commutative Algebra
Algebraic Geometry
Analysis of PDEs
Algebraic Topology
Classical Analysis and ODEs
Combinatorics
Category Theory
Complex Variables
Differential Geometry
Dynamical Systems
Functional Analysis
General Mathematics
General Topology
Group Theory
Geometric Topology
History and Overview
Information Theory
K-Theory and Homology
Logic
Metric Geometry
Mathematical Physics
Numerical Analysis
Number Theory
Operator Algebras
Optimization and Control
Probability
Quantum Algebra
Rings and Algebras
Representation Theory
Symplectic Geometry
Spectral Theory
Statistics Theory
Mathematical Physics
Mathematical Physics
Nonlinear Sciences
Adaptation and Self-Organizing Systems
Chaotic Dynamics
Cellular Automata and Lattice Gases
Pattern Formation and Solitons
Exactly Solvable and Integrable Systems
Nuclear Experiment
Nuclear Experiment
Nuclear Theory
Nuclear Theory
Physics
Accelerator Physics
Atmospheric and Oceanic Physics
Applied Physics
Atomic and Molecular Clusters
Atomic Physics
Biological Physics
Chemical Physics
Classical Physics
Computational Physics
Data Analysis, Statistics and Probability
Physics Education
Fluid Dynamics
General Physics
Geophysics
History and Philosophy of Physics
Instrumentation and Detectors
Medical Physics
Optics
Plasma Physics
Popular Physics
Physics and Society
Space Physics
Quantitative Biology
Biomolecules
Cell Behavior
Genomics
Molecular Networks
Neurons and Cognition
Other Quantitative Biology
Populations and Evolution
Quantitative Methods
Subcellular Processes
Tissues and Organs
Quantitative Finance
Computational Finance
Economics
General Finance
Mathematical Finance
Portfolio Management
Pricing of Securities
Risk Management
Statistical Finance
Trading and Market Microstructure
Quantum Physics
Quantum Physics
Statistics
Applications
Computation
Methodology
Machine Learning
Other Statistics
Statistics Theory
Feedback
Online
Stats
Tracking 222,125 papers.