### Top 10 Arxiv Papers Today in Statistics

##### #1. The effect of geographic sampling on extreme precipitation: from models to observations and back again
###### Mark D. Risser, Michael F. Wehner
In light of the significant uncertainties present in global climate models' characterization of precipitation extremes, it is important to properly use observational data sets to determine whether a particular climate model is suitable for simulating extremes. In this paper, we identify two problems with traditional approaches for comparing global climate models and observational data products with respect to extremes: first, daily gridded products are a suboptimal data source to use for this comparison, and second, failing to account for the geographic locations of weather station data can paint a misleading picture with respect to model performance. To demonstrate these problems, we utilize in situ measurements of daily precipitation along with a spatial statistical extreme value analysis to evaluate and compare model performance with respect to extreme climatology. As an illustration, we use model output from five early submissions to the HighResMIP subproject of the CMIP6 experiment (Haarsma et al., 2016), comparing integrated...
more | pdf | html
None.
###### Tweets
arxiv_org: The effect of geographic sampling on extreme precipitation: from models to observations a... https://t.co/7SVkj2xA1p https://t.co/LNaiNccat6
RLukeDuBois: RT @arxiv_org: The effect of geographic sampling on extreme precipitation: from models to observations a... https://t.co/7SVkj2xA1p https:/…
b_cavello: RT @arxiv_org: The effect of geographic sampling on extreme precipitation: from models to observations a... https://t.co/7SVkj2xA1p https:/…
martikagv: RT @arxiv_org: The effect of geographic sampling on extreme precipitation: from models to observations a... https://t.co/7SVkj2xA1p https:/…
juancarlosvigol: RT @arxiv_org: The effect of geographic sampling on extreme precipitation: from models to observations a... https://t.co/7SVkj2xA1p https:/…
01717257469: RT @arxiv_org: The effect of geographic sampling on extreme precipitation: from models to observations a... https://t.co/7SVkj2xA1p https:/…
AssistedEvolve: RT @arxiv_org: The effect of geographic sampling on extreme precipitation: from models to observations a... https://t.co/7SVkj2xA1p https:/…
Dmitry77162374: RT @arxiv_org: The effect of geographic sampling on extreme precipitation: from models to observations a... https://t.co/7SVkj2xA1p https:/…
MassBassLol: RT @arxiv_org: The effect of geographic sampling on extreme precipitation: from models to observations a... https://t.co/7SVkj2xA1p https:/…
matingshou: RT @arxiv_org: The effect of geographic sampling on extreme precipitation: from models to observations a... https://t.co/7SVkj2xA1p https:/…
None.
None.
###### Other stats
Sample Sizes : None.
Authors: 2
Total Words: 0
Unqiue Words: 0

##### #2. Harmonic Mean Point Processes: Proportional Rate Error Minimization for Obtundation Prediction
###### Yoonjung Kim, Jeremy C. Weiss
In healthcare, the highest risk individuals for morbidity and mortality are rarely those with the greatest modifiable risk. By contrast, many machine learning formulations implicitly attend to the highest risk individuals. We focus on this problem in point processes, a popular modeling technique for the analysis of the temporal event sequences in electronic health records (EHR) data with applications in risk stratification and risk score systems. We show that optimization of the log-likelihood function also gives disproportionate attention to high risk individuals and leads to poor prediction results for low risk individuals compared to ones at high risk. We characterize the problem and propose an adjusted log-likelihood formulation as a new objective for point processes. We demonstrate the benefits of our method in simulations and in EHR data of patients admitted to the critical care unit for intracerebral hemorrhage.
more | pdf | html
None.
###### Tweets
arxiv_org: Harmonic Mean Point Processes: Proportional Rate Error Minimization for Obtundation Predi... https://t.co/jVVAAzpp5W https://t.co/D6f6KszYrC
arxivml: "Harmonic Mean Point Processes: Proportional Rate Error Minimization for Obtundation Prediction", Yoonjung Kim, Jer… https://t.co/dM6R5LlHQm
None.
None.
###### Other stats
Sample Sizes : None.
Authors: 2
Total Words: 0
Unqiue Words: 0

##### #3. Exponential Convergence Rates of Classification Errors on Learning with SGD and Random Features
###### Shingo Yashima, Atsushi Nitanda, Taiji Suzuki
Although kernel methods are widely used in many learning problems, they have poor scalability to large datasets. To address this problem, sketching and stochastic gradient methods are the most commonly used techniques to derive efficient large-scale learning algorithms. In this study, we consider solving a binary classification problem using random features and stochastic gradient descent. In recent research, an exponential convergence rate of the expected classification error under the strong low-noise condition has been shown. We extend these analyses to a random features setting, analyzing the error induced by the approximation of random features in terms of the distance between the generated hypothesis including population risk minimizers and empirical risk minimizers when using general Lipschitz loss functions, to show that an exponential convergence of the expected classification error is achieved even if random features approximation is applied. Additionally, we demonstrate that the convergence rate does not depend on the...
more | pdf | html
None.
###### Tweets
BrundageBot: Exponential Convergence Rates of Classification Errors on Learning with SGD and Random Features. Shingo Yashima, Atsushi Nitanda, and Taiji Suzuki https://t.co/UHSlu56ks6
arxivml: "Exponential Convergence Rates of Classification Errors on Learning with SGD and Random Features", Shingo Yashima, … https://t.co/YYe9f8Wi14
None.
None.
###### Other stats
Sample Sizes : None.
Authors: 3
Total Words: 0
Unqiue Words: 0

##### #4. Long-range Event-level Prediction and Response Simulation for Urban Crime and Global Terrorism with Granger Networks
###### Timmy Li, Yi Huang, James Evans, Ishanu Chattopadhyay
Large-scale trends in urban crime and global terrorism are well-predicted by socio-economic drivers, but focused, event-level predictions have had limited success. Standard machine learning approaches are promising, but lack interpretability, are generally interpolative, and ineffective for precise future interventions with costly and wasteful false positives. Here, we are introducing Granger Network inference as a new forecasting approach for individual infractions with demonstrated performance far surpassing past results, yet transparent enough to validate and extend social theory. Considering the problem of predicting crime in the City of Chicago, we achieve an average AUC of ~90\% for events predicted a week in advance within spatial tiles approximately $1000$ ft across. Instead of pre-supposing that crimes unfold across contiguous spaces akin to diffusive systems, we learn the local transport rules from data. As our key insights, we uncover indications of suburban bias -- how law-enforcement response is modulated by...
more | pdf | html
None.
###### Tweets
arxivml: "Long-range Event-level Prediction and Response Simulation for Urban Crime and Global Terrorism with Granger Networ… https://t.co/9q7d4i9gaQ
SRoyLee: Long-range Event-level Prediction and Response Simulation for Urban Crime and Global Terrorism with Granger N - https://t.co/Rkt3PEPvaC
None.
None.
###### Other stats
Sample Sizes : None.
Authors: 4
Total Words: 0
Unqiue Words: 0

##### #5. Short-term forecasting of solar irradiance without local telemetry: a generalized model using satellite data
###### Jesus Lago, Karel De Brabandere, Fjo De Ridder, Bart De Schutter
Due to the increasing integration of solar power into the electrical grid, forecasting short-term solar irradiance has become key for many applications, e.g.~operational planning, power purchases, reserve activation, etc. In this context, as solar generators are geographically dispersed and ground measurements are not always easy to obtain, it is very important to have general models that can predict solar irradiance without the need of local data. In this paper, a model that can perform short-term forecasting of solar irradiance in any general location without the need of ground measurements is proposed. To do so, the model considers satellite-based measurements and weather-based forecasts, and employs a deep neural network structure that is able to generalize across locations; particularly, the network is trained only using a small subset of sites where ground data is available, and the model is able to generalize to a much larger number of locations where ground data does not exist. As a case study, 25 locations in The...
more | pdf | html
None.
###### Tweets
tweet_nakasho: 人工衛星のデータから機械学習を使って太陽放射強度の予測をした論文...なんか太陽関連の論文が立て続けに出てる！？ https://t.co/3hfkx7YX1j
arxivml: "Short-term forecasting of solar irradiance without local telemetry: a generalized model using satellite data", Jes… https://t.co/6YVAcw1YHD
arxiv_cs_LG: Short-term forecasting of solar irradiance without local telemetry: a generalized model using satellite data. Jesus Lago, Karel De Brabandere, Fjo De Ridder, and Bart De Schutter https://t.co/dGEJ6PfgmL
None.
None.
###### Other stats
Sample Sizes : None.
Authors: 4
Total Words: 0
Unqiue Words: 0

##### #6. Causality-based tests to detect the influence of confounders on mobile health diagnostic applications: a comparison with restricted permutations
###### Elias Chaibub Neto, Meghasyam Tummalacherla, Lara Mangravite, Larsson Omberg
Machine learning practice is often impacted by confounders. Confounding can be particularly severe in remote digital health studies where the participants self-select to enter the study. While many different confounding adjustment approaches have been proposed in the literature, most of these methods rely on modeling assumptions, and it is unclear how robust they are to violations of these assumptions. This realization has recently motivated the development of restricted permutation methods to quantify the influence of observed confounders on the predictive performance of a machine learning models and evaluate if confounding adjustment methods are working as expected. In this paper we show, nonetheless, that restricted permutations can generate biased estimates of the contribution of the confounders to the predictive performance of a learner, and we propose an alternative approach to tackle this problem. By viewing a classification task from a causality perspective, we are able to leverage conditional independence tests between...
more | pdf | html
None.
###### Tweets
arxiv_org: Causality-based tests to detect the influence of confounders on mobile health diagnostic... https://t.co/sZbC8EqLDK https://t.co/mR7S4YzLIX
None.
None.
###### Other stats
Sample Sizes : None.
Authors: 4
Total Words: 0
Unqiue Words: 0

##### #7. Purifying Interaction Effects with the Functional ANOVA: An Efficient Algorithm for Recovering Identifiable Additive Models
###### Benjamin Lengerich, Sarah Tan, Chun-Hao Chang, Giles Hooker, Rich Caruana
Recent methods for training generalized additive models (GAMs) with pairwise interactions achieve state-of-the-art accuracy on a variety of datasets. Adding interactions to GAMs, however, introduces an identifiability problem: effects can be freely moved between main effects and interaction effects without changing the model predictions. In some cases, this can lead to contradictory interpretations of the same underlying function. This is a critical problem because a central motivation of GAMs is model interpretability. In this paper, we use the Functional ANOVA decomposition to uniquely define interaction effects and thus produce identifiable additive models with purified interactions. To compute this decomposition, we present a fast, exact, mass-moving algorithm that transforms any piecewise-constant function (such as a tree-based model) into a purified, canonical representation. We apply this algorithm to several datasets and show large disparity, including contradictions, between the apparent and the purified effects.
more | pdf | html
None.
###### Tweets
arxivml: "Purifying Interaction Effects with the Functional ANOVA: An Efficient Algorithm for Recovering Identifiable Additi… https://t.co/vcRz1cIGH2
arxiv_cs_LG: Purifying Interaction Effects with the Functional ANOVA: An Efficient Algorithm for Recovering Identifiable Additive Models. Benjamin Lengerich, Sarah Tan, Chun-Hao Chang, Giles Hooker, and Rich Caruana https://t.co/KS84le5bgL
None.
None.
###### Other stats
Sample Sizes : None.
Authors: 5
Total Words: 0
Unqiue Words: 0

##### #8. Nonconvex Stochastic Nested Optimization via Stochastic ADMM
###### Zhongruo Wang
We consider the stochastic nested composition optimization problem where the objective is a composition of two expected-value functions. We proposed the stochastic ADMM to solve this complicated objective. In order to find an $\epsilon$ stationary point where the expected norm of the subgradient of corresponding augmented Lagrangian is smaller than $\epsilon$, the total sample complexity of our method is $\mathcal{O}(\epsilon^{-3})$ for the online case and $\mathcal{O} \Bigl((2N_1 + N_2) + (2N_1 + N_2)^{1/2}\epsilon^{-2}\Bigr)$ for the finite sum case. The computational complexity is consistent with proximal version proposed in \cite{zhang2019multi}, but our algorithm can solve more general problem when the proximal mapping of the penalty is not easy to compute.
more | pdf | html
None.
###### Tweets
arxivml: "Nonconvex Stochastic Nested Optimization via Stochastic ADMM", Zhongruo Wang https://t.co/oFVsPppiNb
None.
None.
###### Other stats
Sample Sizes : None.
Authors: 1
Total Words: 0
Unqiue Words: 0

##### #9. Sparse estimation via $\ell_q$ optimization method in high-dimensional linear regression
###### Xin Li, Yaohua Hu, Chong Li, Xiaoqi Yang, Tianzi Jiang
In this paper, we discuss the statistical properties of the $\ell_q$ optimization methods $(0<q\leq 1)$, including the $\ell_q$ minimization method and the $\ell_q$ regularization method, for estimating a sparse parameter from noisy observations in high-dimensional linear regression with either a deterministic or random design. For this purpose, we introduce a general $q$-restricted eigenvalue condition (REC) and provide its sufficient conditions in terms of several widely-used regularity conditions such as sparse eigenvalue condition, restricted isometry property, and mutual incoherence property. By virtue of the $q$-REC, we exhibit the stable recovery property of the $\ell_q$ optimization methods for either deterministic or random designs by showing that the $\ell_2$ recovery bound $O(\epsilon^2)$ for the $\ell_q$ minimization method and the oracle inequality and $\ell_2$ recovery bound $O(\lambda^{\frac{2}{2-q}}s)$ for the $\ell_q$ regularization method hold respectively with high probability. The results in this paper are...
more | pdf | html
None.
###### Tweets
arxivml: "Sparse estimation via $\ell_q$ optimization method in high-dimensional linear regression", Xin Li, Yaohua Hu, Chon… https://t.co/BOlrOZfAMB
None.
None.
###### Other stats
Sample Sizes : None.
Authors: 5
Total Words: 0
Unqiue Words: 0

##### #10. Clustered Gaussian process model with an application to solar irradiance emulation
###### Chih-Li Sung, Benjamin Haaland, Youngdeok Hwang, Siyuan Lu
A Gaussian process has been one of the important approaches for emulating computer simulations. However, the stationarity assumption for a Gaussian process and the intractability for large-scale dataset limit its availability in practice. In this article, we propose a clustered Gaussian process model which segments the input data into multiple clusters, in each of which a Gaussian process is performed. The stochastic expectation-maximization is employed to efficiently fit the model. In our simulations as well as a real application to solar irradiance emulation, our proposed method had smaller mean square error than its main competitors, with competitive computation time, and provides valuable insights from data by discovering the clusters.
more | pdf | html
None.
###### Tweets
tweet_nakasho: clustered ガウス過程　なるものを使って、太陽の放射強度の大規模データをモデリングしたお話。 ガウス過程の勉強をしよう。 https://t.co/OMqVHMQ4Ln
TomiyaAkio: RT @tweet_nakasho: clustered ガウス過程　なるものを使って、太陽の放射強度の大規模データをモデリングしたお話。 ガウス過程の勉強をしよう。 https://t.co/OMqVHMQ4Ln
None.
None.
###### Other stats
Sample Sizes : None.
Authors: 4
Total Words: 11397
Unqiue Words: 2740

Assert is a website where the best academic papers on arXiv (computer science, math, physics), bioRxiv (biology), BITSS (reproducibility), EarthArXiv (earth science), engrXiv (engineering), LawArXiv (law), PsyArXiv (psychology), SocArXiv (social science), and SportRxiv (sport research) bubble to the top each day.

Papers are scored (in real-time) based on how verifiable they are (as determined by their Github repos) and how interesting they are (based on Twitter).

To see top papers, follow us on twitter @assertpub_ (arXiv), @assert_pub (bioRxiv), and @assertpub_dev (everything else).

To see beautiful figures extracted from papers, follow us on Instagram.

Tracking 222,125 papers.

###### Search
Sort results based on if they are interesting or reproducible.
Interesting
Reproducible
Online
###### Stats
Tracking 222,125 papers.