### Top 10 Arxiv Papers Today in Statistics

##### #1. Analysis of Irregular Spatial Data with Machine Learning: Classification of Building Patterns with a Graph Convolutional Neural Network
###### Xiongfeng Yan, Tinghua Ai
Machine learning methods such as convolutional neural networks (CNNs) are becoming an integral part of scientific research in many disciplines, spatial vector data often fail to be analyzed using these powerful learning methods because of its irregularities. With the aid of graph Fourier transform and convolution theorem, it is possible to convert the convolution as a point-wise product in Fourier domain and construct a learning architecture of CNN on graph for the analysis task of irregular spatial data. In this study, we used the classification task of building patterns as a case study to test this method, and experiments showed that this method has achieved outstanding results in identifying regular and irregular patterns, and has significantly improved in comparing with other methods.
more | pdf | html
###### Tweets
arxivml: "Analysis of Irregular Spatial Data with Machine Learning: Classification of Building Patterns with a Graph Convolu… https://t.co/XwMdLOb5RC
nmfeeds: [O] https://t.co/PhJx9JIFPv Analysis of Irregular Spatial Data with Machine Learning: Classification of Building Patterns ...
Memoirs: Analysis of Irregular Spatial Data with Machine Learning: Classification of Building Patterns with a Graph Convolutional Neural Network. https://t.co/StsSt7Bv5c
None.
None.
###### Other stats
Sample Sizes : None.
Authors: 2
Total Words: 2872
Unqiue Words: 1081

##### #2. Deep Bayesian Inversion
Characterizing statistical properties of solutions of inverse problems is essential for decision making. Bayesian inversion offers a tractable framework for this purpose, but current approaches are computationally unfeasible for most realistic imaging applications in the clinic. We introduce two novel deep learning based methods for solving large-scale inverse problems using Bayesian inversion: a sampling based method using a WGAN with a novel mini-discriminator and a direct approach that trains a neural network using a novel loss function. The performance of both methods is demonstrated on image reconstruction in ultra low dose 3D helical CT. We compute the posterior mean and standard deviation of the 3D images followed by a hypothesis test to assess whether a "dark spot" in the liver of a cancer stricken patient is present. Both methods are computationally efficient and our evaluation shows very promising performance that clearly supports the claim that Bayesian inversion is usable for 3D imaging in time critical applications.
more | pdf | html
###### Tweets
BrundageBot: Deep Bayesian Inversion. Jonas Adler and Ozan Öktem https://t.co/XteCMmsI8s
arxivml: "Deep Bayesian Inversion", Jonas Adler, Ozan Öktem https://t.co/b8PgZrq6I1
nmfeeds: [O] https://t.co/TvYuH2wp2H Deep Bayesian Inversion. Characterizing statistical properties of solutions of inverse problem...
StatsPapers: Deep Bayesian Inversion. https://t.co/zWIOyxWY2P
None.
None.
###### Other stats
Sample Sizes : None.
Authors: 2
Total Words: 13166
Unqiue Words: 3185

##### #3. An Overview of Semiparametric Extensions of Finite Mixture Models
###### Sijia Xiang, Weixin Yao, Guangren Yang
Finite mixture models have been a very important tool for exploring complex data structures in many scientific areas, for example, economics, epidemiology, finance. In the past decade, semiparametric techniques have been popularly introduced into traditional finite mixture models, and so semiparametric mixture models have experienced exciting development in methodologies, theories and applications. In this article, we provide a selective overview of newly-developed semiparametric mixture models, discuss their estimation methodologies, theoretical properties if applied, and some open questions. Recent developments and some open questions are also discussed.
more | pdf | html
None.
###### Tweets
StatsPapers: An Overview of Semiparametric Extensions of Finite Mixture Models. https://t.co/rRwtcMMQWG
None.
None.
###### Other stats
Sample Sizes : None.
Authors: 3
Total Words: 10701
Unqiue Words: 2606

##### #4. A flexible sequential Monte Carlo algorithm for shape-constrained regression
###### Kenyon Ng, Kevin Murray, Berwin A. Turlach
We propose an algorithm that is capable of imposing shape constraints on regression curves, without requiring the constraints to be written as closed-form expressions, nor assuming the functional form of the loss function. Our algorithm, which is based on Sequential Monte Carlo-Simulated Annealing, only relies on an indicator function that assesses whether or not the constraints are fulfilled, thus allowing us to enforce various complex constraints by specifying an appropriate indicator function without altering other parts of the algorithm. We demonstrate our algorithm by fitting rational function models subject to monotonicity and continuity constraints. The algorithm was implemented using R (R Core Team, 2018) and the code is freely available on GitHub.
more | pdf | html
None.
###### Tweets
StatsPapers: A flexible sequential Monte Carlo algorithm for shape-constrained regression. https://t.co/B2ZYUVdKLz
None.
None.
###### Other stats
Sample Sizes : None.
Authors: 3
Total Words: 5846
Unqiue Words: 1872

##### #5. A New SVDD-Based Multivariate Non-parametric Process Capability Index
###### Deovrat Kakde, Arin Chaudhuri, Diana Shaw
Process capability index (PCI) is a commonly used statistic to measure ability of a process to operate within the given specifications or to produce products which meet the required quality specifications. PCI can be univariate or multivariate depending upon the number of process specifications or quality characteristics of interest. Most PCIs make distributional assumptions which are often unrealistic in practice. This paper proposes a new multivariate non-parametric process capability index. This index can be used when distribution of the process or quality parameters is either unknown or does not follow commonly used distributions such as multivariate normal.
more | pdf | html
###### Tweets
arxiv_org: A New SVDD-Based Multivariate Non-parametric Process Capability Index. https://t.co/ckbpdR0H2a https://t.co/MwA7yYxdRT
arxivml: "A New SVDD-Based Multivariate Non-parametric Process Capability Index", Deovrat Kakde, Arin Chaudhuri, Diana Shaw https://t.co/R5sRvxiYZi
StatsPapers: A New SVDD-Based Multivariate Non-parametric Process Capability Index. https://t.co/SRSH8CwG24
None.
None.
###### Other stats
Sample Sizes : None.
Authors: 3
Total Words: 4135
Unqiue Words: 1266

##### #6. plsRglm: Partial least squares linear and generalized linear regression for processing incomplete datasets by cross-validation and bootstrap techniques with R
###### F. Bertrand, M. Maumy-Bertrand
The aim of the plsRglm package is to deal with complete and incomplete datasets through several new techniques or, at least, some which were not yet implemented in R. Indeed, not only does it make available the extension of the PLS regression to the generalized linear regression models, but also bootstrap techniques, leave-one-out and repeated $k$-fold cross-validation. In addition, graphical displays help the user to assess the significance of the predictors when using bootstrap techniques. Biplots (Fig. 4) can be used to delve into the relationship between individuals and variables.
more | pdf | html
None.
###### Tweets
StatsPapers: plsRglm: Partial least squares linear and generalized linear regression for processing incomplete datasets by cross-validation and bootstrap techniques with R. https://t.co/DSVlMa6iSq
None.
None.
###### Other stats
Sample Sizes : None.
Authors: 2
Total Words: 2212
Unqiue Words: 976

##### #7. Statistical post-processing of dual-resolution ensemble forecasts
###### Sándor Baran, Martin Leutbecher, Marianna Szabó, Zied Ben Bouallègue
The computational cost as well as the probabilistic skill of ensemble forecasts depends on the spatial resolution of the numerical weather prediction model and the ensemble size. Periodically, e.g. when more computational resources become available, it is appropriate to reassess the balance between resolution and ensemble size. Recently, it has been proposed to investigate this balance in the context of dual-resolution ensembles, which use members with two different resolutions to make probabilistic forecasts. This study investigates whether statistical post-processing of such dual-resolution ensemble forecasts changes the conclusions regarding the optimal dual-resolution configuration. Medium-range dual-resolution ensemble forecasts of 2-metre temperature have been calibrated using ensemble model output statistics. The forecasts are produced with ECMWF's Integrated Forecast System and have horizontal resolutions between 18 km and 45 km. The ensemble sizes range from 8 to 254 members. The forecasts are verified with SYNOP...
more | pdf | html
None.
###### Tweets
StatsPapers: Statistical post-processing of dual-resolution ensemble forecasts. https://t.co/kZfA0sNXbz
None.
None.
###### Other stats
Sample Sizes : None.
Authors: 4
Total Words: 9735
Unqiue Words: 2270

##### #8. Receiver Operating Characteristic Curves and Confidence Bands for Support Vector Machines
###### Daniel J. Luckett, Eric B. Laber, Samer S. El-Kamary, Cheng Fan, Ravi Jhaveri, Charles M. Perou, Fatma M. Shebl, Michael R. Kosorok
Many problems that appear in biomedical decision making, such as diagnosing disease and predicting response to treatment, can be expressed as binary classification problems. The costs of false positives and false negatives vary across application domains and receiver operating characteristic (ROC) curves provide a visual representation of this trade-off. Nonparametric estimators for the ROC curve, such as a weighted support vector machine (SVM), are desirable because they are robust to model misspecification. While weighted SVMs have great potential for estimating ROC curves, their theoretical properties were heretofore underdeveloped. We propose a method for constructing confidence bands for the SVM ROC curve and provide the theoretical justification for the SVM ROC curve by showing that the risk function of the estimated decision rule is uniformly consistent across the weight parameter. We demonstrate the proposed confidence band method and the superior sensitivity and specificity of the weighted SVM compared to commonly used...
more | pdf | html
###### Tweets
arxiv_org: Receiver Operating Characteristic Curves and Confidence Bands for Support Vector Machines. https://t.co/DrZTs3WR4R https://t.co/g61WrLS9cA
HubBucket: RT @arxiv_org: Receiver Operating Characteristic Curves and Confidence Bands for Support Vector Machines. https://t.co/DrZTs3WR4R https://t…
None.
None.
###### Other stats
Sample Sizes : None.
Authors: 8
Total Words: 11320
Unqiue Words: 2363

##### #9. Dynamic Assortment Selection under the Nested Logit Models
###### Xi Chen, Yining Wang, Yuan Zhou
We study a stylized dynamic assortment planning problem during a selling season of finite length $T$, by considering a nested multinomial logit model with $M$ nests and $N$ items per nest. Our policy simultaneously learns customers' choice behavior and makes dynamic decisions on assortments based on the current knowledge. It achieves the regret at the order of $\tilde{O}(\sqrt{MNT}+MN^2)$, where $M$ is the number of nests and $N$ is the number of products in each nest. We further provide a lower bound result of $\Omega(\sqrt{MT})$, which shows the optimality of the upper bound when $T>M$ and $N$ is small. However, the $N^2$ term in the upper bound is not ideal for applications where $N$ is large as compared to $T$. To address this issue, we further generalize our first policy by introducing a discretization technique, which leads to a regret of $\tilde{O}(\sqrt{M}T^{2/3}+MNT^{1/3})$ with a specific choice of discretization granularity. It improves the previous regret bound whenever $N>T^{1/3}$. We provide numerical results to...
more | pdf | html
None.
None.
###### Other stats
Sample Sizes : None.
Authors: 3
Total Words: 17873
Unqiue Words: 3153

##### #10. HMLasso: Lasso for High Dimensional and Highly Missing Data
###### Masaaki Takada, Hironori Fujisawa, Takeichiro Nishikawa
Sparse regression such as Lasso has achieved great success in dealing with high dimensional data for several decades. However, there are few methods applicable to missing data, which often occurs in high dimensional data. Recently, CoCoLasso was proposed to deal with high dimensional missing data, but it still suffers from highly missing data. In this paper, we propose a novel Lasso-type regression technique for Highly Missing data, called `HMLasso'. We use the mean imputed covariance matrix, which is notorious in general due to its estimation bias for missing data. However, we effectively incorporate it into Lasso, by using a useful connection with the pairwise covariance matrix. The resulting optimization problem can be seen as a weighted modification of CoCoLasso with the missing ratios, and is quite effective for highly missing data. To the best of our knowledge, this is the first method that can efficiently deal with both high dimensional and highly missing data. We show that the proposed method is beneficial with regards to...
more | pdf | html
###### Tweets
arxivml: "HMLasso: Lasso for High Dimensional and Highly Missing Data", Masaaki Takada, Hironori Fujisawa, Takeichiro Nishik… https://t.co/UQRkIuStTK
FerrumA: Lasso regression for highly missing data: https://t.co/LPKqAKQr60 (How good, practical for large p and p&gt;n?)
FerrumA: LASSO regression for highly missing data: https://t.co/LPKqAKQr60 (experiments for p &lt;&lt; n).
ComputerPapers: HMLasso: Lasso for High Dimensional and Highly Missing Data. https://t.co/U55nlLKzrs
None.
None.
###### Other stats
Sample Sizes : None.
Authors: 3
Total Words: 8323
Unqiue Words: 1961

Assert is a website where the best academic papers on arXiv (computer science, math, physics), bioRxiv (biology), BITSS (reproducibility), EarthArXiv (earth science), engrXiv (engineering), LawArXiv (law), PsyArXiv (psychology), SocArXiv (social science), and SportRxiv (sport research) bubble to the top each day.

Papers are scored (in real-time) based on how verifiable they are (as determined by their Github repos) and how interesting they are (based on Twitter).

To see top papers, follow us on twitter @assertpub_ (arXiv), @assert_pub (bioRxiv), and @assertpub_dev (everything else).

To see beautiful figures extracted from papers, follow us on Instagram.

Tracking 72,893 papers.

###### Search
Sort results based on if they are interesting or reproducible.
Interesting
Reproducible
Online
###### Stats
Tracking 72,893 papers.