Machine learning methods such as convolutional neural networks (CNNs) are
becoming an integral part of scientific research in many disciplines, spatial
vector data often fail to be analyzed using these powerful learning methods
because of its irregularities. With the aid of graph Fourier transform and
convolution theorem, it is possible to convert the convolution as a point-wise
product in Fourier domain and construct a learning architecture of CNN on graph
for the analysis task of irregular spatial data. In this study, we used the
classification task of building patterns as a case study to test this method,
and experiments showed that this method has achieved outstanding results in
identifying regular and irregular patterns, and has significantly improved in
comparing with other methods.

Authors: 2

Total Words: 2872

Unqiue Words: 1081

Characterizing statistical properties of solutions of inverse problems is
essential for decision making. Bayesian inversion offers a tractable framework
for this purpose, but current approaches are computationally unfeasible for
most realistic imaging applications in the clinic. We introduce two novel deep
learning based methods for solving large-scale inverse problems using Bayesian
inversion: a sampling based method using a WGAN with a novel mini-discriminator
and a direct approach that trains a neural network using a novel loss function.
The performance of both methods is demonstrated on image reconstruction in
ultra low dose 3D helical CT. We compute the posterior mean and standard
deviation of the 3D images followed by a hypothesis test to assess whether a
"dark spot" in the liver of a cancer stricken patient is present. Both methods
are computationally efficient and our evaluation shows very promising
performance that clearly supports the claim that Bayesian inversion is usable
for 3D imaging in time critical applications.

Authors: 2

Total Words: 13166

Unqiue Words: 3185

Finite mixture models have been a very important tool for exploring complex
data structures in many scientific areas, for example, economics, epidemiology,
finance. In the past decade, semiparametric techniques have been popularly
introduced into traditional finite mixture models, and so semiparametric
mixture models have experienced exciting development in methodologies, theories
and applications. In this article, we provide a selective overview of
newly-developed semiparametric mixture models, discuss their estimation
methodologies, theoretical properties if applied, and some open questions.
Recent developments and some open questions are also discussed.

Authors: 3

Total Words: 10701

Unqiue Words: 2606

We propose an algorithm that is capable of imposing shape constraints on
regression curves, without requiring the constraints to be written as
closed-form expressions, nor assuming the functional form of the loss function.
Our algorithm, which is based on Sequential Monte Carlo-Simulated Annealing,
only relies on an indicator function that assesses whether or not the
constraints are fulfilled, thus allowing us to enforce various complex
constraints by specifying an appropriate indicator function without altering
other parts of the algorithm. We demonstrate our algorithm by fitting rational
function models subject to monotonicity and continuity constraints. The
algorithm was implemented using R (R Core Team, 2018) and the code is freely
available on GitHub.

Authors: 3

Total Words: 5846

Unqiue Words: 1872

Process capability index (PCI) is a commonly used statistic to measure
ability of a process to operate within the given specifications or to produce
products which meet the required quality specifications. PCI can be univariate
or multivariate depending upon the number of process specifications or quality
characteristics of interest. Most PCIs make distributional assumptions which
are often unrealistic in practice.
This paper proposes a new multivariate non-parametric process capability
index. This index can be used when distribution of the process or quality
parameters is either unknown or does not follow commonly used distributions
such as multivariate normal.

Authors: 3

Total Words: 4135

Unqiue Words: 1266

The aim of the plsRglm package is to deal with complete and incomplete
datasets through several new techniques or, at least, some which were not yet
implemented in R. Indeed, not only does it make available the extension of the
PLS regression to the generalized linear regression models, but also bootstrap
techniques, leave-one-out and repeated $k$-fold cross-validation. In addition,
graphical displays help the user to assess the significance of the predictors
when using bootstrap techniques. Biplots (Fig. 4) can be used to delve into the
relationship between individuals and variables.

Authors: 2

Total Words: 2212

Unqiue Words: 976

The computational cost as well as the probabilistic skill of ensemble
forecasts depends on the spatial resolution of the numerical weather prediction
model and the ensemble size. Periodically, e.g. when more computational
resources become available, it is appropriate to reassess the balance between
resolution and ensemble size. Recently, it has been proposed to investigate
this balance in the context of dual-resolution ensembles, which use members
with two different resolutions to make probabilistic forecasts. This study
investigates whether statistical post-processing of such dual-resolution
ensemble forecasts changes the conclusions regarding the optimal
dual-resolution configuration.
Medium-range dual-resolution ensemble forecasts of 2-metre temperature have
been calibrated using ensemble model output statistics. The forecasts are
produced with ECMWF's Integrated Forecast System and have horizontal
resolutions between 18 km and 45 km. The ensemble sizes range from 8 to 254
members. The forecasts are verified with SYNOP...

Authors: 4

Total Words: 9735

Unqiue Words: 2270

Many problems that appear in biomedical decision making, such as diagnosing
disease and predicting response to treatment, can be expressed as binary
classification problems. The costs of false positives and false negatives vary
across application domains and receiver operating characteristic (ROC) curves
provide a visual representation of this trade-off. Nonparametric estimators for
the ROC curve, such as a weighted support vector machine (SVM), are desirable
because they are robust to model misspecification. While weighted SVMs have
great potential for estimating ROC curves, their theoretical properties were
heretofore underdeveloped. We propose a method for constructing confidence
bands for the SVM ROC curve and provide the theoretical justification for the
SVM ROC curve by showing that the risk function of the estimated decision rule
is uniformly consistent across the weight parameter. We demonstrate the
proposed confidence band method and the superior sensitivity and specificity of
the weighted SVM compared to commonly used...

Authors: 8

Total Words: 11320

Unqiue Words: 2363

We study a stylized dynamic assortment planning problem during a selling
season of finite length $T$, by considering a nested multinomial logit model
with $M$ nests and $N$ items per nest. Our policy simultaneously learns
customers' choice behavior and makes dynamic decisions on assortments based on
the current knowledge. It achieves the regret at the order of
$\tilde{O}(\sqrt{MNT}+MN^2)$, where $M$ is the number of nests and $N$ is the
number of products in each nest. We further provide a lower bound result of
$\Omega(\sqrt{MT})$, which shows the optimality of the upper bound when $T>M$
and $N$ is small. However, the $N^2$ term in the upper bound is not ideal for
applications where $N$ is large as compared to $T$. To address this issue, we
further generalize our first policy by introducing a discretization technique,
which leads to a regret of $\tilde{O}(\sqrt{M}T^{2/3}+MNT^{1/3})$ with a
specific choice of discretization granularity. It improves the previous regret
bound whenever $N>T^{1/3}$. We provide numerical results to...

Authors: 3

Total Words: 17873

Unqiue Words: 3153

Sparse regression such as Lasso has achieved great success in dealing with
high dimensional data for several decades. However, there are few methods
applicable to missing data, which often occurs in high dimensional data.
Recently, CoCoLasso was proposed to deal with high dimensional missing data,
but it still suffers from highly missing data. In this paper, we propose a
novel Lasso-type regression technique for Highly Missing data, called
`HMLasso'. We use the mean imputed covariance matrix, which is notorious in
general due to its estimation bias for missing data. However, we effectively
incorporate it into Lasso, by using a useful connection with the pairwise
covariance matrix. The resulting optimization problem can be seen as a weighted
modification of CoCoLasso with the missing ratios, and is quite effective for
highly missing data. To the best of our knowledge, this is the first method
that can efficiently deal with both high dimensional and highly missing data.
We show that the proposed method is beneficial with regards to...

Authors: 3

Total Words: 8323

Unqiue Words: 1961

