Support vector data description (SVDD) is a popular anomaly detection
technique. The SVDD classifier partitions the whole data space into an
$\textit{inlier}$ region, which consists of the region $\textit{near}$ the
training data, and an $\textit{outlier}$ region, which consists of points
$\textit{away}$ from the training data. The computation of the SVDD classifier
requires a kernel function, for which the Gaussian kernel is a common choice.
The Gaussian kernel has a bandwidth parameter, and it is important to set the
value of this parameter correctly for good results. A small bandwidth leads to
overfitting such that the resulting SVDD classifier overestimates the number of
anomalies, whereas a large bandwidth leads to underfitting and an inability to
detect many anomalies. In this paper, we present a new unsupervised method for
selecting the Gaussian kernel bandwidth. Our method, which exploits the
low-rank representation of the kernel matrix to suggest a kernel bandwidth
value, is competitive with existing bandwidth selection methods.

more |
pdf
| html
arxivml:
"The Trace Criterion for Kernel Bandwidth Selection for Support Vector Data Description",
Arin Chaudhuri, Deovrat K…
https://t.co/CJhMzD1SDj

StatsPapers:
The Trace Criterion for Kernel Bandwidth Selection for Support Vector Data Description. https://t.co/ef4dYaLUYL

None.

None.

Sample Sizes : None.

Authors: 9

Total Words: 7299

Unqiue Words: 1718

The problem of dimension reduction is of increasing importance in modern data
analysis. In this paper, we consider modeling the collection of points in a
high dimensional space as a union of low dimensional subspaces. In particular
we propose a highly scalable sampling based algorithm that clusters the entire
data via first spectral clustering of a small random sample followed by
classifying or labeling the remaining out of sample points. The key idea is
that this random subset borrows information across the entire data set and that
the problem of clustering points can be replaced with the more efficient and
robust problem of "clustering sub-clusters". We provide theoretical guarantees
for our procedure. The numerical results indicate we outperform other
state-of-the-art subspace clustering algorithms with respect to accuracy and
speed.

more |
pdf
| html
None.

arxivml:
"Subspace Clustering through Sub-Clusters",
Weiwei Li, Jan Hannig, Sayan Mukherjee
https://t.co/9OfaJHb4Q4

StatsPapers:
Subspace Clustering through Sub-Clusters. https://t.co/EeSIluBu91

None.

None.

Sample Sizes : None.

Authors: 3

Total Words: 11729

Unqiue Words: 2552

In unsupervised learning, dimensionality reduction is an important tool for
data exploration and visualization. Because these aims are typically
open-ended, it can be useful to frame the problem as looking for patterns that
are enriched in one dataset relative to another. These pairs of datasets occur
commonly, for instance a population of interest vs. control or signal vs.
signal free recordings.However, there are few methods that work on sets of data
as opposed to data points or sequences. Here, we present a probabilistic model
for dimensionality reduction to discover signal that is enriched in the target
dataset relative to the background dataset. The data in these sets do not need
to be paired or grouped beyond set membership. By using a probabilistic model
where some structure is shared amongst the two datasets and some is unique to
the target dataset, we are able to recover interesting structure in the latent
space of the target dataset. The method also has the advantages of a
probabilistic model, namely that it allows for...

more |
pdf
| html
arxiv_org:
Unsupervised learning with contrastive latent variable models. https://t.co/eCTpcFVu51 https://t.co/WOKDshSSr2

arxivml:
"Unsupervised learning with contrastive latent variable models",
Kristen Severson, Soumya Ghosh, Kenney Ng
https://t.co/fYoduOCZhR

nmfeeds:
[O] https://t.co/JGmYonWfJ8 Unsupervised learning with contrastive latent variable models. In unsupervised learning, dimen...

StatsPapers:
Unsupervised learning with contrastive latent variable models. https://t.co/0Dm0GtzD6T

Rosenchild:
RT @arxiv_org: Unsupervised learning with contrastive latent variable models. https://t.co/eCTpcFVu51 https://t.co/WOKDshSSr2

kuronekodaisuki:
RT @arxiv_org: Unsupervised learning with contrastive latent variable models. https://t.co/eCTpcFVu51 https://t.co/WOKDshSSr2

None.

None.

Sample Sizes : None.

Authors: 3

Total Words: 8137

Unqiue Words: 2256

In machine learning, a nonparametric forecasting algorithm for time series
data has been proposed, called the kernel spectral hidden Markov model (KSHMM).
In this paper, we propose a technique for short-term wind-speed prediction
based on KSHMM. We numerically compared the performance of our KSHMM-based
forecasting technique to other techniques with machine learning, using
wind-speed data offered by the National Renewable Energy Laboratory. Our
results demonstrate that, compared to these methods, the proposed technique
offers comparable or better performance.

more |
pdf
| html
arxiv_org:
Short-Term Wind-Speed Forecasting Using Kernel Spectral Hidden Markov Models. https://t.co/m01SaAnkUh https://t.co/EUdVpBwEVg

BrundageBot:
Short-Term Wind-Speed Forecasting Using Kernel Spectral Hidden Markov Models. Shunsuke Tsuzuki and Yu Nishiyama https://t.co/9YWG97kvwP

arxivml:
"Short-Term Wind-Speed Forecasting Using Kernel Spectral Hidden Markov Models",
Shunsuke Tsuzuki, Yu Nishiyama
https://t.co/F6Gk7WrUE2

nmfeeds:
[O] https://t.co/h1dLaAlWAF Short-Term Wind-Speed Forecasting Using Kernel Spectral Hidden Markov Models. In machine learn...

Memoirs:
Short-Term Wind-Speed Forecasting Using Kernel Spectral Hidden Markov Models. https://t.co/o8vdexmWNN

Rosenchild:
RT @arxiv_org: Short-Term Wind-Speed Forecasting Using Kernel Spectral Hidden Markov Models. https://t.co/m01SaAnkUh https://t.co/EUdVpBwEVg

mench90:
RT @arxiv_org: Short-Term Wind-Speed Forecasting Using Kernel Spectral Hidden Markov Models. https://t.co/m01SaAnkUh https://t.co/EUdVpBwEVg

gaialive:
RT @arxiv_org: Short-Term Wind-Speed Forecasting Using Kernel Spectral Hidden Markov Models. https://t.co/m01SaAnkUh https://t.co/EUdVpBwEVg

puneethmishra:
RT @arxiv_org: Short-Term Wind-Speed Forecasting Using Kernel Spectral Hidden Markov Models. https://t.co/m01SaAnkUh https://t.co/EUdVpBwEVg

shubh_300595:
RT @arxiv_org: Short-Term Wind-Speed Forecasting Using Kernel Spectral Hidden Markov Models. https://t.co/m01SaAnkUh https://t.co/EUdVpBwEVg

None.

None.

Sample Sizes : None.

Authors: 2

Total Words: 6788

Unqiue Words: 2013

This paper considers a multi-armed bandit game where the number of arms is
much larger than the maximum budget and is effectively infinite. We
characterize necessary and sufficient conditions on the total budget for an
algorithm to return an {\epsilon}-good arm with probability at least 1 -
{\delta}. In such situations, the sample complexity depends on {\epsilon},
{\delta} and the so-called reservoir distribution {\nu} from which the means of
the arms are drawn iid. While a substantial literature has developed around
analyzing specific cases of {\nu} such as the beta distribution, our analysis
makes no assumption about the form of {\nu}. Our algorithm is based on
successive halving with the surprising exception that arms start to be
discarded after just a single pull, requiring an analysis that goes beyond
concentration alone. The provable correctness of this algorithm also provides
an explanation for the empirical observation that the most aggressive bracket
of the Hyperband algorithm of Li et al. (2017) for hyperparameter tuning...

more |
pdf
| html
arxiv_org:
Pure-Exploration for Infinite-Armed Bandits with General Arm Reservoirs. https://t.co/OPJLccFAvh https://t.co/gKChc3JqqT

arxivml:
"Pure-Exploration for Infinite-Armed Bandits with General Arm Reservoirs",
Maryam Aziz, Kevin Jamieson, Javed Aslam
https://t.co/s649xKYtp3

Memoirs:
Pure-Exploration for Infinite-Armed Bandits with General Arm Reservoirs. https://t.co/WCr1xdLagm

DrPjenFI:
RT @arxiv_org: Pure-Exploration for Infinite-Armed Bandits with General Arm Reservoirs. https://t.co/OPJLccFAvh https://t.co/gKChc3JqqT

lelayf:
RT @arxiv_org: Pure-Exploration for Infinite-Armed Bandits with General Arm Reservoirs. https://t.co/OPJLccFAvh https://t.co/gKChc3JqqT

festivalWon:
RT @arxiv_org: Pure-Exploration for Infinite-Armed Bandits with General Arm Reservoirs. https://t.co/OPJLccFAvh https://t.co/gKChc3JqqT

None.

None.

Sample Sizes : None.

Authors: 3

Total Words: 9363

Unqiue Words: 2139

Discovering new physical products and processes often demands enormous
experimentation and expensive simulation. To design a new product with certain
target characteristics, an extensive search is performed in the design space by
trying out a large number of design combinations before reaching to the target
characteristics. However, forward searching for the target design becomes
prohibitive when the target is itself moving or only partially understood. To
address this bottleneck, we propose to use backward prediction by leveraging
the rich data generated during earlier exploration and construct a machine
learning framework to predict the design parameters for any target in a single
step. This poses two technical challenges: the first caused due to one-to-many
mapping when learning the inverse problem and the second caused due to an user
specifying the target specifications only partially. To overcome the
challenges, we formulate this problem as conditional density estimation under
high-dimensional setting with incomplete input...

more |
pdf
| html
arxiv_org:
Hybrid Generative-Discriminative Models for Inverse Materials Design. https://t.co/WBhZYBW5Bh https://t.co/kfzXKaTRhB

arxivml:
"Hybrid Generative-Discriminative Models for Inverse Materials Design",
Phuoc Nguyen, Truyen Tran, Sunil Gupta, San…
https://t.co/Vw5n1ZTGqN

nmfeeds:
[O] https://t.co/UgOm4myOxr Hybrid Generative-Discriminative Models for Inverse Materials Design. Discovering new physical...

Memoirs:
Hybrid Generative-Discriminative Models for Inverse Materials Design. https://t.co/ivUaBr3sbS

CondMatPhys:
Hybrid Generative-Discriminative Models for Inverse Materials Design https://t.co/Kqp27IjiQY

None.

None.

Sample Sizes : [4]

Authors: 5

Total Words: 7895

Unqiue Words: 2690

Deep learning involves a difficult non-convex optimization problem, which is
often solved by stochastic gradient (SG) methods. While SG is usually
effective, it may not be robust in some situations. Recently, Newton methods
have been investigated as an alternative optimization technique, but nearly all
existing studies consider only fully-connected feedforward neural networks.
They do not investigate other types of networks such as Convolutional Neural
Networks (CNN), which are more commonly used in deep-learning applications. One
reason is that Newton methods for CNN involve complicated operations, and so
far no works have conducted a thorough investigation. In this work, we give
details of all building blocks including function, gradient, and Jacobian
evaluation, and Gauss-Newton matrix-vector products. These basic components are
very important because with them further developments of Newton methods for CNN
become possible. We show that an efficient MATLAB implementation can be done in
just several hundred lines of code and...

more |
pdf
| html
None.

BrundageBot:
Newton Methods for Convolutional Neural Networks. Chien-Chih Wang, Kent Loong Tan, and Chih-Jen Lin https://t.co/s2BMyphVMH

arxivml:
"Newton Methods for Convolutional Neural Networks",
Chien-Chih Wang, Kent Loong Tan, Chih-Jen Lin
https://t.co/3PRprZhLiV

nmfeeds:
[O] https://t.co/MjLe5uxCK6 Newton Methods for Convolutional Neural Networks. Deep learning involves a difficult non-conve...

StatsPapers:
Newton Methods for Convolutional Neural Networks. https://t.co/XQW2m3a43W

None.

None.

Sample Sizes : None.

Authors: 3

Total Words: 14465

Unqiue Words: 2818

We develop a prediction-based prescriptive model for learning optimal
personalized treatments for patients based on their Electronic Health Records
(EHRs). Our approach consists of: (i) predicting future outcomes under each
possible therapy using a robustified nonlinear model, and (ii) adopting a
randomized prescriptive policy determined by the predicted outcomes. We show
theoretical results that guarantee the out-of-sample predictive power of the
model, and prove the optimality of the randomized strategy in terms of the
expected true future outcome. We apply the proposed methodology to develop
optimal therapies for patients with type 2 diabetes or hypertension using EHRs
from a major safety-net hospital in New England, and show that our algorithm
leads to the most significant reduction of the HbA1c, for diabetics, or
systolic blood pressure, for patients with hypertension, compared to the
alternatives. We demonstrate that our approach outperforms the standard of care
under the robustified nonlinear predictive model.

more |
pdf
| html
None.

arxivml:
"Learning Optimal Personalized Treatment Rules Using Robust Regression Informed K-NN",
Ruidi Chen, Ioannis Paschali…
https://t.co/o3Yl1SGp16

Memoirs:
Learning Optimal Personalized Treatment Rules Using Robust Regression Informed K-NN. https://t.co/gJhb5v6xHy

None.

None.

Sample Sizes : None.

Authors: 2

Total Words: 4508

Unqiue Words: 1565

We consider the problem of learning the level set for which a noisy black-box
function exceeds a given threshold.
To efficiently reconstruct the level set, we investigate Gaussian process
(GP) metamodels. Our focus is on strongly stochastic samplers, in particular
with heavy-tailed simulation noise and low signal-to-noise ratio.
To guard against noise misspecification, we assess the performance of three
variants: (i) GPs with Student-$t$ observations; (ii) Student-$t$ processes
(TPs); and (iii) classification GPs modeling the sign of the response. As a
fourth extension, we study GP surrogates with monotonicity constraints that are
relevant when the level set is known to be connected. In conjunction with these
metamodels, we analyze several acquisition functions for guiding the sequential
experimental designs, extending existing stepwise uncertainty reduction
criteria to the stochastic contour-finding context. This also motivates our
development of (approximate) updating formulas to efficiently compute such
acquisition...

more |
pdf
| html
arxiv_org:
Evaluating Gaussian Process Metamodels and Sequential Designs for Noisy Level Set Estimat... https://t.co/XDsJDnEnjC https://t.co/XhTshK08tz

HubBucket:
RT @arxiv_org: Evaluating Gaussian Process Metamodels and Sequential Designs for Noisy Level Set Estimat... https://t.co/XDsJDnEnjC https:/…

DrPjenFI:
RT @arxiv_org: Evaluating Gaussian Process Metamodels and Sequential Designs for Noisy Level Set Estimat... https://t.co/XDsJDnEnjC https:/…

None.

None.

Sample Sizes : None.

Authors: 3

Total Words: 17125

Unqiue Words: 4186

In this paper, we propose an acceleration scheme for online memory-limited
PCA methods. Our scheme converges to the first $k>1$ eigenvectors in a single
data pass. We provide empirical convergence results of our scheme based on the
spiked covariance model. Our scheme does not require any predefined parameters
such as the eigengap and hence is well facilitated for streaming data
scenarios. Furthermore, we apply our scheme to challenging time-varying systems
where online PCA methods fail to converge. Specifically, we discuss a family of
time-varying systems that are based on Molecular Dynamics simulations where
batch PCA converges to the actual analytic solution of such systems.

more |
pdf
| html
None.

None.

Sample Sizes : None.

Authors: 2

Total Words: 3638

Unqiue Words: 1210

Assert is a website where the best academic papers on arXiv (computer science, math, physics), bioRxiv (biology), BITSS (reproducibility), EarthArXiv (earth science), engrXiv (engineering), LawArXiv (law), PsyArXiv (psychology), SocArXiv (social science), and SportRxiv (sport research) bubble to the top each day.

Papers are scored (in real-time) based on how verifiable they are (as determined by their Github repos) and how interesting they are (based on Twitter).

To see top papers, follow us on twitter @assertpub_ (arXiv), @assert_pub (bioRxiv), and @assertpub_dev (everything else).

To see beautiful figures extracted from papers, follow us on Instagram.

*Tracking 58,338 papers.*

Sort results based on if they are interesting or reproducible.

Interesting

Reproducible