### Top 10 Arxiv Papers Today in Machine Learning

##### #1. The Trace Criterion for Kernel Bandwidth Selection for Support Vector Data Description
###### Arin Chaudhuri, Deovrat Kakde, Carol Sadek, Wenhao Hu, Hansi Jiang, Seunghyun Kong, Yuewei Liao, Sergiy Peredriy, Haoyu Wang
Support vector data description (SVDD) is a popular anomaly detection technique. The SVDD classifier partitions the whole data space into an $\textit{inlier}$ region, which consists of the region $\textit{near}$ the training data, and an $\textit{outlier}$ region, which consists of points $\textit{away}$ from the training data. The computation of the SVDD classifier requires a kernel function, for which the Gaussian kernel is a common choice. The Gaussian kernel has a bandwidth parameter, and it is important to set the value of this parameter correctly for good results. A small bandwidth leads to overfitting such that the resulting SVDD classifier overestimates the number of anomalies, whereas a large bandwidth leads to underfitting and an inability to detect many anomalies. In this paper, we present a new unsupervised method for selecting the Gaussian kernel bandwidth. Our method, which exploits the low-rank representation of the kernel matrix to suggest a kernel bandwidth value, is competitive with existing bandwidth selection methods.
more | pdf | html
###### Tweets
arxivml: "The Trace Criterion for Kernel Bandwidth Selection for Support Vector Data Description", Arin Chaudhuri, Deovrat K… https://t.co/CJhMzD1SDj
StatsPapers: The Trace Criterion for Kernel Bandwidth Selection for Support Vector Data Description. https://t.co/ef4dYaLUYL
None.
None.
###### Other stats
Sample Sizes : None.
Authors: 9
Total Words: 7299
Unqiue Words: 1718

##### #2. Subspace Clustering through Sub-Clusters
###### Weiwei Li, Jan Hannig, Sayan Mukherjee
The problem of dimension reduction is of increasing importance in modern data analysis. In this paper, we consider modeling the collection of points in a high dimensional space as a union of low dimensional subspaces. In particular we propose a highly scalable sampling based algorithm that clusters the entire data via first spectral clustering of a small random sample followed by classifying or labeling the remaining out of sample points. The key idea is that this random subset borrows information across the entire data set and that the problem of clustering points can be replaced with the more efficient and robust problem of "clustering sub-clusters". We provide theoretical guarantees for our procedure. The numerical results indicate we outperform other state-of-the-art subspace clustering algorithms with respect to accuracy and speed.
more | pdf | html
None.
###### Tweets
arxivml: "Subspace Clustering through Sub-Clusters", Weiwei Li, Jan Hannig, Sayan Mukherjee https://t.co/9OfaJHb4Q4
StatsPapers: Subspace Clustering through Sub-Clusters. https://t.co/EeSIluBu91
None.
None.
###### Other stats
Sample Sizes : None.
Authors: 3
Total Words: 11729
Unqiue Words: 2552

##### #3. Unsupervised learning with contrastive latent variable models
###### Kristen Severson, Soumya Ghosh, Kenney Ng
In unsupervised learning, dimensionality reduction is an important tool for data exploration and visualization. Because these aims are typically open-ended, it can be useful to frame the problem as looking for patterns that are enriched in one dataset relative to another. These pairs of datasets occur commonly, for instance a population of interest vs. control or signal vs. signal free recordings.However, there are few methods that work on sets of data as opposed to data points or sequences. Here, we present a probabilistic model for dimensionality reduction to discover signal that is enriched in the target dataset relative to the background dataset. The data in these sets do not need to be paired or grouped beyond set membership. By using a probabilistic model where some structure is shared amongst the two datasets and some is unique to the target dataset, we are able to recover interesting structure in the latent space of the target dataset. The method also has the advantages of a probabilistic model, namely that it allows for...
more | pdf | html
###### Tweets
arxiv_org: Unsupervised learning with contrastive latent variable models. https://t.co/eCTpcFVu51 https://t.co/WOKDshSSr2
arxivml: "Unsupervised learning with contrastive latent variable models", Kristen Severson, Soumya Ghosh, Kenney Ng https://t.co/fYoduOCZhR
nmfeeds: [O] https://t.co/JGmYonWfJ8 Unsupervised learning with contrastive latent variable models. In unsupervised learning, dimen...
StatsPapers: Unsupervised learning with contrastive latent variable models. https://t.co/0Dm0GtzD6T
Rosenchild: RT @arxiv_org: Unsupervised learning with contrastive latent variable models. https://t.co/eCTpcFVu51 https://t.co/WOKDshSSr2
kuronekodaisuki: RT @arxiv_org: Unsupervised learning with contrastive latent variable models. https://t.co/eCTpcFVu51 https://t.co/WOKDshSSr2
None.
None.
###### Other stats
Sample Sizes : None.
Authors: 3
Total Words: 8137
Unqiue Words: 2256

##### #4. Short-Term Wind-Speed Forecasting Using Kernel Spectral Hidden Markov Models
###### Shunsuke Tsuzuki, Yu Nishiyama
In machine learning, a nonparametric forecasting algorithm for time series data has been proposed, called the kernel spectral hidden Markov model (KSHMM). In this paper, we propose a technique for short-term wind-speed prediction based on KSHMM. We numerically compared the performance of our KSHMM-based forecasting technique to other techniques with machine learning, using wind-speed data offered by the National Renewable Energy Laboratory. Our results demonstrate that, compared to these methods, the proposed technique offers comparable or better performance.
more | pdf | html
###### Tweets
arxiv_org: Short-Term Wind-Speed Forecasting Using Kernel Spectral Hidden Markov Models. https://t.co/m01SaAnkUh https://t.co/EUdVpBwEVg
BrundageBot: Short-Term Wind-Speed Forecasting Using Kernel Spectral Hidden Markov Models. Shunsuke Tsuzuki and Yu Nishiyama https://t.co/9YWG97kvwP
arxivml: "Short-Term Wind-Speed Forecasting Using Kernel Spectral Hidden Markov Models", Shunsuke Tsuzuki, Yu Nishiyama https://t.co/F6Gk7WrUE2
nmfeeds: [O] https://t.co/h1dLaAlWAF Short-Term Wind-Speed Forecasting Using Kernel Spectral Hidden Markov Models. In machine learn...
Memoirs: Short-Term Wind-Speed Forecasting Using Kernel Spectral Hidden Markov Models. https://t.co/o8vdexmWNN
Rosenchild: RT @arxiv_org: Short-Term Wind-Speed Forecasting Using Kernel Spectral Hidden Markov Models. https://t.co/m01SaAnkUh https://t.co/EUdVpBwEVg
mench90: RT @arxiv_org: Short-Term Wind-Speed Forecasting Using Kernel Spectral Hidden Markov Models. https://t.co/m01SaAnkUh https://t.co/EUdVpBwEVg
gaialive: RT @arxiv_org: Short-Term Wind-Speed Forecasting Using Kernel Spectral Hidden Markov Models. https://t.co/m01SaAnkUh https://t.co/EUdVpBwEVg
puneethmishra: RT @arxiv_org: Short-Term Wind-Speed Forecasting Using Kernel Spectral Hidden Markov Models. https://t.co/m01SaAnkUh https://t.co/EUdVpBwEVg
shubh_300595: RT @arxiv_org: Short-Term Wind-Speed Forecasting Using Kernel Spectral Hidden Markov Models. https://t.co/m01SaAnkUh https://t.co/EUdVpBwEVg
None.
None.
###### Other stats
Sample Sizes : None.
Authors: 2
Total Words: 6788
Unqiue Words: 2013

##### #5. Pure-Exploration for Infinite-Armed Bandits with General Arm Reservoirs
###### Maryam Aziz, Kevin Jamieson, Javed Aslam
This paper considers a multi-armed bandit game where the number of arms is much larger than the maximum budget and is effectively infinite. We characterize necessary and sufficient conditions on the total budget for an algorithm to return an {\epsilon}-good arm with probability at least 1 - {\delta}. In such situations, the sample complexity depends on {\epsilon}, {\delta} and the so-called reservoir distribution {\nu} from which the means of the arms are drawn iid. While a substantial literature has developed around analyzing specific cases of {\nu} such as the beta distribution, our analysis makes no assumption about the form of {\nu}. Our algorithm is based on successive halving with the surprising exception that arms start to be discarded after just a single pull, requiring an analysis that goes beyond concentration alone. The provable correctness of this algorithm also provides an explanation for the empirical observation that the most aggressive bracket of the Hyperband algorithm of Li et al. (2017) for hyperparameter tuning...
more | pdf | html
###### Tweets
arxiv_org: Pure-Exploration for Infinite-Armed Bandits with General Arm Reservoirs. https://t.co/OPJLccFAvh https://t.co/gKChc3JqqT
arxivml: "Pure-Exploration for Infinite-Armed Bandits with General Arm Reservoirs", Maryam Aziz, Kevin Jamieson, Javed Aslam https://t.co/s649xKYtp3
Memoirs: Pure-Exploration for Infinite-Armed Bandits with General Arm Reservoirs. https://t.co/WCr1xdLagm
DrPjenFI: RT @arxiv_org: Pure-Exploration for Infinite-Armed Bandits with General Arm Reservoirs. https://t.co/OPJLccFAvh https://t.co/gKChc3JqqT
lelayf: RT @arxiv_org: Pure-Exploration for Infinite-Armed Bandits with General Arm Reservoirs. https://t.co/OPJLccFAvh https://t.co/gKChc3JqqT
festivalWon: RT @arxiv_org: Pure-Exploration for Infinite-Armed Bandits with General Arm Reservoirs. https://t.co/OPJLccFAvh https://t.co/gKChc3JqqT
None.
None.
###### Other stats
Sample Sizes : None.
Authors: 3
Total Words: 9363
Unqiue Words: 2139

##### #6. Hybrid Generative-Discriminative Models for Inverse Materials Design
###### Phuoc Nguyen, Truyen Tran, Sunil Gupta, Santu Rana, Svetha Venkatesh
Discovering new physical products and processes often demands enormous experimentation and expensive simulation. To design a new product with certain target characteristics, an extensive search is performed in the design space by trying out a large number of design combinations before reaching to the target characteristics. However, forward searching for the target design becomes prohibitive when the target is itself moving or only partially understood. To address this bottleneck, we propose to use backward prediction by leveraging the rich data generated during earlier exploration and construct a machine learning framework to predict the design parameters for any target in a single step. This poses two technical challenges: the first caused due to one-to-many mapping when learning the inverse problem and the second caused due to an user specifying the target specifications only partially. To overcome the challenges, we formulate this problem as conditional density estimation under high-dimensional setting with incomplete input...
more | pdf | html
###### Tweets
arxiv_org: Hybrid Generative-Discriminative Models for Inverse Materials Design. https://t.co/WBhZYBW5Bh https://t.co/kfzXKaTRhB
arxivml: "Hybrid Generative-Discriminative Models for Inverse Materials Design", Phuoc Nguyen, Truyen Tran, Sunil Gupta, San… https://t.co/Vw5n1ZTGqN
nmfeeds: [O] https://t.co/UgOm4myOxr Hybrid Generative-Discriminative Models for Inverse Materials Design. Discovering new physical...
Memoirs: Hybrid Generative-Discriminative Models for Inverse Materials Design. https://t.co/ivUaBr3sbS
CondMatPhys: Hybrid Generative-Discriminative Models for Inverse Materials Design https://t.co/Kqp27IjiQY
None.
None.
###### Other stats
Sample Sizes : [4]
Authors: 5
Total Words: 7895
Unqiue Words: 2690

##### #7. Newton Methods for Convolutional Neural Networks
###### Chien-Chih Wang, Kent Loong Tan, Chih-Jen Lin
Deep learning involves a difficult non-convex optimization problem, which is often solved by stochastic gradient (SG) methods. While SG is usually effective, it may not be robust in some situations. Recently, Newton methods have been investigated as an alternative optimization technique, but nearly all existing studies consider only fully-connected feedforward neural networks. They do not investigate other types of networks such as Convolutional Neural Networks (CNN), which are more commonly used in deep-learning applications. One reason is that Newton methods for CNN involve complicated operations, and so far no works have conducted a thorough investigation. In this work, we give details of all building blocks including function, gradient, and Jacobian evaluation, and Gauss-Newton matrix-vector products. These basic components are very important because with them further developments of Newton methods for CNN become possible. We show that an efficient MATLAB implementation can be done in just several hundred lines of code and...
more | pdf | html
None.
###### Tweets
BrundageBot: Newton Methods for Convolutional Neural Networks. Chien-Chih Wang, Kent Loong Tan, and Chih-Jen Lin https://t.co/s2BMyphVMH
arxivml: "Newton Methods for Convolutional Neural Networks", Chien-Chih Wang, Kent Loong Tan, Chih-Jen Lin https://t.co/3PRprZhLiV
nmfeeds: [O] https://t.co/MjLe5uxCK6 Newton Methods for Convolutional Neural Networks. Deep learning involves a difficult non-conve...
StatsPapers: Newton Methods for Convolutional Neural Networks. https://t.co/XQW2m3a43W
None.
None.
###### Other stats
Sample Sizes : None.
Authors: 3
Total Words: 14465
Unqiue Words: 2818

##### #8. Learning Optimal Personalized Treatment Rules Using Robust Regression Informed K-NN
###### Ruidi Chen, Ioannis Paschalidis
We develop a prediction-based prescriptive model for learning optimal personalized treatments for patients based on their Electronic Health Records (EHRs). Our approach consists of: (i) predicting future outcomes under each possible therapy using a robustified nonlinear model, and (ii) adopting a randomized prescriptive policy determined by the predicted outcomes. We show theoretical results that guarantee the out-of-sample predictive power of the model, and prove the optimality of the randomized strategy in terms of the expected true future outcome. We apply the proposed methodology to develop optimal therapies for patients with type 2 diabetes or hypertension using EHRs from a major safety-net hospital in New England, and show that our algorithm leads to the most significant reduction of the HbA1c, for diabetics, or systolic blood pressure, for patients with hypertension, compared to the alternatives. We demonstrate that our approach outperforms the standard of care under the robustified nonlinear predictive model.
more | pdf | html
None.
###### Tweets
arxivml: "Learning Optimal Personalized Treatment Rules Using Robust Regression Informed K-NN", Ruidi Chen, Ioannis Paschali… https://t.co/o3Yl1SGp16
Memoirs: Learning Optimal Personalized Treatment Rules Using Robust Regression Informed K-NN. https://t.co/gJhb5v6xHy
None.
None.
###### Other stats
Sample Sizes : None.
Authors: 2
Total Words: 4508
Unqiue Words: 1565

##### #9. Evaluating Gaussian Process Metamodels and Sequential Designs for Noisy Level Set Estimation
###### Xiong Lyu, Mickael Binois, Michael Ludkovski
We consider the problem of learning the level set for which a noisy black-box function exceeds a given threshold. To efficiently reconstruct the level set, we investigate Gaussian process (GP) metamodels. Our focus is on strongly stochastic samplers, in particular with heavy-tailed simulation noise and low signal-to-noise ratio. To guard against noise misspecification, we assess the performance of three variants: (i) GPs with Student-$t$ observations; (ii) Student-$t$ processes (TPs); and (iii) classification GPs modeling the sign of the response. As a fourth extension, we study GP surrogates with monotonicity constraints that are relevant when the level set is known to be connected. In conjunction with these metamodels, we analyze several acquisition functions for guiding the sequential experimental designs, extending existing stepwise uncertainty reduction criteria to the stochastic contour-finding context. This also motivates our development of (approximate) updating formulas to efficiently compute such acquisition...
more | pdf | html
###### Tweets
arxiv_org: Evaluating Gaussian Process Metamodels and Sequential Designs for Noisy Level Set Estimat... https://t.co/XDsJDnEnjC https://t.co/XhTshK08tz
HubBucket: RT @arxiv_org: Evaluating Gaussian Process Metamodels and Sequential Designs for Noisy Level Set Estimat... https://t.co/XDsJDnEnjC https:/…
DrPjenFI: RT @arxiv_org: Evaluating Gaussian Process Metamodels and Sequential Designs for Noisy Level Set Estimat... https://t.co/XDsJDnEnjC https:/…
None.
None.
###### Other stats
Sample Sizes : None.
Authors: 3
Total Words: 17125
Unqiue Words: 4186

##### #10. An Acceleration Scheme for Memory Limited, Streaming PCA
###### Salaheddin Alakkari, John Dingliana
In this paper, we propose an acceleration scheme for online memory-limited PCA methods. Our scheme converges to the first $k>1$ eigenvectors in a single data pass. We provide empirical convergence results of our scheme based on the spiked covariance model. Our scheme does not require any predefined parameters such as the eigengap and hence is well facilitated for streaming data scenarios. Furthermore, we apply our scheme to challenging time-varying systems where online PCA methods fail to converge. Specifically, we discuss a family of time-varying systems that are based on Molecular Dynamics simulations where batch PCA converges to the actual analytic solution of such systems.
more | pdf | html
None.
None.
###### Other stats
Sample Sizes : None.
Authors: 2
Total Words: 3638
Unqiue Words: 1210

Assert is a website where the best academic papers on arXiv (computer science, math, physics), bioRxiv (biology), BITSS (reproducibility), EarthArXiv (earth science), engrXiv (engineering), LawArXiv (law), PsyArXiv (psychology), SocArXiv (social science), and SportRxiv (sport research) bubble to the top each day.

Papers are scored (in real-time) based on how verifiable they are (as determined by their Github repos) and how interesting they are (based on Twitter).

To see top papers, follow us on twitter @assertpub_ (arXiv), @assert_pub (bioRxiv), and @assertpub_dev (everything else).

To see beautiful figures extracted from papers, follow us on Instagram.

Tracking 58,338 papers.

###### Search
Sort results based on if they are interesting or reproducible.
Interesting
Reproducible
Online
###### Stats
Tracking 58,338 papers.