Top 9 Arxiv Papers Today in Machine Learning


2.067 Mikeys
#1. Amortized Monte Carlo Integration
Adam Goliński, Frank Wood, Tom Rainforth
Current approaches to amortizing Bayesian inference focus solely on approximating the posterior distribution. Typically, this approximation is, in turn, used to calculate expectations for one or more target functions - a computational pipeline which is inefficient when the target function(s) are known upfront. In this paper, we address this inefficiency by introducing AMCI, a method for amortizing Monte Carlo integration directly. AMCI operates similarly to amortized inference but produces three distinct amortized proposals, each tailored to a different component of the overall expectation calculation. At runtime, samples are produced separately from each amortized proposal, before being combined to an overall estimate of the expectation. We show that while existing approaches are fundamentally limited in the level of accuracy they can achieve, AMCI can theoretically produce arbitrarily small errors for any integrable target function using only a single sample from each proposal at runtime. We further show that it is able to...
more | pdf | html
Figures
None.
Tweets
BrundageBot: Amortized Monte Carlo Integration. Adam Goliński, Frank Wood, and Tom Rainforth https://t.co/0jiEtLcqsX
arxivml: "Amortized Monte Carlo Integration", Adam Goliński, Frank Wood, Tom Rainforth https://t.co/ggsxnRJOec
arxiv_cs_LG: Amortized Monte Carlo Integration. Adam Goliński, Frank Wood, and Tom Rainforth https://t.co/N17p7G3wHe
StatsPapers: Amortized Monte Carlo Integration. https://t.co/laesU9H4cq
Github

Code to accompany "Amortized Monte Carlo Integration" ICML 2019

Repository: amci
User: talesa
Language: None
Stargazers: 5
Subscribers: 1
Forks: 0
Open Issues: 0
Youtube
None.
Other stats
Sample Sizes : None.
Authors: 3
Total Words: 11545
Unqiue Words: 2590

2.035 Mikeys
#2. Least Angle Regression in Tangent Space and LASSO for Generalized Linear Model
Yoshihiro Hirose
We propose sparse estimation methods for the generalized linear models, which run Least Angle Regression (LARS) and Least Absolute Shrinkage and Selection Operator (LASSO) in the tangent space of the manifold of the statistical model. Our approach is to roughly approximate the statistical model and to subsequently use exact calculations. LARS was proposed as an efficient algorithm for parameter estimation and variable selection for the normal linear model. The LARS algorithm is described in terms of Euclidean geometry with regarding correlation as metric of the space. Since the LARS algorithm only works in Euclidean space, we transform a manifold of the statistical model into the tangent space at the origin. In the generalized linear regression, this transformation allows us to run the original LARS algorithm for the generalized linear models. The proposed methods are efficient and perform well. Real-data analysis shows that the proposed methods output similar results as that of the $l_1$-penalized maximum likelihood estimation...
more | pdf | html
Figures
None.
Tweets
arxivml: "Least Angle Regression in Tangent Space and LASSO for Generalized Linear Model", Yoshihiro Hirose https://t.co/qA6Z1vmMHk
arxiv_cs_LG: Least Angle Regression in Tangent Space and LASSO for Generalized Linear Model. Yoshihiro Hirose https://t.co/HGBdxP6lLG
StatsPapers: Least Angle Regression in Tangent Space and LASSO for Generalized Linear Model. https://t.co/ktNG47qAcm
331prime: RT @StatsPapers: Least Angle Regression in Tangent Space and LASSO for Generalized Linear Model. https://t.co/ktNG47qAcm
Github
None.
Youtube
None.
Other stats
Sample Sizes : None.
Authors: 1
Total Words: 0
Unqiue Words: 0

2.03 Mikeys
#3. An Adaptive Approach for Anomaly Detector Selection and Fine-Tuning in Time Series
Hui Ye, Xiaopeng Ma, Qingfeng Pan, Huaqiang Fang, Hang Xiang, Tongzhen Shao
The anomaly detection of time series is a hotspot of time series data mining. The own characteristics of different anomaly detectors determine the abnormal data that they are good at. There is no detector can be optimizing in all types of anomalies. Moreover, it still has difficulties in industrial production due to problems such as a single detector can't be optimized at different time windows of the same time series. This paper proposes an adaptive model based on time series characteristics and selecting appropriate detector and run-time parameters for anomaly detection, which is called ATSDLN(Adaptive Time Series Detector Learning Network). We take the time series as the input of the model, and learn the time series representation through FCN. In order to realize the adaptive selection of detectors and run-time parameters according to the input time series, the outputs of FCN are the inputs of two sub-networks: the detector selection network and the run-time parameters selection network. In addition, the way that the variable...
more | pdf | html
Figures
None.
Tweets
arxivml: "An Adaptive Approach for Anomaly Detector Selection and Fine-Tuning in Time Series", Hui Ye, Xiaopeng Ma, Qingfeng… https://t.co/o2rEpb87fD
arxiv_cs_LG: An Adaptive Approach for Anomaly Detector Selection and Fine-Tuning in Time Series. Hui Ye, Xiaopeng Ma, Qingfeng Pan, Huaqiang Fang, Hang Xiang, and Tongzhen Shao https://t.co/9C7OPa4u7I
StatsPapers: An Adaptive Approach for Anomaly Detector Selection and Fine-Tuning in Time Series. https://t.co/kZZ6uvF8c5
Github
None.
Youtube
None.
Other stats
Sample Sizes : None.
Authors: 6
Total Words: 0
Unqiue Words: 0

2.03 Mikeys
#4. A discriminative approach for finding and characterizing positivity violations using decision trees
Ehud Karavani, Peter Bak, Yishai Shimoni
The assumption of positivity in causal inference (also known as common support and co-variate overlap) is necessary to obtain valid causal estimates. Therefore, confirming it holds in a given dataset is an important first step of any causal analysis. Most common methods to date are insufficient for discovering non-positivity, as they do not scale for modern high-dimensional covariate spaces, or they cannot pinpoint the subpopulation violating positivity. To overcome these issues, we suggest to harness decision trees for detecting violations. By dividing the covariate space into mutually exclusive regions, each with maximized homogeneity of treatment groups, decision trees can be used to automatically detect subspaces violating positivity. By augmenting the method with an additional random forest model, we can quantify the robustness of the violation within each subspace. This solution is scalable and provides an interpretable characterization of the subspaces in which violations occur. We provide a visualization of the...
more | pdf | html
Figures
None.
Tweets
arxivml: "A discriminative approach for finding and characterizing positivity violations using decision trees", Ehud Karavan… https://t.co/DRif0DJSIj
arxiv_cs_LG: A discriminative approach for finding and characterizing positivity violations using decision trees. Ehud Karavani, Peter Bak, and Yishai Shimoni https://t.co/EJrgkzOuza
StatsPapers: A discriminative approach for finding and characterizing positivity violations using decision trees. https://t.co/2jbMtLZjrV
Github
None.
Youtube
None.
Other stats
Sample Sizes : None.
Authors: 3
Total Words: 0
Unqiue Words: 0

2.03 Mikeys
#5. optimalFlow: Optimal-transport approach to flow cytometry gating and population matching
Eustasio del Barrio, Hristo Inouzhe, Jean-Michel Loubes, Carlos Matrán, Agustín Mayo-Íscar
Data used in Flow Cytometry present pronounced variability due to biological and technical reasons. Biological variability is a well known phenomenon produced by measurements on different individuals, with different characteristics such as age, sex, etc... The use of different settings for measurement, the variation of the conditions during experiments or the different types of flow cytometers are some of the technical sources of variability. This high variability makes difficult the use of supervised machine learning for identification of cell populations. We propose optimalFlowTemplates, based on a similarity distance and Wasserstein barycenters, which clusterizes cytometries and produces prototype cytometries for the different groups. We show that supervised learning restricted to the new groups performs better than the same techniques applied to the whole collection. We also present optimalFlowClassification, which uses a database of gated cytometries and optimalFlowTemplates to assign cell types to a new cytometry. We show...
more | pdf | html
Figures
None.
Tweets
arxivml: "optimalFlow: Optimal-transport approach to flow cytometry gating and population matching", Eustasio del Barrio, Hr… https://t.co/c3XnU95an3
arxiv_cs_LG: optimalFlow: Optimal-transport approach to flow cytometry gating and population matching. Eustasio del Barrio, Hristo Inouzhe, Jean-Michel Loubes, Carlos Matrán, and Agustín Mayo-Íscar https://t.co/6euDZsRbdS
StatsPapers: optimalFlow: Optimal-transport approach to flow cytometry gating and population matching. https://t.co/LeMLpo8JsU
Github
None.
Youtube
None.
Other stats
Sample Sizes : None.
Authors: 5
Total Words: 0
Unqiue Words: 0

2.017 Mikeys
#6. Robust data-driven discovery of governing physical laws using a new subsampling-based sparse Bayesian method to tackle four challenges (large noise, outliers, data integration, and extrapolation)
Sheng Zhang, Guang Lin
The derivation of physical laws is a dominant topic in scientific research. We propose a new method capable of discovering the physical laws from data to tackle four challenges in the previous methods. The four challenges are: (1) large noise in the data, (2) outliers in the data, (3) integrating the data collected from different experiments, and (4) extrapolating the solutions to the areas that have no available data. To resolve these four challenges, we try to discover the governing differential equations and develop a model-discovering method based on sparse Bayesian inference and subsampling. The subsampling technique is used for improving the accuracy of the Bayesian learning algorithm here, while it is usually employed for estimating statistics or speeding up algorithms elsewhere. The optimal subsampling size is moderate, neither too small nor too big. Another merit of our method is that it can work with limited data by the virtue of Bayesian inference. We demonstrate how to use our method to tackle the four aforementioned...
more | pdf | html
Figures
Tweets
arxivml: "Robust data-driven discovery of governing physical laws using a new subsampling-based sparse Bayesian method to ta… https://t.co/iGZiYjoHDn
StatsPapers: Robust data-driven discovery of governing physical laws using a new subsampling-based sparse Bayesian method to tackle four challenges (large noise, outliers, data integration, and extrapolation). https://t.co/ml00WAGYn3
Github
None.
Youtube
None.
Other stats
Sample Sizes : None.
Authors: 2
Total Words: 11441
Unqiue Words: 2745

2.013 Mikeys
#7. Clustering Activity-Travel Behavior Time Series using Topological Data Analysis
Renjie Chen, Jingyue Zhang, Nalini Ravishanker, Karthik Konduri
Over the last few years, traffic data has been exploding and the transportation discipline has entered the era of big data. It brings out new opportunities for doing data-driven analysis, but it also challenges traditional analytic methods. This paper proposes a new Divide and Combine based approach to do K means clustering on activity-travel behavior time series using features that are derived using tools in Time Series Analysis and Topological Data Analysis. Clustering data from five waves of the National Household Travel Survey ranging from 1990 to 2017 suggests that activity-travel patterns of individuals over the last three decades can be grouped into three clusters. Results also provide evidence in support of recent claims about differences in activity-travel patterns of different survey cohorts. The proposed method is generally applicable and is not limited only to activity-travel behavior analysis in transportation studies. Driving behavior, travel mode choice, household vehicle ownership, when being characterized...
more | pdf | html
Figures
Tweets
arxivml: "Clustering Activity-Travel Behavior Time Series using Topological Data Analysis", Renjie Chen, Jingyue Zhang, Nali… https://t.co/CNdHtq5uGc
arxiv_cs_LG: Clustering Activity-Travel Behavior Time Series using Topological Data Analysis. Renjie Chen, Jingyue Zhang, Nalini Ravishanker, and Karthik Konduri https://t.co/QoQkBRaldK
StatsPapers: Clustering Activity-Travel Behavior Time Series using Topological Data Analysis. https://t.co/iMVXci5ilz
cristobalvega: RT @StatsPapers: Clustering Activity-Travel Behavior Time Series using Topological Data Analysis. https://t.co/iMVXci5ilz
Github
None.
Youtube
None.
Other stats
Sample Sizes : None.
Authors: 4
Total Words: 6936
Unqiue Words: 2129

2.011 Mikeys
#8. Algorithmic Analysis and Statistical Estimation of SLOPE via Approximate Message Passing
Zhiqi Bu, Jason Klusowski, Cynthia Rush, Weijie Su
SLOPE is a relatively new convex optimization procedure for high-dimensional linear regression via the sorted l1 penalty: the larger the rank of the fitted coefficient, the larger the penalty. This non-separable penalty renders many existing techniques invalid or inconclusive in analyzing the SLOPE solution. In this paper, we develop an asymptotically exact characterization of the SLOPE solution under Gaussian random designs through solving the SLOPE problem using approximate message passing (AMP). This algorithmic approach allows us to approximate the SLOPE solution via the much more amenable AMP iterates. Explicitly, we characterize the asymptotic dynamics of the AMP iterates relying on a recently developed state evolution analysis for non-separable penalties, thereby overcoming the difficulty caused by the sorted l1 penalty. Moreover, we prove that the AMP iterates converge to the SLOPE solution in an asymptotic sense, and numerical simulations show that the convergence is surprisingly fast. Our proof rests on a novel technique...
more | pdf | html
Figures
None.
Tweets
arxivml: "Algorithmic Analysis and Statistical Estimation of SLOPE via Approximate Message Passing", Zhiqi Bu, Jason Klusows… https://t.co/lV2O0jgHO4
arxiv_cs_LG: Algorithmic Analysis and Statistical Estimation of SLOPE via Approximate Message Passing. Zhiqi Bu, Jason Klusowski, Cynthia Rush, and Weijie Su https://t.co/leLx1F1cwf
Memoirs: Algorithmic Analysis and Statistical Estimation of SLOPE via Approximate Message Passing. https://t.co/LUWRg6ZfNg
Github
None.
Youtube
None.
Other stats
Sample Sizes : None.
Authors: 4
Total Words: 0
Unqiue Words: 0

2.011 Mikeys
#9. Output-weighted optimal sampling for Bayesian regression and rare event statistics using few samples
Themistoklis P. Sapsis
For many important problems the quantity of interest (or output) is an unknown function of the parameter space (or input), which is a random vector with known statistics. Since the dependence of the output on this random vector is unknown, the challenge is to identify its statistics, using the minimum number of function evaluations. This is a problem that can been seen in the context of active learning or optimal experimental design. We employ Bayesian regression to represent the derived model uncertainty due to finite and small number of input-output pairs. In this context we evaluate existing methods for optimal sample selection, such as model error minimization and mutual information maximization. We show that the commonly employed criteria in the literature do not take into account the output values of the existing input-output pairs. To overcome this deficiency we introduce a new criterion that explicitly takes into account the values of the output for the existing samples and adaptively selects inputs from regions or...
more | pdf | html
Figures
Tweets
arxivml: "Output-weighted optimal sampling for Bayesian regression and rare event statistics using few samples", Themistokli… https://t.co/Vzm2B9dreR
arxiv_cs_LG: Output-weighted optimal sampling for Bayesian regression and rare event statistics using few samples. Themistoklis P. Sapsis https://t.co/VD1ttYLNsL
Memoirs: Output-weighted optimal sampling for Bayesian regression and rare event statistics using few samples. https://t.co/VrWG8ks24b
Github
None.
Youtube
None.
Other stats
Sample Sizes : None.
Authors: 1
Total Words: 10466
Unqiue Words: 2222

About

Assert is a website where the best academic papers on arXiv (computer science, math, physics), bioRxiv (biology), BITSS (reproducibility), EarthArXiv (earth science), engrXiv (engineering), LawArXiv (law), PsyArXiv (psychology), SocArXiv (social science), and SportRxiv (sport research) bubble to the top each day.

Papers are scored (in real-time) based on how verifiable they are (as determined by their Github repos) and how interesting they are (based on Twitter).

To see top papers, follow us on twitter @assertpub_ (arXiv), @assert_pub (bioRxiv), and @assertpub_dev (everything else).

To see beautiful figures extracted from papers, follow us on Instagram.

Tracking 160,428 papers.

Search
Sort results based on if they are interesting or reproducible.
Interesting
Reproducible
Categories
All
Astrophysics
Cosmology and Nongalactic Astrophysics
Earth and Planetary Astrophysics
Astrophysics of Galaxies
High Energy Astrophysical Phenomena
Instrumentation and Methods for Astrophysics
Solar and Stellar Astrophysics
Condensed Matter
Disordered Systems and Neural Networks
Mesoscale and Nanoscale Physics
Materials Science
Other Condensed Matter
Quantum Gases
Soft Condensed Matter
Statistical Mechanics
Strongly Correlated Electrons
Superconductivity
Computer Science
Artificial Intelligence
Hardware Architecture
Computational Complexity
Computational Engineering, Finance, and Science
Computational Geometry
Computation and Language
Cryptography and Security
Computer Vision and Pattern Recognition
Computers and Society
Databases
Distributed, Parallel, and Cluster Computing
Digital Libraries
Discrete Mathematics
Data Structures and Algorithms
Emerging Technologies
Formal Languages and Automata Theory
General Literature
Graphics
Computer Science and Game Theory
Human-Computer Interaction
Information Retrieval
Information Theory
Machine Learning
Logic in Computer Science
Multiagent Systems
Multimedia
Mathematical Software
Numerical Analysis
Neural and Evolutionary Computing
Networking and Internet Architecture
Other Computer Science
Operating Systems
Performance
Programming Languages
Robotics
Symbolic Computation
Sound
Software Engineering
Social and Information Networks
Systems and Control
Economics
Econometrics
General Economics
Theoretical Economics
Electrical Engineering and Systems Science
Audio and Speech Processing
Image and Video Processing
Signal Processing
General Relativity and Quantum Cosmology
General Relativity and Quantum Cosmology
High Energy Physics - Experiment
High Energy Physics - Experiment
High Energy Physics - Lattice
High Energy Physics - Lattice
High Energy Physics - Phenomenology
High Energy Physics - Phenomenology
High Energy Physics - Theory
High Energy Physics - Theory
Mathematics
Commutative Algebra
Algebraic Geometry
Analysis of PDEs
Algebraic Topology
Classical Analysis and ODEs
Combinatorics
Category Theory
Complex Variables
Differential Geometry
Dynamical Systems
Functional Analysis
General Mathematics
General Topology
Group Theory
Geometric Topology
History and Overview
Information Theory
K-Theory and Homology
Logic
Metric Geometry
Mathematical Physics
Numerical Analysis
Number Theory
Operator Algebras
Optimization and Control
Probability
Quantum Algebra
Rings and Algebras
Representation Theory
Symplectic Geometry
Spectral Theory
Statistics Theory
Mathematical Physics
Mathematical Physics
Nonlinear Sciences
Adaptation and Self-Organizing Systems
Chaotic Dynamics
Cellular Automata and Lattice Gases
Pattern Formation and Solitons
Exactly Solvable and Integrable Systems
Nuclear Experiment
Nuclear Experiment
Nuclear Theory
Nuclear Theory
Physics
Accelerator Physics
Atmospheric and Oceanic Physics
Applied Physics
Atomic and Molecular Clusters
Atomic Physics
Biological Physics
Chemical Physics
Classical Physics
Computational Physics
Data Analysis, Statistics and Probability
Physics Education
Fluid Dynamics
General Physics
Geophysics
History and Philosophy of Physics
Instrumentation and Detectors
Medical Physics
Optics
Plasma Physics
Popular Physics
Physics and Society
Space Physics
Quantitative Biology
Biomolecules
Cell Behavior
Genomics
Molecular Networks
Neurons and Cognition
Other Quantitative Biology
Populations and Evolution
Quantitative Methods
Subcellular Processes
Tissues and Organs
Quantitative Finance
Computational Finance
Economics
General Finance
Mathematical Finance
Portfolio Management
Pricing of Securities
Risk Management
Statistical Finance
Trading and Market Microstructure
Quantum Physics
Quantum Physics
Statistics
Applications
Computation
Methodology
Machine Learning
Other Statistics
Statistics Theory
Feedback
Online
Stats
Tracking 160,428 papers.