### Top 10 Arxiv Papers Today in Machine Learning

##### #1. Shape and Time Distortion Loss for Training Deep Time Series Forecasting Models
###### Vincent Le Guen, Nicolas Thome
This paper addresses the problem of time series forecasting for non-stationary signals and multiple future steps prediction. To handle this challenging task, we introduce the Shape and Time Distortion Loss (STDL), a new objective function dedicated to training deep neural networks. STDL aims at accurately predicting sudden changes, and explicitly incorporates two terms supporting precise shape and temporal change detection. We introduce a differentiable loss function suitable for training deep neural nets, and provide a custom back-prop implementation for speeding up optimization. We also introduce a variant of STDL, which provides a smooth generalization of temporally-constrained Dynamic Time Warping (DTW). Experiments carried out on various non-stationary datasets reveal the very good behaviour of STDL compared to models trained with the standard Mean Squared Error (MSE) loss function, and also to DTW and variants. STDL is also agnostic to the choice of the model, and we highlight its benefit for training fully connected...
more | pdf | html
None.
###### Tweets
BrundageBot: Shape and Time Distortion Loss for Training Deep Time Series Forecasting Models. Vincent Le Guen and Nicolas Thome https://t.co/OuStdjLCja
evolvingstuff: Shape and Time Distortion Loss for Training Deep Time Series Forecasting Models paper: https://t.co/jgRT3tC5Df code: https://t.co/NIC6XjXlyn https://t.co/dopIL5qYYu
StatsPapers: Shape and Time Distortion Loss for Training Deep Time Series Forecasting Models. https://t.co/qJqJfAKPDz
iamknighton: RT @evolvingstuff: Shape and Time Distortion Loss for Training Deep Time Series Forecasting Models paper: https://t.co/jgRT3tC5Df code: ht…
treasured_write: RT @evolvingstuff: Shape and Time Distortion Loss for Training Deep Time Series Forecasting Models paper: https://t.co/jgRT3tC5Df code: ht…
jeandut14000: RT @evolvingstuff: Shape and Time Distortion Loss for Training Deep Time Series Forecasting Models paper: https://t.co/jgRT3tC5Df code: ht…
None.
None.
###### Other stats
Sample Sizes : None.
Authors: 2
Total Words: 0
Unqiue Words: 0

##### #2. Explaining Visual Models by Causal Attribution
###### Álvaro Parafita, Jordi Vitrià
Model explanations based on pure observational data cannot compute the effects of features reliably, due to their inability to estimate how each factor alteration could affect the rest. We argue that explanations should be based on the causal model of the data and the derived intervened causal models, that represent the data distribution subject to interventions. With these models, we can compute counterfactuals, new samples that will inform us how the model reacts to feature changes on our input. We propose a novel explanation methodology based on Causal Counterfactuals and identify the limitations of current Image Generative Models in their application to counterfactual creation.
more | pdf | html
None.
###### Tweets
BrundageBot: Explaining Visual Models by Causal Attribution. Álvaro Parafita and Jordi Vitrià https://t.co/HjWaNXBqPe
alvaro_parafita: Our paper "Explaining Visual Models by Causal Attribution" got accepted for the #ICCV2019 Workshop on Interpreting and Explaining Visual Artificial Intelligence Models! @bitenmascarado https://t.co/PLDfReecKC
StatsPapers: Explaining Visual Models by Causal Attribution. https://t.co/XMSeSUyfWX
None.
None.
###### Other stats
Sample Sizes : None.
Authors: 2
Total Words: 0
Unqiue Words: 0

##### #3. Absum: Simple Regularization Method for Reducing Structural Sensitivity of Convolutional Neural Networks
We propose Absum, which is a regularization method for improving adversarial robustness of convolutional neural networks (CNNs). Although CNNs can accurately recognize images, recent studies have shown that the convolution operations in CNNs commonly have structural sensitivity to specific noise composed of Fourier basis functions. By exploiting this sensitivity, they proposed a simple black-box adversarial attack: Single Fourier attack. To reduce structural sensitivity, we can use regularization of convolution filter weights since the sensitivity of linear transform can be assessed by the norm of the weights. However, standard regularization methods can prevent minimization of the loss function because they impose a tight constraint for obtaining high robustness. To solve this problem, Absum imposes a loose constraint; it penalizes the absolute values of the summation of the parameters in the convolution layers. Absum can improve robustness against single Fourier attack while being as simple and efficient as standard...
more | pdf | html
None.
###### Tweets
BrundageBot: Absum: Simple Regularization Method for Reducing Structural Sensitivity of Convolutional Neural Networks. Sekitoshi Kanai, Yasutoshi Ida, Yasuhiro Fujiwara, Masanori Yamada, and Shuichi Adachi https://t.co/jjVBYKbzLo
StatsPapers: Absum: Simple Regularization Method for Reducing Structural Sensitivity of Convolutional Neural Networks. https://t.co/xW8tUg4g4s
arxiv_cs_cv_pr: Absum: Simple Regularization Method for Reducing Structural Sensitivity of Convolutional Neural Networks. Sekitoshi Kanai, Yasutoshi Ida, Yasuhiro Fujiwara, Masanori Yamada, and Shuichi Adachi https://t.co/3DdW0YLCSt
None.
None.
###### Other stats
Sample Sizes : None.
Authors: 5
Total Words: 0
Unqiue Words: 0

##### #4. Relaxed Softmax for learning from Positive and Unlabeled data
###### Ugo Tanielian, Flavian Vasile
In recent years, the softmax model and its fast approximations have become the de-facto loss functions for deep neural networks when dealing with multi-class prediction. This loss has been extended to language modeling and recommendation, two fields that fall into the framework of learning from Positive and Unlabeled data. In this paper, we stress the different drawbacks of the current family of softmax losses and sampling schemes when applied in a Positive and Unlabeled learning setup. We propose both a Relaxed Softmax loss (RS) and a new negative sampling scheme based on Boltzmann formulation. We show that the new training objective is better suited for the tasks of density estimation, item similarity and next-event prediction by driving uplifts in performance on textual and recommendation datasets against classical softmax.
more | pdf | html
None.
###### Tweets
arxiv_org: Relaxed Softmax for learning from Positive and Unlabeled data. https://t.co/hqt7lJwSG8 https://t.co/rBaR0bfd6q
BrundageBot: Relaxed Softmax for learning from Positive and Unlabeled data. Ugo Tanielian and Flavian Vasile https://t.co/54GDoHiLvB
arxivml: "Relaxed Softmax for learning from Positive and Unlabeled data", Ugo Tanielian, Flavian Vasile https://t.co/R5QJwKDj0R
arxiv_cs_LG: Relaxed Softmax for learning from Positive and Unlabeled data. Ugo Tanielian and Flavian Vasile https://t.co/PZdn5X5AKj
StatsPapers: Relaxed Softmax for learning from Positive and Unlabeled data. https://t.co/UhevoW5sOm
arxiv_cscl: Relaxed Softmax for learning from Positive and Unlabeled data https://t.co/itM4losNsT
arxiv_cscl: Relaxed Softmax for learning from Positive and Unlabeled data https://t.co/itM4losNsT
treasured_write: RT @BrundageBot: Relaxed Softmax for learning from Positive and Unlabeled data. Ugo Tanielian and Flavian Vasile https://t.co/54GDoHiLvB
None.
None.
###### Other stats
Sample Sizes : None.
Authors: 2
Total Words: 0
Unqiue Words: 0

##### #5. Properties of Laplacian Pyramids for Extension and Denoising
###### William Leeb
We analyze the Laplacian pyramids algorithm of Rabin and Coifman for extending and denoising a function sampled on a discrete set of points. We provide mild conditions under which the algorithm converges, and prove stability bounds on the extended function. We also consider the iterative application of truncated Laplacian pyramids kernels for denoising signals by non-local means.
more | pdf | html
None.
###### Tweets
arxiv_org: Properties of Laplacian Pyramids for Extension and Denoising. https://t.co/WShqSkMQkZ https://t.co/9QQCAQEEdr
arxivml: "Properties of Laplacian Pyramids for Extension and Denoising", William Leeb https://t.co/cnLERscw6J
arxiv_cs_LG: Properties of Laplacian Pyramids for Extension and Denoising. William Leeb https://t.co/Xhfj1jVhM4
StatsPapers: Properties of Laplacian Pyramids for Extension and Denoising. https://t.co/KPW0E94YmQ
udmrzn: RT @arxiv_org: Properties of Laplacian Pyramids for Extension and Denoising. https://t.co/WShqSkMQkZ https://t.co/9QQCAQEEdr
None.
None.
###### Other stats
Sample Sizes : None.
Authors: 1
Total Words: 0
Unqiue Words: 0

##### #6. On Efficient Multilevel Clustering via Wasserstein Distances
###### Viet Huynh, Nhat Ho, Nhan Dam, XuanLong Nguyen, Mikhail Yurochkin, Hung Bui, and Dinh Phung
We propose a novel approach to the problem of multilevel clustering, which aims to simultaneously partition data in each group and discover grouping patterns among groups in a potentially large hierarchically structured corpus of data. Our method involves a joint optimization formulation over several spaces of discrete probability measures, which are endowed with Wasserstein distance metrics. We propose several variants of this problem, which admit fast optimization algorithms, by exploiting the connection to the problem of finding Wasserstein barycenters. Consistency properties are established for the estimates of both local and global clusters. Finally, the experimental results with both synthetic and real data are presented to demonstrate the flexibility and scalability of the proposed approach.
more | pdf | html
###### Tweets
StatsPapers: On Efficient Multilevel Clustering via Wasserstein Distances. https://t.co/F4vZehw8CR
None.
None.
###### Other stats
Sample Sizes : None.
Authors: 7
Total Words: 13997
Unqiue Words: 2823

##### #7. Value function estimation in Markov reward processes: Instance-dependent $\ell_\infty$-bounds for policy evaluation
###### Ashwin Pananjady, Martin J. Wainwright
Markov reward processes (MRPs) are used to model stochastic phenomena arising in operations research, control engineering, robotics, artificial intelligence, as well as communication and transportation networks. In many of these cases, such as in the policy evaluation problem encountered in reinforcement learning, the goal is to estimate the long-term value function of such a process without access to the underlying population transition and reward functions. Working with samples generated under the synchronous model, we study the problem of estimating the value function of an infinite-horizon, discounted MRP in the $\ell_\infty$-norm. We analyze both the standard plug-in approach to this problem and a more robust variant, and establish non-asymptotic bounds that depend on the (unknown) problem instance, as well as data-dependent bounds that can be evaluated based on the observed data. We show that these approaches are minimax-optimal up to constant factors over natural sub-classes of MRPs. Our analysis makes use of a...
more | pdf | html
None.
###### Tweets
StatsPapers: Value function estimation in Markov reward processes: Instance-dependent $\ell_\infty$-bounds for policy evaluation. https://t.co/zYV0TQzvW8
None.
None.
###### Other stats
Sample Sizes : None.
Authors: 2
Total Words: 0
Unqiue Words: 0

##### #8. Stacking Models for Nearly Optimal Link Prediction in Complex Networks
###### Amir Ghasemian, Homa Hosseinmardi, Aram Galstyan, Edoardo M. Airoldi, Aaron Clauset
Most real-world networks are incompletely observed. Algorithms that can accurately predict which links are missing can dramatically speedup the collection of network data and improve the validity of network models. Many algorithms now exist for predicting missing links, given a partially observed network, but it has remained unknown whether a single best predictor exists, how link predictability varies across methods and networks from different domains, and how close to optimality current methods are. We answer these questions by systematically evaluating 203 individual link predictor algorithms, representing three popular families of methods, applied to a large corpus of 548 structurally diverse networks from six scientific domains. We first show that individual algorithms exhibit a broad diversity of prediction errors, such that no one predictor or family is best, or worst, across all realistic inputs. We then exploit this diversity via meta-learning to construct a series of "stacked" models that combine predictors into a single...
more | pdf | html
###### Tweets
alexvespi: Stacking Models for Nearly Optimal Link Prediction in Complex Networks “Applied to a broad range of synthetic networks, for which we may analytically calculate optimal performance, these stacked models achieve optimal or nearly optimal levels of accuracy” https://t.co/hLCtygSOFx https://t.co/VOhwtXIO4F
aaronclauset: Excited to share a new preprint "Stacking models for nearly optimal link prediction in complex networks," led by @Amir_Ghasemian and @HomaHosseinmar1, with @aram_galstyan and @eairoldi: https://t.co/iCxftB6xjF Here’s a little summary: 1/7 https://t.co/k0VVdsV4ce
net_science: Stacking Models for Nearly Optimal Link Prediction in Complex Networks. (arXiv:1909.07578v1 [https://t.co/E3LUKJpMju]) https://t.co/T1CFh8xujP
BrundageBot: Stacking Models for Nearly Optimal Link Prediction in Complex Networks. Amir Ghasemian, Homa Hosseinmardi, Aram Galstyan, Edoardo M. Airoldi, and Aaron Clauset https://t.co/tFGrLqHQwv
arxiv_cs_LG: Stacking Models for Nearly Optimal Link Prediction in Complex Networks. Amir Ghasemian, Homa Hosseinmardi, Aram Galstyan, Edoardo M. Airoldi, and Aaron Clauset https://t.co/WOKu8OXKCf
###### Github

This page is a companion for our paper on optimal link prediction, written by Amir Ghasemian, Homa Hosseinmardi, Aram Galstyan, Edoardo M. Airoldi, and Aaron Clauset. (arXiv:1909.07578)

User: Aghasemian
Language: Python
Stargazers: 1
Subscribers: 2
Forks: 0
Open Issues: 0
None.
###### Other stats
Sample Sizes : None.
Authors: 5
Total Words: 19905
Unqiue Words: 3619

##### #9. Compositional uncertainty in deep Gaussian processes
###### Ivan Ustyuzhaninov, Ieva Kazlauskaite, Markus Kaiser, Erik Bodin, Neill D. F. Campbell, Carl Henrik Ek
Gaussian processes (GPs) are nonparametric priors over functions, and fitting a GP to the data implies computing the posterior distribution of the functions consistent with the observed data. Similarly, deep Gaussian processes (DGPs) [Damianou:2013] should allow us to compute the posterior distribution of compositions of multiple functions giving rise to the observations. However, exact Bayesian inference is usually intractable for DGPs, motivating the use of various approximations. We show that the simplifying assumptions for a common type of Variational inference approximation imply that all but one layer of a DGP collapse to a deterministic transformation. We argue that such an inference scheme is suboptimal, not taking advantage of the potential of the model to discover the compositional structure in the data, and propose possible modifications addressing this issue.
more | pdf | html
None.
###### Tweets
arxiv_in_review: #NeurIPS2019 Compositional uncertainty in deep Gaussian processes. (arXiv:1909.07698v1 [stat\.ML]) https://t.co/2VKltwnnBg
arxiv_cs_LG: Compositional uncertainty in deep Gaussian processes. Ivan Ustyuzhaninov, Ieva Kazlauskaite, Markus Kaiser, Erik Bodin, Neill D. F. Campbell, and Carl Henrik Ek https://t.co/ySR7x8J1dM
None.
None.
###### Other stats
Sample Sizes : None.
Authors: 6
Total Words: 2848
Unqiue Words: 882

##### #10. BLOCCS: Block Sparse Canonical Correlation Analysis With Application To Interpretable Omics Integration
###### Omid Shams Solari, Rojin Safavi, James B. Brown
We introduce Block Sparse Canonical Correlation Analysis which estimates multiple pairs of canonical directions (together a "block") at once, resulting in significantly improved orthogonality of the sparse directions which, we demonstrate, translates to more interpretable solutions. Our approach builds on the sparse CCA method of (Solari, Brown, and Bickel 2019) in that we also express the bi-convex objective of our block formulation as a concave minimization problem over an orthogonal k-frame in a unit Euclidean ball, which in turn, due to concavity of the objective, is shrunk to a Stiefel manifold, which is optimized via gradient descent algorithm. Our simulations show that our method outperforms existing sCCA algorithms and implementations in terms of computational cost and stability, mainly due to the drastic shrinkage of our search space, and the correlation within and orthogonality between pairs of estimated canonical covariates. Finally, we apply our method, available as an R-package called BLOCCS, to multi-omic data on...
more | pdf | html
None.
###### Tweets
arxiv_in_review: #AAAI2020 BLOCCS: Block Sparse Canonical Correlation Analysis With Application To Interpretable Omics Integration. (arXiv:1909.07944v1 [stat\.ML]) https://t.co/f1lFrio8b8
arxiv_cs_LG: BLOCCS: Block Sparse Canonical Correlation Analysis With Application To Interpretable Omics Integration. Omid Shams Solari, Rojin Safavi, and James B. Brown https://t.co/4lRlXvF5WH
None.
None.
###### Other stats
Sample Sizes : None.
Authors: 3
Total Words: 5862
Unqiue Words: 1852

Assert is a website where the best academic papers on arXiv (computer science, math, physics), bioRxiv (biology), BITSS (reproducibility), EarthArXiv (earth science), engrXiv (engineering), LawArXiv (law), PsyArXiv (psychology), SocArXiv (social science), and SportRxiv (sport research) bubble to the top each day.

Papers are scored (in real-time) based on how verifiable they are (as determined by their Github repos) and how interesting they are (based on Twitter).

To see top papers, follow us on twitter @assertpub_ (arXiv), @assert_pub (bioRxiv), and @assertpub_dev (everything else).

To see beautiful figures extracted from papers, follow us on Instagram.

Tracking 192,915 papers.

###### Search
Sort results based on if they are interesting or reproducible.
Interesting
Reproducible
Online
###### Stats
Tracking 192,915 papers.