### Top 10 Arxiv Papers Today in Machine Learning

##### #1. N2D:(Not Too) Deep clustering via clustering the local manifold of an autoencoded embedding
###### Ryan McConville, Raul Santos-Rodriguez, Robert J Piechocki, Ian Craddock
Deep clustering has increasingly been demonstrating superiority over conventional shallow clustering algorithms. Deep clustering algorithms usually combine representation learning with deep neural networks to achieve this performance, typically optimizing a clustering and non-clustering loss. In such cases, an autoencoder is typically connected with a clustering network, and the final clustering is jointly learned by both the autoencoder and clustering network. Instead, we propose to learn an autoencoded embedding and then search this further for the underlying manifold. For simplicity, we then cluster this with a shallow clustering algorithm, rather than a deeper network. We study a number of local and global manifold learning methods on both the raw data and autoencoded embedding, concluding that UMAP in our framework is best able to find the most clusterable manifold in the embedding, suggesting local manifold learning on an autoencoded embedding is effective for discovering higher quality discovering clusters. We...
more | pdf | html
###### Tweets
BrundageBot: N2D:(Not Too) Deep clustering via clustering the local manifold of an autoencoded embedding. Ryan McConville, Raul Santos-Rodriguez, Robert J Piechocki, and Ian Craddock https://t.co/1AMhoVocOk
arxiv_cs_LG: N2D:(Not Too) Deep clustering via clustering the local manifold of an autoencoded embedding. Ryan McConville, Raul Santos-Rodriguez, Robert J Piechocki, and Ian Craddock https://t.co/NBZrhujjj2
###### Github

A deep clustering algorithm. Code to reproduce results for the paper N2D: (Not Too) Deep clustering via clustering the local manifold of an autoencoded embedding.

Repository: n2d
User: rymc
Language: Python
Stargazers: 0
Subscribers: 1
Forks: 0
Open Issues: 0
None.
###### Other stats
Sample Sizes : None.
Authors: 4
Total Words: 6412
Unqiue Words: 1961

##### #2. Performing Deep Recurrent Double Q-Learning for Atari Games
###### Felipe Moreno-Vera
Currently, many applications in Machine Learning are based on define new models to extract more information about data, In this case Deep Reinforcement Learning with the most common application in video games like Atari, Mario, and others causes an impact in how to computers can learning by himself with only information called rewards obtained from any action. There is a lot of algorithms modeled and implemented based on Deep Recurrent Q-Learning proposed by DeepMind used in AlphaZero and Go. In this document, We proposed Deep Recurrent Double Q-Learning that is an implementation of Deep Reinforcement Learning using Double Q-Learning algorithms and Recurrent Networks like LSTM and DRQN.
more | pdf | html
None.
###### Tweets
BrundageBot: Performing Deep Recurrent Double Q-Learning for Atari Games. Felipe Moreno-Vera https://t.co/DKWkfRGeKb
arxivml: "Performing Deep Recurrent Double Q-Learning for Atari Games", Felipe Moreno-Vera https://t.co/pKsn8TOFfc
arxiv_cs_LG: Performing Deep Recurrent Double Q-Learning for Atari Games. Felipe Moreno-Vera https://t.co/r97S4TB1QU
SciFi: Performing Deep Recurrent Double Q-Learning for Atari Games. https://t.co/swdxOKCtwK
None.
None.
###### Other stats
Sample Sizes : None.
Authors: 1
Total Words: 0
Unqiue Words: 0

##### #3. Generating Random Parameters in Feedforward Neural Networks with Random Hidden Nodes: Drawbacks of the Standard Method and How to Improve It
###### Grzegorz Dudek
The standard method of generating random weights and biases in feedforward neural networks with random hidden nodes, selects them both from the uniform distribution over the same fixed interval. In this work, we show the drawbacks of this approach and propose a new method of generating random parameters. This method ensures the most nonlinear fragments of sigmoids, which are most useful in modeling target function nonlinearity, are kept in the input hypercube. In addition, we show how to generate activation functions with uniformly distributed slope angles.
more | pdf | html
None.
###### Tweets
BrundageBot: Generating Random Parameters in Feedforward Neural Networks with Random Hidden Nodes: Drawbacks of the Standard Method and How to Improve It. Grzegorz Dudek https://t.co/rPZZmF7gN5
arxivml: "Generating Random Parameters in Feedforward Neural Networks with Random Hidden Nodes: Drawbacks of the Standard Me… https://t.co/I1VZcvWQsR
arxiv_cs_LG: Generating Random Parameters in Feedforward Neural Networks with Random Hidden Nodes: Drawbacks of the Standard Method and How to Improve It. Grzegorz Dudek https://t.co/IOF8e3VMdy
StatsPapers: Generating Random Parameters in Feedforward Neural Networks with Random Hidden Nodes: Drawbacks of the Standard Method and How to Improve It. https://t.co/wLJJ43PelS
None.
None.
###### Other stats
Sample Sizes : None.
Authors: 1
Total Words: 0
Unqiue Words: 0

##### #4. Linear Stochastic Bandits Under Safety Constraints
###### Sanae Amani, Mahnoosh Alizadeh, Christos Thrampoulidis
Bandit algorithms have various application in safety-critical systems, where it is important to respect the system constraints that rely on the bandit's unknown parameters at every round. In this paper, we formulate a linear stochastic multi-armed bandit problem with safety constraints that depend (linearly) on an unknown parameter vector. As such, the learner is unable to identify all safe actions and must act conservatively in ensuring that her actions satisfy the safety constraint at all rounds (at least with high probability). For these bandits, we propose a new UCB-based algorithm called Safe-LUCB, which includes necessary modifications to respect safety constraints. The algorithm has two phases. During the pure exploration phase the learner chooses her actions at random from a restricted set of safe actions with the goal of learning a good approximation of the entire unknown safe set. Once this goal is achieved, the algorithm begins a safe exploration-exploitation phase where the learner gradually expands their estimate of...
more | pdf | html
None.
###### Tweets
BrundageBot: Linear Stochastic Bandits Under Safety Constraints. Sanae Amani, Mahnoosh Alizadeh, and Christos Thrampoulidis https://t.co/OrsHj9lGLa
arxiv_cs_LG: Linear Stochastic Bandits Under Safety Constraints. Sanae Amani, Mahnoosh Alizadeh, and Christos Thrampoulidis https://t.co/LIhrmjP4l1
StatsPapers: Linear Stochastic Bandits Under Safety Constraints. https://t.co/sQavKrokOZ
None.
None.
###### Other stats
Sample Sizes : None.
Authors: 3
Total Words: 0
Unqiue Words: 0

##### #5. Effect of Activation Functions on the Training of Overparametrized Neural Nets
###### Abhishek Panigrahi, Abhishek Shetty, Navin Goyal
It is well-known that overparametrized neural networks trained using gradient-based methods quickly achieve small training error with appropriate hyperparameter settings. Recent papers have proved this statement theoretically for highly overparametrized networks under reasonable assumptions. The limiting case when the network size approaches infinity has also been considered. These results either assume that the activation function is ReLU or they crucially depend on the minimum eigenvalue of a certain Gram matrix depending on the data, random initialization and the activation function. In the latter case, existing works only prove that this minimum eigenvalue is non-zero and do not provide quantitative bounds. On the empirical side, a contemporary line of investigations has proposed a number of alternative activation functions which tend to perform better than ReLU at least in some settings but no clear understanding has emerged. This state of affairs underscores the importance of theoretically understanding the impact of...
more | pdf | html
None.
###### Tweets
BrundageBot: Effect of Activation Functions on the Training of Overparametrized Neural Nets. Abhishek Panigrahi, Abhishek Shetty, and Navin Goyal https://t.co/xqJGlDM80s
arxiv_cs_LG: Effect of Activation Functions on the Training of Overparametrized Neural Nets. Abhishek Panigrahi, Abhishek Shetty, and Navin Goyal https://t.co/4v80qhZ5aO
StatsPapers: Effect of Activation Functions on the Training of Overparametrized Neural Nets. https://t.co/MM35h1ox4N
None.
None.
###### Other stats
Sample Sizes : None.
Authors: 3
Total Words: 0
Unqiue Words: 0

##### #6. ScarletNAS: Bridging the Gap Between Scalability and Fairness in Neural Architecture Search
###### Xiangxiang Chu, Bo Zhang, Jixiang Li, Qingyuan Li, Ruijun Xu
One-shot neural architecture search features fast training of a supernet in a single run. A pivotal issue for this weight-sharing approach is the lacking of scalability. A simple adjustment with identity block renders a scalable supernet but it arouses unstable training, which makes the subsequent model ranking unreliable. In this paper, we introduce linearly equivalent transformation to soothe training turbulence, providing with the proof that such transformed path is identical with the original one as per representational power. The overall method is named as SCARLET (SCAlable supeRnet with Linearly Equivalent Transformation). We show through experiments that linearly equivalent transformations can indeed harmonize the supernet training. With an EfficientNet-like search space and a multi-objective reinforced evolutionary backend, it generates a series of competitive models: Scarlet-A achieves 76.9% Top-1 accuracy on ImageNet which outperforms EfficientNet-B0 by a large margin; the shallower Scarlet-B exemplifies the proposed...
more | pdf | html
None.
###### Tweets
BrundageBot: ScarletNAS: Bridging the Gap Between Scalability and Fairness in Neural Architecture Search. Xiangxiang Chu, Bo Zhang, Jixiang Li, Qingyuan Li, and Ruijun Xu https://t.co/H7aFUMfAhE
arxiv_cs_LG: ScarletNAS: Bridging the Gap Between Scalability and Fairness in Neural Architecture Search. Xiangxiang Chu, Bo Zhang, Jixiang Li, Qingyuan Li, and Ruijun Xu https://t.co/W4JRP717Ab
None.
None.
###### Other stats
Sample Sizes : None.
Authors: 5
Total Words: 0
Unqiue Words: 0

##### #7. AI Predicts Independent Construction Safety Outcomes from Universal Attributes
###### Henrietta Baker, Matthew R. Hallowell, Antoine J. -P. Tixier
This paper significantly improves on, and finishes to validate, the approach proposed in "Application of Machine Learning to Construction Injury Prediction" (Tixier et al. 2016 [1]). Like in the original study, we use NLP to extract fundamental attributes from raw incident reports and machine learning models are trained to predict safety outcomes (here, these outcomes are injury severity, injury type, bodypart impacted, and incident type). However, in this study, safety outcomes were not extracted via NLP but are independent (human annotations), eliminating any potential source of artificial correlation between predictors and predictands. Results show that attributes are still highly predictive, confirming the validity of the original study. Other improvements brought by the current study include the use of (1) a much larger dataset, (2) two new models (XGBoost andlinear SVM), (3) model stacking, (4) a more straight forward experimental setup with more appropriate performance metrics, and (5) an analysis of per-category attribute...
more | pdf | html
None.
###### Tweets
BrundageBot: AI Predicts Independent Construction Safety Outcomes from Universal Attributes. Henrietta Baker, Matthew R. Hallowell, and Antoine J. -P. Tixier https://t.co/hzU5rcE9zF
arxiv_cs_LG: AI Predicts Independent Construction Safety Outcomes from Universal Attributes. Henrietta Baker, Matthew R. Hallowell, and Antoine J. -P. Tixier https://t.co/StKfhpgFD2
None.
None.
###### Other stats
Sample Sizes : None.
Authors: 3
Total Words: 0
Unqiue Words: 0

##### #8. NUQSGD: Improved Communication Efficiency for Data-parallel SGD via Nonuniform Quantization
###### Ali Ramezani-Kebrya, Fartash Faghri, Daniel M. Roy
As the size and complexity of models and datasets grow, so does the need for communication-efficient variants of stochastic gradient descent that can be deployed on clusters to perform model fitting in parallel. Alistarh et al. (2017) describe two variants of data-parallel SGD that quantize and encode gradients to lessen communication costs. For the first variant, QSGD, they provide strong theoretical guarantees. For the second variant, which we call QSGDinf, they demonstrate impressive empirical gains for distributed training of large neural networks. Building on their work, we propose an alternative scheme for quantizing gradients and show that it yields stronger theoretical guarantees than exist for QSGD while matching the empirical performance of QSGDinf.
more | pdf | html
None.
###### Tweets
BrundageBot: NUQSGD: Improved Communication Efficiency for Data-parallel SGD via Nonuniform Quantization. Ali Ramezani-Kebrya, Fartash Faghri, and Daniel M. Roy https://t.co/O6bigks65T
arxivml: "NUQSGD: Improved Communication Efficiency for Data-parallel SGD via Nonuniform Quantization", Ali Ramezani-Kebrya,… https://t.co/u7gAoOAsC7
arxiv_cs_LG: NUQSGD: Improved Communication Efficiency for Data-parallel SGD via Nonuniform Quantization. Ali Ramezani-Kebrya, Fartash Faghri, and Daniel M. Roy https://t.co/JwM9jGI9EL
None.
None.
###### Other stats
Sample Sizes : None.
Authors: 3
Total Words: 0
Unqiue Words: 0

##### #9. Examining the Use of Temporal-Difference Incremental Delta-Bar-Delta for Real-World Predictive Knowledge Architectures
###### Johannes Günther, Nadia M. Ady, Alex Kearney, Michael R. Dawson, Patrick M. Pilarski
Predictions and predictive knowledge have seen recent success in improving not only robot control but also other applications ranging from industrial process control to rehabilitation. A property that makes these predictive approaches well suited for robotics is that they can be learned online and incrementally through interaction with the environment. However, a remaining challenge for many prediction-learning approaches is an appropriate choice of prediction-learning parameters, especially parameters that control the magnitude of a learning machine's updates to its predictions (the learning rate or step size). To begin to address this challenge, we examine the use of online step-size adaptation using a sensor-rich robotic arm. Our method of choice, Temporal-Difference Incremental Delta-Bar-Delta (TIDBD), learns and adapts step sizes on a feature level; importantly, TIDBD allows step-size tuning and representation learning to occur at the same time. We show that TIDBD is a practical alternative for classic Temporal-Difference...
more | pdf | html
None.
###### Tweets
BrundageBot: Examining the Use of Temporal-Difference Incremental Delta-Bar-Delta for Real-World Predictive Knowledge Architectures. Johannes Günther, Nadia M. Ady, Alex Kearney, Michael R. Dawson, and Patrick M. Pilarski https://t.co/2itJl6wLve
SciFi: Examining the Use of Temporal-Difference Incremental Delta-Bar-Delta for Real-World Predictive Knowledge Architectures. https://t.co/WjFmGeywA3
None.
None.
###### Other stats
Sample Sizes : None.
Authors: 5
Total Words: 0
Unqiue Words: 0

##### #10. M-BERT: Injecting Multimodal Information in the BERT Structure
###### Wasifur Rahman, Md Kamrul Hasan, Amir Zadeh, Louis-Philippe Morency, Mohammed Ehsan Hoque
Multimodal language analysis is an emerging research area in natural language processing that models language in a multimodal manner. It aims to understand language from the modalities of text, visual, and acoustic by modeling both intra-modal and cross-modal interactions. BERT (Bidirectional Encoder Representations from Transformers) provides strong contextual language representations after training on large-scale unlabeled corpora. Fine-tuning the vanilla BERT model has shown promising results in building state-of-the-art models for diverse NLP tasks like question answering and language inference. However, fine-tuning BERT in the presence of information from other modalities remains an open research problem. In this paper, we inject multimodal information within the input space of BERT network for modeling multimodal language. The proposed injection method allows BERT to reach a new state of the art of $84.38\%$ binary accuracy on CMU-MOSI dataset (multimodal sentiment analysis) with a gap of 5.98 percent to the previous state...
more | pdf | html
None.
###### Tweets
BrundageBot: M-BERT: Injecting Multimodal Information in the BERT Structure. Wasifur Rahman, Md Kamrul Hasan, Amir Zadeh, Louis-Philippe Morency, and Mohammed Ehsan Hoque https://t.co/oUZVeujqyC
StatsPapers: M-BERT: Injecting Multimodal Information in the BERT Structure. https://t.co/Ta38W7H2kx
None.
None.
###### Other stats
Sample Sizes : None.
Authors: 5
Total Words: 0
Unqiue Words: 0

Assert is a website where the best academic papers on arXiv (computer science, math, physics), bioRxiv (biology), BITSS (reproducibility), EarthArXiv (earth science), engrXiv (engineering), LawArXiv (law), PsyArXiv (psychology), SocArXiv (social science), and SportRxiv (sport research) bubble to the top each day.

Papers are scored (in real-time) based on how verifiable they are (as determined by their Github repos) and how interesting they are (based on Twitter).

To see top papers, follow us on twitter @assertpub_ (arXiv), @assert_pub (bioRxiv), and @assertpub_dev (everything else).

To see beautiful figures extracted from papers, follow us on Instagram.

Tracking 174,809 papers.

###### Search
Sort results based on if they are interesting or reproducible.
Interesting
Reproducible
Online
###### Stats
Tracking 174,809 papers.