Top 10 Arxiv Papers Today in Machine Learning


2.746 Mikeys
#1. N2D:(Not Too) Deep clustering via clustering the local manifold of an autoencoded embedding
Ryan McConville, Raul Santos-Rodriguez, Robert J Piechocki, Ian Craddock
Deep clustering has increasingly been demonstrating superiority over conventional shallow clustering algorithms. Deep clustering algorithms usually combine representation learning with deep neural networks to achieve this performance, typically optimizing a clustering and non-clustering loss. In such cases, an autoencoder is typically connected with a clustering network, and the final clustering is jointly learned by both the autoencoder and clustering network. Instead, we propose to learn an autoencoded embedding and then search this further for the underlying manifold. For simplicity, we then cluster this with a shallow clustering algorithm, rather than a deeper network. We study a number of local and global manifold learning methods on both the raw data and autoencoded embedding, concluding that UMAP in our framework is best able to find the most clusterable manifold in the embedding, suggesting local manifold learning on an autoencoded embedding is effective for discovering higher quality discovering clusters. We...
more | pdf | html
Figures
Tweets
BrundageBot: N2D:(Not Too) Deep clustering via clustering the local manifold of an autoencoded embedding. Ryan McConville, Raul Santos-Rodriguez, Robert J Piechocki, and Ian Craddock https://t.co/1AMhoVocOk
arxiv_cs_LG: N2D:(Not Too) Deep clustering via clustering the local manifold of an autoencoded embedding. Ryan McConville, Raul Santos-Rodriguez, Robert J Piechocki, and Ian Craddock https://t.co/NBZrhujjj2
Github

A deep clustering algorithm. Code to reproduce results for the paper N2D: (Not Too) Deep clustering via clustering the local manifold of an autoencoded embedding.

Repository: n2d
User: rymc
Language: Python
Stargazers: 0
Subscribers: 1
Forks: 0
Open Issues: 0
Youtube
None.
Other stats
Sample Sizes : None.
Authors: 4
Total Words: 6412
Unqiue Words: 1961

2.701 Mikeys
#2. Performing Deep Recurrent Double Q-Learning for Atari Games
Felipe Moreno-Vera
Currently, many applications in Machine Learning are based on define new models to extract more information about data, In this case Deep Reinforcement Learning with the most common application in video games like Atari, Mario, and others causes an impact in how to computers can learning by himself with only information called rewards obtained from any action. There is a lot of algorithms modeled and implemented based on Deep Recurrent Q-Learning proposed by DeepMind used in AlphaZero and Go. In this document, We proposed Deep Recurrent Double Q-Learning that is an implementation of Deep Reinforcement Learning using Double Q-Learning algorithms and Recurrent Networks like LSTM and DRQN.
more | pdf | html
Figures
None.
Tweets
BrundageBot: Performing Deep Recurrent Double Q-Learning for Atari Games. Felipe Moreno-Vera https://t.co/DKWkfRGeKb
arxivml: "Performing Deep Recurrent Double Q-Learning for Atari Games", Felipe Moreno-Vera https://t.co/pKsn8TOFfc
arxiv_cs_LG: Performing Deep Recurrent Double Q-Learning for Atari Games. Felipe Moreno-Vera https://t.co/r97S4TB1QU
SciFi: Performing Deep Recurrent Double Q-Learning for Atari Games. https://t.co/swdxOKCtwK
Github
None.
Youtube
None.
Other stats
Sample Sizes : None.
Authors: 1
Total Words: 0
Unqiue Words: 0

2.691 Mikeys
#3. Generating Random Parameters in Feedforward Neural Networks with Random Hidden Nodes: Drawbacks of the Standard Method and How to Improve It
Grzegorz Dudek
The standard method of generating random weights and biases in feedforward neural networks with random hidden nodes, selects them both from the uniform distribution over the same fixed interval. In this work, we show the drawbacks of this approach and propose a new method of generating random parameters. This method ensures the most nonlinear fragments of sigmoids, which are most useful in modeling target function nonlinearity, are kept in the input hypercube. In addition, we show how to generate activation functions with uniformly distributed slope angles.
more | pdf | html
Figures
None.
Tweets
BrundageBot: Generating Random Parameters in Feedforward Neural Networks with Random Hidden Nodes: Drawbacks of the Standard Method and How to Improve It. Grzegorz Dudek https://t.co/rPZZmF7gN5
arxivml: "Generating Random Parameters in Feedforward Neural Networks with Random Hidden Nodes: Drawbacks of the Standard Me… https://t.co/I1VZcvWQsR
arxiv_cs_LG: Generating Random Parameters in Feedforward Neural Networks with Random Hidden Nodes: Drawbacks of the Standard Method and How to Improve It. Grzegorz Dudek https://t.co/IOF8e3VMdy
StatsPapers: Generating Random Parameters in Feedforward Neural Networks with Random Hidden Nodes: Drawbacks of the Standard Method and How to Improve It. https://t.co/wLJJ43PelS
Github
None.
Youtube
None.
Other stats
Sample Sizes : None.
Authors: 1
Total Words: 0
Unqiue Words: 0

2.479 Mikeys
#4. Linear Stochastic Bandits Under Safety Constraints
Sanae Amani, Mahnoosh Alizadeh, Christos Thrampoulidis
Bandit algorithms have various application in safety-critical systems, where it is important to respect the system constraints that rely on the bandit's unknown parameters at every round. In this paper, we formulate a linear stochastic multi-armed bandit problem with safety constraints that depend (linearly) on an unknown parameter vector. As such, the learner is unable to identify all safe actions and must act conservatively in ensuring that her actions satisfy the safety constraint at all rounds (at least with high probability). For these bandits, we propose a new UCB-based algorithm called Safe-LUCB, which includes necessary modifications to respect safety constraints. The algorithm has two phases. During the pure exploration phase the learner chooses her actions at random from a restricted set of safe actions with the goal of learning a good approximation of the entire unknown safe set. Once this goal is achieved, the algorithm begins a safe exploration-exploitation phase where the learner gradually expands their estimate of...
more | pdf | html
Figures
None.
Tweets
BrundageBot: Linear Stochastic Bandits Under Safety Constraints. Sanae Amani, Mahnoosh Alizadeh, and Christos Thrampoulidis https://t.co/OrsHj9lGLa
arxiv_cs_LG: Linear Stochastic Bandits Under Safety Constraints. Sanae Amani, Mahnoosh Alizadeh, and Christos Thrampoulidis https://t.co/LIhrmjP4l1
StatsPapers: Linear Stochastic Bandits Under Safety Constraints. https://t.co/sQavKrokOZ
Github
None.
Youtube
None.
Other stats
Sample Sizes : None.
Authors: 3
Total Words: 0
Unqiue Words: 0

2.479 Mikeys
#5. Effect of Activation Functions on the Training of Overparametrized Neural Nets
Abhishek Panigrahi, Abhishek Shetty, Navin Goyal
It is well-known that overparametrized neural networks trained using gradient-based methods quickly achieve small training error with appropriate hyperparameter settings. Recent papers have proved this statement theoretically for highly overparametrized networks under reasonable assumptions. The limiting case when the network size approaches infinity has also been considered. These results either assume that the activation function is ReLU or they crucially depend on the minimum eigenvalue of a certain Gram matrix depending on the data, random initialization and the activation function. In the latter case, existing works only prove that this minimum eigenvalue is non-zero and do not provide quantitative bounds. On the empirical side, a contemporary line of investigations has proposed a number of alternative activation functions which tend to perform better than ReLU at least in some settings but no clear understanding has emerged. This state of affairs underscores the importance of theoretically understanding the impact of...
more | pdf | html
Figures
None.
Tweets
BrundageBot: Effect of Activation Functions on the Training of Overparametrized Neural Nets. Abhishek Panigrahi, Abhishek Shetty, and Navin Goyal https://t.co/xqJGlDM80s
arxiv_cs_LG: Effect of Activation Functions on the Training of Overparametrized Neural Nets. Abhishek Panigrahi, Abhishek Shetty, and Navin Goyal https://t.co/4v80qhZ5aO
StatsPapers: Effect of Activation Functions on the Training of Overparametrized Neural Nets. https://t.co/MM35h1ox4N
Github
None.
Youtube
None.
Other stats
Sample Sizes : None.
Authors: 3
Total Words: 0
Unqiue Words: 0

2.302 Mikeys
#6. ScarletNAS: Bridging the Gap Between Scalability and Fairness in Neural Architecture Search
Xiangxiang Chu, Bo Zhang, Jixiang Li, Qingyuan Li, Ruijun Xu
One-shot neural architecture search features fast training of a supernet in a single run. A pivotal issue for this weight-sharing approach is the lacking of scalability. A simple adjustment with identity block renders a scalable supernet but it arouses unstable training, which makes the subsequent model ranking unreliable. In this paper, we introduce linearly equivalent transformation to soothe training turbulence, providing with the proof that such transformed path is identical with the original one as per representational power. The overall method is named as SCARLET (SCAlable supeRnet with Linearly Equivalent Transformation). We show through experiments that linearly equivalent transformations can indeed harmonize the supernet training. With an EfficientNet-like search space and a multi-objective reinforced evolutionary backend, it generates a series of competitive models: Scarlet-A achieves 76.9% Top-1 accuracy on ImageNet which outperforms EfficientNet-B0 by a large margin; the shallower Scarlet-B exemplifies the proposed...
more | pdf | html
Figures
None.
Tweets
BrundageBot: ScarletNAS: Bridging the Gap Between Scalability and Fairness in Neural Architecture Search. Xiangxiang Chu, Bo Zhang, Jixiang Li, Qingyuan Li, and Ruijun Xu https://t.co/H7aFUMfAhE
arxiv_cs_LG: ScarletNAS: Bridging the Gap Between Scalability and Fairness in Neural Architecture Search. Xiangxiang Chu, Bo Zhang, Jixiang Li, Qingyuan Li, and Ruijun Xu https://t.co/W4JRP717Ab
Github
None.
Youtube
None.
Other stats
Sample Sizes : None.
Authors: 5
Total Words: 0
Unqiue Words: 0

2.302 Mikeys
#7. AI Predicts Independent Construction Safety Outcomes from Universal Attributes
Henrietta Baker, Matthew R. Hallowell, Antoine J. -P. Tixier
This paper significantly improves on, and finishes to validate, the approach proposed in "Application of Machine Learning to Construction Injury Prediction" (Tixier et al. 2016 [1]). Like in the original study, we use NLP to extract fundamental attributes from raw incident reports and machine learning models are trained to predict safety outcomes (here, these outcomes are injury severity, injury type, bodypart impacted, and incident type). However, in this study, safety outcomes were not extracted via NLP but are independent (human annotations), eliminating any potential source of artificial correlation between predictors and predictands. Results show that attributes are still highly predictive, confirming the validity of the original study. Other improvements brought by the current study include the use of (1) a much larger dataset, (2) two new models (XGBoost andlinear SVM), (3) model stacking, (4) a more straight forward experimental setup with more appropriate performance metrics, and (5) an analysis of per-category attribute...
more | pdf | html
Figures
None.
Tweets
BrundageBot: AI Predicts Independent Construction Safety Outcomes from Universal Attributes. Henrietta Baker, Matthew R. Hallowell, and Antoine J. -P. Tixier https://t.co/hzU5rcE9zF
arxiv_cs_LG: AI Predicts Independent Construction Safety Outcomes from Universal Attributes. Henrietta Baker, Matthew R. Hallowell, and Antoine J. -P. Tixier https://t.co/StKfhpgFD2
Github
None.
Youtube
None.
Other stats
Sample Sizes : None.
Authors: 3
Total Words: 0
Unqiue Words: 0

2.302 Mikeys
#8. NUQSGD: Improved Communication Efficiency for Data-parallel SGD via Nonuniform Quantization
Ali Ramezani-Kebrya, Fartash Faghri, Daniel M. Roy
As the size and complexity of models and datasets grow, so does the need for communication-efficient variants of stochastic gradient descent that can be deployed on clusters to perform model fitting in parallel. Alistarh et al. (2017) describe two variants of data-parallel SGD that quantize and encode gradients to lessen communication costs. For the first variant, QSGD, they provide strong theoretical guarantees. For the second variant, which we call QSGDinf, they demonstrate impressive empirical gains for distributed training of large neural networks. Building on their work, we propose an alternative scheme for quantizing gradients and show that it yields stronger theoretical guarantees than exist for QSGD while matching the empirical performance of QSGDinf.
more | pdf | html
Figures
None.
Tweets
BrundageBot: NUQSGD: Improved Communication Efficiency for Data-parallel SGD via Nonuniform Quantization. Ali Ramezani-Kebrya, Fartash Faghri, and Daniel M. Roy https://t.co/O6bigks65T
arxivml: "NUQSGD: Improved Communication Efficiency for Data-parallel SGD via Nonuniform Quantization", Ali Ramezani-Kebrya,… https://t.co/u7gAoOAsC7
arxiv_cs_LG: NUQSGD: Improved Communication Efficiency for Data-parallel SGD via Nonuniform Quantization. Ali Ramezani-Kebrya, Fartash Faghri, and Daniel M. Roy https://t.co/JwM9jGI9EL
Github
None.
Youtube
None.
Other stats
Sample Sizes : None.
Authors: 3
Total Words: 0
Unqiue Words: 0

2.298 Mikeys
#9. Examining the Use of Temporal-Difference Incremental Delta-Bar-Delta for Real-World Predictive Knowledge Architectures
Johannes Günther, Nadia M. Ady, Alex Kearney, Michael R. Dawson, Patrick M. Pilarski
Predictions and predictive knowledge have seen recent success in improving not only robot control but also other applications ranging from industrial process control to rehabilitation. A property that makes these predictive approaches well suited for robotics is that they can be learned online and incrementally through interaction with the environment. However, a remaining challenge for many prediction-learning approaches is an appropriate choice of prediction-learning parameters, especially parameters that control the magnitude of a learning machine's updates to its predictions (the learning rate or step size). To begin to address this challenge, we examine the use of online step-size adaptation using a sensor-rich robotic arm. Our method of choice, Temporal-Difference Incremental Delta-Bar-Delta (TIDBD), learns and adapts step sizes on a feature level; importantly, TIDBD allows step-size tuning and representation learning to occur at the same time. We show that TIDBD is a practical alternative for classic Temporal-Difference...
more | pdf | html
Figures
None.
Tweets
BrundageBot: Examining the Use of Temporal-Difference Incremental Delta-Bar-Delta for Real-World Predictive Knowledge Architectures. Johannes Günther, Nadia M. Ady, Alex Kearney, Michael R. Dawson, and Patrick M. Pilarski https://t.co/2itJl6wLve
SciFi: Examining the Use of Temporal-Difference Incremental Delta-Bar-Delta for Real-World Predictive Knowledge Architectures. https://t.co/WjFmGeywA3
Github
None.
Youtube
None.
Other stats
Sample Sizes : None.
Authors: 5
Total Words: 0
Unqiue Words: 0

2.287 Mikeys
#10. M-BERT: Injecting Multimodal Information in the BERT Structure
Wasifur Rahman, Md Kamrul Hasan, Amir Zadeh, Louis-Philippe Morency, Mohammed Ehsan Hoque
Multimodal language analysis is an emerging research area in natural language processing that models language in a multimodal manner. It aims to understand language from the modalities of text, visual, and acoustic by modeling both intra-modal and cross-modal interactions. BERT (Bidirectional Encoder Representations from Transformers) provides strong contextual language representations after training on large-scale unlabeled corpora. Fine-tuning the vanilla BERT model has shown promising results in building state-of-the-art models for diverse NLP tasks like question answering and language inference. However, fine-tuning BERT in the presence of information from other modalities remains an open research problem. In this paper, we inject multimodal information within the input space of BERT network for modeling multimodal language. The proposed injection method allows BERT to reach a new state of the art of $84.38\%$ binary accuracy on CMU-MOSI dataset (multimodal sentiment analysis) with a gap of 5.98 percent to the previous state...
more | pdf | html
Figures
None.
Tweets
BrundageBot: M-BERT: Injecting Multimodal Information in the BERT Structure. Wasifur Rahman, Md Kamrul Hasan, Amir Zadeh, Louis-Philippe Morency, and Mohammed Ehsan Hoque https://t.co/oUZVeujqyC
StatsPapers: M-BERT: Injecting Multimodal Information in the BERT Structure. https://t.co/Ta38W7H2kx
Github
None.
Youtube
None.
Other stats
Sample Sizes : None.
Authors: 5
Total Words: 0
Unqiue Words: 0

About

Assert is a website where the best academic papers on arXiv (computer science, math, physics), bioRxiv (biology), BITSS (reproducibility), EarthArXiv (earth science), engrXiv (engineering), LawArXiv (law), PsyArXiv (psychology), SocArXiv (social science), and SportRxiv (sport research) bubble to the top each day.

Papers are scored (in real-time) based on how verifiable they are (as determined by their Github repos) and how interesting they are (based on Twitter).

To see top papers, follow us on twitter @assertpub_ (arXiv), @assert_pub (bioRxiv), and @assertpub_dev (everything else).

To see beautiful figures extracted from papers, follow us on Instagram.

Tracking 174,809 papers.

Search
Sort results based on if they are interesting or reproducible.
Interesting
Reproducible
Categories
All
Astrophysics
Cosmology and Nongalactic Astrophysics
Earth and Planetary Astrophysics
Astrophysics of Galaxies
High Energy Astrophysical Phenomena
Instrumentation and Methods for Astrophysics
Solar and Stellar Astrophysics
Condensed Matter
Disordered Systems and Neural Networks
Mesoscale and Nanoscale Physics
Materials Science
Other Condensed Matter
Quantum Gases
Soft Condensed Matter
Statistical Mechanics
Strongly Correlated Electrons
Superconductivity
Computer Science
Artificial Intelligence
Hardware Architecture
Computational Complexity
Computational Engineering, Finance, and Science
Computational Geometry
Computation and Language
Cryptography and Security
Computer Vision and Pattern Recognition
Computers and Society
Databases
Distributed, Parallel, and Cluster Computing
Digital Libraries
Discrete Mathematics
Data Structures and Algorithms
Emerging Technologies
Formal Languages and Automata Theory
General Literature
Graphics
Computer Science and Game Theory
Human-Computer Interaction
Information Retrieval
Information Theory
Machine Learning
Logic in Computer Science
Multiagent Systems
Multimedia
Mathematical Software
Numerical Analysis
Neural and Evolutionary Computing
Networking and Internet Architecture
Other Computer Science
Operating Systems
Performance
Programming Languages
Robotics
Symbolic Computation
Sound
Software Engineering
Social and Information Networks
Systems and Control
Economics
Econometrics
General Economics
Theoretical Economics
Electrical Engineering and Systems Science
Audio and Speech Processing
Image and Video Processing
Signal Processing
General Relativity and Quantum Cosmology
General Relativity and Quantum Cosmology
High Energy Physics - Experiment
High Energy Physics - Experiment
High Energy Physics - Lattice
High Energy Physics - Lattice
High Energy Physics - Phenomenology
High Energy Physics - Phenomenology
High Energy Physics - Theory
High Energy Physics - Theory
Mathematics
Commutative Algebra
Algebraic Geometry
Analysis of PDEs
Algebraic Topology
Classical Analysis and ODEs
Combinatorics
Category Theory
Complex Variables
Differential Geometry
Dynamical Systems
Functional Analysis
General Mathematics
General Topology
Group Theory
Geometric Topology
History and Overview
Information Theory
K-Theory and Homology
Logic
Metric Geometry
Mathematical Physics
Numerical Analysis
Number Theory
Operator Algebras
Optimization and Control
Probability
Quantum Algebra
Rings and Algebras
Representation Theory
Symplectic Geometry
Spectral Theory
Statistics Theory
Mathematical Physics
Mathematical Physics
Nonlinear Sciences
Adaptation and Self-Organizing Systems
Chaotic Dynamics
Cellular Automata and Lattice Gases
Pattern Formation and Solitons
Exactly Solvable and Integrable Systems
Nuclear Experiment
Nuclear Experiment
Nuclear Theory
Nuclear Theory
Physics
Accelerator Physics
Atmospheric and Oceanic Physics
Applied Physics
Atomic and Molecular Clusters
Atomic Physics
Biological Physics
Chemical Physics
Classical Physics
Computational Physics
Data Analysis, Statistics and Probability
Physics Education
Fluid Dynamics
General Physics
Geophysics
History and Philosophy of Physics
Instrumentation and Detectors
Medical Physics
Optics
Plasma Physics
Popular Physics
Physics and Society
Space Physics
Quantitative Biology
Biomolecules
Cell Behavior
Genomics
Molecular Networks
Neurons and Cognition
Other Quantitative Biology
Populations and Evolution
Quantitative Methods
Subcellular Processes
Tissues and Organs
Quantitative Finance
Computational Finance
Economics
General Finance
Mathematical Finance
Portfolio Management
Pricing of Securities
Risk Management
Statistical Finance
Trading and Market Microstructure
Quantum Physics
Quantum Physics
Statistics
Applications
Computation
Methodology
Machine Learning
Other Statistics
Statistics Theory
Feedback
Online
Stats
Tracking 174,809 papers.