Deep clustering has increasingly been demonstrating superiority over
conventional shallow clustering algorithms. Deep clustering algorithms usually
combine representation learning with deep neural networks to achieve this
performance, typically optimizing a clustering and non-clustering loss. In such
cases, an autoencoder is typically connected with a clustering network, and the
final clustering is jointly learned by both the autoencoder and clustering
network. Instead, we propose to learn an autoencoded embedding and then search
this further for the underlying manifold. For simplicity, we then cluster this
with a shallow clustering algorithm, rather than a deeper network. We study a
number of local and global manifold learning methods on both the raw data and
autoencoded embedding, concluding that UMAP in our framework is best able to
find the most clusterable manifold in the embedding, suggesting local manifold
learning on an autoencoded embedding is effective for discovering higher
quality discovering clusters. We...

more |
pdf
| html
BrundageBot:
N2D:(Not Too) Deep clustering via clustering the local manifold of an autoencoded embedding. Ryan McConville, Raul Santos-Rodriguez, Robert J Piechocki, and Ian Craddock https://t.co/1AMhoVocOk

arxiv_cs_LG:
N2D:(Not Too) Deep clustering via clustering the local manifold of an autoencoded embedding. Ryan McConville, Raul Santos-Rodriguez, Robert J Piechocki, and Ian Craddock https://t.co/NBZrhujjj2

A deep clustering algorithm. Code to reproduce results for the paper N2D: (Not Too) Deep clustering via clustering the local manifold of an autoencoded embedding.

None.

Sample Sizes : None.

Authors: 4

Total Words: 6412

Unqiue Words: 1961

Currently, many applications in Machine Learning are based on define new
models to extract more information about data, In this case Deep Reinforcement
Learning with the most common application in video games like Atari, Mario, and
others causes an impact in how to computers can learning by himself with only
information called rewards obtained from any action. There is a lot of
algorithms modeled and implemented based on Deep Recurrent Q-Learning proposed
by DeepMind used in AlphaZero and Go. In this document, We proposed Deep
Recurrent Double Q-Learning that is an implementation of Deep Reinforcement
Learning using Double Q-Learning algorithms and Recurrent Networks like LSTM
and DRQN.

more |
pdf
| html
None.

BrundageBot:
Performing Deep Recurrent Double Q-Learning for Atari Games. Felipe Moreno-Vera https://t.co/DKWkfRGeKb

arxivml:
"Performing Deep Recurrent Double Q-Learning for Atari Games",
Felipe Moreno-Vera
https://t.co/pKsn8TOFfc

arxiv_cs_LG:
Performing Deep Recurrent Double Q-Learning for Atari Games. Felipe Moreno-Vera https://t.co/r97S4TB1QU

SciFi:
Performing Deep Recurrent Double Q-Learning for Atari Games. https://t.co/swdxOKCtwK

None.

None.

Sample Sizes : None.

Authors: 1

Total Words: 0

Unqiue Words: 0

The standard method of generating random weights and biases in feedforward
neural networks with random hidden nodes, selects them both from the uniform
distribution over the same fixed interval. In this work, we show the drawbacks
of this approach and propose a new method of generating random parameters. This
method ensures the most nonlinear fragments of sigmoids, which are most useful
in modeling target function nonlinearity, are kept in the input hypercube. In
addition, we show how to generate activation functions with uniformly
distributed slope angles.

more |
pdf
| html
None.

BrundageBot:
Generating Random Parameters in Feedforward Neural Networks with Random Hidden Nodes: Drawbacks of the Standard Method and How to Improve It. Grzegorz Dudek https://t.co/rPZZmF7gN5

arxivml:
"Generating Random Parameters in Feedforward Neural Networks with Random Hidden Nodes: Drawbacks of the Standard Me…
https://t.co/I1VZcvWQsR

arxiv_cs_LG:
Generating Random Parameters in Feedforward Neural Networks with Random Hidden Nodes: Drawbacks of the Standard Method and How to Improve It. Grzegorz Dudek https://t.co/IOF8e3VMdy

StatsPapers:
Generating Random Parameters in Feedforward Neural Networks with Random Hidden Nodes: Drawbacks of the Standard Method and How to Improve It. https://t.co/wLJJ43PelS

None.

None.

Sample Sizes : None.

Authors: 1

Total Words: 0

Unqiue Words: 0

Bandit algorithms have various application in safety-critical systems, where
it is important to respect the system constraints that rely on the bandit's
unknown parameters at every round. In this paper, we formulate a linear
stochastic multi-armed bandit problem with safety constraints that depend
(linearly) on an unknown parameter vector. As such, the learner is unable to
identify all safe actions and must act conservatively in ensuring that her
actions satisfy the safety constraint at all rounds (at least with high
probability). For these bandits, we propose a new UCB-based algorithm called
Safe-LUCB, which includes necessary modifications to respect safety
constraints. The algorithm has two phases. During the pure exploration phase
the learner chooses her actions at random from a restricted set of safe actions
with the goal of learning a good approximation of the entire unknown safe set.
Once this goal is achieved, the algorithm begins a safe
exploration-exploitation phase where the learner gradually expands their
estimate of...

more |
pdf
| html
None.

BrundageBot:
Linear Stochastic Bandits Under Safety Constraints. Sanae Amani, Mahnoosh Alizadeh, and Christos Thrampoulidis https://t.co/OrsHj9lGLa

arxiv_cs_LG:
Linear Stochastic Bandits Under Safety Constraints. Sanae Amani, Mahnoosh Alizadeh, and Christos Thrampoulidis https://t.co/LIhrmjP4l1

StatsPapers:
Linear Stochastic Bandits Under Safety Constraints. https://t.co/sQavKrokOZ

None.

None.

Sample Sizes : None.

Authors: 3

Total Words: 0

Unqiue Words: 0

It is well-known that overparametrized neural networks trained using
gradient-based methods quickly achieve small training error with appropriate
hyperparameter settings. Recent papers have proved this statement theoretically
for highly overparametrized networks under reasonable assumptions. The limiting
case when the network size approaches infinity has also been considered. These
results either assume that the activation function is ReLU or they crucially
depend on the minimum eigenvalue of a certain Gram matrix depending on the
data, random initialization and the activation function. In the latter case,
existing works only prove that this minimum eigenvalue is non-zero and do not
provide quantitative bounds. On the empirical side, a contemporary line of
investigations has proposed a number of alternative activation functions which
tend to perform better than ReLU at least in some settings but no clear
understanding has emerged. This state of affairs underscores the importance of
theoretically understanding the impact of...

more |
pdf
| html
None.

BrundageBot:
Effect of Activation Functions on the Training of Overparametrized Neural Nets. Abhishek Panigrahi, Abhishek Shetty, and Navin Goyal https://t.co/xqJGlDM80s

arxiv_cs_LG:
Effect of Activation Functions on the Training of Overparametrized Neural Nets. Abhishek Panigrahi, Abhishek Shetty, and Navin Goyal https://t.co/4v80qhZ5aO

StatsPapers:
Effect of Activation Functions on the Training of Overparametrized Neural Nets. https://t.co/MM35h1ox4N

None.

None.

Sample Sizes : None.

Authors: 3

Total Words: 0

Unqiue Words: 0

One-shot neural architecture search features fast training of a supernet in a
single run. A pivotal issue for this weight-sharing approach is the lacking of
scalability. A simple adjustment with identity block renders a scalable
supernet but it arouses unstable training, which makes the subsequent model
ranking unreliable. In this paper, we introduce linearly equivalent
transformation to soothe training turbulence, providing with the proof that
such transformed path is identical with the original one as per
representational power. The overall method is named as SCARLET (SCAlable
supeRnet with Linearly Equivalent Transformation). We show through experiments
that linearly equivalent transformations can indeed harmonize the supernet
training. With an EfficientNet-like search space and a multi-objective
reinforced evolutionary backend, it generates a series of competitive models:
Scarlet-A achieves 76.9% Top-1 accuracy on ImageNet which outperforms
EfficientNet-B0 by a large margin; the shallower Scarlet-B exemplifies the
proposed...

more |
pdf
| html
None.

BrundageBot:
ScarletNAS: Bridging the Gap Between Scalability and Fairness in Neural Architecture Search. Xiangxiang Chu, Bo Zhang, Jixiang Li, Qingyuan Li, and Ruijun Xu https://t.co/H7aFUMfAhE

arxiv_cs_LG:
ScarletNAS: Bridging the Gap Between Scalability and Fairness in Neural Architecture Search. Xiangxiang Chu, Bo Zhang, Jixiang Li, Qingyuan Li, and Ruijun Xu https://t.co/W4JRP717Ab

None.

None.

Sample Sizes : None.

Authors: 5

Total Words: 0

Unqiue Words: 0

This paper significantly improves on, and finishes to validate, the approach
proposed in "Application of Machine Learning to Construction Injury Prediction"
(Tixier et al. 2016 [1]). Like in the original study, we use NLP to extract
fundamental attributes from raw incident reports and machine learning models
are trained to predict safety outcomes (here, these outcomes are injury
severity, injury type, bodypart impacted, and incident type). However, in this
study, safety outcomes were not extracted via NLP but are independent (human
annotations), eliminating any potential source of artificial correlation
between predictors and predictands. Results show that attributes are still
highly predictive, confirming the validity of the original study. Other
improvements brought by the current study include the use of (1) a much larger
dataset, (2) two new models (XGBoost andlinear SVM), (3) model stacking, (4) a
more straight forward experimental setup with more appropriate performance
metrics, and (5) an analysis of per-category attribute...

more |
pdf
| html
None.

BrundageBot:
AI Predicts Independent Construction Safety Outcomes from Universal Attributes. Henrietta Baker, Matthew R. Hallowell, and Antoine J. -P. Tixier https://t.co/hzU5rcE9zF

arxiv_cs_LG:
AI Predicts Independent Construction Safety Outcomes from Universal Attributes. Henrietta Baker, Matthew R. Hallowell, and Antoine J. -P. Tixier https://t.co/StKfhpgFD2

None.

None.

Sample Sizes : None.

Authors: 3

Total Words: 0

Unqiue Words: 0

As the size and complexity of models and datasets grow, so does the need for
communication-efficient variants of stochastic gradient descent that can be
deployed on clusters to perform model fitting in parallel. Alistarh et al.
(2017) describe two variants of data-parallel SGD that quantize and encode
gradients to lessen communication costs. For the first variant, QSGD, they
provide strong theoretical guarantees. For the second variant, which we call
QSGDinf, they demonstrate impressive empirical gains for distributed training
of large neural networks. Building on their work, we propose an alternative
scheme for quantizing gradients and show that it yields stronger theoretical
guarantees than exist for QSGD while matching the empirical performance of
QSGDinf.

more |
pdf
| html
None.

BrundageBot:
NUQSGD: Improved Communication Efficiency for Data-parallel SGD via Nonuniform Quantization. Ali Ramezani-Kebrya, Fartash Faghri, and Daniel M. Roy https://t.co/O6bigks65T

arxivml:
"NUQSGD: Improved Communication Efficiency for Data-parallel SGD via Nonuniform Quantization",
Ali Ramezani-Kebrya,…
https://t.co/u7gAoOAsC7

arxiv_cs_LG:
NUQSGD: Improved Communication Efficiency for Data-parallel SGD via Nonuniform Quantization. Ali Ramezani-Kebrya, Fartash Faghri, and Daniel M. Roy https://t.co/JwM9jGI9EL

None.

None.

Sample Sizes : None.

Authors: 3

Total Words: 0

Unqiue Words: 0

Predictions and predictive knowledge have seen recent success in improving
not only robot control but also other applications ranging from industrial
process control to rehabilitation. A property that makes these predictive
approaches well suited for robotics is that they can be learned online and
incrementally through interaction with the environment. However, a remaining
challenge for many prediction-learning approaches is an appropriate choice of
prediction-learning parameters, especially parameters that control the
magnitude of a learning machine's updates to its predictions (the learning rate
or step size). To begin to address this challenge, we examine the use of online
step-size adaptation using a sensor-rich robotic arm. Our method of choice,
Temporal-Difference Incremental Delta-Bar-Delta (TIDBD), learns and adapts step
sizes on a feature level; importantly, TIDBD allows step-size tuning and
representation learning to occur at the same time. We show that TIDBD is a
practical alternative for classic Temporal-Difference...

more |
pdf
| html
None.

BrundageBot:
Examining the Use of Temporal-Difference Incremental Delta-Bar-Delta for Real-World Predictive Knowledge Architectures. Johannes Günther, Nadia M. Ady, Alex Kearney, Michael R. Dawson, and Patrick M. Pilarski https://t.co/2itJl6wLve

SciFi:
Examining the Use of Temporal-Difference Incremental Delta-Bar-Delta for Real-World Predictive Knowledge Architectures. https://t.co/WjFmGeywA3

None.

None.

Sample Sizes : None.

Authors: 5

Total Words: 0

Unqiue Words: 0

Multimodal language analysis is an emerging research area in natural language
processing that models language in a multimodal manner. It aims to understand
language from the modalities of text, visual, and acoustic by modeling both
intra-modal and cross-modal interactions. BERT (Bidirectional Encoder
Representations from Transformers) provides strong contextual language
representations after training on large-scale unlabeled corpora. Fine-tuning
the vanilla BERT model has shown promising results in building state-of-the-art
models for diverse NLP tasks like question answering and language inference.
However, fine-tuning BERT in the presence of information from other modalities
remains an open research problem. In this paper, we inject multimodal
information within the input space of BERT network for modeling multimodal
language. The proposed injection method allows BERT to reach a new state of the
art of $84.38\%$ binary accuracy on CMU-MOSI dataset (multimodal sentiment
analysis) with a gap of 5.98 percent to the previous state...

more |
pdf
| html
None.

BrundageBot:
M-BERT: Injecting Multimodal Information in the BERT Structure. Wasifur Rahman, Md Kamrul Hasan, Amir Zadeh, Louis-Philippe Morency, and Mohammed Ehsan Hoque https://t.co/oUZVeujqyC

StatsPapers:
M-BERT: Injecting Multimodal Information in the BERT Structure. https://t.co/Ta38W7H2kx

None.

None.

Sample Sizes : None.

Authors: 5

Total Words: 0

Unqiue Words: 0

Assert is a website where the best academic papers on arXiv (computer science, math, physics), bioRxiv (biology), BITSS (reproducibility), EarthArXiv (earth science), engrXiv (engineering), LawArXiv (law), PsyArXiv (psychology), SocArXiv (social science), and SportRxiv (sport research) bubble to the top each day.

Papers are scored (in real-time) based on how verifiable they are (as determined by their Github repos) and how interesting they are (based on Twitter).

To see top papers, follow us on twitter @assertpub_ (arXiv), @assert_pub (bioRxiv), and @assertpub_dev (everything else).

To see beautiful figures extracted from papers, follow us on Instagram.

*Tracking 174,809 papers.*

Sort results based on if they are interesting or reproducible.

Interesting

Reproducible