Top 8 Arxiv Papers Today in Audio And Speech Processing


2.047 Mikeys
#1. Learn Spelling from Teachers: Transferring Knowledge from Language Models to Sequence-to-Sequence Speech Recognition
Ye Bai, Jiangyan Yi, Jianhua Tao, Zhengkun Tian, Zhengqi Wen
Integrating an external language model into a sequence-to-sequence speech recognition system is non-trivial. Previous works utilize linear interpolation or a fusion network to integrate external language models. However, these approaches introduce external components, and increase decoding computation. In this paper, we instead propose a knowledge distillation based training approach to integrating external language models into a sequence-to-sequence model. A recurrent neural network language model, which is trained on large scale external text, generates soft labels to guide the sequence-to-sequence model training. Thus, the language model plays the role of the teacher. This approach does not add any external component to the sequence-to-sequence model during testing. And this approach is flexible to be combined with shallow fusion technique together for decoding. The experiments are conducted on public Chinese datasets AISHELL-1 and CLMAD. Our approach achieves a character error rate of 9.3%, which is relatively reduced by...
more | pdf | html
Figures
None.
Tweets
BrundageBot: Learn Spelling from Teachers: Transferring Knowledge from Language Models to Sequence-to-Sequence Speech Recognition. Ye Bai, Jiangyan Yi, Jianhua Tao, Zhengkun Tian, and Zhengqi Wen https://t.co/Lju71uZ1Hh
arxiv_cscl: Learn Spelling from Teachers: Transferring Knowledge from Language Models to Sequence-to-Sequence Speech Recognition https://t.co/eLhrdSSSR2
Github
None.
Youtube
None.
Other stats
Sample Sizes : None.
Authors: 5
Total Words: 0
Unqiue Words: 0

2.038 Mikeys
#2. Effective Incorporation of Speaker Information in Utterance Encoding in Dialog
Tianyu Zhao, Tatsuya Kawahara
In dialog studies, we often encode a dialog using a hierarchical encoder where each utterance is converted into an utterance vector, and then a sequence of utterance vectors is converted into a dialog vector. Since knowing who produced which utterance is essential to understanding a dialog, conventional methods tried integrating speaker labels into utterance vectors. We found the method problematic in some cases where speaker annotations are inconsistent among different dialogs. A relative speaker modeling method is proposed to address the problem. Experimental evaluations on dialog act recognition and response generation show that the proposed method yields superior and more consistent performances.
more | pdf | html
Figures
None.
Tweets
BrundageBot: Effective Incorporation of Speaker Information in Utterance Encoding in Dialog. Tianyu Zhao and Tatsuya Kawahara https://t.co/djf8R6GH9G
arxiv_in_review: #acl2019nlp Effective Incorporation of Speaker Information in Utterance Encoding in Dialog. (arXiv:1907.05599v1 [eess\.AS]) https://t.co/m7HnLF6uqI
arxivml: "Effective Incorporation of Speaker Information in Utterance Encoding in Dialog", Tianyu Zhao, Tatsuya Kawahara https://t.co/SdL8jbtYmb
arxiv_cs_LG: Effective Incorporation of Speaker Information in Utterance Encoding in Dialog. Tianyu Zhao and Tatsuya Kawahara https://t.co/5S2BiRbhNF
Memoirs: Effective Incorporation of Speaker Information in Utterance Encoding in Dialog. https://t.co/OP1iQftAMk
arxiv_cscl: Effective Incorporation of Speaker Information in Utterance Encoding in Dialog https://t.co/UiG4CDu81P
Github
None.
Youtube
None.
Other stats
Sample Sizes : None.
Authors: 2
Total Words: 5166
Unqiue Words: 1671

2.036 Mikeys
#3. Speaker Recognition with Random Digit Strings Using Uncertainty Normalized HMM-based i-vectors
Nooshin Maghsoodi, Hossein Sameti, Hossein Zeinali, Themos~Stafylakis
In this paper, we combine Hidden Markov Models (HMMs) with i-vector extractors to address the problem of text-dependent speaker recognition with random digit strings. We employ digit-specific HMMs to segment the utterances into digits, to perform frame alignment to HMM states and to extract Baum-Welch statistics. By making use of the natural partition of input features into digits, we train digit-specific i-vector extractors on top of each HMM and we extract well-localized i-vectors, each modelling merely the phonetic content corresponding to a single digit. We then examine ways to perform channel and uncertainty compensation, and we propose a novel method for using the uncertainty in the i-vector estimates. The experiments on RSR2015 part III show that the proposed method attains 1.52\% and 1.77\% Equal Error Rate (EER) for male and female respectively, outperforming state-of-the-art methods such as x-vectors, trained on vast amounts of data. Furthermore, these results are attained by a single system trained entirely on RSR2015,...
more | pdf | html
Figures
Tweets
arxiv_cscl: Speaker Recognition with Random Digit Strings Using Uncertainty Normalized HMM-based i-vectors https://t.co/OIikOkBv3a
arxiv_cscl: Speaker Recognition with Random Digit Strings Using Uncertainty Normalized HMM-based i-vectors https://t.co/OIikOkjTEA
Github
None.
Youtube
None.
Other stats
Sample Sizes : None.
Authors: 4
Total Words: 9357
Unqiue Words: 2724

2.015 Mikeys
#4. Teach an all-rounder with experts in different domains
Zhao You, Dan Su, Dong Yu
In many automatic speech recognition (ASR) tasks, an ideal model has to be applicable over multiple domains. In this paper, we propose to teach an all-rounder with experts in different domains. Concretely, we build a multi-domain acoustic model by applying the teacher-student training framework. First, for each domain, a teacher model (domain-dependent model) is trained by fine-tuning a multi-condition model with domain-specific subset. Then all these teacher models are used to teach one single student model simultaneously. We perform experiments on two predefined domain setups. One is domains with different speaking styles, the other is nearfield, far-field and far-field with noise. Moreover, two types of models are examined: deep feedforward sequential memory network (DFSMN) and long short term memory (LSTM). Experimental results show that the model trained with this framework outperforms not only multi-condition model but also domain-dependent model. Specially, our training method provides up to 10.4% relative character error...
more | pdf | html
Figures
None.
Tweets
arxivml: "Teach an all-rounder with experts in different domains", Zhao You, Dan Su, Dong Yu https://t.co/6bh8IYkvef
arxiv_cscl: Teach an all-rounder with experts in different domains https://t.co/xv7e4sOtZi
arxiv_cscl: Teach an all-rounder with experts in different domains https://t.co/xv7e4sOtZi
Github
None.
Youtube
None.
Other stats
Sample Sizes : None.
Authors: 3
Total Words: 0
Unqiue Words: 0

2.012 Mikeys
#5. A Highly Efficient Distributed Deep Learning System For Automatic Speech Recognition
Wei Zhang, Xiaodong Cui, Ulrich Finkler, George Saon, Abdullah Kayi, Alper Buyuktosunoglu, Brian Kingsbury, David Kung, Michael Picheny
Modern Automatic Speech Recognition (ASR) systems rely on distributed deep learning to for quick training completion. To enable efficient distributed training, it is imperative that the training algorithms can converge with a large mini-batch size. In this work, we discovered that Asynchronous Decentralized Parallel Stochastic Gradient Descent (ADPSGD) can work with much larger batch size than commonly used Synchronous SGD (SSGD) algorithm. On commonly used public SWB-300 and SWB-2000 ASR datasets, ADPSGD can converge with a batch size 3X as large as the one used in SSGD, thus enable training at a much larger scale. Further, we proposed a Hierarchical-ADPSGD (H-ADPSGD) system in which learners on the same computing node construct a super learner via a fast allreduce implementation, and super learners deploy ADPSGD algorithm among themselves. On a 64 Nvidia V100 GPU cluster connected via a 100Gb/s Ethernet network, our system is able to train SWB-2000 to reach a 7.6% WER on the Hub5-2000 Switchboard (SWB) test-set and a 13.2% WER...
more | pdf | html
Figures
None.
Tweets
arxivml: "A Highly Efficient Distributed Deep Learning System For Automatic Speech Recognition", Wei Zhang, Xiaodong Cui, Ul… https://t.co/9G7JwOASzk
StatsPapers: A Highly Efficient Distributed Deep Learning System For Automatic Speech Recognition. https://t.co/Ad1VQMrqLX
tiagoooliveira: RT @StatsPapers: A Highly Efficient Distributed Deep Learning System For Automatic Speech Recognition. https://t.co/Ad1VQMrqLX
Github
None.
Youtube
None.
Other stats
Sample Sizes : None.
Authors: 9
Total Words: 3833
Unqiue Words: 1378

2.009 Mikeys
#6. Voice Pathology Detection Using Deep Learning: a Preliminary Study
Pavol Harar, Jesus B. Alonso-Hernandez, Jiri Mekyska, Zoltan Galaz, Radim Burget, Zdenek Smekal
This paper describes a preliminary investigation of Voice Pathology Detection using Deep Neural Networks (DNN). We used voice recordings of sustained vowel /a/ produced at normal pitch from German corpus Saarbruecken Voice Database (SVD). This corpus contains voice recordings and electroglottograph signals of more than 2 000 speakers. The idea behind this experiment is the use of convolutional layers in combination with recurrent Long-Short-Term-Memory (LSTM) layers on raw audio signal. Each recording was split into 64 ms Hamming windowed segments with 30 ms overlap. Our trained model achieved 71.36% accuracy with 65.04% sensitivity and 77.67% specificity on 206 validation files and 68.08% accuracy with 66.75% sensitivity and 77.89% specificity on 874 testing files. This is a promising result in favor of this approach because it is comparable to similar previously published experiment that used different methodology. Further investigation is needed to achieve the state-of-the-art results.
more | pdf | html
Figures
None.
Tweets
Memoirs: Voice Pathology Detection Using Deep Learning: a Preliminary Study. https://t.co/oJ8NsWAWgE
Github
None.
Youtube
None.
Other stats
Sample Sizes : None.
Authors: 6
Total Words: 3565
Unqiue Words: 1378

2.009 Mikeys
#7. Deep auscultation: Predicting respiratory anomalies and diseases via recurrent neural networks
Diego Perna, Andrea Tagarelli
Respiratory diseases are among the most common causes of severe illness and death worldwide. Prevention and early diagnosis are essential to limit or even reverse the trend that characterizes the diffusion of such diseases. In this regard, the development of advanced computational tools for the analysis of respiratory auscultation sounds can become a game changer for detecting disease-related anomalies, or diseases themselves. In this work, we propose a novel learning framework for respiratory auscultation sound data. Our approach combines state-of-the-art feature extraction techniques and advanced deep-neural-network architectures. Remarkably, to the best of our knowledge, we are the first to model a recurrent-neural-network based learning framework to support the clinician in detecting respiratory diseases, at either level of abnormal sounds or pathology classes. Results obtained on the ICBHI benchmark dataset show that our approach outperforms competing methods on both anomaly-driven and pathology-driven prediction tasks, thus...
more | pdf | html
Figures
Tweets
arxivml: "Deep auscultation: Predicting respiratory anomalies and diseases via recurrent neural networks", Diego Perna, Andr… https://t.co/o4bBkfMnVS
Memoirs: Deep auscultation: Predicting respiratory anomalies and diseases via recurrent neural networks. https://t.co/ytTNisk0eZ
Github
None.
Youtube
None.
Other stats
Sample Sizes : None.
Authors: 2
Total Words: 5288
Unqiue Words: 1944

2.007 Mikeys
#8. BUT VOiCES 2019 System Description
Hossein Zeinali, Pavel Matějka, Ladislav Mošner, Oldřich Plchot, Anna Silnova, Ondřej Novotný, Ján Profant, Ondřej Glembek, Lukáš Burget
This is a description of our effort in VOiCES 2019 Speaker Recognition challenge. All systems in the fixed condition are based on the x-vector paradigm with different features and DNN topologies. The single best system reaches 1.2% EER and a fusion of 3 systems yields 1.0% EER, which is 15% relative improvement. The open condition allowed us to use external data which we did for the PLDA adaptation and achieved less than ~10% relative improvement. In the submission to open condition, we used 3 x-vector systems and also one i-vector based system.
more | pdf | html
Figures
None.
Tweets
arxiv_cscl: BUT VOiCES 2019 System Description https://t.co/6uTKlrewpH
Github
None.
Youtube
None.
Other stats
Sample Sizes : None.
Authors: 9
Total Words: 3179
Unqiue Words: 1359

About

Assert is a website where the best academic papers on arXiv (computer science, math, physics), bioRxiv (biology), BITSS (reproducibility), EarthArXiv (earth science), engrXiv (engineering), LawArXiv (law), PsyArXiv (psychology), SocArXiv (social science), and SportRxiv (sport research) bubble to the top each day.

Papers are scored (in real-time) based on how verifiable they are (as determined by their Github repos) and how interesting they are (based on Twitter).

To see top papers, follow us on twitter @assertpub_ (arXiv), @assert_pub (bioRxiv), and @assertpub_dev (everything else).

To see beautiful figures extracted from papers, follow us on Instagram.

Tracking 158,360 papers.

Search
Sort results based on if they are interesting or reproducible.
Interesting
Reproducible
Categories
All
Astrophysics
Cosmology and Nongalactic Astrophysics
Earth and Planetary Astrophysics
Astrophysics of Galaxies
High Energy Astrophysical Phenomena
Instrumentation and Methods for Astrophysics
Solar and Stellar Astrophysics
Condensed Matter
Disordered Systems and Neural Networks
Mesoscale and Nanoscale Physics
Materials Science
Other Condensed Matter
Quantum Gases
Soft Condensed Matter
Statistical Mechanics
Strongly Correlated Electrons
Superconductivity
Computer Science
Artificial Intelligence
Hardware Architecture
Computational Complexity
Computational Engineering, Finance, and Science
Computational Geometry
Computation and Language
Cryptography and Security
Computer Vision and Pattern Recognition
Computers and Society
Databases
Distributed, Parallel, and Cluster Computing
Digital Libraries
Discrete Mathematics
Data Structures and Algorithms
Emerging Technologies
Formal Languages and Automata Theory
General Literature
Graphics
Computer Science and Game Theory
Human-Computer Interaction
Information Retrieval
Information Theory
Machine Learning
Logic in Computer Science
Multiagent Systems
Multimedia
Mathematical Software
Numerical Analysis
Neural and Evolutionary Computing
Networking and Internet Architecture
Other Computer Science
Operating Systems
Performance
Programming Languages
Robotics
Symbolic Computation
Sound
Software Engineering
Social and Information Networks
Systems and Control
Economics
Econometrics
General Economics
Theoretical Economics
Electrical Engineering and Systems Science
Audio and Speech Processing
Image and Video Processing
Signal Processing
General Relativity and Quantum Cosmology
General Relativity and Quantum Cosmology
High Energy Physics - Experiment
High Energy Physics - Experiment
High Energy Physics - Lattice
High Energy Physics - Lattice
High Energy Physics - Phenomenology
High Energy Physics - Phenomenology
High Energy Physics - Theory
High Energy Physics - Theory
Mathematics
Commutative Algebra
Algebraic Geometry
Analysis of PDEs
Algebraic Topology
Classical Analysis and ODEs
Combinatorics
Category Theory
Complex Variables
Differential Geometry
Dynamical Systems
Functional Analysis
General Mathematics
General Topology
Group Theory
Geometric Topology
History and Overview
Information Theory
K-Theory and Homology
Logic
Metric Geometry
Mathematical Physics
Numerical Analysis
Number Theory
Operator Algebras
Optimization and Control
Probability
Quantum Algebra
Rings and Algebras
Representation Theory
Symplectic Geometry
Spectral Theory
Statistics Theory
Mathematical Physics
Mathematical Physics
Nonlinear Sciences
Adaptation and Self-Organizing Systems
Chaotic Dynamics
Cellular Automata and Lattice Gases
Pattern Formation and Solitons
Exactly Solvable and Integrable Systems
Nuclear Experiment
Nuclear Experiment
Nuclear Theory
Nuclear Theory
Physics
Accelerator Physics
Atmospheric and Oceanic Physics
Applied Physics
Atomic and Molecular Clusters
Atomic Physics
Biological Physics
Chemical Physics
Classical Physics
Computational Physics
Data Analysis, Statistics and Probability
Physics Education
Fluid Dynamics
General Physics
Geophysics
History and Philosophy of Physics
Instrumentation and Detectors
Medical Physics
Optics
Plasma Physics
Popular Physics
Physics and Society
Space Physics
Quantitative Biology
Biomolecules
Cell Behavior
Genomics
Molecular Networks
Neurons and Cognition
Other Quantitative Biology
Populations and Evolution
Quantitative Methods
Subcellular Processes
Tissues and Organs
Quantitative Finance
Computational Finance
Economics
General Finance
Mathematical Finance
Portfolio Management
Pricing of Securities
Risk Management
Statistical Finance
Trading and Market Microstructure
Quantum Physics
Quantum Physics
Statistics
Applications
Computation
Methodology
Machine Learning
Other Statistics
Statistics Theory
Feedback
Online
Stats
Tracking 158,360 papers.