Top 8 Arxiv Papers Today in Sound


2.052 Mikeys
#1. Joint DNN-Based Multichannel Reduction of Acoustic Echo, Reverberation and Noise
Guillaume Carbajal, Romain Serizel, Emmanuel Vincent, Eric Humbert
We consider the problem of simultaneous reduction of acoustic echo, reverberation and noise. In real scenarios, these distortion sources may occur simultaneously and reducing them implies combining the corresponding distortion-specific filters. As these filters interact with each other, they must be jointly optimized. We propose to model the target and residual signals after linear echo cancellation and dereverberation using a multichannel Gaussian modeling framework and to jointly represent their spectra by means of a neural network. We develop an iterative block-coordinate ascent algorithm to update all the filters. We evaluate our system on real recordings of acoustic echo, reverberation and noise acquired with a smart speaker in various situations. The proposed approach outperforms in terms of overall distortion a cascade of the individual approaches and a joint reduction approach which does not rely on a spectral model of the target and residual signals.
more | pdf | html
Figures
None.
Tweets
arxivml: "Joint DNN-Based Multichannel Reduction of Acoustic Echo, Reverberation and Noise", Guillaume Carbajal, Romain Seri… https://t.co/vlTKLD0I0l
arxiv_cs_LG: Joint DNN-Based Multichannel Reduction of Acoustic Echo, Reverberation and Noise. Guillaume Carbajal, Romain Serizel, Emmanuel Vincent, and Eric Humbert https://t.co/CccPz9Tlbq
StatsPapers: Joint DNN-Based Multichannel Reduction of Acoustic Echo, Reverberation and Noise. https://t.co/duyW8AUrQb
Github
None.
Youtube
None.
Other stats
Sample Sizes : None.
Authors: 4
Total Words: 0
Unqiue Words: 0

2.008 Mikeys
#2. Improving Universal Sound Separation Using Sound Classification
Efthymios Tzinis, Scott Wisdom, John R. Hershey, Aren Jansen, Daniel P. W. Ellis
Deep learning approaches have recently achieved impressive performance on both audio source separation and sound classification. Most audio source separation approaches focus only on separating sources belonging to a restricted domain of source classes, such as speech and music. However, recent work has demonstrated the possibility of "universal sound separation", which aims to separate acoustic sources from an open domain, regardless of their class. In this paper, we utilize the semantic information learned by sound classifier networks trained on a vast amount of diverse sounds to improve universal sound separation. In particular, we show that semantic embeddings extracted from a sound classifier can be used to condition a separation network, providing it with useful additional information. This approach is especially useful in an iterative setup, where source estimates from an initial separation stage and their corresponding classifier-derived embeddings are fed to a second separation network. By performing a thorough...
more | pdf | html
Figures
Tweets
BrundageBot: Improving Universal Sound Separation Using Sound Classification. Efthymios Tzinis, Scott Wisdom, John R. Hershey, Aren Jansen, and Daniel P. W. Ellis https://t.co/QybNf8A72x
StatsPapers: Improving Universal Sound Separation Using Sound Classification. https://t.co/vZZdYpamih
Github
None.
Youtube
None.
Other stats
Sample Sizes : None.
Authors: 5
Total Words: 4265
Unqiue Words: 1491

2.001 Mikeys
#3. Alternating Between Spectral and Spatial Estimation for Speech Separation and Enhancement
Zhong-Qiu Wang, Scott Wisdom, Kevin Wilson, John R. Hershey
This work investigates alternation between spectral separation using masking-based networks and spatial separation using multichannel beamforming. In this framework, the spectral separation is performed using a mask-based deep network. The result of mask-based separation is used, in turn, to estimate a spatial beamformer. The output of the beamformer is fed back into another mask-based separation network. We explore multiple ways of computing time-varying covariance matrices to improve beamforming, including factorizing the spatial covariance into a time-varying amplitude component and time-invariant spatial component. For the subsequent mask-based filtering, we consider different modes, including masking the noisy input, masking the beamformer output, and a hybrid approach combining both. Our best method first uses spectral separation, then spatial beamforming, and finally a spectral post-filter, and demonstrates an average improvement of 2.8 dB over baseline mask-based separation, across four different reverberant speech...
more | pdf | html
Figures
None.
Tweets
StatsPapers: Alternating Between Spectral and Spatial Estimation for Speech Separation and Enhancement. https://t.co/OPL9QRnoWA
Github
None.
Youtube
None.
Other stats
Sample Sizes : None.
Authors: 4
Total Words: 0
Unqiue Words: 0

2.0 Mikeys
#4. Music theme recognition using CNN and self-attention
Manoj Sukhavasi, Sainath Adapa
We present an efficient architecture to detect mood/themes in music tracks on autotagging-moodtheme subset of the MTG-Jamendo dataset. Our approach consists of two blocks, a CNN block based on MobileNetV2 architecture and a self-attention block from Transformer architecture to capture long term temporal characteristics. We show that our proposed model produces a significant improvement over the baseline model. Our model (team name: AMLAG) achieves 4th place on PR-AUC-macro Leaderboard in MediaEval 2019: Emotion and Theme Recognition in Music Using Jamendo.
more | pdf | html
Figures
None.
Tweets
Memoirs: Music theme recognition using CNN and self-attention. https://t.co/X0EKxza016
Github

4th position solution to the MediaEval - The 2019 Emotion and Themes in Music using Jamendo

Repository: mediaeval-2019-moodtheme-detection
User: sainathadapa
Language: Jupyter Notebook
Stargazers: 5
Subscribers: 2
Forks: 0
Open Issues: 0
Youtube
None.
Other stats
Sample Sizes : None.
Authors: 2
Total Words: 2108
Unqiue Words: 1032

1.998 Mikeys
#5. VOICe: A Sound Event Detection Dataset For Generalizable Domain Adaptation
Shayan Gharib, Konstantinos Drossos, Eemi Fagerlund, Tuomas Virtanen
The performance of sound event detection methods can significantly degrade when they are used in unseen conditions (e.g. recording devices, ambient noise). Domain adaptation is a promising way to tackle this problem. In this paper, we present VOICe, the first dataset for the development and evaluation of domain adaptation methods for sound event detection. VOICe consists of mixtures with three different sound events ("baby crying", "glass breaking", and "gunshot"), which are over-imposed over three different categories of acoustic scenes: vehicle, outdoors, and indoors. Moreover, the mixtures are also offered without any background noise. VOICe is freely available online (https://doi.org/10.5281/zenodo.3514950). In addition, using an adversarial-based training method, we evaluate the performance of a domain adaptation method on VOICe.
more | pdf | html
Figures
None.
Tweets
Github
None.
Youtube
None.
Other stats
Sample Sizes : None.
Authors: 4
Total Words: 0
Unqiue Words: 0

1.998 Mikeys
#6. A Spatial Sampling Approach to Wave Field Synthesis: PBAP and Huygens Arrays
Julius O. Smith III
A simple approach to microphone- and speaker-arrays is described in which the microphone array is regarded as a sampling grid for the acoustic field, and the corresponding speaker-array is treated as a "spatial digital to analog converter" that reconstructs the acoustic field from its spatial samples. Advantages of this approach include ease of understanding and teaching, ease of deployment, effective practical guidelines for deployment, and significant computational savings in special cases. In particular, in the far-field case (acoustic sources many wavelengths away from a linear array of speakers) it is possible to quantize source angles slightly so that no processing per speaker is required beyond pure integer delay. Smoothly moving sources are obtained using well known delay-line interpolation techniques such as linear (cross-fading) and Lagrange (polynomial) interpolation between/among speakers. We call the far-field line-array case Planewave-Based Angle Panning (PBAP), in reference to the well-known Vector-Based Amplitude...
more | pdf | html
Figures
None.
Tweets
Github
None.
Youtube
None.
Other stats
Sample Sizes : None.
Authors: 1
Total Words: 0
Unqiue Words: 0

1.998 Mikeys
#7. N-HANS: Introducing the Augsburg Neuro-Holistic Audio-eNhancement System
Shuo Liu, Gil Keren, Björn Schuller
N-HANS is a Python toolkit for in-the-wild audio enhancement, including speech, music, and general audio denoising, separation, and selective noise or source suppression. The functionalities are realised based on two neural network models sharing the same architecture, but trained separately. The models are comprised of stacks of residual blocks, each conditioned on additional speech or environmental noise recordings for adapting to different unseen speakers or environments in real life. In addition to a Python API, a command line interface is provided to researchers and developers, both of which are documented at https://github.com/N-HANS/N-HANS. Experimental results indicate that N-HANS achieves outstanding performance, and ensure its reliable usage in real-life audio and speech-related tasks, reaching very high audio and speech quality.
more | pdf | html
Figures
None.
Tweets
arxivml: "N-HANS: Introducing the Augsburg Neuro-Holistic Audio-eNhancement System", Shuo Liu, Gil Keren, Björn Schuller https://t.co/AcMn3V3FmR
Github
None.
Youtube
None.
Other stats
Sample Sizes : None.
Authors: 3
Total Words: 0
Unqiue Words: 0

1.985 Mikeys
#8. Moving to Communicate, Moving to Interact: Patterns of Body Motion in Musical Duo Performance
Laura Bishop, Carlos Cancino-Chacón, Werner Goebl
Skilled ensemble musicians coordinate with high precision, even when improvising or interpreting loosely-defined notation. Successful coordination is supported primarily through shared attention to the musical output; however, musicians also interact visually, particularly when the musical timing is irregular. This study investigated the performance conditions that encourage visual signalling and interaction between ensemble members. Piano and clarinet duos rehearsed a new piece as their body motion was recorded. Analyses of head movement showed that performers communicated gesturally following held notes. Gesture patterns became more consistent as duos rehearsed, though consistency dropped again during a final performance given under no-visual-contact conditions. Movements were smoother and interperformer coordination was stronger during irregularly-timed passages than elsewhere in the piece, suggesting heightened visual interaction. Performers moved more after rehearsing than before, and more when they could see each other than...
more | pdf | html
Figures
None.
Tweets
Github
None.
Youtube
None.
Other stats
Sample Sizes : None.
Authors: 3
Total Words: 0
Unqiue Words: 0

About

Assert is a website where the best academic papers on arXiv (computer science, math, physics), bioRxiv (biology), BITSS (reproducibility), EarthArXiv (earth science), engrXiv (engineering), LawArXiv (law), PsyArXiv (psychology), SocArXiv (social science), and SportRxiv (sport research) bubble to the top each day.

Papers are scored (in real-time) based on how verifiable they are (as determined by their Github repos) and how interesting they are (based on Twitter).

To see top papers, follow us on twitter @assertpub_ (arXiv), @assert_pub (bioRxiv), and @assertpub_dev (everything else).

To see beautiful figures extracted from papers, follow us on Instagram.

Tracking 225,721 papers.

Search
Sort results based on if they are interesting or reproducible.
Interesting
Reproducible
Categories
All
Astrophysics
Cosmology and Nongalactic Astrophysics
Earth and Planetary Astrophysics
Astrophysics of Galaxies
High Energy Astrophysical Phenomena
Instrumentation and Methods for Astrophysics
Solar and Stellar Astrophysics
Condensed Matter
Disordered Systems and Neural Networks
Mesoscale and Nanoscale Physics
Materials Science
Other Condensed Matter
Quantum Gases
Soft Condensed Matter
Statistical Mechanics
Strongly Correlated Electrons
Superconductivity
Computer Science
Artificial Intelligence
Hardware Architecture
Computational Complexity
Computational Engineering, Finance, and Science
Computational Geometry
Computation and Language
Cryptography and Security
Computer Vision and Pattern Recognition
Computers and Society
Databases
Distributed, Parallel, and Cluster Computing
Digital Libraries
Discrete Mathematics
Data Structures and Algorithms
Emerging Technologies
Formal Languages and Automata Theory
General Literature
Graphics
Computer Science and Game Theory
Human-Computer Interaction
Information Retrieval
Information Theory
Machine Learning
Logic in Computer Science
Multiagent Systems
Multimedia
Mathematical Software
Numerical Analysis
Neural and Evolutionary Computing
Networking and Internet Architecture
Other Computer Science
Operating Systems
Performance
Programming Languages
Robotics
Symbolic Computation
Sound
Software Engineering
Social and Information Networks
Systems and Control
Economics
Econometrics
General Economics
Theoretical Economics
Electrical Engineering and Systems Science
Audio and Speech Processing
Image and Video Processing
Signal Processing
General Relativity and Quantum Cosmology
General Relativity and Quantum Cosmology
High Energy Physics - Experiment
High Energy Physics - Experiment
High Energy Physics - Lattice
High Energy Physics - Lattice
High Energy Physics - Phenomenology
High Energy Physics - Phenomenology
High Energy Physics - Theory
High Energy Physics - Theory
Mathematics
Commutative Algebra
Algebraic Geometry
Analysis of PDEs
Algebraic Topology
Classical Analysis and ODEs
Combinatorics
Category Theory
Complex Variables
Differential Geometry
Dynamical Systems
Functional Analysis
General Mathematics
General Topology
Group Theory
Geometric Topology
History and Overview
Information Theory
K-Theory and Homology
Logic
Metric Geometry
Mathematical Physics
Numerical Analysis
Number Theory
Operator Algebras
Optimization and Control
Probability
Quantum Algebra
Rings and Algebras
Representation Theory
Symplectic Geometry
Spectral Theory
Statistics Theory
Mathematical Physics
Mathematical Physics
Nonlinear Sciences
Adaptation and Self-Organizing Systems
Chaotic Dynamics
Cellular Automata and Lattice Gases
Pattern Formation and Solitons
Exactly Solvable and Integrable Systems
Nuclear Experiment
Nuclear Experiment
Nuclear Theory
Nuclear Theory
Physics
Accelerator Physics
Atmospheric and Oceanic Physics
Applied Physics
Atomic and Molecular Clusters
Atomic Physics
Biological Physics
Chemical Physics
Classical Physics
Computational Physics
Data Analysis, Statistics and Probability
Physics Education
Fluid Dynamics
General Physics
Geophysics
History and Philosophy of Physics
Instrumentation and Detectors
Medical Physics
Optics
Plasma Physics
Popular Physics
Physics and Society
Space Physics
Quantitative Biology
Biomolecules
Cell Behavior
Genomics
Molecular Networks
Neurons and Cognition
Other Quantitative Biology
Populations and Evolution
Quantitative Methods
Subcellular Processes
Tissues and Organs
Quantitative Finance
Computational Finance
Economics
General Finance
Mathematical Finance
Portfolio Management
Pricing of Securities
Risk Management
Statistical Finance
Trading and Market Microstructure
Quantum Physics
Quantum Physics
Statistics
Applications
Computation
Methodology
Machine Learning
Other Statistics
Statistics Theory
Feedback
Online
Stats
Tracking 225,721 papers.