Top 6 Arxiv Papers Today in Sound


2.035 Mikeys
#1. Prosody Transfer in Neural Text to Speech Using Global Pitch and Loudness Features
Siddharth Gururani, Kilol Gupta, Dhaval Shah, Zahra Shakeri, Jervis Pinto
This paper presents a simple yet effective method to achieve prosody transfer from a reference speech signal to synthesized speech. The main idea is to incorporate well-known acoustic correlates of prosody such as pitch and loudness contours of the reference speech into a modern neural text-to-speech (TTS) synthesizer such as Tacotron2 (TC2). More specifically, a small set of acoustic features are extracted from the reference audio and then used to condition a TC2 synthesizer. The trained model is evaluated using subjective listening tests and novel objective evaluations of prosody transfer are proposed. Listening tests show that the synthesized speech is rated as highly natural and that prosody is successfully transferred from the reference speech signal to the synthesized signal.
more | pdf | html
Figures
None.
Tweets
arxiv_cs_LG: Prosody Transfer in Neural Text to Speech Using Global Pitch and Loudness Features. Siddharth Gururani, Kilol Gupta, Dhaval Shah, Zahra Shakeri, and Jervis Pinto https://t.co/tgskamOtle
Memoirs: Prosody Transfer in Neural Text to Speech Using Global Pitch and Loudness Features. https://t.co/botmr1vryx
arxiv_cscl: Prosody Transfer in Neural Text to Speech Using Global Pitch and Loudness Features https://t.co/UuCL0NVh8K
Github
None.
Youtube
None.
Other stats
Sample Sizes : None.
Authors: 5
Total Words: 0
Unqiue Words: 0

2.012 Mikeys
#2. Joint DNN-Based Multichannel Reduction of Acoustic Echo, Reverberation and Noise
Guillaume Carbajal, Romain Serizel, Emmanuel Vincent, Eric Humbert
We consider the problem of simultaneous reduction of acoustic echo, reverberation and noise. In real scenarios, these distortion sources may occur simultaneously and reducing them implies combining the corresponding distortion-specific filters. As these filters interact with each other, they must be jointly optimized. We propose to model the target and residual signals after linear echo cancellation and dereverberation using a multichannel Gaussian modeling framework and to jointly represent their spectra by means of a neural network. We develop an iterative block-coordinate ascent algorithm to update all the filters. We evaluate our system on real recordings of acoustic echo, reverberation and noise acquired with a smart speaker in various situations. The proposed approach outperforms in terms of overall distortion a cascade of the individual approaches and a joint reduction approach which does not rely on a spectral model of the target and residual signals.
more | pdf | html
Figures
None.
Tweets
arxivml: "Joint DNN-Based Multichannel Reduction of Acoustic Echo, Reverberation and Noise", Guillaume Carbajal, Romain Seri… https://t.co/vlTKLD0I0l
arxiv_cs_LG: Joint DNN-Based Multichannel Reduction of Acoustic Echo, Reverberation and Noise. Guillaume Carbajal, Romain Serizel, Emmanuel Vincent, and Eric Humbert https://t.co/CccPz9Tlbq
StatsPapers: Joint DNN-Based Multichannel Reduction of Acoustic Echo, Reverberation and Noise. https://t.co/duyW8AUrQb
Github
None.
Youtube
None.
Other stats
Sample Sizes : None.
Authors: 4
Total Words: 0
Unqiue Words: 0

1.998 Mikeys
#3. Improving Universal Sound Separation Using Sound Classification
Efthymios Tzinis, Scott Wisdom, John R. Hershey, Aren Jansen, Daniel P. W. Ellis
Deep learning approaches have recently achieved impressive performance on both audio source separation and sound classification. Most audio source separation approaches focus only on separating sources belonging to a restricted domain of source classes, such as speech and music. However, recent work has demonstrated the possibility of "universal sound separation", which aims to separate acoustic sources from an open domain, regardless of their class. In this paper, we utilize the semantic information learned by sound classifier networks trained on a vast amount of diverse sounds to improve universal sound separation. In particular, we show that semantic embeddings extracted from a sound classifier can be used to condition a separation network, providing it with useful additional information. This approach is especially useful in an iterative setup, where source estimates from an initial separation stage and their corresponding classifier-derived embeddings are fed to a second separation network. By performing a thorough...
more | pdf | html
Figures
Tweets
BrundageBot: Improving Universal Sound Separation Using Sound Classification. Efthymios Tzinis, Scott Wisdom, John R. Hershey, Aren Jansen, and Daniel P. W. Ellis https://t.co/QybNf8A72x
StatsPapers: Improving Universal Sound Separation Using Sound Classification. https://t.co/vZZdYpamih
Github
None.
Youtube
None.
Other stats
Sample Sizes : None.
Authors: 5
Total Words: 4265
Unqiue Words: 1491

1.998 Mikeys
#4. Alternating Between Spectral and Spatial Estimation for Speech Separation and Enhancement
Zhong-Qiu Wang, Scott Wisdom, Kevin Wilson, John R. Hershey
This work investigates alternation between spectral separation using masking-based networks and spatial separation using multichannel beamforming. In this framework, the spectral separation is performed using a mask-based deep network. The result of mask-based separation is used, in turn, to estimate a spatial beamformer. The output of the beamformer is fed back into another mask-based separation network. We explore multiple ways of computing time-varying covariance matrices to improve beamforming, including factorizing the spatial covariance into a time-varying amplitude component and time-invariant spatial component. For the subsequent mask-based filtering, we consider different modes, including masking the noisy input, masking the beamformer output, and a hybrid approach combining both. Our best method first uses spectral separation, then spatial beamforming, and finally a spectral post-filter, and demonstrates an average improvement of 2.8 dB over baseline mask-based separation, across four different reverberant speech...
more | pdf | html
Figures
None.
Tweets
StatsPapers: Alternating Between Spectral and Spatial Estimation for Speech Separation and Enhancement. https://t.co/OPL9QRnoWA
Github
None.
Youtube
None.
Other stats
Sample Sizes : None.
Authors: 4
Total Words: 0
Unqiue Words: 0

1.996 Mikeys
#5. Moving to Communicate, Moving to Interact: Patterns of Body Motion in Musical Duo Performance
Laura Bishop, Carlos Cancino-Chacón, Werner Goebl
Skilled ensemble musicians coordinate with high precision, even when improvising or interpreting loosely-defined notation. Successful coordination is supported primarily through shared attention to the musical output; however, musicians also interact visually, particularly when the musical timing is irregular. This study investigated the performance conditions that encourage visual signalling and interaction between ensemble members. Piano and clarinet duos rehearsed a new piece as their body motion was recorded. Analyses of head movement showed that performers communicated gesturally following held notes. Gesture patterns became more consistent as duos rehearsed, though consistency dropped again during a final performance given under no-visual-contact conditions. Movements were smoother and interperformer coordination was stronger during irregularly-timed passages than elsewhere in the piece, suggesting heightened visual interaction. Performers moved more after rehearsing than before, and more when they could see each other than...
more | pdf | html
Figures
None.
Tweets
Github
None.
Youtube
None.
Other stats
Sample Sizes : None.
Authors: 3
Total Words: 0
Unqiue Words: 0

1.989 Mikeys
#6. Designing Virtual Soundscapes for Alzheimer's Disease Care
Frédéric Voisin
Sound environment is a prime source of conscious and unconscious information which allows listeners to place themselves, to communicate, to feel, to remember. The author describes the process of designing a new audio interactive apparatus for Alzheimer's care, in the context of an active multidisciplinary research project led by the author in collaboration with a longterm care centre (EHPAD) in Burgundy (France), a geriatrician, a gerontologist, psychologists and caregivers. The apparatus, named Madeleines Sonores in reference to Proust's madeleine, have provided virtual soundscapes sounding for a year for 14 elderly people hosted in the dedicated Alzheimer's unit of the care centre, 24/7. Empiric aspects of sonic interactivity are discussed in relation to dementia and to the activity of caring. Scientific studies are initiated to evaluate the benefits of such a disposal in Alzheimer's disease therapy and in caring dementia.
more | pdf | html
Figures
None.
Tweets
Github
None.
Youtube
None.
Other stats
Sample Sizes : None.
Authors: 1
Total Words: 0
Unqiue Words: 0

About

Assert is a website where the best academic papers on arXiv (computer science, math, physics), bioRxiv (biology), BITSS (reproducibility), EarthArXiv (earth science), engrXiv (engineering), LawArXiv (law), PsyArXiv (psychology), SocArXiv (social science), and SportRxiv (sport research) bubble to the top each day.

Papers are scored (in real-time) based on how verifiable they are (as determined by their Github repos) and how interesting they are (based on Twitter).

To see top papers, follow us on twitter @assertpub_ (arXiv), @assert_pub (bioRxiv), and @assertpub_dev (everything else).

To see beautiful figures extracted from papers, follow us on Instagram.

Tracking 226,515 papers.

Search
Sort results based on if they are interesting or reproducible.
Interesting
Reproducible
Categories
All
Astrophysics
Cosmology and Nongalactic Astrophysics
Earth and Planetary Astrophysics
Astrophysics of Galaxies
High Energy Astrophysical Phenomena
Instrumentation and Methods for Astrophysics
Solar and Stellar Astrophysics
Condensed Matter
Disordered Systems and Neural Networks
Mesoscale and Nanoscale Physics
Materials Science
Other Condensed Matter
Quantum Gases
Soft Condensed Matter
Statistical Mechanics
Strongly Correlated Electrons
Superconductivity
Computer Science
Artificial Intelligence
Hardware Architecture
Computational Complexity
Computational Engineering, Finance, and Science
Computational Geometry
Computation and Language
Cryptography and Security
Computer Vision and Pattern Recognition
Computers and Society
Databases
Distributed, Parallel, and Cluster Computing
Digital Libraries
Discrete Mathematics
Data Structures and Algorithms
Emerging Technologies
Formal Languages and Automata Theory
General Literature
Graphics
Computer Science and Game Theory
Human-Computer Interaction
Information Retrieval
Information Theory
Machine Learning
Logic in Computer Science
Multiagent Systems
Multimedia
Mathematical Software
Numerical Analysis
Neural and Evolutionary Computing
Networking and Internet Architecture
Other Computer Science
Operating Systems
Performance
Programming Languages
Robotics
Symbolic Computation
Sound
Software Engineering
Social and Information Networks
Systems and Control
Economics
Econometrics
General Economics
Theoretical Economics
Electrical Engineering and Systems Science
Audio and Speech Processing
Image and Video Processing
Signal Processing
General Relativity and Quantum Cosmology
General Relativity and Quantum Cosmology
High Energy Physics - Experiment
High Energy Physics - Experiment
High Energy Physics - Lattice
High Energy Physics - Lattice
High Energy Physics - Phenomenology
High Energy Physics - Phenomenology
High Energy Physics - Theory
High Energy Physics - Theory
Mathematics
Commutative Algebra
Algebraic Geometry
Analysis of PDEs
Algebraic Topology
Classical Analysis and ODEs
Combinatorics
Category Theory
Complex Variables
Differential Geometry
Dynamical Systems
Functional Analysis
General Mathematics
General Topology
Group Theory
Geometric Topology
History and Overview
Information Theory
K-Theory and Homology
Logic
Metric Geometry
Mathematical Physics
Numerical Analysis
Number Theory
Operator Algebras
Optimization and Control
Probability
Quantum Algebra
Rings and Algebras
Representation Theory
Symplectic Geometry
Spectral Theory
Statistics Theory
Mathematical Physics
Mathematical Physics
Nonlinear Sciences
Adaptation and Self-Organizing Systems
Chaotic Dynamics
Cellular Automata and Lattice Gases
Pattern Formation and Solitons
Exactly Solvable and Integrable Systems
Nuclear Experiment
Nuclear Experiment
Nuclear Theory
Nuclear Theory
Physics
Accelerator Physics
Atmospheric and Oceanic Physics
Applied Physics
Atomic and Molecular Clusters
Atomic Physics
Biological Physics
Chemical Physics
Classical Physics
Computational Physics
Data Analysis, Statistics and Probability
Physics Education
Fluid Dynamics
General Physics
Geophysics
History and Philosophy of Physics
Instrumentation and Detectors
Medical Physics
Optics
Plasma Physics
Popular Physics
Physics and Society
Space Physics
Quantitative Biology
Biomolecules
Cell Behavior
Genomics
Molecular Networks
Neurons and Cognition
Other Quantitative Biology
Populations and Evolution
Quantitative Methods
Subcellular Processes
Tissues and Organs
Quantitative Finance
Computational Finance
Economics
General Finance
Mathematical Finance
Portfolio Management
Pricing of Securities
Risk Management
Statistical Finance
Trading and Market Microstructure
Quantum Physics
Quantum Physics
Statistics
Applications
Computation
Methodology
Machine Learning
Other Statistics
Statistics Theory
Feedback
Online
Stats
Tracking 226,515 papers.