Top 10 Arxiv Papers Today in Sound


2.043 Mikeys
#1. Neural Wavetable: a playable wavetable synthesizer using neural networks
Lamtharn Hantrakul, Li-Chia Yang
We present Neural Wavetable, a proof-of-concept wavetable synthesizer that uses neural networks to generate playable wavetables. The system can produce new, distinct waveforms through the interpolation of traditional wavetables in an autoencoder's latent space. It is available as a VST/AU plugin for use in a Digital Audio Workstation.
more | pdf | html
Figures
Tweets
BrundageBot: Neural Wavetable: a playable wavetable synthesizer using neural networks. Lamtharn Hantrakul and Li-Chia Yang https://t.co/0JT6h7TX5j
arxivml: "Neural Wavetable: a playable wavetable synthesizer using neural networks", Lamtharn Hantrakul, Li-Chia Yang https://t.co/Q50z5WpRML
nmfeeds: [O] https://t.co/fws5xZourt Neural Wavetable: a playable wavetable synthesizer using neural networks. We present Neural Wa...
Memoirs: Neural Wavetable: a playable wavetable synthesizer using neural networks. https://t.co/0lGfI8jcVR
Github

Neural Wavetable: a playable wavetable synthesizer using neural networks

Repository: Neural_Wavetable_Synthesizer
User: RichardYang40148
Language: C++
Stargazers: 15
Subscribers: 3
Forks: 0
Open Issues: 0
Youtube
None.
Other stats
Sample Sizes : None.
Authors: 2
Total Words: 871
Unqiue Words: 483

2.004 Mikeys
#2. To bee or not to bee: Investigating machine learning approaches for beehive sound recognition
Inês Nolasco, Emmanouil Benetos
In this work, we aim to explore the potential of machine learning methods to the problem of beehive sound recognition. A major contribution of this work is the creation and release of annotations for a selection of beehive recordings. By experimenting with both support vector machines and convolutional neural networks, we explore important aspects to be considered in the development of beehive sound recognition systems using machine learning approaches.
more | pdf | html
Figures
Tweets
ComputerPapers: To bee or not to bee: Investigating machine learning approaches for beehive sound recognition. https://t.co/TumURYKoqj
Github
None.
Youtube
None.
Other stats
Sample Sizes : None.
Authors: 2
Total Words: 4228
Unqiue Words: 1410

2.004 Mikeys
#3. Audio-based identification of beehive states
Inês Nolasco, Alessandro Terenzi, Stefania Cecchi, Simone Orcioni, Helen L. Bear, Emmanouil Benetos
The absence of the queen in a beehive is a very strong indicator of the need for beekeeper intervention. Manually searching for the queen is an arduous recurrent task for beekeepers that disrupts the normal life cycle of the beehive and can be a source of stress for bees. Sound is an indicator for signalling different states of the beehive, including the absence of the queen bee. In this work, we apply machine learning methods to automatically recognise different states in a beehive using audio as input. % The system is built on top of a method for beehive sound recognition in order to detect bee sounds from other external sounds. We investigate both support vector machines and convolutional neural networks for beehive state recognition, using audio data of beehives collected from the NU-Hive project. Results indicate the potential of machine learning methods as well as the challenges of generalizing the system to new hives.
more | pdf | html
Figures
Tweets
ComputerPapers: Audio-based identification of beehive states. https://t.co/2V0Qy5ATsJ
Github
None.
Youtube
None.
Other stats
Sample Sizes : None.
Authors: 6
Total Words: 4414
Unqiue Words: 1549

0.0 Mikeys
#4. Efficient Neural Audio Synthesis
Nal Kalchbrenner, Erich Elsen, Karen Simonyan, Seb Noury, Norman Casagrande, Edward Lockhart, Florian Stimberg, Aaron van den Oord, Sander Dieleman, Koray Kavukcuoglu
Sequential models achieve state-of-the-art results in audio, visual and textual domains with respect to both estimating the data distribution and generating high-quality samples. Efficient sampling for this class of models has however remained an elusive problem. With a focus on text-to-speech synthesis, we describe a set of general techniques for reducing sampling time while maintaining high output quality. We first describe a single-layer recurrent neural network, the WaveRNN, with a dual softmax layer that matches the quality of the state-of-the-art WaveNet model. The compact form of the network makes it possible to generate 24kHz 16-bit audio 4x faster than real time on a GPU. Second, we apply a weight pruning technique to reduce the number of weights in the WaveRNN. We find that, for a constant number of parameters, large sparse networks perform better than small dense networks and this relationship holds for sparsity levels beyond 96%. The small number of weights in a Sparse WaveRNN makes it possible to sample high-fidelity...
more | pdf | html
Figures
None.
Tweets
hiho_karuta: WaveRNNの論文、定量評価にNLLの値を使ってるんだけど、そういえばどうやってDual Softmaxから単一のNLLを得てるんだろう。Dualだから、NLLが2つ得られる理解だったけど・・・。 https://t.co/IFQMENUAWG
Swall0wTech: [1802.08435] Efficient Neural Audio Synthesis https://t.co/FYt9Sof2p7
Github
None.
Youtube
None.
Other stats
Sample Sizes : None.
Authors: 10
Total Words: 7272
Unqiue Words: 2122

0.0 Mikeys
#5. Conditioning Deep Generative Raw Audio Models for Structured Automatic Music
Rachel Manzelli, Vijay Thakkar, Ali Siahkamari, Brian Kulis
Existing automatic music generation approaches that feature deep learning can be broadly classified into two types: raw audio models and symbolic models. Symbolic models, which train and generate at the note level, are currently the more prevalent approach; these models can capture long-range dependencies of melodic structure, but fail to grasp the nuances and richness of raw audio generations. Raw audio models, such as DeepMind's WaveNet, train directly on sampled audio waveforms, allowing them to produce realistic-sounding, albeit unstructured music. In this paper, we propose an automatic music generation methodology combining both of these approaches to create structured, realistic-sounding compositions. We consider a Long Short Term Memory network to learn the melodic structure of different styles of music, and then use the unique symbolic generations from this model as a conditioning input to a WaveNet-based raw audio generator, creating a model for automatic, novel music. We then evaluate this approach by showcasing results...
more | pdf | html
Figures
Tweets
satory074: » [1806.09905] Conditioning Deep Generative Raw Audio Models for Structured Automatic Music https://t.co/WTPZbyt0vG
nmfeeds: [O] https://t.co/4NUnvcnNyN Conditioning Deep Generative Raw Audio Models for Structured Automatic Music. Existing automat...
Memoirs: Conditioning Deep Generative Raw Audio Models for Structured Automatic Music. https://t.co/M3w47o3JSe
Github
None.
Youtube
None.
Other stats
Sample Sizes : None.
Authors: 4
Total Words: 4998
Unqiue Words: 1675

0.0 Mikeys
#6. Exploring End-to-End Techniques for Low-Resource Speech Recognition
Vladimir Bataev, Maxim Korenevsky, Ivan Medennikov, Alexander Zatvornitskiy
In this work we present simple grapheme-based system for low-resource speech recognition using Babel data for Turkish spontaneous speech (80 hours). We have investigated different neural network architectures performance, including fully-convolutional, recurrent and ResNet with GRU. Different features and normalization techniques are compared as well. We also proposed CTC-loss modification using segmentation during training, which leads to improvement while decoding with small beam size. Our best model achieved word error rate of 45.8%, which is the best reported result for end-to-end systems using in-domain data for this task, according to our knowledge.
more | pdf | html
Figures
None.
Tweets
nmfeeds: [O] https://t.co/ox84o1hZDc Exploring End-to-End Techniques for Low-Resource Speech Recognition. In this work we present s...
nmfeeds: [CL] https://t.co/ox84o1hZDc Exploring End-to-End Techniques for Low-Resource Speech Recognition. In this work we present ...
languageML: RT @arxiv_cscl: Exploring End-to-End Techniques for Low-Resource Speech Recognition https://t.co/FhaDyHpaYi
pywirrarika: RT @arxiv_cscl: Exploring End-to-End Techniques for Low-Resource Speech Recognition https://t.co/FhaDyHpaYi
Github
None.
Youtube
None.
Other stats
Sample Sizes : None.
Authors: 4
Total Words: 3585
Unqiue Words: 1473

0.0 Mikeys
#7. Foreign English Accent Adjustment by Learning Phonetic Patterns
Fedor Kitashov, Elizaveta Svitanko, Debojyoti Dutta
State-of-the-art automatic speech recognition (ASR) systems struggle with the lack of data for rare accents. For sufficiently large datasets, neural engines tend to outshine statistical models in most natural language processing problems. However, a speech accent remains a challenge for both approaches. Phonologists manually create general rules describing a speaker's accent, but their results remain underutilized. In this paper, we propose a model that automatically retrieves phonological generalizations from a small dataset. This method leverages the difference in pronunciation between a particular dialect and General American English (GAE) and creates new accented samples of words. The proposed model is able to learn all generalizations that previously were manually obtained by phonologists. We use this statistical method to generate a million phonological variations of words from the CMU Pronouncing Dictionary and train a sequence-to-sequence RNN to recognize accented words with 59% accuracy.
more | pdf | html
Figures
None.
Tweets
quantum_tunnel: Foreign English Accent Adjustment by Learning Phonetic Patterns. (arXiv:1807.03625v1 [https://t.co/N8U5whzgZw]) https://t.co/juhJXs2tRf
Github
None.
Youtube
None.
Other stats
Sample Sizes : None.
Authors: 3
Total Words: 2415
Unqiue Words: 1124

0.0 Mikeys
#8. The NES Music Database: A multi-instrumental dataset with expressive performance attributes
Chris Donahue, Huanru Henry Mao, Julian McAuley
Existing research on music generation focuses on composition, but often ignores the expressive performance characteristics required for plausible renditions of resultant pieces. In this paper, we introduce the Nintendo Entertainment System Music Database (NES-MDB), a large corpus allowing for separate examination of the tasks of composition and performance. NES-MDB contains thousands of multi-instrumental songs composed for playback by the compositionally-constrained NES audio synthesizer. For each song, the dataset contains a musical score for four instrument voices as well as expressive attributes for the dynamics and timbre of each voice. Unlike datasets comprised of General MIDI files, NES-MDB includes all of the information needed to render exact acoustic performances of the original compositions. Alongside the dataset, we provide a tool that renders generated compositions as NES-style audio by emulating the device's audio processor. Additionally, we establish baselines for the tasks of composition, which consists of learning...
more | pdf | html
Figures
Tweets
schmilblick42: @chrisdonahuey presenting work @ismir2018 on "The NES Music Database: A multi-instrumental dataset with expressive performance attributes." Yes, that's NES for Nintendo Entertainment System! Paper and dataset links below. https://t.co/fFZ4hfWQmR https://t.co/oBywaRdOOI https://t.co/AUGvbcVPCQ
Github

The NES Music Database: use machine learning to compose music for the Nintendo Entertainment System!

Repository: nesmdb
User: chrisdonahue
Language: Python
Stargazers: 187
Subscribers: 9
Forks: 16
Open Issues: 2
Youtube
None.
Other stats
Sample Sizes : None.
Authors: 3
Total Words: 5685
Unqiue Words: 2216

0.0 Mikeys
#9. Subjective and objective experiments on the influence of speaker's gender on the unvoiced segments
A Madhavaraj, T V Ananthapadmanabha, A G Ramakrishnan
Subjective and objective experiments are conducted to understand the extent to which a speaker's gender influences the acoustics of unvoiced (U) sounds. U segments of utterances are replaced by the corresponding segments of a speaker of opposite gender to prepare modified utterances. Humans are asked to judge if the modified utterance is spoken by one or two speakers. The experiments show that human subjects are unable to distinguish the modified from the original. Thus, listeners are able to identify the U segments irrespective of the gender, which may be based on some speaker-independent invariant acoustic cues. To test if this finding is purely a perceptual phenomenon, objective experiments are also conducted. Gender specific HMM based phoneme recognition systems are trained using the TIMIT training set and tested on (a) utterances spoken by the same gender (b) utterances spoken by the opposite gender and (c) the modified utterances of the test set. As expected, the performance is the highest for case (a) and the lowest for...
more | pdf | html
Figures
Tweets
Github
None.
Youtube
None.
Other stats
Sample Sizes : None.
Authors: 3
Total Words: 3671
Unqiue Words: 1184

0.0 Mikeys
#10. FloWaveNet : A Generative Flow for Raw Audio
Sungwon Kim, Sang-gil Lee, Jongyoon Song, Sungroh Yoon
Most of modern text-to-speech architectures use a WaveNet vocoder for synthesizing a high-fidelity waveform audio, but there has been a limitation for practical applications due to its slow autoregressive sampling scheme. A recently suggested Parallel WaveNet has achieved a real-time audio synthesis by incorporating Inverse Autogressive Flow (IAF) for parallel sampling. However, the Parallel WaveNet requires a two-stage training pipeline with a well-trained teacher network and is prone to mode collapsing if using a probability distillation training only. We propose FloWaveNet, a flow-based generative model for raw audio synthesis. FloWaveNet requires only a single maximum likelihood loss without any additional auxiliary terms and is inherently parallel due to the flow-based transformation. The model can efficiently sample the raw audio in real-time with a clarity comparable to the original WaveNet and ClariNet. Codes and samples for all models including our FloWaveNet is available via GitHub: https://github.com/ksw0306/FloWaveNet
more | pdf | html
Figures
None.
Tweets
PyTorch: "FloWaveNet : A Generative Flow for Raw Audio" from Seoul National University Their research was scooped by a few days, yet, they spent the time to write and release a paper and code. Great spirit folks! Paper: https://t.co/G7GIcld23q Code: https://t.co/oQEUV5WqAx https://t.co/Hx1SLCoG4P
FeitengLi: #FlowWaveNet FloWaveNet : A Generative Flow for Raw Audio https://t.co/wlEQqcb6La
ballforest: RT @FeitengLi: #FlowWaveNet FloWaveNet : A Generative Flow for Raw Audio https://t.co/wlEQqcb6La
syoyo: RT @FeitengLi: #FlowWaveNet FloWaveNet : A Generative Flow for Raw Audio https://t.co/wlEQqcb6La
heiga_zen: RT @FeitengLi: #FlowWaveNet FloWaveNet : A Generative Flow for Raw Audio https://t.co/wlEQqcb6La
r9y9: RT @FeitengLi: #FlowWaveNet FloWaveNet : A Generative Flow for Raw Audio https://t.co/wlEQqcb6La
SythonUK: RT @FeitengLi: #FlowWaveNet FloWaveNet : A Generative Flow for Raw Audio https://t.co/wlEQqcb6La
garygarywang: RT @FeitengLi: #FlowWaveNet FloWaveNet : A Generative Flow for Raw Audio https://t.co/wlEQqcb6La
10_1100011_01: RT @FeitengLi: #FlowWaveNet FloWaveNet : A Generative Flow for Raw Audio https://t.co/wlEQqcb6La
Github

A Pytorch implementation of "FloWaveNet: A Generative Flow for Raw Audio"

Repository: FloWaveNet
User: ksw0306
Language: Python
Stargazers: 219
Subscribers: 17
Forks: 42
Open Issues: 2
Youtube
None.
Other stats
Sample Sizes : None.
Authors: 4
Total Words: 3430
Unqiue Words: 1205

About

Assert is a website where the best academic papers on arXiv (computer science, math, physics), bioRxiv (biology), BITSS (reproducibility), EarthArXiv (earth science), engrXiv (engineering), LawArXiv (law), PsyArXiv (psychology), SocArXiv (social science), and SportRxiv (sport research) bubble to the top each day.

Papers are scored (in real-time) based on how verifiable they are (as determined by their Github repos) and how interesting they are (based on Twitter).

To see top papers, follow us on twitter @assertpub_ (arXiv), @assert_pub (bioRxiv), and @assertpub_dev (everything else).

To see beautiful figures extracted from papers, follow us on Instagram.

Tracking 57,756 papers.

Search
Sort results based on if they are interesting or reproducible.
Interesting
Reproducible
Categories
All
Astrophysics
Cosmology and Nongalactic Astrophysics
Earth and Planetary Astrophysics
Astrophysics of Galaxies
High Energy Astrophysical Phenomena
Instrumentation and Methods for Astrophysics
Solar and Stellar Astrophysics
Condensed Matter
Disordered Systems and Neural Networks
Mesoscale and Nanoscale Physics
Materials Science
Other Condensed Matter
Quantum Gases
Soft Condensed Matter
Statistical Mechanics
Strongly Correlated Electrons
Superconductivity
Computer Science
Artificial Intelligence
Hardware Architecture
Computational Complexity
Computational Engineering, Finance, and Science
Computational Geometry
Computation and Language
Cryptography and Security
Computer Vision and Pattern Recognition
Computers and Society
Databases
Distributed, Parallel, and Cluster Computing
Digital Libraries
Discrete Mathematics
Data Structures and Algorithms
Emerging Technologies
Formal Languages and Automata Theory
General Literature
Graphics
Computer Science and Game Theory
Human-Computer Interaction
Information Retrieval
Information Theory
Machine Learning
Logic in Computer Science
Multiagent Systems
Multimedia
Mathematical Software
Numerical Analysis
Neural and Evolutionary Computing
Networking and Internet Architecture
Other Computer Science
Operating Systems
Performance
Programming Languages
Robotics
Symbolic Computation
Sound
Software Engineering
Social and Information Networks
Systems and Control
Economics
Econometrics
General Economics
Theoretical Economics
Electrical Engineering and Systems Science
Audio and Speech Processing
Image and Video Processing
Signal Processing
General Relativity and Quantum Cosmology
General Relativity and Quantum Cosmology
High Energy Physics - Experiment
High Energy Physics - Experiment
High Energy Physics - Lattice
High Energy Physics - Lattice
High Energy Physics - Phenomenology
High Energy Physics - Phenomenology
High Energy Physics - Theory
High Energy Physics - Theory
Mathematics
Commutative Algebra
Algebraic Geometry
Analysis of PDEs
Algebraic Topology
Classical Analysis and ODEs
Combinatorics
Category Theory
Complex Variables
Differential Geometry
Dynamical Systems
Functional Analysis
General Mathematics
General Topology
Group Theory
Geometric Topology
History and Overview
Information Theory
K-Theory and Homology
Logic
Metric Geometry
Mathematical Physics
Numerical Analysis
Number Theory
Operator Algebras
Optimization and Control
Probability
Quantum Algebra
Rings and Algebras
Representation Theory
Symplectic Geometry
Spectral Theory
Statistics Theory
Mathematical Physics
Mathematical Physics
Nonlinear Sciences
Adaptation and Self-Organizing Systems
Chaotic Dynamics
Cellular Automata and Lattice Gases
Pattern Formation and Solitons
Exactly Solvable and Integrable Systems
Nuclear Experiment
Nuclear Experiment
Nuclear Theory
Nuclear Theory
Physics
Accelerator Physics
Atmospheric and Oceanic Physics
Applied Physics
Atomic and Molecular Clusters
Atomic Physics
Biological Physics
Chemical Physics
Classical Physics
Computational Physics
Data Analysis, Statistics and Probability
Physics Education
Fluid Dynamics
General Physics
Geophysics
History and Philosophy of Physics
Instrumentation and Detectors
Medical Physics
Optics
Plasma Physics
Popular Physics
Physics and Society
Space Physics
Quantitative Biology
Biomolecules
Cell Behavior
Genomics
Molecular Networks
Neurons and Cognition
Other Quantitative Biology
Populations and Evolution
Quantitative Methods
Subcellular Processes
Tissues and Organs
Quantitative Finance
Computational Finance
Economics
General Finance
Mathematical Finance
Portfolio Management
Pricing of Securities
Risk Management
Statistical Finance
Trading and Market Microstructure
Quantum Physics
Quantum Physics
Statistics
Applications
Computation
Methodology
Machine Learning
Other Statistics
Statistics Theory
Feedback
Online
Stats
Tracking 57,756 papers.