Fundamental frequency is one of the most important characteristics of speech
and audio signals. Harmonic model-based fundamental frequency estimators offer
a higher estimation accuracy and robustness against noise than the widely used
autocorrelation-based methods. However, the traditional harmonic model-based
estimators do not take the temporal smoothness of the fundamental frequency,
the model order, and the voicing into account as they process each data segment
independently. In this paper, a fully Bayesian fundamental frequency tracking
algorithm based on the harmonic model and a first-order Markov process model is
proposed. Smoothness priors are imposed on the fundamental frequencies, model
orders, and voicing using first-order Markov process models. Using these Markov
models, fundamental frequency estimation and voicing detection errors can be
reduced. Using the harmonic model, the proposed fundamental frequency tracker
has an improved robustness to noise. An analytical form of the likelihood
function, which can be computed...

more |
pdf
| html
None.

arxiv_cs_LG:
Bayesian Pitch Tracking Based on the Harmonic Model. Liming Shi, Jesper Kjaer Nielsen, Jesper Rindom Jensen, Max A. Little, and Mads Graesboll Christensen https://t.co/qSDx23zItE

None.

None.

Sample Sizes : None.

Authors: 5

Total Words: 0

Unqiue Words: 0

Record companies invest billions of dollars in new talent around the globe
each year. Gaining insight into what actually makes a hit song would provide
tremendous benefits for the music industry. In this research we tackle this
question by focussing on the dance hit song classification problem. A database
of dance hit songs from 1985 until 2013 is built, including basic musical
features, as well as more advanced features that capture a temporal aspect. A
number of different classifiers are used to build and test dance hit prediction
models. The resulting best model has a good performance when predicting whether
a song is a "top 10" dance hit versus a lower listed position.

more |
pdf
| html
None.

arxivml:
"Dance Hit Song Prediction",
Dorien herremans, David Martens, Kenneth Sörensen
https://t.co/IOjmK4YxG2

StatsPapers:
Dance Hit Song Prediction. https://t.co/qe3NFQk0Y5

None.

None.

Sample Sizes : None.

Authors: 3

Total Words: 0

Unqiue Words: 0

Speech separation has been studied widely for single-channel close-talk
recordings over the past few years; developed solutions are mostly in
frequency-domain. Recently, a raw audio waveform separation network (TasNet)
introduced for single-channel data, with achieving high Si-SNR (scale-invariant
source-to-noise ratio) and SDR (source-to-distortion ratio) comparing against
the state-of-the-art solution in frequency-domain. In this study, we
incorporate effective components of TasNet into a frequency-domain separation
method. We compare both for alternative scenarios. We introduce a solution for
directly optimizing the separation criterion in frequency-domain networks. In
addition to speech separation objective and subjective measurements, we
evaluate the separation performance on a speech recognition task as well. We
study the speech separation problem for far-filed data (more similar to
naturalistic audio streams) and develop multi-channel solutions for both
frequency and time-domain separators with utilizing spectral, spatial...

more |
pdf
| html
None.

arxivml:
"A comprehensive study of speech separation: spectrogram vs waveform separation",
Fahimeh Bahmaninezhad, Jian Wu, R…
https://t.co/EqIsjLivA8

None.

None.

Sample Sizes : [257, 25713, 25715, 40, 40]

Authors: 7

Total Words: 4291

Unqiue Words: 1559

We extend frequency-domain blind source separation based on independent
vector analysis to the case where there are more microphones than sources. The
signal is modelled as non-Gaussian sources in a Gaussian background. The
proposed algorithm is based on a parametrization of the demixing matrix
decreasing the number of parameters to estimate. Furthermore, orthogonal
constraints between the signal and background subspaces are imposed to
regularize the separation. The problem can then be posed as a constrained
likelihood maximization. We propose efficient alternating updates guaranteed to
converge to a stationary point of the cost function. The performance of the
algorithm is assessed on simulated signals. We find that the separation
performance is on par with that of the conventional determined algorithm at a
fraction of the computational cost.

more |
pdf
| html
None.

None.

None.

Sample Sizes : None.

Authors: 2

Total Words: 4079

Unqiue Words: 1504

With the aim of constructing a biologically plausible model of machine
listening, we study the representation of a multicomponent stationary signal by
a wavelet scattering network. First, we show that renormalizing second-order
nodes by their first-order parents gives a simple numerical criterion to
establish whether two neighboring components will interfere psychoacoustically.
Secondly, we generalize the `one or two components' framework to three sine
waves or more, and show that a network of depth $M = \log_2 N$ suffices to
characterize the relative amplitudes of the first $N$ terms in a Fourier
series, while enjoying properties of invariance to frequency transposition and
component-wise phase shifts.

more |
pdf
| html
None.

None.

None.

Sample Sizes : None.

Authors: 1

Total Words: 0

Unqiue Words: 0

This paper presents the sound event localization and detection (SELD) task
setup for the DCASE 2019 challenge. The goal of the SELD task is to detect the
temporal activities of a known set of sound event classes, and further localize
them in space when active. As part of the challenge, a synthesized dataset with
each sound event associated with a spatial coordinate represented using azimuth
and elevation angles is provided. These sound events are spatialized using
real-life impulse responses collected at multiple spatial coordinates in five
different rooms with varying dimensions and material properties. A baseline
SELD method employing a convolutional recurrent neural network is used to
generate benchmark scores for this reverberant dataset. The benchmark scores
are obtained using the recommended cross-validation setup.

more |
pdf
| html
None.

None.

None.

Sample Sizes : None.

Authors: 3

Total Words: 0

Unqiue Words: 0

Bioacoustic sensors, sometimes known as autonomous recording units (ARUs),
can record sounds of wildlife over long periods of time in scalable and
minimally invasive ways. Deriving per-species abundance estimates from these
sensors requires detection, classification, and quantification of animal
vocalizations as individual acoustic events. Yet, variability in ambient noise,
both over time and across sensors, hinders the reliability of current automated
systems for sound event detection (SED), such as convolutional neural networks
(CNN) in the time-frequency domain. In this article, we develop, benchmark, and
combine several machine listening techniques to improve the generalizability of
SED models across heterogeneous acoustic environments. As a case study, we
consider the problem of detecting avian flight calls from a ten-hour recording
of nocturnal bird migration, recorded by a network of six ARUs in the presence
of heterogeneous background noise. Starting from a CNN yielding
state-of-the-art accuracy on this task, we introduce...

more |
pdf
| html
None.

None.

None.

Sample Sizes : None.

Authors: 5

Total Words: 0

Unqiue Words: 0

Assert is a website where the best academic papers on arXiv (computer science, math, physics), bioRxiv (biology), BITSS (reproducibility), EarthArXiv (earth science), engrXiv (engineering), LawArXiv (law), PsyArXiv (psychology), SocArXiv (social science), and SportRxiv (sport research) bubble to the top each day.

Papers are scored (in real-time) based on how verifiable they are (as determined by their Github repos) and how interesting they are (based on Twitter).

To see top papers, follow us on twitter @assertpub_ (arXiv), @assert_pub (bioRxiv), and @assertpub_dev (everything else).

To see beautiful figures extracted from papers, follow us on Instagram.

*Tracking 129,961 papers.*

Sort results based on if they are interesting or reproducible.

Interesting

Reproducible