##### #1. Joint Acoustic and Class Inference for Weakly Supervised Sound Event Detection
###### Sandeep Kothinti, Keisuke Imoto, Debmalya Chakrabarty, Gregory Sell, Shinji Watanabe, Mounya Elhilali
Sound event detection is a challenging task, especially for scenes with multiple simultaneous events. While event classification methods tend to be fairly accurate, event localization presents additional challenges, especially when large amounts of labeled data are not available. Task4 of the 2018 DCASE challenge presents an event detection task that requires accuracy in both segmentation and recognition of events while providing only weakly labeled training data. Supervised methods can produce accurate event labels but are limited in event segmentation when training data lacks event timestamps. On the other hand, unsupervised methods that model the acoustic properties of the audio can produce accurate event boundaries but are not guided by the characteristics of event classes and sound categories. We present a hybrid approach that combines an acoustic-driven event boundary detection and a supervised label inference using a deep neural network. This framework leverages benefits of both unsupervised and supervised methodologies and...
##### #2. Can We Use Speaker Recognition Technology to Attack Itself? Enhancing Mimicry Attacks Using Automatic Target Speaker Selection
###### Tomi Kinnunen, Rosa González Hautamäki, Ville Vestman, Md Sahidullah
We consider technology-assisted mimicry attacks in the context of automatic speaker verification (ASV). We use ASV itself to select targeted speakers to be attacked by human-based mimicry. We recorded 6 naive mimics for whom we select target celebrities from VoxCeleb1 and VoxCeleb2 corpora (7,365 potential targets) using an i-vector system. The attacker attempts to mimic the selected target, with the utterances subjected to ASV tests using an independently developed x-vector system. Our main finding is negative: even if some of the attacker scores against the target speakers were slightly increased, our mimics did not succeed in spoofing the x-vector system. Interestingly, however, the relative ordering of the selected targets (closest, furthest, median) are consistent between the systems, which suggests some level of transferability between the systems.
##### #3. Plug-In Stochastic Gradient Method
###### Yu Sun, Brendt Wohlberg, Ulugbek S. Kamilov
Plug-and-play priors (PnP) is a popular framework for regularized signal reconstruction by using advanced denoisers within an iterative algorithm. In this paper, we discuss our recent online variant of PnP that uses only a subset of measurements at every iteration, which makes it scalable to very large datasets. We additionally present novel convergence results for both batch and online PnP algorithms.
##### #4. A new insight into the secondary path modeling problem in active noise control
###### Meiling Hu, Jing Lu, Jun Wang, Jinpei Xue
Secondary path modelling is a critical problem in active noise control (ANC). Usually either an off-line or an online modeling process with additive random noise is required to ensure the convergence of the ANC system. In this paper, the close relationship between the feedforward ANC system and the stereo acoustic echo cancellation (SAEC) system is revealed. Accordingly, the convergence behavior of the ANC system can be analyzed by investigating the joint auto-correlation matrix of the reference and the filtered reference signal. It is proved that the straightforward secondary path modeling can be carried out without the injection of any additive noise as long as the control filter is of a sufficient long length. Furthermore, by taking advantage of the time-varying characteristic of the control filter, effective modeling of the secondary path can be even achieved without any restriction on the control filter length. Simulation and experimental results both validate the theoretical analysis.
##### #5. Feature Analysis for Classification of Physical Actions using surface EMG Data
###### Anish C. Turlapaty, Balakrishna Gokaraju
Based on recent health statistics, there are several thousands of people with limb disability and gait disorders that require a medical assistance. A robot assisted rehabilitation therapy can help them recover and return to a normal life. In this scenario, a successful methodology is to use the EMG signal based information to control the support robotics. For this mechanism to function properly, the EMG signal from the muscles has to be sensed and then the biological motor intention has to be decoded and finally the resulting information has to be communicated to the controller of the robot. An accurate detection of the motor intention requires a pattern recognition based categorical identification. Hence in this paper, we propose an improved classification framework by identification of the relevant features that drive the pattern recognition algorithm. Major contributions include a set of modified spectral moment based features and another relevant inter-channel correlation feature that contribute to an improved classification...
##### #6. Codeword Position Index based Sparse Code Multiple Access System
###### Ke Lai, Lei Wen, Jing Lei, Gaojie Chen, Pei Xiao, Amine Maaref
In this letter, a novel variation of sparse code multiple access (SCMA), called codeword position index based SCMA (CPI-SCMA), is proposed. In this scheme, the information is transmitted not only by the codewords in M point SCMA codebook, but also by the indices of the codeword positions in a data block. As such, both the power and transmission efficiency (TE) can be improved, moreover, CPI-SCMA can achieve a better error rate performance compare to conventional SCMA (C-SCMA).
##### #7. MCE 2018: The 1st Multi-target Speaker Detection and Identification Challenge Evaluation (MCE) Plan, Dataset and Baseline System
###### Suwon Shon, Najim Dehak, Douglas Reynolds, James Glass
The Multitarget Challenge aims to assess how well current speech technology is able to determine whether or not a recorded utterance was spoken by one of a large number of 'blacklisted' speakers. It is a form of multi-target speaker detection based on real-world telephone conversations. Data recordings are generated from call center customer-agent conversations. Each conversation is represented by a single i-vector. Given a pool of training and development data from non-Blacklist and Blacklist speakers, the task is to measure how accurately one can detect 1) whether a test recording is spoken by a Blacklist speaker, and 2) which specific Blacklist speaker was talking.
##### #8. Pattern Synthesis via Complex-Coefficient Weight Vector Orthogonal Decomposition--Part I: Fundamentals
###### Xuejing Zhang, Zishu He, Xuepan Zhang
This paper presents a new array response control scheme named complex-coefficient weight vector orthogonal decomposition ($\textrm{C}^2\textrm{-WORD}$) and its application to pattern synthesis. The proposed $\textrm{C}^2\textrm{-WORD}$ algorithm is a modified version of the existing WORD approach. We extend WORD by allowing a complex-valued combining coefficient in $\textrm{C}^2\textrm{-WORD}$, and find the optimal combining coefficient by maximizing white noise gain (WNG). Our algorithm offers a closed-from expression to precisely control the array response level of a given point starting from an arbitrarily-specified weight vector. In addition, it results less pattern variations on the uncontrolled angles. Elaborate analysis shows that the proposed $\textrm{C}^2\textrm{-WORD}$ scheme performs at least as good as the state-of-the-art $\textrm{A}^\textrm{2}\textrm{RC}$ or WORD approach. By applying $\textrm{C}^2\textrm{-WORD}$ successively, we present a flexible and effective approach to pattern synthesis. Numerical...
##### #9. Robust Differential Received Signal Strength-Based Localization
###### Yongchang Hu, Geert Leus
Source localization based on signal strength measurements has become very popular due to its practical simplicity. However, the severe nonlinearity and non-convexity make the related optimization problem mathematically difficult to solve, especially when the transmit power or the path-loss exponent (PLE) is unknown. Moreover, even if the PLE is known but not perfectly estimated or the anchor location information is not accurate, the constructed data model will become uncertain, making the problem again hard to solve. This paper particularly focuses on differential received signal strength (DRSS)-based localization with model uncertainties in case of unknown transmit power and PLE. A new whitened model for DRSS-based localization with unknown transmit powers is first presented and investigated. When assuming the PLE is known, we introduce two estimators based on an exact data model, an advanced best linear unbiased estimator (A-BLUE) and a Lagrangian estimator (LE), and then we present a robust semidefinite programming (SDP)-based...
##### #10. Ballistocardiogram-based Authentication using Convolutional Neural Networks
###### Joshua Hebert, Brittany Lewis, Hang Cai, Krishna K. Venkatasubramanian, Matthew Provost, Kelly Charlebois
The goal of this work is to demonstrate the use of the ballistocardiogram (BCG) signal, derived using head-mounted wearable devices, as a viable biometric for authentication. The BCG signal is the measure of an person's body acceleration as a result of the heart's ejection of blood. It is a characterization of the cardiac cycle and can be derived non-invasively from the measurement of subtle movements of a person's extremities. In this paper, we use several versions of the BCG signal, derived from accelerometer and gyroscope sensors on a Smart Eyewear (SEW) device, for authentication. The derived BCG signals are used to train a convolutional neural network (CNN) as an authentication model, which is personalized for each subject. We evaluate our authentication models using data from 12 subjects and show that our approach has an equal error rate (EER) of 3.5% immediately after training and 13\% after about 2 months, in the worst case. We also explore the use of our authentication approach for people with motor disabilities. Our...
