Multimodal Speech Emotion Recognition and Ambiguity Resolution
Identifying emotion from speech is a non-trivial task pertaining to the ambiguous definition of emotion itself. In this work, we adopt a feature-engineering based approach to tackle the task of speech emotion recognition. Formalizing our problem as a multi-class classification problem, we compare the performance of two categories of models. For both, we extract eight hand-crafted features from the audio signal. In the first approach, the extracted features are used to train six traditional machine learning classifiers, whereas the second approach is based on deep learning wherein a baseline feed-forward neural network and an LSTM-based classifier are trained over the same features. In order to resolve ambiguity in communication, we also include features from the text domain. We report accuracy, f-score, precision, and recall for the different experiment settings we evaluated our models in. Overall, we show that lighter machine learning based models trained over a few hand-crafted features are able to achieve performance comparable to the current deep learning based state-of-the-art method for emotion recognition.
NurtureToken New!

Token crowdsale for this paper ends in

Buy Nurture Tokens

Author

Are you an author of this paper? Check the Twitter handle we have for you is correct.

Gaurav Sahu (edit)
Ask The Authors

Ask the authors of this paper a question or leave a comment.

Read it. Rate it.
#1. Which part of the paper did you read?

#2. The paper contains new data or analyses that is openly accessible?
#3. The conclusion is supported by the data and analyses?
#4. The conclusion is of scientific interest?
#5. The result is likely to lead to future research?

Github
User:
None (add)
Repo:
None (add)
Stargazers:
0
Forks:
0
Open Issues:
0
Network:
0
Subscribers:
0
Language:
None
Youtube
Link:
None (add)
Views:
0
Likes:
0
Dislikes:
0
Favorites:
0
Comments:
0
Other
Sample Sizes (N=):
Inserted:
Words Total:
Words Unique:
Source:
Abstract:
None
04/14/19 06:03PM
4,849
1,834
Tweets
jaialkdanel: RT @arxiv_org: Multimodal Speech Emotion Recognition and Ambiguity Resolution. https://t.co/1AZVbLdIqB https://t.co/nBJcQ5SGVP
HubDataScience: RT @arxiv_org: Multimodal Speech Emotion Recognition and Ambiguity Resolution. https://t.co/1AZVbLdIqB https://t.co/nBJcQ5SGVP
syoyo: RT @arxiv_org: Multimodal Speech Emotion Recognition and Ambiguity Resolution. https://t.co/1AZVbLdIqB https://t.co/nBJcQ5SGVP
arxiv_org: Multimodal Speech Emotion Recognition and Ambiguity Resolution. https://t.co/1AZVbLdIqB https://t.co/nBJcQ5SGVP
arxiv_cscl: Multimodal Speech Emotion Recognition and Ambiguity Resolution https://t.co/6AmhGgbeJa
arxiv_cscl: Multimodal Speech Emotion Recognition and Ambiguity Resolution https://t.co/6AmhGgsQ7K
arxivml: "Multimodal Speech Emotion Recognition and Ambiguity Resolution", Gaurav Sahu https://t.co/O7tcb5oOLN
arxiv_cs_LG: Multimodal Speech Emotion Recognition and Ambiguity Resolution. Gaurav Sahu https://t.co/vDmpNbEucc
BrundageBot: Multimodal Speech Emotion Recognition and Ambiguity Resolution. Gaurav Sahu https://t.co/W6CBMYNlDJ
arxiv_cscl: Multimodal Speech Emotion Recognition and Ambiguity Resolution https://t.co/6AmhGgsQ7K
Images
Related