Performance Improvement in Speech Based Emotion Recognition With DWT and ANOVA

Ashwini S. Shinde; Vaishali V. Patil

doi:10.26713/cma.v14i3.2389

Authors

Ashwini S. Shinde E&TC Department, AISSM’s Institute of Information Technology (Savitribai Phule Pune University), Pune, India; E&TC Department, Pimpri Chichwad College of Engineering (Savitribai Phule Pune University), Pune, India https://orcid.org/0000-0002-9388-3345
Vaishali V. Patil E&TC Department, International Institute of Information Technology (Savitribai Phule Pune University), Pune, India https://orcid.org/0000-0003-3922-002X

DOI:

https://doi.org/10.26713/cma.v14i3.2389

Keywords:

Speech emotion recognition, DWT, ANOVA, SVM, MLP

Abstract

With technological advancements, machine needs to understand human speech, i.e., Human-Computer Interaction (HCI) has become vital. For natural interaction, emotion detection in speech is a must. Time domain features can identify a few emotions, whereas some are determined by inherently using frequency domain features. With wavelet-based features majority of emotion discriminating features can be dentified. A common observation is that happy emotion is seen to be majorly misclassified as an angry emotion. Reduction in this misclassification is achieved with the proposed feature vector. Spectral features and Discrete Wavelet Transform (DWT) features form the proposed feature vector. Feature selection is made using statistical test analysis of variance ANOVA. The model is verified using SVM and MLP classifiers. In this work, a speech emotion recognition system is evaluated using a German audio database (EMODB). It is seen to be able to recognize happy and angry emotions with better accuracy as compared to state-of-the-art algorithms. For four emotion classes: happy, angry, neutral and sad proposed model performance with DWT features has improved by 3% compared to baseline features in the case of both the classifiers, viz. SVM and MLP

Downloads

References

L. Abdel-Hamid, Egyptian Arabic speech emotion recognition using prosodic, spectral and wavelet features, Speech Communication 122 (2020), 19 – 30, DOI: 10.1016/j.specom.2020.04.005.

F. Burkhardt, A. Paeschke, M. Rolfes, W. F. Sendlmeier and B. Weiss, A database of German emotional speech, in: 9th European Conference on Speech Communication and Technology, 2005, pp. 1517 – 1520, DOI: 10.21437/Interspeech.2005-446.

S. Deb and S. Dandapat, Multiscale amplitude feature and significance of enhanced vocal tract information for emotion classification, IEEE Transactions on Cybernetics 49(3) (2019), 802 – 815, DOI: 10.1109/TCYB.2017.2787717.

M. El Ayadi, M. S. Kamel and F. Karray, Survey on speech emotion recognition: Features, classification schemes, and databases, Pattern Recognition 44(3) (2011), 572 – 587, DOI: 10.1016/j.patcog.2010.09.020.

G. R. Lee, R. Gommers, F. Waselewski, K. Wohlfahrt and A. O’Leary, PyWavelets: a Python package for wavelet analysis, Journal of Open Source Software 4(36) (2019), 1237, DOI: 10.21105/joss.01237.

H. K. Palo and M. N. Mohanty, Wavelet based feature combination for recognition of emotions, Ain Shams Engineering Journal 9(4) (2018), 1799 – 1806, DOI: 10.1016/j.asej.2016.11.001.

S. T. Saste and S. M. Jagdale, Emotion recognition from speech using MFCC and DWT for security system, in: 2017 International conference of Electronics, Communication and Aerospace Technology (ICECA), Coimbatore, India, 2017, pp. 701 – 704, DOI: 10.1109/ICECA.2017.8203631.

M. Sheikhan, M. Bejani and D. Gharavian, Modular neural-SVM scheme for speech emotion recognition using ANOVA feature selection method, Neural Computing and Applications 23 (2013), 215 – 227 (2013), DOI: 10.1007/s00521-012-0814-8.

A. S. Shinde and V. V. Patil, Speech Emotion recognition system: a review, in: Proceedings of the 4th International Conference on Advances in Science & Technology (ICAST2021), 2021, 6 pages, DOI: 10.2139/ssrn.3869462.

A. S. Shinde, V. V. Patil, K. R. Khadse, N. Jadhav, S. Joglekar and M. Hatwalne, ML based speech emotion recognition framework for music therapy suggestion system, in: 2022 6th International Conference On Computing, Communication, Control and Automation, Pune, India, 2022, pp. 1 – 5, DOI: 10.1109/ICCUBEA54992.2022.10011091.

N. Sugan, N. S. S. Srinivas, L. S. Kumar, M. K. Nath and A. Kanhe, Speech emotion recognition using cepstral features extracted with novel triangular filter banks based on bark and ERB frequency scales, Digital Signal Processing 104 (2020), 102763, DOI: 10.1016/j.dsp.2020.102763.

M. Swain, A. Routray and P. Kabisatpathy, Databases, features and classifiers for speech emotion recognition: a review, International Journal of Speech Technology 21 (2018), 93 – 120, DOI: 10.1007/s10772-018-9491-z.

Performance Improvement in Speech Based Emotion Recognition With DWT and ANOVA

Authors

DOI:

Keywords:

Abstract

Downloads

References

Downloads

Published

How to Cite

Issue

Section

License

Make a Submission

Indexed in

Keywords