Performance Improvement in Speech Based Emotion Recognition With DWT and ANOVA
DOI:
https://doi.org/10.26713/cma.v14i3.2389Keywords:
Speech emotion recognition, DWT, ANOVA, SVM, MLPAbstract
With technological advancements, machine needs to understand human speech, i.e., Human-Computer Interaction (HCI) has become vital. For natural interaction, emotion detection in speech is a must. Time domain features can identify a few emotions, whereas some are determined by inherently using frequency domain features. With wavelet-based features majority of emotion discriminating features can be dentified. A common observation is that happy emotion is seen to be majorly misclassified as an angry emotion. Reduction in this misclassification is achieved with the proposed feature vector. Spectral features and Discrete Wavelet Transform (DWT) features form the proposed feature vector. Feature selection is made using statistical test analysis of variance ANOVA. The model is verified using SVM and MLP classifiers. In this work, a speech emotion recognition system is evaluated using a German audio database (EMODB). It is seen to be able to recognize happy and angry emotions with better accuracy as compared to state-of-the-art algorithms. For four emotion classes: happy, angry, neutral and sad proposed model performance with DWT features has improved by 3% compared to baseline features in the case of both the classifiers, viz. SVM and MLP
Downloads
References
L. Abdel-Hamid, Egyptian Arabic speech emotion recognition using prosodic, spectral and wavelet features, Speech Communication 122 (2020), 19 – 30, DOI: 10.1016/j.specom.2020.04.005.
F. Burkhardt, A. Paeschke, M. Rolfes, W. F. Sendlmeier and B. Weiss, A database of German emotional speech, in: 9th European Conference on Speech Communication and Technology, 2005, pp. 1517 – 1520, DOI: 10.21437/Interspeech.2005-446.
S. Deb and S. Dandapat, Multiscale amplitude feature and significance of enhanced vocal tract information for emotion classification, IEEE Transactions on Cybernetics 49(3) (2019), 802 – 815, DOI: 10.1109/TCYB.2017.2787717.
M. El Ayadi, M. S. Kamel and F. Karray, Survey on speech emotion recognition: Features, classification schemes, and databases, Pattern Recognition 44(3) (2011), 572 – 587, DOI: 10.1016/j.patcog.2010.09.020.
G. R. Lee, R. Gommers, F. Waselewski, K. Wohlfahrt and A. O’Leary, PyWavelets: a Python package for wavelet analysis, Journal of Open Source Software 4(36) (2019), 1237, DOI: 10.21105/joss.01237.
H. K. Palo and M. N. Mohanty, Wavelet based feature combination for recognition of emotions, Ain Shams Engineering Journal 9(4) (2018), 1799 – 1806, DOI: 10.1016/j.asej.2016.11.001.
S. T. Saste and S. M. Jagdale, Emotion recognition from speech using MFCC and DWT for security system, in: 2017 International conference of Electronics, Communication and Aerospace Technology (ICECA), Coimbatore, India, 2017, pp. 701 – 704, DOI: 10.1109/ICECA.2017.8203631.
M. Sheikhan, M. Bejani and D. Gharavian, Modular neural-SVM scheme for speech emotion recognition using ANOVA feature selection method, Neural Computing and Applications 23 (2013), 215 – 227 (2013), DOI: 10.1007/s00521-012-0814-8.
A. S. Shinde and V. V. Patil, Speech Emotion recognition system: a review, in: Proceedings of the 4th International Conference on Advances in Science & Technology (ICAST2021), 2021, 6 pages, DOI: 10.2139/ssrn.3869462.
A. S. Shinde, V. V. Patil, K. R. Khadse, N. Jadhav, S. Joglekar and M. Hatwalne, ML based speech emotion recognition framework for music therapy suggestion system, in: 2022 6th International Conference On Computing, Communication, Control and Automation, Pune, India, 2022, pp. 1 – 5, DOI: 10.1109/ICCUBEA54992.2022.10011091.
N. Sugan, N. S. S. Srinivas, L. S. Kumar, M. K. Nath and A. Kanhe, Speech emotion recognition using cepstral features extracted with novel triangular filter banks based on bark and ERB frequency scales, Digital Signal Processing 104 (2020), 102763, DOI: 10.1016/j.dsp.2020.102763.
M. Swain, A. Routray and P. Kabisatpathy, Databases, features and classifiers for speech emotion recognition: a review, International Journal of Speech Technology 21 (2018), 93 – 120, DOI: 10.1007/s10772-018-9491-z.
Downloads
Published
How to Cite
Issue
Section
License
Authors who publish with this journal agree to the following terms:
- Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a CCAL that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work.