Speech-based emotion recognition and speaker identification: static vs. dynamic mode of speech representation
Žurnal Sibirskogo federalʹnogo universiteta. Matematika i fizika, Tome 9 (2016) no. 4, pp. 518-523
Voir la notice de l'article provenant de la source Math-Net.Ru
In this paper we present the performance of different machine learning algorithms for the problems of speech-based Emotion Recognition (ER) and Speaker Identification (SI) in static and dynamic modes of speech signal representation. We have used a multi-corporal, multi-language approach in the study. 3 databases for the problem of SI and 4 databases for the ER task of 3 different languages (German, English and Japanese) have been used in our study to evaluate the models. More than 45 machine learning algorithms were applied to these tasks in both modes and the results alongside discussion are presented here.
Keywords:
emotion recognition from speech, speaker identification from speech, machine learning algorithms, speaker adaptive emotion recognition from speech.
@article{JSFU_2016_9_4_a15,
author = {Maxim Sidorov and Wolfgang Minker and Eugene S. Semenkin},
title = {Speech-based emotion recognition and speaker identification: static vs. dynamic mode of speech representation},
journal = {\v{Z}urnal Sibirskogo federalʹnogo universiteta. Matematika i fizika},
pages = {518--523},
publisher = {mathdoc},
volume = {9},
number = {4},
year = {2016},
language = {en},
url = {http://geodesic.mathdoc.fr/item/JSFU_2016_9_4_a15/}
}
TY - JOUR AU - Maxim Sidorov AU - Wolfgang Minker AU - Eugene S. Semenkin TI - Speech-based emotion recognition and speaker identification: static vs. dynamic mode of speech representation JO - Žurnal Sibirskogo federalʹnogo universiteta. Matematika i fizika PY - 2016 SP - 518 EP - 523 VL - 9 IS - 4 PB - mathdoc UR - http://geodesic.mathdoc.fr/item/JSFU_2016_9_4_a15/ LA - en ID - JSFU_2016_9_4_a15 ER -
%0 Journal Article %A Maxim Sidorov %A Wolfgang Minker %A Eugene S. Semenkin %T Speech-based emotion recognition and speaker identification: static vs. dynamic mode of speech representation %J Žurnal Sibirskogo federalʹnogo universiteta. Matematika i fizika %D 2016 %P 518-523 %V 9 %N 4 %I mathdoc %U http://geodesic.mathdoc.fr/item/JSFU_2016_9_4_a15/ %G en %F JSFU_2016_9_4_a15
Maxim Sidorov; Wolfgang Minker; Eugene S. Semenkin. Speech-based emotion recognition and speaker identification: static vs. dynamic mode of speech representation. Žurnal Sibirskogo federalʹnogo universiteta. Matematika i fizika, Tome 9 (2016) no. 4, pp. 518-523. http://geodesic.mathdoc.fr/item/JSFU_2016_9_4_a15/