Speech Unit Category based Short Utterance Speaker Recognition
Computer Science and Information Systems, Tome 9 (2012) no. 4.

Voir la notice de l'article provenant de la source Computer Science and Information Systems website

Information of speech units like vowels, consonants and syllables can be a kind of knowledge used in text-independent Short Utterance Speaker Recognition (SUSR) in a similar way as in text-dependent speaker recognition. In such tasks, data for each speech unit, especially at the time of recognition, is often not enough. Hence, it is not practical to use the full set of speech units because some of the units might not be well trained. To solve this problem, a method of using speech unit categories rather than individual phones is proposed for SUSR, wherein similar speech units are put together, hence solving the problem of sparse data. We define Vowel, Consonant, and Syllable Categories (VC, CC and SC) with Standard Chinese (Putonghua) as a reference. A speech utterance is recognized into VC, CC ad SC sequences which are used to train Universal Background Models (UBM) for each speech unit category in the training procedure, and to perform speech unit category dependent speaker recognition, respectively. Experimental results in Gaussian Mixture Model-Universal Background Model (GMM-UBM) based system give a relative equal error rate (EER) reduction of 54.50% and 40.95% from minimum EERs of VCs and SCs, respectively, for 2 seconds of test utterance compared with the existing SUSR systems.
Keywords: Short Utterance Speaker Recognition, Vowel Categories, Universal Background Vowel Category Model
@article{CSIS_2012_9_4_a3,
     author = {Nakhat Fatima and Xiaojun Wu and Thomas Fang Zheng},
     title = {Speech {Unit} {Category} based {Short} {Utterance} {Speaker} {Recognition}},
     journal = {Computer Science and Information Systems},
     publisher = {mathdoc},
     volume = {9},
     number = {4},
     year = {2012},
     url = {http://geodesic.mathdoc.fr/item/CSIS_2012_9_4_a3/}
}
TY  - JOUR
AU  - Nakhat Fatima
AU  - Xiaojun Wu
AU  - Thomas Fang Zheng
TI  - Speech Unit Category based Short Utterance Speaker Recognition
JO  - Computer Science and Information Systems
PY  - 2012
VL  - 9
IS  - 4
PB  - mathdoc
UR  - http://geodesic.mathdoc.fr/item/CSIS_2012_9_4_a3/
ID  - CSIS_2012_9_4_a3
ER  - 
%0 Journal Article
%A Nakhat Fatima
%A Xiaojun Wu
%A Thomas Fang Zheng
%T Speech Unit Category based Short Utterance Speaker Recognition
%J Computer Science and Information Systems
%D 2012
%V 9
%N 4
%I mathdoc
%U http://geodesic.mathdoc.fr/item/CSIS_2012_9_4_a3/
%F CSIS_2012_9_4_a3
Nakhat Fatima; Xiaojun Wu; Thomas Fang Zheng. Speech Unit Category based Short Utterance Speaker Recognition. Computer Science and Information Systems, Tome 9 (2012) no. 4. http://geodesic.mathdoc.fr/item/CSIS_2012_9_4_a3/