Training a speaker verification system on unlabelled data
Matematičeskoe modelirovanie, Tome 27 (2015) no. 7, pp. 51-57.

Voir la notice de l'article provenant de la source Math-Net.Ru

In the article we consider a method of labeling speaker data using clusterization techniques. Labelling problems arise when one needs to use a speaker database from new channels, for example, mobile devices. Newly labelled database might then be used to construct a speaker verification system. In the article described a speaker verification task along with some methods to solve it which are based on GMM-UBM, also some channel normalization techniques are described, which might enhance the quality of recognition. Methods based on supervectors and PLDA are also presented. We also study the quality of labeling obtained through clusterization with different metrics. Resulting labelled database is then used to train several PLDA models. Then these models fused and used to solve a speaker verification task on i-vectors from NIST are i-vector Machine Learning Challenge 2014.
Keywords: patern recognition, automatic speaker verification, clusterization, PLDA.
@article{MM_2015_27_7_a8,
     author = {A. V. Ermilov and I. M. Gostev},
     title = {Training a speaker verification system on unlabelled data},
     journal = {Matemati\v{c}eskoe modelirovanie},
     pages = {51--57},
     publisher = {mathdoc},
     volume = {27},
     number = {7},
     year = {2015},
     language = {ru},
     url = {http://geodesic.mathdoc.fr/item/MM_2015_27_7_a8/}
}
TY  - JOUR
AU  - A. V. Ermilov
AU  - I. M. Gostev
TI  - Training a speaker verification system on unlabelled data
JO  - Matematičeskoe modelirovanie
PY  - 2015
SP  - 51
EP  - 57
VL  - 27
IS  - 7
PB  - mathdoc
UR  - http://geodesic.mathdoc.fr/item/MM_2015_27_7_a8/
LA  - ru
ID  - MM_2015_27_7_a8
ER  - 
%0 Journal Article
%A A. V. Ermilov
%A I. M. Gostev
%T Training a speaker verification system on unlabelled data
%J Matematičeskoe modelirovanie
%D 2015
%P 51-57
%V 27
%N 7
%I mathdoc
%U http://geodesic.mathdoc.fr/item/MM_2015_27_7_a8/
%G ru
%F MM_2015_27_7_a8
A. V. Ermilov; I. M. Gostev. Training a speaker verification system on unlabelled data. Matematičeskoe modelirovanie, Tome 27 (2015) no. 7, pp. 51-57. http://geodesic.mathdoc.fr/item/MM_2015_27_7_a8/

[1] B. Gold, N. Morgan, D. Ellis, Speech and audio signal processing: processing and perception of speech and music, John Wiley Sons, 2011

[2] D. A. Reynolds, R. C. Rose, “Robust text-independent speaker identification using Gaussian mixture speaker models”, IEEE Transactions on Speech and Audio Processing, 3:1 (1995), 72–83

[3] D. Doroshin, M. Tkachenko, N. Lubimov et al., “Application of $l_1$ Estimation of Gaussian Mixture Model Parameters for Language Identification”, Speech and Computer, Springer International Publishing, 2013, 41–45

[4] U. Simon, I. Lapidot, H. Guterman, “Comparison between Normalizations for SVM-GMM Supervectors Speaker Verification”, IEEE 26th Convention of Electrical and Electronics Engineers in Israel (IEEEI) (2010)

[5] N. Dehak, P. Kenny, R. Dehak, “Front-end factor analysis for speaker verification”, IEEE Transactions on Audio, Speech, and Language Processing, 19:4 (2011), 788–798

[6] N. Lyubimov, M. Nastasenko, M. Kotov et al., “Exploiting Non-negative Matrix Factorization with Linear Constraints in Noise-Robust Speaker Identification”, Speech and Computer, Springer International Publishing, 2014, 200–208

[7] W. M. Campbell, D. E. Sturim, D. A. Reynolds, “Support vector machines using GMM supervectors for speaker verification”, IEEE Signal Processing Letters, 13:5 (2006), 308–311

[8] M. Senoussaoui, P. Kenny, N. Dehak, “An i-vector Extractor Suitable for Speaker Recognition with both Microphone and Telephone Speech”, Proc. Odyssey Speaker and Language Recognition Workshop (Brno, Czech Republic, June 2010), 6 pp.

[9] P. Kenny, “Bayesian Speaker Verification with Heavy-Tailed Priors”, Odyssey (2010), 14

[10] E. Khoury, L. El Shafey, M. Ferras, “Hierarchical speaker clustering methods for the NIST i-vector Challenge”, Odyssey: The Speaker and Language Recognition Workshop (2014), No EPFL-CONF-198439

[11] J. H. Ward (Jr.), “Hierarchical grouping to optimize an objective function”, Journal of the American statistical association, 58:301 (1963), 236–244

[12] G. N. Lance, W. T. Williams, “A general theory of classificatory sorting strategies. II: Clustering systems”, Computer Journal, 10:3 (1967), 271–277

[13] L. Breiman, “Bagging predictors”, Machine learning, 24:2 (1996), 123–140 | Zbl

[14] NIST SRE Homepage, , 2014 http://www.nist.gov/itl/iad/mig/sre.cfm