Analytical review and classification of methods
News of the Kabardin-Balkar scientific center of RAS, no. 1 (2022), pp. 41-58.

Voir la notice de l'article provenant de la source Math-Net.Ru

This paper presents an overview of methods and algorithms for feature extraction to transform an acoustic signal into a sequence of vectors for solving problems of segmentation, classification, identification, or speech recognition. A classification of feature extraction methods according to mathematical approaches is proposed. The algorithms and techniques of spectral analysis, which are most used in the design of speech recognition systems, are discussed. This review clearly demonstrates the complexity of the problem of acoustic processing - searching a representation that decreases the dimension of the model and maintain the completeness of linguistic information and, importantly, is stable to variability with respect to the speaker, transmission channels and the environment. The analysis of the existing feature extraction methods is useful for selection of a technology when designing a key element of a speech system.
Keywords: speech recognition, Fourier analysis, cepstral analysis, linear prediction, methods for feature extraction.
@article{IZKAB_2022_1_a3,
     author = {I. A. Gurtueva and K. Ch. Bzhikhatlov},
     title = {Analytical review and classification of methods},
     journal = {News of the Kabardin-Balkar scientific center of RAS},
     pages = {41--58},
     publisher = {mathdoc},
     number = {1},
     year = {2022},
     language = {ru},
     url = {http://geodesic.mathdoc.fr/item/IZKAB_2022_1_a3/}
}
TY  - JOUR
AU  - I. A. Gurtueva
AU  - K. Ch. Bzhikhatlov
TI  - Analytical review and classification of methods
JO  - News of the Kabardin-Balkar scientific center of RAS
PY  - 2022
SP  - 41
EP  - 58
IS  - 1
PB  - mathdoc
UR  - http://geodesic.mathdoc.fr/item/IZKAB_2022_1_a3/
LA  - ru
ID  - IZKAB_2022_1_a3
ER  - 
%0 Journal Article
%A I. A. Gurtueva
%A K. Ch. Bzhikhatlov
%T Analytical review and classification of methods
%J News of the Kabardin-Balkar scientific center of RAS
%D 2022
%P 41-58
%N 1
%I mathdoc
%U http://geodesic.mathdoc.fr/item/IZKAB_2022_1_a3/
%G ru
%F IZKAB_2022_1_a3
I. A. Gurtueva; K. Ch. Bzhikhatlov. Analytical review and classification of methods. News of the Kabardin-Balkar scientific center of RAS, no. 1 (2022), pp. 41-58. http://geodesic.mathdoc.fr/item/IZKAB_2022_1_a3/

[1] J. R. Deller, J. G. Proakis, J. H.L. Hansen, Discrete Time Processing of Speech Signals, Wiley-IEEE Press, Hoboken NJ, 1999, 936 pp.

[2] V. Gupta, “A Survey of Natural Language Processing Techniques”, International Journal of Computer Science Engineering Technology (IJCSET), 5:1 (2014), 14–16

[3] J. W. Picone, “Signal modeling techniques in speech recognition”, Proceedings of the IEEE, 81:9 (1993), 1215–1245 | DOI

[4] D. Jurafsky, J. Martin, Speech and Language Processing: An introduction to natural language processing, computational linguistics, and speech recognition, Prentice Hall, NJ, 2009, 1024 pp.

[5] J. O. Pickles, An Introduction to the Physiology of Hearing, Academic Press, New York, 1988, 400 pp. | MR

[6] H. Fletcher, W. A. Munson, “Relation between Loudness and Masking”, J. Acoust. Soc. Am, 1937, no. 9, 1–10 | DOI

[7] E. Zviker, R. Feldkeller, Ear as a receiver of information, Svyaz', Moscow, 1971, 255 pp. (In Russian)

[8] M. R. Schroeder, “Optimizing Digital Speech Coders by Exploiting Masking properties of the Human Ear”, J. Acoust. Soc. Am, 1979, no. 66 (6), 1647–1652 | DOI

[9] Traunmuller H, “Analytical Expressions for the tonotopic sensory scale.”, The Journal of the Acoustical Society of America, 88:1 (1990), 97–100 | DOI

[10] A. N. Kavalchuk, “The formula for the transition from the frequency domain to the bark scale and vice versa”, Informatics, 2011, no. 4 (32), 71–81 (In Russian)

[11] I. Aldoshina, R. Pritts, “Musical acoustics”, Textbook. St, 2014, 720, Composer, Petersburg (In Russian)

[12] B. Moore, “Frequency selectivity in hearing”, Boston, 1986, 456, Springer, MA | DOI

[13] J. O. Smith, “Mathematics of the Discrete Fourier Transform (DFT) with Audio Applications”, W3K Publishing, 2007, 322 http://books.w3k.org/ | MR

[14] L. R. Rabiner, R. W. Schafer, Digital processing of speech signal, eds. Russ. ed.: Rabiner L.R., Shafer R.V., PrenticeHall, New Jersey, 1978, 496 pp. (Tsifrovaya obrabotka rechevykh signalov. Moscow: Radio i svyaz' Publ., 1981)

[15] S. Davis, P. Mermelstein, “Experiments in syllable-based recognition of continuous speech”, IEEE Transactions on Acoustics, Speech and Signal Processing, 28 (1980), 357–366 | DOI | MR

[16] S. Chakroborty, A. Roy, G. Saha, “Fusion of a complementary feature set with MFCC for improved closed set text-independent speaker identification”, IEEE International Conference on Industrial Technology., 2006, 387–390, Mumbai

[17] Y. Shao, D. L. Wang, “Robust speaker identification using auditory features and computational auditory scene analysis”, Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP 2008)., 2008, 1589–1592 (Las Vegas, NV, USA) | DOI | MR

[18] L. V. Novikov, Fundamentals of Wavelet Signal Analysis, IanP RAN, St.Petersburg, 1999, 152 pp. (In Russian)

[19] N. E. Huang, “The empirical mode decomposition and the Hilbert spectrum for nonlinear and non-stationary time series analysis”, Proceedings of the Royal Society of London A, v. 454, 1998, 903–995 | DOI | MR | Zbl

[20] M. Todisco, H. Delgado, N. Evans, “A new feature for automatic speaker verification antispoofing: constant Q cepstral coefficients”, The Speaker and Language Recognition Workshop (Odyssey 2016.. Bilbao, Spain), 2016, 283–290

[21] G. Fant, Acoustic Theory of Speech Production, Walter de Gruyter, 1970, 328 pp.

[22] L. Rabiner, B. H. Juang, Fundamentals of speech recognition, Prentice-Hall, Inc, NJ, 1993, 507 pp.