Voir la notice de l'article provenant de la source Math-Net.Ru
@article{IZKAB_2020_6_a2, author = {I. A. Gurtueva}, title = {Modern problems of automatic speech recognition}, journal = {News of the Kabardin-Balkar scientific center of RAS}, pages = {20--33}, publisher = {mathdoc}, number = {6}, year = {2020}, language = {ru}, url = {http://geodesic.mathdoc.fr/item/IZKAB_2020_6_a2/} }
I. A. Gurtueva. Modern problems of automatic speech recognition. News of the Kabardin-Balkar scientific center of RAS, no. 6 (2020), pp. 20-33. http://geodesic.mathdoc.fr/item/IZKAB_2020_6_a2/
[1] M. Campbell, A. J. Hoane, F. h. Hsu, “Deep Blue”, Artificial intelligence, 134 (2002), 57–83 | DOI | Zbl
[2] D. Silver, A. Huang, C. J. Maddison, A. Guez, L. Sifre, G. Van Den Driessche, J. Schrittwieser, I. Antonoglou, V. Panneershelvam, M. Lanctot et al., “Mastering the game of Go with deep neural networks and tree search”, Nature, 529 (2016), 484–489 | DOI
[3] D. Amodei, R. Anubhai, E. Battenberg, C. Case, J. Casper, B. Catanzaro, J. Hen, M. Chrzanowski, A. Coates, G. Diamos et al., Deep Speech 2: End-to-end speech recognition in English and Mandarin, 2015, arXiv: 1512.02595
[4] T. T. Kristjansson, J. R. Hershey, P. A. Olsen, S. J. Rennie, R. A. Gopinath, “Super-human multi-talker speech recognition: the IBM 2006 Speech Separation Challenge system”, Proc. Inter-speech, 12 (2006), 155
[5] C. Weng, D. Yu, M. L. Seltzer, J. Droppo, “Single-channel mixed speech recognition using deep neural networks”, Proc. IEEE ICASSP, 2014, 5632–5636
[6] D. S. Pallett, “A look at NIST's benchmark ASR tests: past, present and future”, IEEE Automatic Speech Recognition and Understanding Workshop, 2003, 483–488 | DOI
[7] P. Price, W. M. Fisher, J. Bernstein, D. S. Pallett, “The DARPA 1000-word resource management database for continuous speech recognition”, Proc. IEEE ICASSP, 1988, 651–654 | MR
[8] D. B. Paul, J. M. Baker, “The design for the wall street journal-based csr corpus”, Proceedings of the workshop on Speech and Natural Language, 1992, 357–362 | DOI
[9] D. Graff, Z. Wu, R. MacIntyre, M. Liberman, “The 1996 broadcast news speech and language-model corpus”, Proceedings of the DARPA Workshop on Spoken Language technology, 1997, 11–14 | MR
[10] A. Ljolje, “The AT 2001 LVCSR system”, NIST LVCSR Workshop, 2001
[11] D. Philipov, Interactive Voice Text Editing Using New Speech Technologies from Yandex, 2014 https://habr.com/ru/company/yandex/blog/243813/
[12] S. F. Chen, B. Kingsbury, L. Mangu, D. Povey, G. Saon, H. Soltau, G. Zweig, “Advances in speech transcription at IBM under the DARPA EARS program”, IEEE Trans. Audio, Speech, and Language Processing, 14 (2006), 1596–1608 | DOI
[13] F. Seide, G. Li, D. Yu, “Conversational speech transcription using context-dependent deep neural networks”, Proc. Interspeech, 2011, 437–440
[14] S. Matsoukas, J. L. Gauvain, G. Adda, T. Colthurst, C. L. Kao, O. Kimball, L. Lamel, F. Lefevre, J. Z. Ma, J. Makhoul et al., “Advances in transcription of broadcast news and conver-sational telephone speech within the combined ears bbn/limsi system”, IEEE Transactions on Audio, Speech, and Language Processing, 14 (2006), 1541–1556 | DOI
[15] A. Stolcke, B. Chen, H. Franco, V. R. R. Gadde, M. Graciarena, M. Y. Hwang, K. Kirchhoff, A. Mandal, N. Morgan, X. Lei et al., “Recent innovations in speech-to-text transcription at SRI-ICSI-UW”, IEEE Transactions on Audio, Speech, and Language Processing, 14 (2006), 1729–1744 | DOI
[16] J. L. Gauvain, L. Lamel, H. Schwenk, G. Adda, L. Chen, F. Lefevre, “Conversational telephone speech recognition”, Proc. IEEE ICASSP, 1 (2003), 1–212
[17] G. Evermann, H. Y. Chan, M. J. F. Gales, T. Hain, X. Liu, D. Mrva, L. Wang, P. C. Woodland, “Development of the 2003 CU-HTK conversational telephone speech transcription system”, Proc. IEEE ICASSP, v. 1, 2004, 1–249 pp.
[18] D. B. Fry, “Theoretical aspects of mechanical speech recognition”, J. British Inst. Radio Engr., 1959, 211–229
[19] T. K. Vintsyuk, “Speech discrimination by dynamic programming”, Kibernetika, 4 (1968), 81–88 | DOI | MR
[20] A. J. Viterbi, “Error bounds for convolutional codes and an asymptotically optimal decoding algorithm”, IEEE Trans. Information Theory, IT - 13 (1967), 260–269 | DOI | Zbl
[21] D. R. Reddy, An approach to computer speech recognition by direct analysis of the speech wave, Tech. Report No. C549, Stanford Univ., Computer Science Dept., 1966. | MR
[22] V. M. Velichko, N. G. Zagoruyko, “Automatic recognition of 200 words”, Int. J. Man- Machine Studies., 1970., no. 2. | Zbl
[23] H. Sakoe, S. Chiba, “Dynamic programming algorithm optimization for spoken word recognition”, IEEE Trans. Acoustics, Speech, Signal Proc, ASSP, 26:1 (1978), 43-49. | MR | Zbl
[24] L. R. Rabiner et. al., “Speaker independent recognition of isolated words using clustering techniques”, IEEE Trans. Acoustics, Speech, Signal Proc., ASSP, 27 (1979), 336-349. | MR | Zbl
[25] D. Klatt, “Review of the ARPA speech understanding project”, J.A.S.A., 62:6 (1977), 1324-1366.
[26] B. Lowerre, “The HARPY speech understanding system”, Trends in Speech Recognition, W. Lea, Speech Science Pub., 1990, 576-586 | DOI | MR
[27] L. R. Rabiner, B. H. Juang, Fundamentals of Speech Recognition, Prentice-Hall, Englewood Cliff., New Jersey, 1993.
[28] S. Katagiri, “Speech pattern recognition using neural networks”, Pattern Recognition in Speech and Language Processing, eds. W. Chou, B.-H. Juang, CRC Press, 2003., 115-147
[29] C. S. Myers, L. R. Rabiner, “A level building dynamic time warping algorithm for connected word recognition”, IEEE Trans. Acoustics, Speech, Signal Proc., ASSP, 29 (1981), 284-297 | DOI | Zbl
[30] C. H. Lee, L. R. Rabiner, “A frame synchronous network search algorithm for connected word recognition”, IEEE Trans. Acoustics, Speech, Signal Proc., 37:11 (1989.), 1649-1658 | DOI
[31] J. S. Bridle , M. D. Brown, “Connected word recognition using whole word templates”, Proc. Inst. Acoust. Autumn Conf., 1979, 25-28
[32] B.-H. Juang , S. Furui, “Automatic speech recognition and understanding: A first step to-ward natural human-machine communication”, Proc. IEEE, 88:8 (2000), 1142-1165 | DOI | MR
[33] W. Chou, “Mimimum classification error (MCE) approach in pattern recognition”, Pattern Recognition in Speech and Language Processing, Chou W., Juang B.-H., CRC Press., 2003, 1-49
[34] C. J. Leggetter, P. C. Woodland, “Maximum likelihood linear regression for speaker adaptation of continuous density hidden Markov models”, Computer Speech and Language, 1995, no. 9, 171-185 | DOI
[35] A. P. Varga , R. K. Moore, “Hidden Markov model decomposition of speech and noise”, Proc. ICASSP, 1990, 845-848
[36] M. J. F. Gales , S. J. Young Parallel model combination for speech recognition in noise, Technical Report, CUED/FINFENG/ TR135, 1993
[37] K. Shinoda, C. H. Lee, “A structural Bayes approach to speaker adaptation”, IEEE Trans. Speech and Audio Proc., 9:3 (2001), 276-287 | DOI
[38] A. Stolcke, J. Droppo, “Comparing Human and Machine Errors in Conversational Speech Transcription”, Interspeech., 2017, 137-141
[39] G. Saon, G. Kurata, T. Sercu, K. Audhkhasi, S. Thomas, D. Dimitriadis, X. Cui, B. Ramabhadran, M. Picheny, L.-L. Lim, B. Roomi, P. Hall, “English Conversational Telephone Speech Recognition by Humans and Machines”, INTERSPEECH, 2017 | Zbl
[40] R. R. Lippmann, “Speech recognition by machines and humans”, Speech Communication, 22:1 (1997), 1-15 | DOI
[41] M. L. Glenn, S. M. Strassel , H. Lee, K. Maeda, R. Zakhary, X. Li, “Transcription Methods for Consistency, Volume and Efficiency”, Proceedings of the International Conference on Language Resources and Evaluation, LREC. (Malta, 2010)
[42] A. Hannun, Speech Recognition Is Not Solved, 2017. https://awni.github.io/speech-recognition
[43] C. Han, J. O'Sullivan, Y. Luo, J. Herrero, A. D. Mehta, N. Mesgarani, “Speaker-independent auditory attention decoding without access to clean speech sources”, Sci Adv., 5:5 (2019), eaav6134. (PMID: 31106271; PMCID: PMC652)