Adaptive noise cancellation for robust speech recognition in noisy environments
Proceedings of the Yerevan State University. Physical and mathematical sciences, Tome 58 (2024) no. 1, pp. 22-29.

Voir la notice de l'article provenant de la source Math-Net.Ru

In this paper, we address the challenges faced when combining noise cancellation and automatic speech recognition (ASR) models. When these models are combined directly, the performance of word recognition often suffers because the distribution of input data changes. To overcome this limitation, we propose a novel method for combining these models, which enhances the ability of the speech recognition model to perform well in noisy environments. The key feature of the proposed method is the introduction of a mechanism to control the aggressiveness of noise reduction. This mechanism enables us to customize the noise reduction process according to the specific requirements of the ASR model, without necessitating any retraining. This advantage makes our method applicable to any ASR model, facilitating its implementation in practical scenarios.
Keywords: automatic speech recognition, noise robustness
Mots-clés : noise cancellation, domain adaptation
@article{UZERU_2024_58_1_a3,
     author = {D. S. Karamyan},
     title = {Adaptive noise cancellation for robust speech recognition in noisy environments},
     journal = {Proceedings of the Yerevan State University. Physical and mathematical sciences},
     pages = {22--29},
     publisher = {mathdoc},
     volume = {58},
     number = {1},
     year = {2024},
     language = {en},
     url = {http://geodesic.mathdoc.fr/item/UZERU_2024_58_1_a3/}
}
TY  - JOUR
AU  - D. S. Karamyan
TI  - Adaptive noise cancellation for robust speech recognition in noisy environments
JO  - Proceedings of the Yerevan State University. Physical and mathematical sciences
PY  - 2024
SP  - 22
EP  - 29
VL  - 58
IS  - 1
PB  - mathdoc
UR  - http://geodesic.mathdoc.fr/item/UZERU_2024_58_1_a3/
LA  - en
ID  - UZERU_2024_58_1_a3
ER  - 
%0 Journal Article
%A D. S. Karamyan
%T Adaptive noise cancellation for robust speech recognition in noisy environments
%J Proceedings of the Yerevan State University. Physical and mathematical sciences
%D 2024
%P 22-29
%V 58
%N 1
%I mathdoc
%U http://geodesic.mathdoc.fr/item/UZERU_2024_58_1_a3/
%G en
%F UZERU_2024_58_1_a3
D. S. Karamyan. Adaptive noise cancellation for robust speech recognition in noisy environments. Proceedings of the Yerevan State University. Physical and mathematical sciences, Tome 58 (2024) no. 1, pp. 22-29. http://geodesic.mathdoc.fr/item/UZERU_2024_58_1_a3/

[1] A. Radford, J. Kim, et al., “Robust Speech Recognition Via Large-scale Weak Supervision”, International Conference on Machine Learning, 2023, 28492–28518 | DOI

[2] A. Gulati, J. Qin, et al., “Conformer: Convolution-augmented Transformer for Speech Recognition”, Electrical Engineering and Systems Science, 2020 (2020), 5036–5040 | DOI

[3] J. Li, V. Lavrukhin, et al., “Jasper: An End-to-End Convolutional Neural Acoustic Model”, Electrical Engineering and Systems Science, 2019 (2019), 70–75 | DOI

[4] S. Boll, “Suppression of Acoustic Noise in Speech Using Spectral Subtraction”, IEEE Transactions on Acoustics, Speech, and Signal Processing, 27 (1979), 113–120 | DOI | MR

[5] A. Acero, “Acoustical and Environmental Robustness in Automatic Speech Recognition”, Springer Science and Business Media, 1992 | DOI | MR

[6] X. Cui, M. Iseli, et al., “Evaluation of Noise Robust Features on the Aurora Databases.”, Proc. 7th International Conference on Spoken Language Processing (ICSLP 2002), 2002, 481–484 | DOI

[7] H. Hermansky, N. Morgan, “RASTA Processing of Speech”, IEEE Transactions on Speech and Audio Processing, 2 (1994), 578–589 | DOI

[8] L. Mošner, M. Wu, et al., “Improving Noise Robustness of Automatic Speech Recognition Via Parallel Data and Teacher-Student Learning”, ICASSP 2019-2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2019, 6475–6479 | DOI

[9] M. Gales, S. Young, “Robust Continuous Speech Recognition Using Parallel Model Combination”, IEEE Transactions on Speech and Audio Processing, 4 (1996), 352–359 | DOI

[10] Y. Gong, “Speech Recognition in Noisy Environments: A Survey”, Speech Communication, 16 (1995), 261–291 | DOI

[11] R. Lippmann, E. Martin, D. Paul, “Multi-style Training for Robust Isolated-word Speech Recognition (ICASSP)”, IEEE International Conference on Acoustics, Speech, and Signal Processing, 12 (1987), 705–708 | DOI | MR

[12] Z. Wang , X. Wang , et al., “Oracle Performance Investigation of the Ideal Masks”, IEEE International Workshop on Acoustic Signal Enhancement (IWAENC), 2016 (2016), 1–5 | DOI

[13] S. Xia, H. Li, X. Zhang, “Using Optimal Ratio Mask as Training Target for Supervised Speech Separation”, Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC), IEEE, Malaysia, Kuala Lumpur, 2017 | DOI

[14] K. Cho, B. Merri\..{e}nboer, et al., “On the Properties of Neural Machine Translation: Encoder–Decoder Approaches.”, Proceedings of SSST-8, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, 10 (2014), 103–111 | DOI | MR

[15] V. Panayotov, G. Chen, et al., “Librispeech: An ASR Corpus Based on Public Domain Audio Books”, IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2015, 5206–5210 | DOI | MR

[16] D. Snyder, G. Chen, D. Povey, A Music, Speech, and Noise Corpus, 2015 | DOI