Mathematical methods in automatic speech recognition

Mihov, Stoyan

Mihov, Stoyan

Mathematics and Education in Mathematics, Tome 49 (2020), pp. 114-122

Cet article a éte moissonné depuis la source Bulgarian Digital Mathematics Library

Voir la notice de l'acte

Résumé

We present the mathematical methods which are used in the process of Automatic Speech Recognition. The presentation is divided in three parts. We start with a short overview of the vocal tract and the corresponding acoustics equations. Afterwards we introduce the digital signal processing, which is performed over the speech signal in order to extract Mel-frequency cepstrum coeﬃcients, corresponing to the articulation conﬁguration. In the second part we present an approach for acoustic modeling based on time-delayed deep neural networks. We discuss the methodology for the machine learning of the acoustic model. In the third part we describe the use of ﬁnite-state f-transducers for representing the language model. For decoding the signal we shortly present the Viterby and the beam-search algorithm over a Hidden Markov Model represented as a f-transducer. Finally, we show experimental results for automatic speech recognition of Bulgarian language. Представяме математическите методи, които се използват в процеса на автома- тично разпознаване на речта. Презентацията е разделена на три части. Започва- ме с кратък преглед на вокалния тракт и съответните уравнения на акустиката, които описват процеса. След това представяме цифровата обработка на сигна- ла, която се осъществява над речевия сигнал, за да се извлекат коефициентите на Мел-честотния кепструм, съответстващи на конфигурацията на артикулаци- ята. Във втората част представяме подход за акустично моделиране, базиран на забавени във времето дълбоки невронни мрежи. Разглеждаме и методологията за машинно обучение на акустичния модел. В третата част описваме използва- нето на монотонни стохастични f-преобразуватели за представяне на езиковия модел. За декодиране на сигнала представяме накратко алгоритъма на Витерби и алгоритъма за търсене по лъча върху стохастичния f-преобразувател. Накрая показваме експериментални резултати за автоматично разпознаване на реч на български език.

Keywords: speech corpus, automatic speech recognition, low resource language, 68T10, 94A12, 68T07, 68Q45, речеви корпус, автоматично разпознаване на реч, език с нисък ресурс, 68T10, 94A12, 68T07, 68Q45

@incollection{MEM_2020_49_a10,
     author = {Mihov, Stoyan},
     title = {Mathematical methods in automatic speech recognition},
     booktitle = {},
     series = {Mathematics and Education in Mathematics},
     pages = {114--122},
     year = {2020},
     volume = {49},
     language = {en},
     url = {http://geodesic.mathdoc.fr/item/MEM_2020_49_a10/}
}

TY  - JOUR
AU  - Mihov, Stoyan
TI  - Mathematical methods in automatic speech recognition
JO  - Mathematics and Education in Mathematics
PY  - 2020
SP  - 114
EP  - 122
VL  - 49
UR  - http://geodesic.mathdoc.fr/item/MEM_2020_49_a10/
LA  - en
ID  - MEM_2020_49_a10
ER  -

%0 Journal Article
%A Mihov, Stoyan
%T Mathematical methods in automatic speech recognition
%J Mathematics and Education in Mathematics
%D 2020
%P 114-122
%V 49
%U http://geodesic.mathdoc.fr/item/MEM_2020_49_a10/
%G en
%F MEM_2020_49_a10

Mihov, Stoyan. Mathematical methods in automatic speech recognition. Mathematics and Education in Mathematics, Tome 49 (2020), pp. 114-122. http://geodesic.mathdoc.fr/item/MEM_2020_49_a10/

Parcourir par

Geodesic

Parcourir par