The basic elements for cognitive model of speech perception mechanism on the base of multi-agent recursive intellect
News of the Kabardin-Balkar scientific center of RAS, no. 3 (2019), pp. 3-14.

Voir la notice de l'article provenant de la source Math-Net.Ru

In this paper, the generalized architecture used in almost all modern systems of automatic speech recognition is analyzed. The necessity of developing a fundamentally new approach to solving speech recognition problems is outlined. A formal description of the structure of the speech perception act is proposed for use as a general theoretical basis in the development of universal automatic speech recognition systems that are highly effective in conditions of high noise and “cocktail party” situations. The general structural dynamics of the speech recognition process has been developed, which allows to take into account the linguistic and extra-linguistic aspects of a speech message. The concept of an articulation event as a minimal basic pattern of sound image recognition has been proposed. The recognition process is structured based on the functional determinants of the situation. The need to analyze the numerous sources of information accompanying the sound message, the rejection of the search for an invariant here is of fundamental nature. Multi-agent systems were chosen as the formal means for implementation. Multi-agent approach allows to differentiate and analyze sounds of different nature. This makes the proposed model unique and gives it advantages in the so-called “cocktail party” situation, as well as in tasks where the noise level is extremely high.
Keywords: artificial intellect, multi-agent systems, speech recognition, artificial neural networks.
@article{IZKAB_2019_3_a0,
     author = {Z. V. Nagoev and I. A. Gurtueva},
     title = {The basic elements for cognitive model of speech perception mechanism on the base of multi-agent recursive intellect},
     journal = {News of the Kabardin-Balkar scientific center of RAS},
     pages = {3--14},
     publisher = {mathdoc},
     number = {3},
     year = {2019},
     language = {ru},
     url = {http://geodesic.mathdoc.fr/item/IZKAB_2019_3_a0/}
}
TY  - JOUR
AU  - Z. V. Nagoev
AU  - I. A. Gurtueva
TI  - The basic elements for cognitive model of speech perception mechanism on the base of multi-agent recursive intellect
JO  - News of the Kabardin-Balkar scientific center of RAS
PY  - 2019
SP  - 3
EP  - 14
IS  - 3
PB  - mathdoc
UR  - http://geodesic.mathdoc.fr/item/IZKAB_2019_3_a0/
LA  - ru
ID  - IZKAB_2019_3_a0
ER  - 
%0 Journal Article
%A Z. V. Nagoev
%A I. A. Gurtueva
%T The basic elements for cognitive model of speech perception mechanism on the base of multi-agent recursive intellect
%J News of the Kabardin-Balkar scientific center of RAS
%D 2019
%P 3-14
%N 3
%I mathdoc
%U http://geodesic.mathdoc.fr/item/IZKAB_2019_3_a0/
%G ru
%F IZKAB_2019_3_a0
Z. V. Nagoev; I. A. Gurtueva. The basic elements for cognitive model of speech perception mechanism on the base of multi-agent recursive intellect. News of the Kabardin-Balkar scientific center of RAS, no. 3 (2019), pp. 3-14. http://geodesic.mathdoc.fr/item/IZKAB_2019_3_a0/

[1] O. Abdel-Hamid, A. Mohamed, H. Jiang, G. Penn, “Applying convolutional neural networks concepts to hybrid NN-HMM model for speech recognition”, Acoust., Speech, Signal Process, Proc. IEEE Int. Conf., 2012, 4277–4280

[2] N. A. Chomsky, “A Review of Skinner?s Verbal Behavior, [Readings in the Psychology of Language]”, Prentice-Hall, Upper Saddle River, New Jersey, 1967, 636 pp.

[3] A. Coates, A. Y. Ng, “Learning feature representations with K-means”, Neural Networks: Tricks of the Trade, 2012, 561–580

[4] W. De Mulder, S. Bethard, M. F. Moens, “A Survey on the Application of Recurrent Neural Networks to Statistical Language Modeling”, Computer Speech and Language, 2015, no. 30 (1), 61–98 | DOI

[5] F. De Saussure, Kurs obshei lingvistiki [Course in General Linguistics], Izdatel'stvo Ural'skogo Universiteta, Yekaterinburg, 1999, 256 pp.

[6] L. Deng, X. Li, IEEE Transactions on Audio, Speech, and Language Processing (21 (5)), 2013

[7] M. S. Gazzaniga, Conversations in the Cognitive Neuroscience, The MIT Press, Cambridge, 1996, 752 pp.

[8] W. Ghai, N. Singh, “Literature Review on Automatic Speech Recognition”, International Journal of Computer Applications, 2012, no. 41 (8), 42–50 | DOI

[9] O. Ghitza, “Auditory nerve representation as a front-end for speech recognition in a noisy environment”, Computer Speech and Language, 1 (1986), 109–130 | DOI

[10] V. Gupta, “A Survey of Natural Language Processing Techniques”, International Journal of Computer Science Engineering Technology, 2014, no. 5 (1), 14–16

[11] P. Haikonen, The Cognitive Approach to Conscious Machines, imprint Academic, Exeter, UK, 2003, 300 pp.

[12] G. Hinton, L. Deng, D. Yu et al., IEEE Signal Process. Mag (29 (6)), 2012 | MR

[13] B. H. Juan, “Speech Recognition in Adverse Environments”, Computer Speech and Language, 5 (1991), 275–294 | DOI

[14] D. Jurafsky, J. Martin, Speech and Language Processing: An introduction to natural language processing, computational linguistics, and speech recognition, Prentice Hall, Boston, 2008, 1032 pp.

[15] I. Kotseruba, J. K. Tsotsos, A Review of 40 Years of Cognitive Architecture Research: Core Cognitive Abilities and Practical Applications, 2016, arXiv: abs/1610.08602

[16] I. L. Mazurenko, “Komputrenye sistemy raspoznavaniya rechi [Computer Speech Recognition Systems]”, Intellektualnye sistemy [Intellectual Systems], 3:1-2 (1998), 117–134

[17] M. Minsky, The Society of Mind, Simon and Shuster, New York, 1988, 336 pp.

[18] A. Mohamed, G. Dahl, G. Hinton, IEEE Audio, Speech, Lang. Process (20 (1)), 2012

[19] V. P. Morozov, I. A. Vartanyan, V. I. Galunov, Vospriyatiye Rechi: voprosy funktsional?noi asimmetrii mozga [Speech Perception: Issues of functionalbrain asymmetry], Nauka, Leningrad, 1988, 135 pp.

[20] Z. V. Nagoev, Intellektika ili Myshlenie v zhivych i iskusstvennych sistemach [Intellectics or Thinking in Living and Artificial Systems], Izdatel'stvo KBNC RAN, Nalchik, 2013, 232 pp.

[21] Z. V. Nagoev, O. V. Nagoeva, “Izvlechenie znanii iz mnogomodal'nyh potokov nestrukturirovannyh dannyh na osnove samoorganizatsii mul'tiagentnoi kognitivnoi arhitektury mobil'nogo robota [Knowledge Extraction from Multimodal Streams of Unstructured Data on the Base of Self-Organization of Multi-Agent Cognitive Architecture for Mobile Robot]”, Izvestia KBNC RAN [News of KBSC of RAS], 2015, no. 6 (68), 73–85

[22] Z. V. Nagoev, O. V. Nagoeva, “Zritel'nyi analizator intellektual'nogo robota dlya obrabotki nestrukturirovannyh dannyh na osnove mul'tiagentnoi neirocognitivnoi arhitektury [Visual Analyzer of Intellectual Robot for Unstructured Data Processing on the Base of Multi-agent Neurocognitive Architechture]”, Perspektivnye sistemy izadachi upravleniya [AdvancedSystems and Management Tasks], Materialy vserossiiskoi nauchno-prakticheskoi konferencii [Proceedings of the 12th All-Russia Conference] (Rostov-on-Don), 2017, 457–467

[23] Z. V. Nagoev, V. A. Denisenko, L. A. Lyutikova, “Sistema obucheniya avtonomnogo sel'skohozyaistvennogo robota raspoznavaniyu staticheskih izobrazhenii na osnove multiagentnyh kognitivnyh arhitektur [Learning System of Autonomous Agricultural Robot for Static Images Recognition on the Base of Multi-Agent Cognitive Architectures]”, Ustoichivoie razvitie gornyh territorii [Sustainable Development of Mountain Territories], 2018, no. 2, 289–297

[24] Z. Nagoev, L. Lyutikova, I. Gurtueva, “Model for Automatic Speech Recognition Using Multi-Agent Recursive Cognitive Architecture” (Prague, Chech Republic), Annual International Conference on Biologically Inspired Cognitive Architectures BICA,, 2018, | DOI

[25] A. Newell, Unified Theories of Cognition, Harvard University Press, Cambridge, Massachusetts, 1990, 576 pp.

[26] L. R. Rabiner, R. W. Schafer, “Tsifrovaya Obrabotka Rechevyh Signalov [Digital Processing ofSpeech Signals]”, Radio and communications, Moscow, 1981, 496 pp.

[27] R. Reddy, Proceedings of the IEEE (64 (4)), 1976

[28] A. L. Ronzhin, A. A. Karpov, I. A. Kagirov, “Osobennosti distantsionnoi zapisi i obrabotki rechi v avtomatah samoobsluzhivaniya [Peculiarities of Remote Recording and Speech Processing in Self-Service Machines]”, Informatsionno-upravlyayushie sistemy [Information and Control Systems], 2009, no. 5, 32–38

[29] D. H. Schunk, Learning Theories: An Educational Perspective, Pearson Merrill Prentice Hall, Boston, 2011, 576 pp.

[30] B. D. Van Veen, K. M. Buckley, IEEE ASSP Magazine (5 (2)), M., 1988

[31] A. Waibel, K. F. Lee, Readings in Speech Recognition,, Morgan Kaufman, Berlington, 1990, 680 pp.

[32] M. Wooldridge, An Introduction to Multi-Agent Systems, Wiley,Hoboken, 2009, 366 pp.

[33] L. R. Zinder, “Obshaya Fonetika [General Phonetics]”, Vysshaya Shkola, Moscow, 1979, 312 pp.

[34] E. M. ZionGolumbic, N. Ding, S.et al. Bickel, “Mechanisms underlying selective neuronal tracking of attended speech at a «cocktail party»”, Neuron, 77(5) (2013), 980–991 | DOI