Short-Term Memory Mechanisms in the Goal-Directed Behavior of the Neural Network Agents

K. V. Lakhman; M. S. Burtsev

Geodesic

Parcourir par

Short-Term Memory Mechanisms in the Goal-Directed Behavior of the Neural Network Agents

K. V. Lakhman ; M. S. Burtsev

Matematičeskaâ biologiâ i bioinformatika, Tome 8 (2013), pp. 419-431.

Voir la notice de l'article provenant de la source Math-Net.Ru

Résumé

Modern machine learning methods are not able to achieve level of adaptability comparable with one that observed in the animals’ behavior in complex environments with numerous goals. This fact necessitates the investigation of general principles for the formation of complex control systems able to provide effective goal-directed behavior. We have developed original neuroevolutionary model for the agents situated in stochastic environments with hierarchy of goals. The paper provides the analysis of the evolutionary dynamics of agents’ behavioral strategies. Analysis’s results demonstrate that evolution results in neural network controllers that allow agents to store information in short-term memory via several neurodynamical mechanisms and use it for behavior based on alternative actions. During the study of neuronal basics of the agents’ behavior we found that neurons’ groups could be responsible for different stages of behavior.

Export
Comment citer

@article{MBB_2013_8_a8,
     author = {K. V. Lakhman and M. S. Burtsev},
     title = {Short-Term {Memory} {Mechanisms} in the {Goal-Directed} {Behavior} of the {Neural} {Network} {Agents}},
     journal = {Matemati\v{c}eska\^a biologi\^a i bioinformatika},
     pages = {419--431},
     publisher = {mathdoc},
     volume = {8},
     year = {2013},
     language = {ru},
     url = {http://geodesic.mathdoc.fr/item/MBB_2013_8_a8/}
}

TY  - JOUR
AU  - K. V. Lakhman
AU  - M. S. Burtsev
TI  - Short-Term Memory Mechanisms in the Goal-Directed Behavior of the Neural Network Agents
JO  - Matematičeskaâ biologiâ i bioinformatika
PY  - 2013
SP  - 419
EP  - 431
VL  - 8
PB  - mathdoc
UR  - http://geodesic.mathdoc.fr/item/MBB_2013_8_a8/
LA  - ru
ID  - MBB_2013_8_a8
ER  -

%0 Journal Article
%A K. V. Lakhman
%A M. S. Burtsev
%T Short-Term Memory Mechanisms in the Goal-Directed Behavior of the Neural Network Agents
%J Matematičeskaâ biologiâ i bioinformatika
%D 2013
%P 419-431
%V 8
%I mathdoc
%U http://geodesic.mathdoc.fr/item/MBB_2013_8_a8/
%G ru
%F MBB_2013_8_a8

K. V. Lakhman; M. S. Burtsev. Short-Term Memory Mechanisms in the Goal-Directed Behavior of the Neural Network Agents. Matematičeskaâ biologiâ i bioinformatika, Tome 8 (2013), pp. 419-431. http://geodesic.mathdoc.fr/item/MBB_2013_8_a8/

Bibliographie
Cité par

[1] Botvinick M. M., Niv Y., Barto A. C., “Hierarchically organized behavior and its neural foundations: A reinforcement learning perspective”, Cognition, 113:3 (2009), 262–280 <ext-link ext-link-type='doi' href='https://doi.org/10.1016/j.cognition.2008.08.011'>10.1016/j.cognition.2008.08.011</ext-link>

[2] Sutton R. S., Barto A. G., Reinforcement Learning: An Introduction, MIT Press, 1998

[3] Sutton R. S., Precup D., Singh S., “Etween MDPs and semi-MDPs: A framework for temporal abstraction in reinforcement learning”, Artificial Intelligence, 112 (1999), 181–211 <ext-link ext-link-type='doi' href='https://doi.org/10.1016/S0004-3702(99)00052-1'>10.1016/S0004-3702(99)00052-1</ext-link><ext-link ext-link-type='mr-item-id' href='http://mathscinet.ams.org/mathscinet-getitem?mr=1716644'>1716644</ext-link><ext-link ext-link-type='zbl-item-id' href='https://zbmath.org/?q=an:0996.68151'>0996.68151</ext-link>

[4] Sutton R. S., Rafols E. J., Koop A., “Temporal abstraction in temporal-difference networks”, Proceedings of NIPS-18, MIT Press, 2006, 1313–1320

[5] Sutton R. S., Modayil J., Delp M., Degris T., Pilarski P. M., White A., Precup D., “Horde: a scalable real-time architecture for learning knowledge from unsupervised sensorimotor interaction”, The 10th International Conference on Autonomous Agents and Multiagent Systems, v. 2, International Foundation for Autonomous Agents and Multiagent Systems, 2011, 761–768

[6] Barto A. G., Mahadevan S., “Recent advances in hierarchical reinforcement learning”, Discrete Event Dynamic Systems, 13:1–2 (2003), 41–77 <ext-link ext-link-type='doi' href='https://doi.org/10.1023/A:1022140919877'>10.1023/A:1022140919877</ext-link><ext-link ext-link-type='mr-item-id' href='http://mathscinet.ams.org/mathscinet-getitem?mr=1972050'>1972050</ext-link><ext-link ext-link-type='zbl-item-id' href='https://zbmath.org/?q=an:1018.93035'>1018.93035</ext-link>

[7] Satinder S., Lewis R. L., Barto A. G., Where do rewards come from?, Proceedings of the 31st Annual Meeting of the Cognitive Science Society, Cognitive Science Society, 2009, 2601–2606

[8] Sandamirskaya Y., Schöner G., “An embodied account of serial order: How instabilities drive sequence generation”, Neural Networks, 23:10 (2010), 1164–1179 <ext-link ext-link-type='doi' href='https://doi.org/10.1016/j.neunet.2010.07.012'>10.1016/j.neunet.2010.07.012</ext-link>

[9] Komarov M. A., Osipov G. V., Burtsev M. S., “Adaptive functional systems: Learning with chaos”, Chaos, 20:4 (2010), 045119 <ext-link ext-link-type='doi' href='https://doi.org/10.1063/1.3521250'>10.1063/1.3521250</ext-link>

[10] Floreano D., Mondada F., “Automatic creation of an autonomous agent: genetic evolution of a neural-network driven robot”, From animals to animats 3, Proceedings of the third international conference on Simulation of adaptive behavior, MIT Press, 1994, 421–430

[11] Floreano D., Dürr P., Mattiussi C., “Neuroevolution: from architectures to learning”, Evolutionary Intelligence, 1 (2008), 47–62 <ext-link ext-link-type='doi' href='https://doi.org/10.1007/s12065-007-0002-4'>10.1007/s12065-007-0002-4</ext-link>

[12] Schrum J., Miikkulainen R., “Evolving multimodal networks for multitask games”, IEEE Transactions on Computational Intelligence and AI in Games, 4:2 (2012), 94–111 <ext-link ext-link-type='doi' href='https://doi.org/10.1109/TCIAIG.2012.2193399'>10.1109/TCIAIG.2012.2193399</ext-link>

[13] Kaelbling L. P., Littman M. L., Moore A. W., “Reinforcement learning: a survey”, Journal of Artificial Intelligence Research, 4 (1996), 237–285

[14] Hochreiter S., Informatik F. F., Bengio Y., Frasconi P., Schmidhuber J., “Gradient flow in recurrent nets: the difficulty of learning long-term dependencies”, Field Guide to Dynamical Recurrent Networks, eds. Kolen J., Kremer S., IEEE Press, 2001

[15] Botvinick M. M., Plaut D. C., “Short-term memory for serial order: A recurrent neural network model”, Psychological Review, 113 (2006), 201–233 <ext-link ext-link-type='doi' href='https://doi.org/10.1037/0033-295X.113.2.201'>10.1037/0033-295X.113.2.201</ext-link>

[16] Grossberg S., “Contour enhancement, short term memory, and constancies in reverberating neural networks”, Studies in Applied Mathematics, 52:3 (1973), 213–257 <ext-link ext-link-type='mr-item-id' href='http://mathscinet.ams.org/mathscinet-getitem?mr=359862'>359862</ext-link><ext-link ext-link-type='zbl-item-id' href='https://zbmath.org/?q=an:0281.92005'>0281.92005</ext-link>

[17] Anokhin P., Biology and Neurophysiology of the Conditioned Reflex and Its Role in Adaptive Behavior, Pergamon Press, 1974

[18] Edelman G., Neural Darwinism: The Theory of Neuronal Group Selection, Basic Books, 1987

[19] Taylor J. S., Raes J., “Duplication and divergence: the evolution of new genes and old ideas”, Annual Review of Genetics, 38 (2004), 615–643 <ext-link ext-link-type='doi' href='https://doi.org/10.1146/annurev.genet.38.072902.092831'>10.1146/annurev.genet.38.072902.092831</ext-link>

[20] Stanley K. O., Miikkulainen R., “Evolving neural networks through augmenting topologies”, Evolutionary Computation, 10:2 (2002), 99–127 <ext-link ext-link-type='doi' href='https://doi.org/10.1162/106365602320169811'>10.1162/106365602320169811</ext-link>