Reinforcement learning of a spiking neural network in the task of control of an agent in a virtual discrete environment
Russian journal of nonlinear dynamics, Tome 7 (2011) no. 4, pp. 859-875.

Voir la notice de l'article provenant de la source Math-Net.Ru

Method of reinforcement learning of spiking neural network that controls robot or virtual agent is described. Using spiking neurons as key elements of a network allows us to exploit spatial and temporal structure of input sensory information. Teaching of the network is implemented with a use of reinforcement signals that come from the external environment and reflect the success of agents recent actions. A maximization of the received reinforcement is done via modulated minimization of neurons informational entropy that depends on neurons weights. The laws of weights changes were close to modulated synaptic plasticity that was observed in real neurons. Reinforcement learning algorithm was tested on a task of a resource search in a virtual discrete environment.
Keywords: spiking neuron, adaptive control, reinforcement learning, informational entropy.
@article{ND_2011_7_4_a8,
     author = {O. Yu. Sinyavskiy and A. I. Kobrin},
     title = {Reinforcement learning of a spiking neural network in the task of control of an agent in a virtual discrete environment},
     journal = {Russian journal of nonlinear dynamics},
     pages = {859--875},
     publisher = {mathdoc},
     volume = {7},
     number = {4},
     year = {2011},
     language = {ru},
     url = {http://geodesic.mathdoc.fr/item/ND_2011_7_4_a8/}
}
TY  - JOUR
AU  - O. Yu. Sinyavskiy
AU  - A. I. Kobrin
TI  - Reinforcement learning of a spiking neural network in the task of control of an agent in a virtual discrete environment
JO  - Russian journal of nonlinear dynamics
PY  - 2011
SP  - 859
EP  - 875
VL  - 7
IS  - 4
PB  - mathdoc
UR  - http://geodesic.mathdoc.fr/item/ND_2011_7_4_a8/
LA  - ru
ID  - ND_2011_7_4_a8
ER  - 
%0 Journal Article
%A O. Yu. Sinyavskiy
%A A. I. Kobrin
%T Reinforcement learning of a spiking neural network in the task of control of an agent in a virtual discrete environment
%J Russian journal of nonlinear dynamics
%D 2011
%P 859-875
%V 7
%N 4
%I mathdoc
%U http://geodesic.mathdoc.fr/item/ND_2011_7_4_a8/
%G ru
%F ND_2011_7_4_a8
O. Yu. Sinyavskiy; A. I. Kobrin. Reinforcement learning of a spiking neural network in the task of control of an agent in a virtual discrete environment. Russian journal of nonlinear dynamics, Tome 7 (2011) no. 4, pp. 859-875. http://geodesic.mathdoc.fr/item/ND_2011_7_4_a8/

[1] Nicholls J. G., Martin A. R., Wallace B. G., Fuchs P. A., From neuron to brain: A cellular and molecular approach to the function of the nervous system, 4th ed., Sinauer Associates, Sunderland, MA, 2001, 679 pp.; Nikolls Dzh. G., Martin A. R., Vallas B. Dzh., Fuks P. A., Ot neirona k mozgu, URSS, M., 2003, 688 pp.

[2] Gerstner W., Kistler W. M., Spiking neuron models: Single neurons, populations, plasticity, Cambridge Univ. Press, Cambridge, 2002, 494 pp. | MR | Zbl

[3] Melamed O., Gerstner W., Maass W., Tsodyks M., Markram H., “Coding and learning of behavioral sequences”, Trends in Neurosciences, 27:1 (2004), 11–14 | DOI

[4] Maas W., “Networks of spiking neurons: The third generation of neural network models”, Trans. Soc. Comput. Simul. Int., 14:4 (1997), 1659–1671

[5] Rieke F., Warland D., de Ruyter van Steveninck R., Bialek W., Spikes: Exploring the neural code, Computational Neurosciences series, MIT Press, Cambridge, MA, 1997, 395 pp. | MR

[6] Di Paolo E. A., “Spike-timing dependent plasticity for evolved robots”, Adaptive Behavior, 10:3–4 (2002), 243–263 | DOI

[7] Saggie K., Keinan A., Ruppin E., “Solving a delayed response task with spiking and McCulloch–Pitts agents”, Advances in Artificial Life, Proc. of the 7th European Conf. on Artificial Life (ECAL) (Dortmund, Germany, 2003), eds. W. Banzhaf, T. Christaller, P. Dittrich, J. T. Kim, J. Ziegler, Springer, Berlin–Heidelberg, 2003, 199–208 | DOI

[8] Antonelo E. A., Schrauwen B., Stroobandt D., “Mobile robot control in the road sign problem using Reservoir Computing networks”, IEEE Internat. Conf. on Robotics and Automation (ICRA) (Pasadena, CA, 2008), eds. S. Hutchinson et al., Pasadena, CA, 2008, 911–916 | DOI

[9] Queiroz M. S., Braga A., Berredo R. C., “Reinforcement learning of a simple control task using the spike response model”, Neurocomputing, 70:1–3 (2006), 14–20 | DOI

[10] Lee K., Kwon D.-S., “Synaptic plasticity model of a spiking neural network for reinforcement learning”, Neurocomputing, 71:13 (2008), 3037–3043 | DOI

[11] Florian R. V., “A reinforcement learning algorithm for spiking neural networks”, SYNASC'05, Proc. of the 7th Internat. Symp. on Symbolic and Numeric Algorithms for Scientific Computing (Timisoara, Romania, 2005), Timisoara, 2005, 299–306

[12] Florian R. V., “Spiking neural controllers for pushing objects around”, From Animals to Animats 9 (SAB'06), Proc. of the 9th Internat. Conf. on the Simulation of Adaptive Behavior (Rome, Italy, 2006), Lecture Notes in Artificial Intelligence, 4095, eds. S. Nolfi, G. Baldassare, R. Calabretta, J. Hallam, D. Marocco, O. Miglino, J.-A. Meyer, D. Parisi, Springer, Berlin–Heidelberg, 2006, 570–581

[13] Burgsteiner H., “Training networks of biological realistic spiking neurons for real-time robot control”, Proc. of the 9th Internat. Conf. on Engineering Applications of Neural Networks (Lille, France, 2005), Lille, 2005, 129–136

[14] Floreano D., Zufferey J.-C., Mattiussi C., “Evolving spiking neurons from wheels to wings”, Dynamic Systems Approach for Embodiment and Sociality, 6 (2003), 65–70

[15] Wiles J., Ball D., Heath S., Nolan C., Stratton P., “Spike-time robotics: A rapid response circuit for a robot that seeks temporally varying stimuli”, Australian J. of Intelligent Information Processing Systems, 11:1 (2010), 10 pp.

[16] Damper R. I., French R. L. B., Scutt T. W., “ARBIB: An autonomous robot based on inspirations from biology”, Robotics and Autonomous Systems, 31:4 (1998), 247–274 | DOI

[17] Alnajjar F., Murase K., “A simple Aplysia-like spiking neural network to generate adaptive behavior in autonomous robots”, Adaptive Behavior, 16:5 (2008), 306–324 | DOI

[18] Soula H., Alwan A., Beslon G., “Learning at the edge of chaos: Temporal coupling of spiking neurons controller for autonomous robotic”, Proc. of American Association for Artificial Intelligence (AAAI) Spring Symposia on Developmental Robotic (Stanford, CA, 2005), eds. D. Bank, L. Meeden, AAAI Press, Menlo Park, CA, 2005, 6 pp.

[19] Nolfi S., Floreano D., “Synthesis of autonomous robots through evolution”, Trends in Cognitive Sciences, 6:1 (2002), 31–37 | DOI

[20] Joshi P., Maass W., “Movement generation with circuits of spiking neurons”, Neural Computation, 17:8 (2005), 1715–1738 | DOI | Zbl

[21] Carrillo R., Ros E., Boucheny C., Coenen O. J.-M. D., “A real-time spiking cerebellum model for learning robot control”, Biosystems, 94:1–2 (2008), 18–27 | DOI

[22] Boucheny Ch., Carrillo R., Ros E., Coenen O. J.-M. D., “Real-time spiking neural network: An adaptive cerebellar model”, Proc. of the 8th Internat. Work-Conf. on Artificial Neural Networks, Computational Intelligence and Bioinspired Systems, Lecture Notes in Computer Science, 3512, eds. J. Cabestany, A. Prieto, F. Sandoval Hernández, Springer, Berlin–Heidelberg, 2005, 136–144 | DOI

[23] Manoonpong P., Woegoetter F., Pasemann F., “Biological inspiration for mechanical design and control of autonomous walking robots: Towards life-like robots”, The International Journal of Applied Biomedical Engineering (IJABME), 3:1 (2010), 1–12

[24] Maass W., Natschlager T., Markram H., “Real-time computing without stable states: A new framework for neural computation based on perturbations”, Neural Computations, 14:11 (2002), 2531–2560 | DOI | MR | Zbl

[25] Sutton R. S., Barto A. G., Reinforcement learning: An introduction, MIT Press, Cambridge, MA, 1998, 323 pp.

[26] Baxter J., Weaver L., Bartlett P. L., Direct gradient-based reinforcement learning: II. Gradient ascent algorithms and experiments, Technical report, Australian National University, Research School of Information Sciences and Engineering, 1999

[27] Bellman R., “A Markovian decision process”, J. Math. Mech., 6 (1957), 679–684 | MR | Zbl

[28] Farries M. A., Fairhall A. L., “Reinforcement learning with modulated spike timing-dependent synaptic plasticity”, Neurophysiol., 98 (2007), 3648–3665 | DOI

[29] Baras D., Meir R., “Reinforcement learning, spike-time-dependent plasticity and the BCM rule”, Neural Computation, 19:8 (2007), 2245–2279 | DOI | MR | Zbl

[30] Sinyavskii O. Yu., Kobrin A. I., “Ispolzovanie informatsionnykh kharakteristik potoka impulsnykh signalov dlya obucheniya spaikovykh neironnykh setei”, Integrirovannye modeli i myagkie vychisleniya v iskusstvennom intellekte, Sb. nauchn. tr. (Kolomna, 2009), v. 2, M., 2009, 678–687

[31] Levine M. W., Shefner J. M., Fundamentals of sensation and perception, 2nd ed., Brooks/Cole, Pacific Grove, CA, 1991, 675 pp.

[32] Rejeb L., Guessoum Z. and M'Hallah R., “An adaptive approach for the exploration-exploitation dilemma for learning agents”, Multi-Agent Systems and Applications IV, 4th Internat. Central and Eastern European Conf. on Multi-Agent Systems (Budapest, Hungary, 2005), Lecture Notes in Comput. Sci., 3690, eds. M. Pechoucek, P. Petta, L. Zsolt Varga, Springer, Berlin, 2005, 316–325 | DOI

[33] Pfister J. P., Toyoizumi T., Barber D., Gerstner W., “Optimal spike-timing-dependent plasticity for precise action potential firing in supervised learning”, Neural Comput., 18:6 (2006), 1318–1348 | DOI | MR | Zbl

[34] Sinyavskii O. Yu., Kobrin A. I., “Obuchenie spaikovogo neirona s uchitelem v zadache detektirovaniya prostranstvenno-vremennogo impulsnogo patterna”, Neirokompyutery: razrabotka i primenenie, 8 (2010), 69–76 | Zbl

[35] Bartlett P. L., Baxter J., A biologically plausible and locally optimal learning algorithm for spiking neurons, 2000 http://arp.anu.edu.au/ftp/papers/jon/brains.pdf.gz

[36] Bi G. Q., Poo M. M., “Synaptic modifications in cultured hippocampal neurons: Dependence on spike timing, synaptic strength, and postsynaptic cell type”, The Journal of Neuroscience, 18:24 (1998), 10464–10472

[37] Legenstein R., Pecevski D., Maass W., “A learning theory for reward-modulated spike-timing-dependent plasticity with application to biofeedback”, PLoS Comput. Biol., 4:10 (2008), e1000180 | DOI | MR

[38] Izhikevich E. M., “Solving the distal reward problem through linkage of STDP and dopamine signaling”, Cerebral Cortex, 17 (2007), 2443–2452 | DOI

[39] Frémaux N., Sprekeler H., Gerstner W., “Functional requirements for reward-modulated spike-timing-dependent plasticity”, The Journal of Neuroscience, 30:40 (2010), 13326–13337 | DOI