Reinforcement learning for manipulator control
Russian journal of nonlinear dynamics, Tome 8 (2012) no. 4, pp. 689-704.

Voir la notice de l'article provenant de la source Math-Net.Ru

We present method for constructing manipulator control system with reinforcement learning algorithm. We construct learning algorithm which uses information about performed actions and their quality with respect to desired behaviour called «reward». The goal of the learning algorithm is to construct control system maximizing total reward. Learning algorithm and constructed control system were tested on the manipulator collision avoidance problem.
Keywords: reinforcement learning, manipulator, control, Newton–Euler algorithm.
@article{ND_2012_8_4_a1,
     author = {Nataly P. Koshmanova and Dmitry S. Trifonov and Vladimir E. Pavlovsky},
     title = {Reinforcement learning for manipulator control},
     journal = {Russian journal of nonlinear dynamics},
     pages = {689--704},
     publisher = {mathdoc},
     volume = {8},
     number = {4},
     year = {2012},
     language = {ru},
     url = {http://geodesic.mathdoc.fr/item/ND_2012_8_4_a1/}
}
TY  - JOUR
AU  - Nataly P. Koshmanova
AU  - Dmitry S. Trifonov
AU  - Vladimir E. Pavlovsky
TI  - Reinforcement learning for manipulator control
JO  - Russian journal of nonlinear dynamics
PY  - 2012
SP  - 689
EP  - 704
VL  - 8
IS  - 4
PB  - mathdoc
UR  - http://geodesic.mathdoc.fr/item/ND_2012_8_4_a1/
LA  - ru
ID  - ND_2012_8_4_a1
ER  - 
%0 Journal Article
%A Nataly P. Koshmanova
%A Dmitry S. Trifonov
%A Vladimir E. Pavlovsky
%T Reinforcement learning for manipulator control
%J Russian journal of nonlinear dynamics
%D 2012
%P 689-704
%V 8
%N 4
%I mathdoc
%U http://geodesic.mathdoc.fr/item/ND_2012_8_4_a1/
%G ru
%F ND_2012_8_4_a1
Nataly P. Koshmanova; Dmitry S. Trifonov; Vladimir E. Pavlovsky. Reinforcement learning for manipulator control. Russian journal of nonlinear dynamics, Tome 8 (2012) no. 4, pp. 689-704. http://geodesic.mathdoc.fr/item/ND_2012_8_4_a1/

[1] Yurevich E. I., Osnovy robototekhniki, 2-e izd., BKhV-Peterburg, SPb., 2005, 7–207

[2] Spong M. W., Hutchinson S., Vidyasagar M., Robot modeling and control, Wiley, New York, 2005, 1–328

[3] Craig J. J., Introduction to robotics: Mechanics and control, 3rd ed., Addison-Wesley, Reading, MA, 1986, 19–256

[4] Buss R., “Introduction to inverse kinematics with Jacobian transpose, pseudoinverse, and damped least squares methods”, IEEE J. Robot. Autom., 3 (2004), 681–685

[5] Khatib O., “A unified approach for motion and force control of robot manipulators: The operational space formulation”, IEEE J. Robot. Autom., 3 (1987), 43–53 | DOI

[6] Khatib O., Yokoi K., Chang K., Ruspini D., Holmberg R., Casal A., Baader A., “Force strategies for cooperative tasks in multiple mobile manipulation systems”, Proc. of the Internat. Symposium on Robotics Research (Herrsching, Germany, 1995), eds. G. Giralt, G. Hirzinger, 333–342

[7] Stückler J., Behnke S., “Compliant task-space control with back-drivable servo actuators”, Proc. of RoboCup International Symposium (Istanbul, Turkey, 2011), 78–98

[8] Sutton R., Barto A., Reinforcement learning: An Introduction, MIT Press, Cambridge, MA, 1998, 322 pp.

[9] Peters J., Schaal S., “Natural actor-critic”, Neurocomputing, 71 (2008), 1180–1190 | DOI

[10] van Hasselt H., Wiering A., “Reinforcement learning in continuous action spaces”, IEEE Internat. Symp. on Approximate Dynamic Programming and Reinforcement Learning, 2007, 272–279

[11] Peters J., Vijayakumar S., Schaal S., “Reinforcement learning for humanoid robotics”, 3rd IEEE-RAS Internat. Conf. on Humanoid Robots (Karlsruhe, Germany, 2003), 1–20

[12] Peters J., Schaal S., “Reinforcement learning of motor skills with policy gradients”, Neural Netw., 21 (2008), 682–697 | DOI

[13] Girgin S., Preux P., “Basis expansion in natural actor critic methods”, Recent Advances in Reinforcement Learning, Proc. of the 8th European Workshop (Villeneuve d'Ascq, France, June 30–July 3, 2008), 110–123

[14] Schaal S., “Learning robot control”, The handbook of brain theory and neural networks, 2nd ed., ed. M. A. Arbib, MIT Press, Cambridge, MA, 2003, 983–987

[15] Vengerov D., “A gradient-based reinforcement learning approach to dynamic pricing in partially-observable environments”, Future Generation Computer Systems, 24:7 (2008), 687–693 | DOI

[16] Schaal S., Peters J., Nakanishi J., Ijspeert A., “Learning movement primitives”, Robotics Research, 15 (2004), 561–572

[17] Ito M., Noda K., Hoshino Y., Tani J., “Dynamic and interactive generation of object handling behaviours by a small humanoid robot using a dynamic neural network model”, Neural Netw., 9:3 (2006), 323–337 | DOI

[18] Ude A., “Trajectory generation from noisy positions of object features for teaching robot paths”, Robotics and Autonomous Systems, 11 (1994), 113–127 | DOI

[19] Tso S., Liu K., “Hidden Markov model for intelligent extraction of robot trajectory command from demonstrated trajectories”, Proc. IEEE Internat. Conf. on Industrial Technology (ICIT), 1996, 294–298

[20] Yang J., Xu Y., Chen C., “Human action learning via hidden Markov model”, IEEE Trans. Syst. Man Cybernet., 27:1 (1997), 34–44 | DOI