Learn to Human-level Control in Dynamic Environment Using Incremental Batch Interrupting Temporal Abstraction
Computer Science and Information Systems, Tome 13 (2016) no. 2.

Voir la notice de l'article provenant de la source Computer Science and Information Systems website

The temporal world is characterized by dynamic and variance. A lot of machine learning algorithms are difficult to be applied to practical control applications directly, while hierarchical reinforcement learning can be used to deal with them. Meanwhile, it is a commonplace to have some partial solutions available, called options, which are learned from knowledge or predefined by the system, to solve sub-tasks of the problem. The option can be reused for policy determination in control. Many traditional semi-Markov decision process methods take advantage of it. But most of them treat the option as a primitive object. However, due to the uncertainty and variability of the environment, they are unable to deal with real world control problems effectively. Based on the idea of interrupting option under the prerequisite for dynamic environment, a Q-learning control method which uses temporal abstraction, named as I-QOption, is introduced. The I-QOption approach combines the idea of interruption with the characteristics of dynamic environment so as to be able to learn and improve control policy in dynamic environment. The Q-learning framework helps to learn from interaction with raw data and achieving human-level control. The I-QOption algorithm is applied to grid world, a benchmark dynamic environment evaluation testing. The experiment results show that the proposed algorithm can learn and improve policy effectively in dynamic environment.
Keywords: hierarchical reinforcement learning, option, reinforcement learning, online learning, dynamic environment
@article{CSIS_2016_13_2_a14,
     author = {Yuchen Fu and Zhipeng Xu and Fei Zhu and Quan Liu and Xiaoke Zhou},
     title = {Learn to {Human-level} {Control} in {Dynamic} {Environment} {Using} {Incremental} {Batch} {Interrupting} {Temporal} {Abstraction}},
     journal = {Computer Science and Information Systems},
     publisher = {mathdoc},
     volume = {13},
     number = {2},
     year = {2016},
     url = {http://geodesic.mathdoc.fr/item/CSIS_2016_13_2_a14/}
}
TY  - JOUR
AU  - Yuchen Fu
AU  - Zhipeng Xu
AU  - Fei Zhu
AU  - Quan Liu
AU  - Xiaoke Zhou
TI  - Learn to Human-level Control in Dynamic Environment Using Incremental Batch Interrupting Temporal Abstraction
JO  - Computer Science and Information Systems
PY  - 2016
VL  - 13
IS  - 2
PB  - mathdoc
UR  - http://geodesic.mathdoc.fr/item/CSIS_2016_13_2_a14/
ID  - CSIS_2016_13_2_a14
ER  - 
%0 Journal Article
%A Yuchen Fu
%A Zhipeng Xu
%A Fei Zhu
%A Quan Liu
%A Xiaoke Zhou
%T Learn to Human-level Control in Dynamic Environment Using Incremental Batch Interrupting Temporal Abstraction
%J Computer Science and Information Systems
%D 2016
%V 13
%N 2
%I mathdoc
%U http://geodesic.mathdoc.fr/item/CSIS_2016_13_2_a14/
%F CSIS_2016_13_2_a14
Yuchen Fu; Zhipeng Xu; Fei Zhu; Quan Liu; Xiaoke Zhou. Learn to Human-level Control in Dynamic Environment Using Incremental Batch Interrupting Temporal Abstraction. Computer Science and Information Systems, Tome 13 (2016) no. 2. http://geodesic.mathdoc.fr/item/CSIS_2016_13_2_a14/