TY - JOUR
T1 - Behavior learning based on a policy gradient method
T2 - Separation of environmental dynamics and state-values in policies
AU - Seiji, Ishihara
AU - Harukazu, Igarashi
PY - 2009/1/1
Y1 - 2009/1/1
N2 - Policy gradient methods are useful approaches to reinforcement learning. Applying the method to behavior learning, we can deal with each decision problem in different time-steps as a problem of minimizing an objective function. In this paper, we give the objective function consists of two types of parameters, which represent state-values and environmental dynamics. In order to separate the learning of the state-value from that of the environmental dynamics, we also give respective learning rules for each type of parameters. Furthermore, we show that the same set of state-values can be reused under different environmental dynamics.
AB - Policy gradient methods are useful approaches to reinforcement learning. Applying the method to behavior learning, we can deal with each decision problem in different time-steps as a problem of minimizing an objective function. In this paper, we give the objective function consists of two types of parameters, which represent state-values and environmental dynamics. In order to separate the learning of the state-value from that of the environmental dynamics, we also give respective learning rules for each type of parameters. Furthermore, we show that the same set of state-values can be reused under different environmental dynamics.
KW - Policy gradient method
KW - Pursuit problem
KW - Reinforcement learning
KW - State transition probabilities
UR - http://www.scopus.com/inward/record.url?scp=70350148263&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=70350148263&partnerID=8YFLogxK
U2 - 10.1541/ieejeiss.129.1737
DO - 10.1541/ieejeiss.129.1737
M3 - Article
AN - SCOPUS:70350148263
SN - 0385-4221
VL - 129
SP - 1737-1746+15
JO - IEEJ Transactions on Electronics, Information and Systems
JF - IEEJ Transactions on Electronics, Information and Systems
IS - 9
ER -