TY - GEN
T1 - Behavior learning based on a policy gradient method
T2 - 10th Pacific Rim International Conference on Artificial Intelligence, PRICAI 2008
AU - Ishihara, Seiji
AU - Igarashi, Harukazu
PY - 2008
Y1 - 2008
N2 - Policy gradient methods are very useful approaches in reinforcement learning. In our policy gradient approach to behavior learning of agents, we define an agent's decision problem at each time step as a problem of minimizing an objective function. In this paper, we give an objective function that consists of two types of parameters representing environmental dynamics and state-value functions. We derive separate learning rules for the two types of parameters so that the two sets of parameters can be learned independently. Separating these two types of parameters will make it possible to reuse state-value functions for agents in other different environmental dynamics, even if the dynamics is stochastic. Our simulation experiments on learning hunter-agent policies in pursuit problems show the effectiveness of our method.
AB - Policy gradient methods are very useful approaches in reinforcement learning. In our policy gradient approach to behavior learning of agents, we define an agent's decision problem at each time step as a problem of minimizing an objective function. In this paper, we give an objective function that consists of two types of parameters representing environmental dynamics and state-value functions. We derive separate learning rules for the two types of parameters so that the two sets of parameters can be learned independently. Separating these two types of parameters will make it possible to reuse state-value functions for agents in other different environmental dynamics, even if the dynamics is stochastic. Our simulation experiments on learning hunter-agent policies in pursuit problems show the effectiveness of our method.
UR - http://www.scopus.com/inward/record.url?scp=58349115123&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=58349115123&partnerID=8YFLogxK
U2 - 10.1007/978-3-540-89197-0_18
DO - 10.1007/978-3-540-89197-0_18
M3 - Conference contribution
AN - SCOPUS:58349115123
SN - 354089196X
SN - 9783540891963
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 164
EP - 174
BT - PRICAI 2008
Y2 - 15 December 2008 through 19 December 2008
ER -