Behavior learning based on a policy gradient method: Separation of environmental dynamics and state-values in policies

Ishihara Seiji, Igarashi Harukazu

研究成果: Article査読

1 被引用数 (Scopus)

抄録

Policy gradient methods are useful approaches to reinforcement learning. Applying the method to behavior learning, we can deal with each decision problem in different time-steps as a problem of minimizing an objective function. In this paper, we give the objective function consists of two types of parameters, which represent state-values and environmental dynamics. In order to separate the learning of the state-value from that of the environmental dynamics, we also give respective learning rules for each type of parameters. Furthermore, we show that the same set of state-values can be reused under different environmental dynamics.

本文言語English
ページ(範囲)1737-1746+15
ジャーナルIEEJ Transactions on Electronics, Information and Systems
129
9
DOI
出版ステータスPublished - 2009 1月 1

ASJC Scopus subject areas

  • 電子工学および電気工学

フィンガープリント

「Behavior learning based on a policy gradient method: Separation of environmental dynamics and state-values in policies」の研究トピックを掘り下げます。これらがまとまってユニークなフィンガープリントを構成します。

引用スタイル