Behavior learning based on a policy gradient method: Separation of environmental dynamics and state values in policies

Seiji Ishihara, Harukazu Igarashi

研究成果: Conference contribution

1 被引用数 (Scopus)

抄録

Policy gradient methods are very useful approaches in reinforcement learning. In our policy gradient approach to behavior learning of agents, we define an agent's decision problem at each time step as a problem of minimizing an objective function. In this paper, we give an objective function that consists of two types of parameters representing environmental dynamics and state-value functions. We derive separate learning rules for the two types of parameters so that the two sets of parameters can be learned independently. Separating these two types of parameters will make it possible to reuse state-value functions for agents in other different environmental dynamics, even if the dynamics is stochastic. Our simulation experiments on learning hunter-agent policies in pursuit problems show the effectiveness of our method.

本文言語English
ホスト出版物のタイトルPRICAI 2008
ホスト出版物のサブタイトルTrends in Artificial Intelligence - 10th Pacific Rim International Conference on Artificial Intelligence, Proceedings
ページ164-174
ページ数11
DOI
出版ステータスPublished - 2008
外部発表はい
イベント10th Pacific Rim International Conference on Artificial Intelligence, PRICAI 2008 - Hanoi, Viet Nam
継続期間: 2008 12月 152008 12月 19

出版物シリーズ

名前Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
5351 LNAI
ISSN(印刷版)0302-9743
ISSN(電子版)1611-3349

Conference

Conference10th Pacific Rim International Conference on Artificial Intelligence, PRICAI 2008
国/地域Viet Nam
CityHanoi
Period08/12/1508/12/19

ASJC Scopus subject areas

  • 理論的コンピュータサイエンス
  • コンピュータ サイエンス(全般)

フィンガープリント

「Behavior learning based on a policy gradient method: Separation of environmental dynamics and state values in policies」の研究トピックを掘り下げます。これらがまとまってユニークなフィンガープリントを構成します。

引用スタイル