TY - JOUR
T1 - Reinforcement learning for POMDP using state classification
AU - Dung, Le Tien
AU - Komeda, Takashi
AU - Takagi, Motoki
PY - 2008/8
Y1 - 2008/8
N2 - Reinforcement learning (RL) has been widely used to solve problems with a little feedback from environment. Q learning can solve Markov decision processes (MDPs) quite well. For partially observable Markov decision processes (POMDPs), a recurrent neural network (RNN) can be used to approximate Q values. However, learning time for these problems is typically very long. We present a new combination of RL and RNN to find a good policy for POMDPs in a shorter learning time. This method contains two phases: firstly, state space is divided into two groups (fully observable state group and hidden state group); secondly, a Q value table is used to store values of fully observable states and an RNN is used to approximate values for hidden states. Results of experiments in two grid world problems show that the proposed method enables an agent to acquire a policy with better learning performance compared to the method using only a RNN.
AB - Reinforcement learning (RL) has been widely used to solve problems with a little feedback from environment. Q learning can solve Markov decision processes (MDPs) quite well. For partially observable Markov decision processes (POMDPs), a recurrent neural network (RNN) can be used to approximate Q values. However, learning time for these problems is typically very long. We present a new combination of RL and RNN to find a good policy for POMDPs in a shorter learning time. This method contains two phases: firstly, state space is divided into two groups (fully observable state group and hidden state group); secondly, a Q value table is used to store values of fully observable states and an RNN is used to approximate values for hidden states. Results of experiments in two grid world problems show that the proposed method enables an agent to acquire a policy with better learning performance compared to the method using only a RNN.
UR - http://www.scopus.com/inward/record.url?scp=49749119543&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=49749119543&partnerID=8YFLogxK
U2 - 10.1080/08839510802170538
DO - 10.1080/08839510802170538
M3 - Article
AN - SCOPUS:49749119543
SN - 0883-9514
VL - 22
SP - 761
EP - 779
JO - Applied Artificial Intelligence
JF - Applied Artificial Intelligence
IS - 7-8
ER -