TY - GEN

T1 - Mixed reinforcement learning for partially observable Markov decision process

AU - Dung, Le Tien

AU - Komeda, Takashi

AU - Takagi, Motoki

PY - 2007

Y1 - 2007

N2 - Reinforcement Learning has been widely used to solve problems with a little feedback from environment. Q learning can solve full observable Markov Decision Processes quite well. For Partially Observable Markov Decision Processes (POMDPs), a Recurrent Neural Network (RNN) can be used to approximate Q values. However, learning time for these problems is typically very long. In this paper, Mixed Reinforcement Learning is presented to And an optimal policy for POMDPs in a shorter learning time. This method uses both a Q value table and a RNN. Q value table stores Q values for full observable states and the RNN approximates Q values for hidden states. An observable degree is calculated for each state while the agent explores the environment. If the observable degree is less than a threshold, the state is considered as a hidden state. Results of experiment in lighting grid world problem show that the proposed method enables an agent to acquire a policy, as good as the policy acquired by using only a RNN, with better learning performance.

AB - Reinforcement Learning has been widely used to solve problems with a little feedback from environment. Q learning can solve full observable Markov Decision Processes quite well. For Partially Observable Markov Decision Processes (POMDPs), a Recurrent Neural Network (RNN) can be used to approximate Q values. However, learning time for these problems is typically very long. In this paper, Mixed Reinforcement Learning is presented to And an optimal policy for POMDPs in a shorter learning time. This method uses both a Q value table and a RNN. Q value table stores Q values for full observable states and the RNN approximates Q values for hidden states. An observable degree is calculated for each state while the agent explores the environment. If the observable degree is less than a threshold, the state is considered as a hidden state. Results of experiment in lighting grid world problem show that the proposed method enables an agent to acquire a policy, as good as the policy acquired by using only a RNN, with better learning performance.

UR - http://www.scopus.com/inward/record.url?scp=34948826477&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=34948826477&partnerID=8YFLogxK

U2 - 10.1109/CIRA.2007.382910

DO - 10.1109/CIRA.2007.382910

M3 - Conference contribution

AN - SCOPUS:34948826477

SN - 1424407907

SN - 9781424407903

T3 - Proceedings of the 2007 IEEE International Symposium on Computational Intelligence in Robotics and Automation, CIRA 2007

SP - 7

EP - 12

BT - Proceedings of the 2007 IEEE International Symposium on Computational Intelligence in Robotics and Automation, CIRA 2007

T2 - 2007 IEEE International Symposium on Computational Intelligence in Robotics and Automation, CIRA 2007

Y2 - 20 June 2007 through 23 June 2007

ER -