TY - GEN
T1 - Learning of soccer player agents using a policy gradient method
T2 - 2008 International Joint Conference on Neural Networks, IJCNN 2008
AU - Igarashi, H.
AU - Nakamura, K.
AU - Ishihara, S.
PY - 2008
Y1 - 2008
N2 - The RoboCup Simulation League is recognized as a test bed for research on multi-agent learning. As an example of multi-agent learning in a soccer game, we dealt with a learning problem between a kicker and a receiver when a direct free kick is awarded just outside the opponent's penalty area. In such a situation, to which point should the kicker kick the ball? We propose a function that expresses heuristics to evaluate an advantageous target point for safely sending/receiving a pass and scoring. The heuristics includes an interaction term between a kicker and a receiver to intensify their coordination. To calculate the interaction term, we let kicker/receiver agents have a receiver/kicker action decision model to predict his teammate's action. The evaluation function makes it possible to handle a large space of states consisting of the positions of a kicker, a receiver, and their opponents. The target point of the free kick is selected by the kicker using Boltzmann selection with an evaluation function. Parameters in the function can be learned by a kind of reinforcement learning called the policy gradient method. The point to which a receiver should run to receive the ball is simultaneously learned in the same manner. The effectiveness of our solution was shown by experiments.
AB - The RoboCup Simulation League is recognized as a test bed for research on multi-agent learning. As an example of multi-agent learning in a soccer game, we dealt with a learning problem between a kicker and a receiver when a direct free kick is awarded just outside the opponent's penalty area. In such a situation, to which point should the kicker kick the ball? We propose a function that expresses heuristics to evaluate an advantageous target point for safely sending/receiving a pass and scoring. The heuristics includes an interaction term between a kicker and a receiver to intensify their coordination. To calculate the interaction term, we let kicker/receiver agents have a receiver/kicker action decision model to predict his teammate's action. The evaluation function makes it possible to handle a large space of states consisting of the positions of a kicker, a receiver, and their opponents. The target point of the free kick is selected by the kicker using Boltzmann selection with an evaluation function. Parameters in the function can be learned by a kind of reinforcement learning called the policy gradient method. The point to which a receiver should run to receive the ball is simultaneously learned in the same manner. The effectiveness of our solution was shown by experiments.
UR - http://www.scopus.com/inward/record.url?scp=56349109992&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=56349109992&partnerID=8YFLogxK
U2 - 10.1109/IJCNN.2008.4633765
DO - 10.1109/IJCNN.2008.4633765
M3 - Conference contribution
AN - SCOPUS:56349109992
SN - 9781424418213
T3 - Proceedings of the International Joint Conference on Neural Networks
SP - 46
EP - 52
BT - 2008 International Joint Conference on Neural Networks, IJCNN 2008
Y2 - 1 June 2008 through 8 June 2008
ER -