Learning Team Strategies: Soccer Case Studies

          Rafal Salustowicz  Marco Wiering  Juergen Schmidhuber

             IDSIA, Corso Elvezia 36, 6900 Lugano, Switzerland
               e-mail: {rafal, marco, juergen}@idsia.ch
               tel.: +41-91-9919838 fax: +41-91-9919839

                                Abstract:

We use simulated soccer to study multiagent learning. Each team's players 
(agents) share action set and policy, but may behave differently due to 
position-dependent inputs.  All agents making up a team are rewarded or 
punished collectively in case of goals.  We conduct simulations with 
varying team sizes, and compare several learning algorithms: TD-Q learning 
with linear neural networks (TD-Q), Probabilistic Incremental Program 
Evolution (PIPE), and a PIPE version that learns by coevolution (CO-PIPE).  
TD-Q is based on learning evaluation functions (EFs) mapping input/action 
pairs to expected reward. PIPE and CO-PIPE search policy space directly. 
They use adaptive probability distributions to synthesize programs that 
calculate action probabilities from current inputs.  Our results show that 
linear TD-Q encounters several difficulties in learning appropriate shared 
EFs. PIPE and CO-PIPE, however, do not depend on EFs and find good policies 
faster and more reliably. This suggests that in some multiagent learning 
scenarios direct search in policy space can offer advantages over EF-based 
approaches.

Keywords:
	Multiagent Reinforcement Learning,
	Soccer,
	TD-Q Learning,
	Evaluation Functions,
	Probabilistic Incremental Program Evolution,
	Coevolution