On Learning Soccer Strategies


          Rafal Salustowicz  Marco Wiering  Juergen Schmidhuber

             IDSIA, Corso Elvezia 36, 6900 Lugano, Switzerland
               e-mail: {rafal, marco, juergen}@idsia.ch
               tel.: +41-91-9919838 fax: +41-91-9919839

                                Abstract:

We use simulated soccer to study multiagent learning.  Each team's
players (agents) share action set and policy but may behave differently
due to position-dependent inputs.  All agents making up a team are
rewarded or punished collectively in case of goals.  We conduct
simulations with varying team sizes, and compare two learning 
algorithms: TD-Q learning with linear neural networks (TD-Q) and
Probabilistic Incremental Program Evolution (PIPE).  TD-Q is based on
evaluation functions (EFs) mapping input/action pairs to expected
reward, while PIPE searches policy space directly. PIPE uses an adaptive
probability distribution to synthesize programs that calculate action 
probabilities from current inputs.  Our results show that TD-Q has 
difficulties to learn appropriate shared EFs. PIPE, however, does not 
depend on EFs and finds good policies faster and more reliably.