Learning Team Strategies With Multiple Policy-Sharing Agents:
			 A Soccer Case Study

	  Rafal Salustowicz  Marco Wiering  Juergen Schmidhuber

             IDSIA, Corso Elvezia 36, 6900 Lugano, Switzerland
               e-mail: {rafal, marco, juergen}@idsia.ch
               tel.: +41-91-9919838 fax: +41-91-9919839

                                Abstract:

We use simulated soccer to study multiagent learning.  Each team's
players (agents) share action set and policy but may behave differently
due to position-dependent inputs.  All agents making up a team are
rewarded or punished collectively in case of goals.  We conduct
simulations with varying team sizes, and compare two learning
algorithms: TD-Q learning with linear neural networks (TD-Q) and
Probabilistic Incremental Program Evolution (PIPE).  TD-Q is based on
evaluation functions (EFs) mapping input/action pairs to expected
reward, while PIPE searches policy space directly. PIPE uses adaptive
``probabilistic prototype trees'' to synthesize programs that calculate
action probabilities from current inputs.  Our results show that TD-Q
encounters several difficulties in learning appropriate shared EFs.
PIPE, however, does not depend on EFs and can find good policies faster
and more reliably.  This suggests that in multiagent learning scenarios
direct search through policy space can offer advantages over EF-based
approaches.

Keywords:
	Multiagent Reinforcement Learning,
	Soccer,
	TD-Q Learning,
	Probabilistic Incremental Program Evolution