Projects: Learning in Multi-Agent Systems



Many real world problems such as traffic control, network routing, elevator dispatching, stock supply, transportation problems and pollution detection can be naturally modeled with Multi-agent systems (MASs). We are interested in the properties of such MASs. MASs consist of (adaptive) agents which use inputs to select actions. The global behavior of a MAS depends on the behavior of all agents and their interactions. We are interested in using Machine Learning (ML) to optimize the behavior of each agent. Agents use a decision policy to select actions based on inputs. Their goal is to optimize the rewards they get through their reward function which maps changes of the environmental state to scalar reward signals. Since each agent tries to maximize its own rewards, agents behave in their own interests. Therefore conflicts between agents may arise and the need to coordinate the agents emerges. Coordinating agents in a MAS can be done in different ways. We can allow communication between agents, we can create management agencies, or we can have systems in which reward functions are shared among agents. In our project we identify different MAS systems and study on what kind of problems they can be used efficiently.



Traffic light control

In one of our projects we study traffic light optimization in cities. Our traffic light controllers use reinforcement learning techniques for tracking waiting times for red lights of infrastructure users such as cars and busses. The expected waiting time when a light is set to red if of course higher than when the light is set to green. First of all, the car cannot move closer to the intersection or cross the intersection, but it has to wait at the same position in the traffic queue. Furthermore, if the car could pass an intersection, it does not have to wait anymore for the previous traffic light, whereas a red light could have stayed a while on a red light. Therefore there is a gain of at least a single time step for each car to have its light set to green. If the current lane almost never has its light set to green, than the advantage of a green light for this car may be very large. By monitoring cars driving through the city and interacting with traffic lights, reinforcement learning can estimate the expected waiting times online. The traffic light controllers use these expected waiting times to compute gains for each car waiting for the traffic light. After this, the traffic light controllers set the allowed configuration with maximal gain to green lights or it explores. We have developed a user friendly simulator for editing infrastructures using the mouse. The simulator can track a variety of statistical measures to compare different traffic light controllers. The software (written in Java1.4) of the simulator, Green Light District (GLD) can be found here: GLD.