Limited Bidding


The applet above shows an implementation of simulated agents playing the game of limited bidding. These agents are constrained in their ability to model the mental content of others.

Game outline

Limited bidding is a simplified version of a game described in “Edward de Bono’s super mind pack”. At the beginning of the game, both players receive five numbered tokens. The number corresponds to the value of the token. Over the course of five rounds, players simultaneously choose one of their own tokens to play. Whoever picks the highest value token wins the round. If both players pick the same value token, there is no winner. Since each token can only be used once per game, players are forced to play strategically.

Game-theoretically, the optimal way to play limited bidding is by randomizing every choice. That is, under the assumption of common knowledge of rationality, a players should randomly pick one of the tokens still available to them. However, the agents in this applet suspect that their opponent may not be fully rational. Moreover, they are limited in their ability to make decisions themselves. By playing the game repeatedly against the same opponent, agents try to learn to predict what their opponent will do, and change their strategy accordingly.

Theory of mind

Theory of mind refers to the individual’s ability to model mental content of others, such as beliefs, desires or intentions. The agents modeled in the applet are constrained in their theory of mind. At the most basic level, a zero-order theory of mind agent tries to model his opponent through patterns of behaviour. For example, a zero-order theory of mind agent might find out that his opponent always plays token 5 at the start of the game, or tends to save token 3 for last. However, he is unable to realize that his opponent might be doing the same. In fact, a zero-order theory of mind agent does not realize that his opponent has goals that are opposite to the ones he has himself. The agent’s zero-order beliefs are represented by red bars in the applet, which indicate how likely the agent believes it to be that his opponent is going to play a certain token.

A first-order theory of mind agent realizes that his opponent might be a zero-order theory of mind agent, and tries to predict what she is going to do by putting himself in her position. He looks at the game from the point of view of his opponent to determine what he would believe if the situation were reversed, and uses this as a prediction for his opponent’s actions. For example, a first-order theory of mind agent might realize that he has started the game by using token 3 a few times in a row, and suspect that his opponent is going to try and take advantage of that. First-order beliefs show how likely the agent thinks his opponent to believe that he is going to play a certain token. In the applet, this is indicated by the height of the green bars.

A second-order theory of mind agent takes this reasoning one step further. He puts himself into the position of his opponent, but also believes that she might be putting herself into his position. In the applet, the height of the blue bars indicate the agent’s second-order beliefs.

Based on zero-order, first-order and second-order beliefs, an agent makes different predictions about what his opponent is going to do. The agent must therefore also form beliefs about which of these predictions will yield the best results. An agent’s combined beliefs represent how the different order of theory of mind are combined into a single prediction of his opponent’s actions.

Although the agents in the applet make use of theory of mind, they do not remember the choices of their opponent. Instead, when they see the outcome of a game, they form beliefs about what the opponent is going to do next time and forget what they saw. As an alternative type of agent, a high memory agent is a zero-order theory of mind agent that remembers what his last choice was. That is, the high memory agent forms beliefs about what his opponent is going to do in reaction to him playing each of the possible tokens.
In terms of memory, a high memory agent uses about the same amount of space as a second-order theory of mind agent, although this space is used differently.

Controls

The applet has a number of controls to allow users to judge the effect of using a higher order of theory of mind on the performance of agents in the limited bidding game.

  • Player one, player two: Determines the type of player. Both players can be either zero-order, first-order or second-order theory of mind agents. Additionally, player one may also be a high memory agent, while player two can be controlled by a human user.
  • Learning speed: Determines how strongly an agent changes his beliefs based on new information. A learning speed of 0.0 means that an agent doesn’t learn at all, while an agent with learning speed 1.0 learns so quickly that he cannot remember anything more than the last game.
  • Reset: Resets the game to the start situation.
  • Play round: Plays one round of the game, and pauses afterwards. This action is not possible if player two is a human player.
  • Play game: Repeatedly plays rounds of the game until no tokens are left. This action is not possible if player two is a human player.
  • Tokens: When player two is a human player, the game only continues once player two has revealed his choice by clicking on one of the tokens that are still available to player two. In other cases, selecting the tokens has no effect.

With the applet, you can see how agents perform better when their theory of mind level increases. The applet also shows that second-order theory of mind agents outperform agents with more memory in Limited Bidding, even though they don’t do better in simple games like Rock-paper-scissors.

The applet can also be downloaded to be used as an offline version.

This entry was posted in Uncategorized. Bookmark the permalink.