The Java applet on this page shows the implementation of simulated agents playing the game of rock-paper-scissors. These agents differ in their ability to make use of theory of mind, the human ability that allows us to reason about what other people know and believe. The controls for this applet are explained at the bottom of this page. You can also download the offline version of the applet.
Game outline
Figure 1: In rock-paper-scissors, scissors beats paper |
Rock-paper-scissors is a two-player game in which players simultaneously choose to play either rock, paper or scissors. If the two players made the same choice, the game ends in a tie. However, if one player chose scissors, while the other chose paper, the player that chose scissors wins (see Figure 1). In the same way, rock wins from scissors, and paper wins from rock.
According to game theory, the only stable strategy when playing rock-paper-scissors is to randomly choose one of the possibilities. After all, if you play according to some pattern, the other player might learn that pattern over many repeated games, and exploit that knowledge. Playing randomly makes sure that the opponent cannot learn any patterns in the way you play the game. Although this strategy works well, people are not very good at playing randomly. For example, people usually avoid playing rock when they have just played rock two times in a row, even though this should not matter in truly random play. Also, if there are some people that are not playing randomly, smart players may be able to exploit this and get a higher score than a random player.
Theory of mind
In game settings, people often consider what their opponents know and believe, by making use of what is known as theory of mind. The computer agents in the applet on this page also make use of theory of mind to predict what their opponent is going to do. The applet allows the user to restrict agents in their ability to make use of theory of mind. This way, we can determine whether higher orders of theory of mind allows agents to win more often in rock-paper-scissors.
Figure 2: The blue zero-order theory of mind agent tries to learn patterns in the behaviour of his opponent. |
The lowest possible order of theory of mind is zero-order theory of mind. Zero-order theory of mind agents try to model their opponent through patterns of behaviour. For example, if the opponent has always played paper before, the zero-order theory of mind agent believes that she is going to play paper again (see Figure 2). In the rock-paper-scissors applet, the red bars indicate the agent’s zero-order beliefs, which show how likely the agent believes it to be that his opponent is going to play (R)ock, (P)aper or (S)cissors. This way, when a zero-order theory of mind agent sees that his opponent is playing paper more than average, he will try to take advantage of that by playing paper more than average.
A zero-order theory of mind agent tries to learn patterns in the behaviour of his opponent, but does not realize that his opponent could be doing the same thing. A first-order theory of mind agent realizes that his opponent may be a zero-order theory of mind agent. He tries to predict what his opponent is going to do by putting himself in her position. He looks at the game from the point of view of his opponent to determine what he would do if the situation were reversed, and use that as a prediction of his opponent’s action. For example, when the agent realizes he has been playing paper more than average, he believes his opponent may try and take advantage of this by playing scissors more often. If that is true, the agent can take advantage of that by playing rock (see Figure 3). In the applet, the agent’s first-order beliefs are shown as green bars. They show what the agent believes how likely his opponent thinks it is that he himself is going to play one of the three possible moves.
A second-order theory of mind agent takes his reasoning one step further, and realizes that his opponent may be a first-order theory of mind agent. He puts himself into the position of his opponent, but also believes that she might be putting herself into his position. For example, if the agent realizes his opponent is playing paper more than average, he realizes he could take advantage of that by playing scissors more often. A second-order theory of mind agent thinks that his opponent may be expecting this, and therefore that she will play rock to take advantage of the way he behaves. If that is true, the agent should start playing paper more often himself (see Figure 4). In the applet, the blue bars indicate an agent’s second-order beliefs.
Although the agents in the applet have theory of mind, they do not remember the choices of their opponent. Instead, when they see the outcome of a game, they form beliefs about what the opponent is going to do next time. After this, they forget what they saw. This means that the agents in our applet can only look at very simple patterns of behaviour. As an alternative type of agent, a high memory agent is a zero-order theory of mind agent that also remembers what was played in the previous round. That is, the high memory agent forms beliefs about what his opponent is going to do in reaction to the outcome of the last game. For example, a high memory agent may believe that his opponent is going to play rock if he just played scissors, but not if he just played paper.
Controls
With the script, you can see how agents perform better when their theory of mind level increases. In addition, you can test your ability against computer agents, and see what agents believe you are doing when playing rock-paper-scissors.
- Player 1/2 theory of mind: The radio buttons determine the order of theory of mind of the two players. Players can be any order of theory of mind up to fourth-order. Additionally, the second player can be controlled by a human user.
- Learning speed: Determines how quickly an agent changes his beliefs based on new information. A learning speed of 0.0 means that an agent does not learn at all, and will always do the same thing. An agent with learning speed 1.0 on the other hand believes that the previous game gives him all the information he needs to predict his opponent’s behaviour. Agents do not try to model the learning speed of their opponent; if the two agents have different learning speeds, they will not be able to correctly model the beliefs of their opponent.
- Reset game: Resets the game to the start situation. The score and accuracy information is reset to zero as well.
- Play round: Play one game of rock-paper-scissors. This can only be done when player two is not user-controlled.
- Rock, paper and scissors: When player two is user-controlled, selecting one of the three possible moves plays one game, with player two’s choice.
- Show mental content: A human player can use the graphs to determine what the agent will do next, or what a computer agent would do next if he were the one to play next. For a human player, the game is more challenging if the graphs are not visible. Uncheck the box to hide mental content information from the graphs.
With the applet, you can see how agents perform better when their theory of mind level increases. But for a simple game like rock-paper-scissors, theory of mind agents do not outperform high memory agents (see also Limited Bidding).