Theory of mind in competition
The script on this page (open script in separate tab) shows the implementation of simulated agents playing the game of rock-paper-scissors. These agents differ make use of theory of mind, the human ability that allows us to reason about what other people know and believe. Even though rock-paper-scissors is a simple game in which trying to outsmart your opponent seems useless, the script on this page shows how theory of mind can still be effective. The controls for this script are explained at the bottom of this page.
Note: The theory of mind agents behave as described in the associated paper (also available from the Publications page), with some minor changes.
|Figure 1: In rock-paper-scissors, scissors (orange agent) beats paper (blue agent).
Rock-paper-scissors is a two-player game in which players simultaneously choose to play either rock, paper or scissors. If the two players made the same choice, the game ends in a tie. However, if one player chose scissors, while the other chose paper, the player that chose scissors wins (see Figure 1). In the same way, rock wins from scissors, and paper wins from rock.
According to game theory, the only stable strategy when playing rock-paper-scissors is to randomly choose one of the possibilities. After all, if you play according to some pattern, the other player might learn that pattern over many repeated games, and exploit that knowledge. Playing randomly makes sure that the opponent cannot learn any patterns in the way you play the game. Although this strategy works well, people are not very good at playing randomly. For example, people usually avoid playing rock when they have just played rock two times in a row, even though this should not matter in truly random play. Also, if there are some people that are not playing randomly, smart players may be able to exploit this and get a higher score than a random player.
Theory of mind
In game settings, people often consider what their opponents know and believe, by making use of what is known as theory of mind. The computer agents in the game on this page also make use of theory of mind to predict what their opponent is going to do. The game allows the user to restrict agents in their ability to make use of theory of mind. This way, we can determine whether higher orders of theory of mind allows agents to win more often in rock-paper-scissors.
|Figure 2: The blue zero-order theory of mind agent tries to learn patterns in the behavior of his opponent.
The lowest possible order of theory of mind is zero-order theory of mind. Zero-order theory of mind agents try to model their opponent through patterns of behavior. For example, suppose that the opponent has always played paper before. In this case, the zero-order theory of mind agent believes that it is very likely that she will be playing paper again (see Figure 2). When a zero-order theory of mind agent sees that his opponent is often playing paper, he will try to take advantage of this by playing scissors. In the rock-paper-scissors script, the red bars indicate the agent’s zero-order tendencies to play (R)ock, (P)aper, or (S)cissors. The higher the bar for rock (R), for example, the more likely it is that a zero-order theory of mind agent will play rock.
The text below the red bars show to what extent zero-order theory of mind determines the next action of the agent (“Weight”), as well as the average accuracy of the predictions of zero-order theory of mind (“Accuracy”).
|Figure 3: If the blue agent has first-order beliefs, he believes that his orange opponent may be trying to learn and exploit patterns in his behavior. By looking at the patterns in his own behavior, the blue agent predicts how the orange opponent will try to exploit these patterns.
A zero-order theory of mind agent tries to learn patterns in the behavior of his opponent, but does not realize that his opponent could be doing the same thing. A first-order theory of mind agent realizes that his opponent may be using zero-order theory of mind. He tries to predict what his opponent is going to do by placing himself in her position. He looks at the game from the perspective of his opponent to determine what he would do if the situation were reversed. The first-order theory of mind agent then uses this action as a prediction of his opponent’s behavior. For example, suppose that the first-order theory of mind agent realizes he has been playing paper a lot. He believes that his opponent may be trying to take advantage of this by playing scissors. If that is true, the agent can take advantage of this behavior by playing rock (see Figure 3).
In the game, the green bars indicates how likely a first-order theory of mind agent considers it to be that the agent will win the next round by playing (R)ock, (P)aper, or (S)cissors. This suggestion combines the agent’s zero-order and first-order beliefs. Below the graph, the weight indicates to what extent first-order theory of mind influences the decision of the agent. The accuracy indicates the average accuracy of the agent’s first-order beliefs in predicting the behavior of the opponent.
|Figure 4: If the blue agent has second-order beliefs, he believes that his orange opponent believes that he himself is trying to learn and exploit patterns in her behavior. This allows him to anticipate how the orange opponent will try to exploit his behavior.
A second-order theory of mind agent takes his reasoning one step further, and realizes that his opponent may be a first-order theory of mind agent. He puts himself into the position of his opponent, but also believes that she might be putting herself into his position. For example, suppose that the second-order theory of mind agent realizes his opponent is playing paper a lot. Zero-order theory of mind makes him realize that he could take advantage of this predictable behavior by playing scissors. A second-order theory of mind agent thinks that his opponent may be expecting him to do so, and therefore that she will play rock to take advantage of the way he behaves. If that is true, the agent should continue playing paper himself (see Figure 4). The agent’s second-order beliefs are indicated by the blue bars.
In the script on this page, agents can continue this stepwise reasoning even further to use third-order and even fourth-order theory of mind. The associated beliefs are represented by orange and gray bars, respectively.
Although the agents in the game use theory of mind to predict the future, they do not remember the past choices of their opponent. Instead, when they see the outcome of a game, they form beliefs about what the opponent is going to do next time. After this, they immediately forget what they saw. This means that these agents can only look at very simple patterns of behavior. However, increasingly higher orders of theory of mind allow the agents to exhibit increasingly more complex patterns of behavior. Using the script on this page, you can experiment to see to what extent higher orders of theory of mind are still useful in rock-paper-scissors. In addition, you can also play the game against one of the agents yourself. The mental content of the agent then shows how closely your behavior corresponds to behavior of agents of different orders of theory of mind.
With the script, you can see how agents perform better when their theory of mind level increases. In addition, you can test your ability against computer agents, and see what agents believe you are doing when playing rock-paper-scissors.
- Player 1/2 theory of mind: The radio buttons determine the order of theory of mind of the two players. Players can be any order of theory of mind up to fourth-order. Additionally, the second player can be controlled by a human user.
- Learning speed: Determines how quickly an agent changes his beliefs based on new information. A learning speed of 0.0 means that an agent does not learn at all, but will always repeat the same behavior. An agent with learning speed 1.0, on the other hand, believes that the previous game gives him all the information he needs to predict his opponent’s next action. Agents do not try to model the learning speed of their opponent. Instead, if the two agents have different learning speeds, they will not be able to correctly model the beliefs of their opponent.
- Reset game: Resets the game to the start situation. The score and accuracy information is reset to zero as well.
- Play round: Play one game of rock-paper-scissors. This can only be done when player two is not user-controlled.
- Rock, paper and scissors: When player two is user-controlled, selecting one of the three possible moves plays one game, with player two’s choice.
- Show mental content: A human player can use the graphs to determine what the agent will do next, or what a computer agent would do next if he were the one to play next. For a human player, the game is more challenging if the graphs are not visible. Uncheck the box to hide mental content information from the graphs.
An older version of the rock-paper-scissors script is available as a Java applet. However, for security reasons, many browsers no longer allow Java applets to be run from a web browser. The rock-paper-scissors applet can be still be downloaded for offline use.