Rock-paper-scissors

Theory of mind in competition

The script on this page (open script in separate tab) shows the implementation of simulated agents playing the game of rock-paper-scissors. These agents differ make use of theory of mind, the human ability that allows us to reason about what other people know and believe. Even though rock-paper-scissors is a simple game in which trying to outsmart your opponent seems useless, the script on this page shows how theory of mind can still be effective. The controls for this script are explained at the bottom of this page.
Note: The theory of mind agents behave as described in the associated paper (also available from the Publications page), with some minor changes.



Game outline

Figure 1: In rock-paper-scissors, scissors (orange agent) beats paper (blue agent).

Rock-paper-scissors is a two-player game in which players simultaneously choose to play either rock, paper or scissors. If the two players made the same choice, the game ends in a tie. However, if one player chose scissors, while the other chose paper, the player that chose scissors wins (see Figure 1). In the same way, rock wins from scissors, and paper wins from rock.

According to game theory, the only stable strategy when playing rock-paper-scissors is to randomly choose one of the possibilities. After all, if you play according to some pattern, the other player might learn that pattern over many repeated games, and exploit that knowledge. Playing randomly makes sure that the opponent cannot learn any patterns in the way you play the game. Although this strategy works well, people are not very good at playing randomly. For example, people usually avoid playing rock when they have just played rock two times in a row, even though this should not matter in truly random play. Also, if there are some people that are not playing randomly, smart players may be able to exploit this and get a higher score than a random player.

Theory of mind

In game settings, people often consider what their opponents know and believe, by making use of what is known as theory of mind. The computer agents in the game on this page also make use of theory of mind to predict what their opponent is going to do. The game allows the user to restrict agents in their ability to make use of theory of mind. This way, we can determine whether higher orders of theory of mind allows agents to win more often in rock-paper-scissors.

Figure 2: The blue zero-order theory of mind agent tries to learn patterns in the behavior of his opponent.

The lowest possible order of theory of mind is zero-order theory of mind. Zero-order theory of mind agents try to model their opponent through patterns of behavior. For example, suppose that the opponent has always played paper before. In this case, the zero-order theory of mind agent believes that it is very likely that she will be playing paper again (see Figure 2). When a zero-order theory of mind agent sees that his opponent is often playing paper, he will try to take advantage of this by playing scissors. In the rock-paper-scissors script, the red bars indicate the agent’s zero-order tendencies to play (R)ock, (P)aper, or (S)cissors. The higher the bar for rock (R), for example, the more likely it is that a zero-order theory of mind agent will play rock.

The text below the red bars show to what extent zero-order theory of mind determines the next action of the agent (“Weight”), as well as the average accuracy of the predictions of zero-order theory of mind (“Accuracy”).

Figure 3: If the blue agent has first-order beliefs, he believes that his orange opponent may be trying to learn and exploit patterns in his behavior. By looking at the patterns in his own behavior, the blue agent predicts how the orange opponent will try to exploit these patterns.

A zero-order theory of mind agent tries to learn patterns in the behavior of his opponent, but does not realize that his opponent could be doing the same thing. A first-order theory of mind agent realizes that his opponent may be using zero-order theory of mind. He tries to predict what his opponent is going to do by placing himself in her position. He looks at the game from the perspective of his opponent to determine what he would do if the situation were reversed. The first-order theory of mind agent then uses this action as a prediction of his opponent’s behavior. For example, suppose that the first-order theory of mind agent realizes he has been playing paper a lot. He believes that his opponent may be trying to take advantage of this by playing scissors. If that is true, the agent can take advantage of this behavior by playing rock (see Figure 3).

In the game, the green bars indicates how likely a first-order theory of mind agent considers it to be that the agent will win the next round by playing (R)ock, (P)aper, or (S)cissors. This suggestion combines the agent’s zero-order and first-order beliefs. Below the graph, the weight indicates to what extent first-order theory of mind influences the decision of the agent. The accuracy indicates the average accuracy of the agent’s first-order beliefs in predicting the behavior of the opponent.

Figure 4: If the blue agent has second-order beliefs, he believes that his orange opponent believes that he himself is trying to learn and exploit patterns in her behavior. This allows him to anticipate how the orange opponent will try to exploit his behavior.

A second-order theory of mind agent takes his reasoning one step further, and realizes that his opponent may be a first-order theory of mind agent. He puts himself into the position of his opponent, but also believes that she might be putting herself into his position. For example, suppose that the second-order theory of mind agent realizes his opponent is playing paper a lot. Zero-order theory of mind makes him realize that he could take advantage of this predictable behavior by playing scissors. A second-order theory of mind agent thinks that his opponent may be expecting him to do so, and therefore that she will play rock to take advantage of the way he behaves. If that is true, the agent should continue playing paper himself (see Figure 4). The agent’s second-order beliefs are indicated by the blue bars.

In the script on this page, agents can continue this stepwise reasoning even further to use third-order and even fourth-order theory of mind. The associated beliefs are represented by orange and gray bars, respectively.

Although the agents in the game use theory of mind to predict the future, they do not remember the past choices of their opponent. Instead, when they see the outcome of a game, they form beliefs about what the opponent is going to do next time. After this, they immediately forget what they saw. This means that these agents can only look at very simple patterns of behavior. However, increasingly higher orders of theory of mind allow the agents to exhibit increasingly more complex patterns of behavior. Using the script on this page, you can experiment to see to what extent higher orders of theory of mind are still useful in rock-paper-scissors. In addition, you can also play the game against one of the agents yourself. The mental content of the agent then shows how closely your behavior corresponds to behavior of agents of different orders of theory of mind.

Controls

With the script, you can see how agents perform better when their theory of mind level increases. In addition, you can test your ability against computer agents, and see what agents believe you are doing when playing rock-paper-scissors.

  • Player 1/2 theory of mind: The radio buttons determine the order of theory of mind of the two players. Players can be any order of theory of mind up to fourth-order. Additionally, the second player can be controlled by a human user.
  • Learning speed: Determines how quickly an agent changes his beliefs based on new information. A learning speed of 0.0 means that an agent does not learn at all, but will always repeat the same behavior. An agent with learning speed 1.0, on the other hand, believes that the previous game gives him all the information he needs to predict his opponent’s next action. Agents do not try to model the learning speed of their opponent. Instead, if the two agents have different learning speeds, they will not be able to correctly model the beliefs of their opponent.
  • Reset game: Resets the game to the start situation. The score and accuracy information is reset to zero as well.
  • Play round: Play one game of rock-paper-scissors. This can only be done when player two is not user-controlled.
  • Rock, paper and scissors: When player two is user-controlled, selecting one of the three possible moves plays one game, with player two’s choice.
  • Show mental content: A human player can use the graphs to determine what the agent will do next, or what a computer agent would do next if he were the one to play next. For a human player, the game is more challenging if the graphs are not visible. Uncheck the box to hide mental content information from the graphs.

An older version of the rock-paper-scissors script is available as a Java applet. However, for security reasons, many browsers no longer allow Java applets to be run from a web browser. The rock-paper-scissors applet can be still be downloaded for offline use.

Limited Bidding

Theory of mind in competition

The script on this page (open script in separate tab) shows the implementation of simulated agents playing the game of Limited Bidding. These agents make use of theory of mind, the human ability that allows us to reason about what other people know and believe. Agents use their theory of mind to try and outsmart their opponent. The script on this page demonstrates the effectiveness of this strategy. The controls for this script are explained at the bottom of this page.
Note: The theory of mind agents on this page do not behave the same way as described in the associated paper (also available from the Publications page). Instead of calculating their planned moves for the entire game, agents plan only one action and use a Q-learning approach to learn values for the resulting state.



Game outline

Limited bidding is a simplified version of a game described in “Edward de Bono’s Supermind Pack”. The game is played by two players. At the beginning of the game, each player receives an identical set of five numbered tokens. The number on each token corresponds to the value of the token. Over the course of five rounds, players simultaneously choose one of their own tokens to play. Whoever picks the highest value token wins the round. If both players pick the same value token, there is no winner. However, each token can only be used once per game. This means that players should choose their actions strategically.

Game-theoretically, the optimal way to play limited bidding is by randomizing every choice. That is, under the assumption of common knowledge of rationality, a players should randomly pick one of the tokens still available to them. However, the theory of mind agents modeled on this page suspect that their opponent may not be fully rational. Moreover, they are limited in their ability to make decisions themselves. By playing the game repeatedly against the same opponent, agents try to learn to predict what their opponent will do, and change their strategy accordingly.

Theory of mind

Theory of mind refers to the individual’s ability to model mental content of others, such as beliefs, desires or intentions. The agents modeled in the script are constrained in their theory of mind. At the most basic level, a zero-order theory of mind agent tries to model his opponent through patterns of behavior. For example, a zero-order theory of mind agent might find out that his opponent always plays token 5 at the start of the game, or tends to save token 3 for last. However, he is unable to realize that his opponent might be doing the same. In fact, a zero-order theory of mind agent does not realize that his opponent has goals that are opposite to the ones he has himself. In the script, the agent’s zero-order beliefs are represented by red bars. The height of each red bar indicates how likely the agent believes it to be that his opponent is going to play a certain token.

A first-order theory of mind agent realizes that his opponent might be a zero-order theory of mind agent, and tries to predict what she is going to do by putting himself in her position. He looks at the game from the point of view of his opponent to determine what he would believe if the situation were reversed, and uses this as a prediction for his opponent’s actions. For example, a first-order theory of mind agent might realize that he has started the game by using token 3 a few times in a row, and suspect that his opponent is going to try and take advantage of that by playing 4 in the first round. The agent’s first-order theory of mind would therefore suggest that the agent plays 5 to win the first round. In the script, the height of the green bars represent the agent’s first-order beliefs concerning his opponent’s next action.

A second-order theory of mind agent takes this theory of mind reasoning one step further. He puts himself into the position of his opponent, but also believes that she might be putting herself into his position. In the script, the height of the blue bars indicate the agent’s second-order beliefs.

Based on zero-order, first-order and second-order beliefs, an agent makes different predictions about what his opponent is going to do. The agent must therefore also form beliefs about which of these predictions will yield the best results. An agent’s combined beliefs represent how the different order of theory of mind are combined into a single prediction of his opponent’s actions. In the script, each bar graph depicting an agent’s theory of mind beliefs also indicates the accuracy of the predictions of that particular order of theory of mind, as well as the weight of these predictions in the agent’s next action. For example, a second-order theory of mind agent that has zero weight for his second-order beliefs will ignore his second-order theory of mind, and act as if he were an agent of a lower order of theory of mind.

Although the agents in the script make use of theory of mind, they do not remember the choices of their opponent. Instead, when they see the outcome of a game, they form beliefs about what the opponent is going to do next time and forget what they saw. As an alternative type of agent, a high memory agent is a zero-order theory of mind agent that remembers what his last choice was. That is, the high memory agent forms beliefs about what his opponent is going to do in reaction to him playing each of the possible tokens. In terms of memory, a high memory agent uses about the same amount of space as a second-order theory of mind agent, although this space is used differently. Although the high memory agent is not available in the script on this page, this agent is included in the applet example, which you can download at the bottom of the page.

Controls

The script has a number of controls to allow users to judge the effect of using a higher order of theory of mind on the performance of agents in the limited bidding game.

  • Player 1/2 theory of mind: The radio buttons determine the order of theory of mind of the two players. Players can be any order of theory of mind up to second-order. Additionally, the second player can be controlled by a human user.
  • Learning speed: Determines how quickly an agent changes his beliefs based on new information. A learning speed of 0.0 means that an agent does not learn at all, and will always do the same thing. An agent with learning speed 1.0 on the other hand believes that the previous game gives him all the information he needs to predict his opponent’s behavior. Agents do not try to model the learning speed of their opponent; if the two agents have different learning speeds, they will not be able to correctly model the beliefs of their opponent.
  • Reset game: Resets the game to the start situation. The score and accuracy information is reset to zero as well.
  • Start/Stop: Starts or stops the automatic play of limited bidding games. This can only be done when player two is not user-controlled.
  • Play round: Play one game of limited bidding. This can only be done when player two is not user-controlled.
  • Token buttons: When player 2 is user-controlled, selecting one of the available orange numbered tokens performs a move in the game.
  • Show mental content: A human player can use the graphs to determine what the agent believes that the human player will do next, or what a computer agent would believe if he were the one to play next instead of the human player. For a human player, the game is more challenging if the graphs are not visible.

With the applet, you can see how agents perform better when their theory of mind level increases. The applet also shows that second-order theory of mind agents outperform agents with more memory in Limited Bidding, even though they don’t do better in simple games like Rock-paper-scissors.

An older version of the limited bidding script is available as a Java applet. However, for security reasons, many browsers no longer allow Java applets to be run from a web browser. The Limited Bidding applet can be still be downloaded for offline use. As an additional feature, this Java implementation also includes high memory agents.