Tacit Communication Game

Theory of mind in cooperation

The script on this page (open script in separate tab) shows the implementation of simulated agents playing a simplified version of the cooperative-communicative Tacit Communication Game. These agents make use of theory of mind, the human ability that allows us to reason about what other people know and believe. Using theory of mind, these agents can reason about the goals and intentions of their partner. The script demonstrates how theory of mind can help agents to set up communication more quickly. The controls for this script are explained at the bottom of this page.
The JavaScript on this page makes use of HTML5.
Note: The theory of mind agents behave as described in the associated paper (also available from the Publications page), with some minor changes.

Tacit Communication Game

Figure 1: In the Tacit Communication Game, both players have the same goal. However, only the blue Sender player knows what this goal is.

The Tacit Communication Game is a cooperative communication game played on a 3×3 board of tiles by two players. The game is played in two round. At the start of the first round, the blue Sender token is placed in the middle of the board. The Sender player may rotate and/or move this token to an adjacent tile any number of times. Once the Sender is satisfied with the final position and orientation of his token, the second round starts. When this round starts, the orange Receiver token is placed in the middle of the board, and the Receiver player may rotate and/or move this token to an adjacent tile any number of times. Once she is satisfied with the location and orientation of her token, the game ends.

The Sender and the Receiver share the same goal: the final location and orientation of their respective tokens should match a certain goal configuration. Importantly, only the Sender knows what this goal configuration is (see also Figure 1). During the first round, the Sender should therefore match his token to the goal configuration, but he also needs to communicate the goal configuration of the orange token to the Receiver using only his movements on the board. The Receiver, for her part, has to find out what the goal configuration of her orange token is based on the movements of the blue Sender token. At the end of each game, the Sender and Receiver hear whether they matched their tokens to the goal configuration or not. However, if they failed to reach the goal, the Receiver does not hear what the correct configuration was.

In the full Tacit Communication Game, Sender and Receiver can have tokens of different shapes. Figure 1 shows an example in which the Sender has a round token, while the Receiver has a triangular token. This makes the game more difficult for the Sender, because he will have to let the Receiver know what the goal orientation of her token is without being able to use the orientation of his own token. On this page, we take a look at a simplified version of the Tacit Communication Game, in which Sender and Receiver both have a round token. This means that the Sender only has to let the Receiver know where her orange token should be placed. But even in this simple game, we can already see some interesting behavior. On this page, we focus mostly on the role of theory of mind and predictable behavior.



Theory of mind

In the Tacit Communication Game, the Sender knows what the goal configuration is, but the Receiver does not. It could therefore be beneficial for the players to reason about what the other knows and believes. This reasoning ability is known as theory of mind. The software agents playing the Tacit Communication Game on this page also make use of theory of mind to predict what the other player is going to do. In the game, the theory of mind level of agents can be set to determine how this influences the ability of agents to cooperate.

Figure 2: Zero-order theory of mind agents randomly try actions until they find one that works. In this case, the zero-order theory of mind Sender sends message UP-LEFT, and the Receiver correctly guesses her goal location. This results in both players learning that sending the message UP-LEFT results in the Receiver selecting bottom right location.

The lowest possible order of theory of mind is zero-order theory of mind. Zero-order theory of mind agents can only predict future behavior based on past behavior. This means that a zero-order theory of mind Sender sends random messages to find out how the Receiver reacts to them. Once he has a clear idea of how the Receiver behaves, he sends the message for which he believes the Receiver to match her token to the goal configuration. At the same time, a zero-order theory of mind Receiver randomly tries locations until she finds the one that results in success. Figure 2 shows an example where the Sender sends the message UP-LEFT, after which the Receiver correctly guesses that she should put her token to the bottom right location. Because this was a correct guess, both Sender and Receiver now believe that if the Sender were to send UP-LEFT again, the Receiver would respond by moving her token to the bottom right location.

Figure 3: Errors can lead to conflicting beliefs in the Tacit Communication Game. In this case, a zero-order theory of mind Sender sends message UP-LEFT, and the Receiver incorrectly guesses her goal location. The Sender now believes that sending the message UP-LEFT results in the Receiver selecting the bottom right location. But the Receiver believes that if she sees the message UP-LEFT, she should try anything but the bottom right location.

When the Receiver misinterprets a message the Sender has sent, zero-order theory of mind leads to conflicting beliefs. Figure 3 shows an example of this. Here, a zero-order theory of mind Sender sent the message UP-LEFT, and the zero-order theory of mind Receiver responded incorrectly by selecting the bottom right location. The negative feedback causes the Receiver to believe that, if the Sender were to send the message UP-LEFT again, she should not choose the bottom right location again. On the other hand, the Sender believes that if he were to send the message UP-LEFT again, the Receiver will respond by selecting the bottom right location again.

On first glance, it may seem strange that after this negative feedback, the Sender believes that the Receiver will not change her behavior. After all, the Receiver has also seen this negative feedback, so the Sender should expect that she will change her behavior. However, this would mean that the Sender knows that the Receiver has a goal: to match her token to the goal configuration. But the zero-order theory of mind Sender cannot reason about the beliefs and goals of the Receiver. To the zero-order theory of mind Sender, the Receiver is like a coffee machine with the labels remove. The Sender randomly pushes buttons to try and get the type of coffee he wants. If he has pushed the wrong button, he believes that if he were to press the same button again, the coffee machine would produce the same result. In the same way, the Sender believes that if he sends the same message, the Receiver would produce the same behavior again. For people, however, theory of mind reasoning is so natural that zero-order theory of mind reasoning actually seems counterintuitive.

Figure 4: First-order theory of mind allows a Receiver to look at the game from the perspective of the Sender. This way, a first-order theory of mind Receiver believes that any new message (RIGHT-UP-LEFT-LEFT) is not meant to communicate a goal location that the Sender has already found a good message for.

A first-order theory of mind agent can reason about the goals of others. Such an agent realizes that the two players have the same goal. This especially helps the Receiver she she tries to interpret the messages of the Sender, as is shown in Figure 4. When a first-order theory of mind Receiver sees a message, she tries to figure out for what goal configuration she would have decided to send the same message. This helps the Receiver to interpret new messages. Figure 4 shows a situation in which the Sender has previously sent the message UP-LEFT, after which the Receiver correctly guessed that the bottom right tile was her goal location. When the Receiver sees the message RIGHT-UP-LEFT-LEFT, she takes the perspective of the Sender and concludes that her goal location is not the bottom right tile. After all, if the Sender had wanted her to go to the bottom right tile, he would have sent the message UP-LEFT.

First-order theory of mind agents believe that other players may be zero-order theory of mind agents. However, if both Sender and Receiver are first-order theory of mind agents, both agents are mistaken. For the best results, either the Sender or the Receiver should use zero-order theory of mind.

Figure 5: Second-order theory of mind helps when players are predictable. In this case, Senders want to send short messages that visit the goal location of the Receiver. A second-order theory of mind Sender reasons that sending the message LEFT-DOWN-UP-UP will cause the Receiver to move her token to the bottom left tile, but the message DOWN-LEFT-UP-UP may not.

A second-order theory of mind agent takes this reasoning one step further, and believes that the other player may be a first-order theory of mind agent. This means that the second-order theory of mind agent believes that the other player knows that both players have the same goal. Interestingly, unlike in competition and negotiation, second-order theory of mind does not provide any additional benefits in the standard model. However, second-order theory of mind can be beneficial in cooperative settings such as the Tacit Communication Game when player behavior more predictable.
For example, Senders may prefer to send different messages for different goal configurations. In the game on this page, Sender preferences can be set to short messages, messages that visit the Receiver’s goal location, and short messages that visit the the Receiver’s goal location. This can help agents to play the Tacit Communication Game more effectively.

Figure 5 shows an example in which Senders prefer to send messages that are as short as possible, but also visit the goal location of the Receiver. In this example, the Sender wants to send a message that lets the Receiver know that she should place her token on the bottom left tile. By placing himself in the position of the Receiver, the Sender tries to predict how the Receiver will react to the message LEFT-DOWN-UP-UP. For example, the Receiver may think that the Sender wants her to go to the left tile in the middle row. After all, the Sender’s message visits this location. However, the Receiver knows that there is a shorter message (LEFT-UP) that the Sender could have sent, and that would still have visited the same location. By placing herself in the shoes of the Sender, the Receiver reasons that the Sender would have preferred to send LEFT-UP in this case. As a result, the Receiver believes that the message LEFT-DOWN-UP-UP is not intended to tell her that her goal location is the left tile in the middle row. In fact, the only location that the first-order theory of mind Receiver considers to be her goal location is the bottom left tile.

Through the use of second-order theory of mind, the Sender believes that if he were to send the message LEFT-DOWN-UP-UP, the Receiver would respond by moving to the bottom left tile. Moreover, the use of second-order theory of mind lets the Sender know that LEFT-DOWN-UP-UP is a better message than DOWN-LEFT-UP-UP. Even though the messages have the same length, and the Sender has no preference for either of these messages, the second-order theory of mind Sender believes that the Receiver could misinterpret the message DOWN-LEFT-UP-UP. As Figure 5 suggests, the Receiver may move her token to the middle location in the bottom row. The Sender believes that this would not happen with the message LEFT-DOWN-UP-UP.

Controls

The script above has a number of controls to show the effect of using a higher orders of theory of mind on the performance of agents in the Tacit Communication Game.

  • Sender/Receiver checkboxes: At the top of the script, there are two checkboxes to show and hide the mental content of the Sender and Receiver agents. The mental content shows the agent’s zero-order, first-order and second-order beliefs concerning the behavior of the other player. When a human user is playing the game, this mental content can give a lot of information on what the goal configuration is or how the agents are going to behave. For a more challenging game, remove the check from the appropriate checkbox to hide mental content.
  • Sender/Receiver theory of mind: The radio buttons determine the order of theory of mind of the two players. Players can be any order of theory of mind up to the second. Additionally, players can be controlled by a human user. This way, you can experience the effect of different outcomes on agent behavior firsthand. When human input is accepted, the arrow keys can be used to move the token.
  • Sender preferences: If a Sender has a choice of multiple messages to send, the Sender’s preferences tell him what he message he should pick. Senders can either choose to send short messages, messages that visit the goal location of the Receiver, or short messages that visit the goal location of the Receiver.
  • Reset turn: Resets the message that will be sent to the Receiver. This is only used by human Senders.
  • Playback: Repeats the Sender’s message on the game board.
  • Play turn: Play one round of the game.
  • Start and Stop: Starts and stops automatic play. When started, this mode continuously plays new games.
  • Skip game: Randomly pick a new goal configuration.
  • Clear beliefs: Resets the agents’ beliefs to forget everything they have learned.

With the game script, you can see how agents perform when their theory of mind level changes. In addition, you can experiment with how Sender preferences influence the effectiveness of theory of mind.

Negotiating with alternating offers

Theory of mind in mixed-motive interactions

The script on this page (open script in separate tab) shows the implementation of simulated agents playing the negotiation game Colored Trails. These agents make use of theory of mind, the human ability that allows us to reason about what other people know and believe. This ability is especially useful in negotiations, where the negotiating parties want to cooperate to reach an agreement, but also compete to get the best possible deal for themselves. The script on this page shows how theory of mind can help. The controls for this script are explained at the bottom of this page.
Note: The theory of mind agents behave as described in the associated paper (also available from the Publications page), with some minor changes.

Colored Trails

Figure 1: In this example, the blue player starts in the top left corner and wants to reach the bottom right corner.

Colored Trails is a negotiation game played on a board with tiles of different colors (see also the Colored Trails homepage). There are many different ways to play Colored Trails. The way we describe here is just one possibility. In the setup on this page, two players each receive a set of chips that allows them to move around on the board. Players can move horizontally and vertically to a tile next to their current location, but only if they hand in a chip of the same color as their destination tile. For example, the blue player in Figure 1 can move down by handing in a black chip. However, this means that the blue player will no longer have a black chip to reach his goal in the bottom right. Instead, the player can move right first by handing in a yellow chip. This way, he can still use the black chip to reach his goal location.

Each player receives four chips at the beginning of the game, randomly drawn from one of the colors on the board. This means that a player may not end up with the chips he needs to reach his goal location. To help players to reach their goals, players can negotiate over ownership of the chips. Negotiation takes the form of alternating in making an offer. The initiator (blue player) always starts by making the first offer. The responder (orange player) can then decide to accept the offer of the initiator, make a new offer, or withdraw from the negotiation. If the responder accepts, the chips are divided as suggested by the responder and the negotiation ends. Alternatively, if the responder withdraws, each player keeps his own chips and the negotiation ends as well. If the responder decides to make a new offer, the players switch roles and the negotiation continues. Although this process could in principle go on forever, the game we present here has a maximum of 40 offers. This means that once the initiator and the responder have made 20 offers each, the initiator can no longer make a new offer. Instead, he has to accept the offer of the responder, or withdraw from negotiation.

Each player is scored based on how closely he ends up to his goal location, indicated by a flag on the board. The scores are listed in the table below.

Situation Change in score
Ending on your goal location +50 points
Ending anywhere but your goal location -10 points per step towards your goal location
Ending with unused chips +5 points per chip

As the table shows, players get the most points for reaching their goal, although every step in the right direction helps. Also, even if you cannot use a chip to reach your goal location, it is worth a few points. After the negotiation, the game automatically gives the players the highest possible scores given their chips.

At the start of the game, each player is placed at the center of the board and receives a random goal location. When there is at least one computer-controlled player in the game, players only know their own goal location. That is, the initiator does not know the goal location of the responder and vice versa. However, goal locations are always at least three steps away from the center. Also, the initiator and the responder never have the same goal.



Theory of mind

Although the score of a player depends only on his own performance and not on the performance of the other player, whether or not the other player will accept your offer will depend on how it affects his score. It may therefore help to think about the goals of the other player. When people consider what other people want, know, or believe, they are using their theory of mind. The computer agents in the game on this page also make use of theory of mind to predict what the other player is going to do. The game allows the user to restrict agents in their ability to make use of theory of mind. This way, we can find out how higher orders of theory of mind allow agents to negotiate more effectively.

Figure 2: The orange zero-order theory of mind agent believes that the behaviour of the blue player is consistent. If the blue player rejects some offer (eg. 1 black, 2 white, 1 yellow chip), the orange player believes that the blue player will also reject a smaller offer (eg. 2 white, 1 yellow chip).

The lowest possible order of theory of mind is zero-order theory of mind. Zero-order theory of mind agents try to model others through patterns of behaviour. A zero-order theory of mind agent tries to find out what kind of offers are more likely to be successful, without reasoning about the goals of the other player. Through experience, a zero-order theory of mind agent will find out that asking for more than 6 chips, while leaving 2 or fewer chips for the other player, is very unlikely to be accepted. Instead, the zero-order theory of mind agent learns to make “fair” offers without knowing what “fair” means. To help the zero-order theory of mind agent along, agents are programmed with 1,000 games of experience. That is, when starting a game for the first time, an agent already has 1,000 negotiations worth of experience to let him know what kind of offers are more successful than others.

The zero-order theory of mind agent learns what kind of offers are more successful, but he believes that the other player has a set of offers that he is willing to accept. The zero-order theory of mind agent makes offers as if he were pushing buttons on a machine, trying to find out what button will make the trading partner do what the zero-order theory of mind agents wants him to do. On the other hand, the zero-order theory of mind agent believes that the behaviour of the other player is more or less consistent. For example, if the other player rejects an offer, the zero-order theory of mind agent believes that the other player will also reject an offer that gives fewer chips to the other player. Figure 2 shows an example of this. The orange player believes that if the blue player rejects an offer in which the blue player would get 1 black chip, 2 white chips, and 1 yellow chip, then the blue player will also reject an offer in which the blue player gets only the 2 white chips and 1 yellow chip.

Figure 3: If the orange player has first-order theory of mind, he tries to find out what the goal location of the blue player is by analyzing the offers he receives. In this example, there is only one possible goal location for which the blue player could get a higher score with the chips he is asking for than with the chips he already has.

A first-order theory of mind agent realizes that the other player has a goal, and that the other player will only accept offers that will help him reach that goal. The first-order theory of mind agent also realizes that the other player will only make offers that increase his score. By looking carefully at the offers of the other player, the first-order theory of mind agent tries to find out what the goal of the other player is. Once the first-order theory of mind agent knows what the goal location of the other player is, he can make offers that lead to a mutually beneficial outcome.

Figure 3 shows a situation in which the blue player offers to trade one of his yellow chips and a black chip against one white chip of the orange player. If the orange player is a first-order theory of mind agent, he tries to find out for what goal locations the offer of the blue player makes sense. That is, for which goal locations would the blue player have a higher score with the chips he is asking for (2 white and 1 yellow) than with his initial set of chips (1 white, 1 black, and 2 yellow). As it turns out, there is only one such location, as shown in the thought balloon of the orange player. For all other possible goal locations, the blue player would have been better off with his initial set of chips.

In the game above, you can reveal an agent’s first-order belief though the checkbox “Show mental content“. Checking this option shows a grid like the game board, where brighter locations indicate that the agent believes it to be more likely to be the other player’s goal location. This means that once an agent is convinced that the other player has a particular goal location, that location will appear white while the other locations will be black. In addition, the weight of first-order theory of mind shows the degree to which first-order theory of mind determines the agent’s behaviour. If the weight is close to 1, the agent always selects an offer suggested by his first-order theory of mind. If the weight is close to 0, the agent tends to ignore the predictions of first-order theory of mind, and behave as a zero-order theory of mind agent instead. Finally, the accuracy indicates how accurately first-order theory of mind has predicted the behaviour of the other agent. However, note that the accuracy will be very low in the beginning of the game, while the agent does not know the goal location of the other player.

Using first-order theory of mind, an agent tries to determine what the goal location of the other player is. This allows a first-order theory of mind agent to get a better idea of what kind of offers the other player is going to accept. But an agent can also use first-order theory of mind to try and manipulate the other player. A first-order theory of mind agent believes that the other player might be a zero-order theory of mind agent, who learns what the first-order theory of mind agent wants through the offers he makes. By strategically selecting his offer, the first-order theory of mind agent can try to change the beliefs of the other player. The first-order theory of mind agent may sometimes make an offer that he knows the other player would never accept because it would reduce his score. The reason for this is to push the other player into making an offer that is better for the first-order theory of mind agent. For example, a first-order theory of mind agent may ask for 3 black chips if he believes that it would convince the the other player to offer the agent at least 2 black chips.

Figure 4: If the blue agent has second-order beliefs, he can try to manipulate what the other player believes about the agent’s goal location. In this case, the agent believes that if he could for the purple chip to make his trading partner believe he needs it to reach his goal location, even though he does not need that chip.

A second-order theory of mind agent takes his reasoning one step further, and realizes that the other player may be a first-order theory of mind agent. This means that the second-order theory of mind agent believes that the other player knows that the agent has a goal, and that the other player may be trying to find out what his goal location is. Instead of trying to find out what the goal location of the other player is, a second-order theory of mind agent can make an offer that signals his own goal location to the other player. By telling the other player what his goal location is, the agent gives the other player the opportunity to find a mutually beneficial solution.

Alternatively, a second-order theory of mind agent can select offers that give very little information about his goal location to get a higher score. For example, the second-order theory of mind agent can make an offer that suggests that his goal location is further away than it actually is. For example, the blue agent in Figure 4 believes that by asking for enough chips to reach the top left tile (2 white, 1 purple, 1 yellow chip), the other player will believe that that is his goal location, even though is actual goal location is closer to the center.

When an agent’s mental content is shown in the game, it shows both first-order and second-order beliefs about the goal location of the other player. In addition, the weight of second-order theory of mind indicate to what degree second-order theory of mind influences the behaviour of the agent, while the accuracy shows how close the predictions made by second-order theory of mind match the offers actually made by the other agent.

An important feature of the agents in the game is that although they use theory of mind to predict future behaviour, they have no memory to recall previous behaviour. An agent sees the offer made by the other player, changes his beliefs accordingly, and then forgets he ever saw the offer. One of the behaviours you may see a lot is agents “insisting” on a certain distribution of chips by making the same offer over and over again. In part, this is because the agents do not remember making that offer before.

Controls

The script below has a number of controls to show the effect of using a higher orders of theory of mind on the performance of agents in rock-paper-scissors.

  • Initiator/Responder theory of mind: The radio buttons determine the order of theory of mind of the two players. Players can be any order of theory of mind up to second-order. Additionally, players can be controlled by a human user. When there is a human user in the game, the goal location of the computer player is not revealed until the end of the game. However, if two human users play the game, the goal are not hidden.
  • Show mental content: The mental content shows the agent’s first-order and second-order beliefs concerning the goal location of the other player. When a human user is playing the game, this information can give some information on how the offers are interpreted by agents. However, for a more challenging negotiation, uncheck the option to hide mental content.
  • Accept offer, Make new offer and Withdraw from negotiation: When a human user is playing the game, these buttons allows control over the next move. Use the arrow buttons to select the offer you want to make and press Make new offer. Alternatively, Accept offer accepts the previous offer, while Withdraw from negotiation stops the game without trading any chips.
  • Play round and New game: Play one round of the negotiation game. If the game has ended, pressing this button starts a new game.
  • Start and Stop: Starts and stops automatic play. When started, this mode plays a new round every 0.5 seconds.
  • Reset game: Resets the game to the start situation. The score and accuracy information is reset to zero as well.

With the game script, you can see how agents perform better when their theory of mind level increases. In addition, you can test your ability against computer agents, and see what agents believe you are doing when negotiating.

Rock-paper-scissors

Theory of mind in competition

The script on this page (open script in separate tab) shows the implementation of simulated agents playing the game of rock-paper-scissors. These agents differ make use of theory of mind, the human ability that allows us to reason about what other people know and believe. Even though rock-paper-scissors is a simple game in which trying to outsmart your opponent seems useless, the script on this page shows how theory of mind can still be effective. The controls for this script are explained at the bottom of this page.
Note: The theory of mind agents behave as described in the associated paper (also available from the Publications page), with some minor changes.



Game outline

Figure 1: In rock-paper-scissors, scissors (orange agent) beats paper (blue agent).

Rock-paper-scissors is a two-player game in which players simultaneously choose to play either rock, paper or scissors. If the two players made the same choice, the game ends in a tie. However, if one player chose scissors, while the other chose paper, the player that chose scissors wins (see Figure 1). In the same way, rock wins from scissors, and paper wins from rock.

According to game theory, the only stable strategy when playing rock-paper-scissors is to randomly choose one of the possibilities. After all, if you play according to some pattern, the other player might learn that pattern over many repeated games, and exploit that knowledge. Playing randomly makes sure that the opponent cannot learn any patterns in the way you play the game. Although this strategy works well, people are not very good at playing randomly. For example, people usually avoid playing rock when they have just played rock two times in a row, even though this should not matter in truly random play. Also, if there are some people that are not playing randomly, smart players may be able to exploit this and get a higher score than a random player.

Theory of mind

In game settings, people often consider what their opponents know and believe, by making use of what is known as theory of mind. The computer agents in the game on this page also make use of theory of mind to predict what their opponent is going to do. The game allows the user to restrict agents in their ability to make use of theory of mind. This way, we can determine whether higher orders of theory of mind allows agents to win more often in rock-paper-scissors.

Figure 2: The blue zero-order theory of mind agent tries to learn patterns in the behavior of his opponent.

The lowest possible order of theory of mind is zero-order theory of mind. Zero-order theory of mind agents try to model their opponent through patterns of behavior. For example, suppose that the opponent has always played paper before. In this case, the zero-order theory of mind agent believes that it is very likely that she will be playing paper again (see Figure 2). When a zero-order theory of mind agent sees that his opponent is often playing paper, he will try to take advantage of this by playing scissors. In the rock-paper-scissors script, the red bars indicate the agent’s zero-order tendencies to play (R)ock, (P)aper, or (S)cissors. The higher the bar for rock (R), for example, the more likely it is that a zero-order theory of mind agent will play rock.

The text below the red bars show to what extent zero-order theory of mind determines the next action of the agent (“Weight”), as well as the average accuracy of the predictions of zero-order theory of mind (“Accuracy”).

Figure 3: If the blue agent has first-order beliefs, he believes that his orange opponent may be trying to learn and exploit patterns in his behavior. By looking at the patterns in his own behavior, the blue agent predicts how the orange opponent will try to exploit these patterns.

A zero-order theory of mind agent tries to learn patterns in the behavior of his opponent, but does not realize that his opponent could be doing the same thing. A first-order theory of mind agent realizes that his opponent may be using zero-order theory of mind. He tries to predict what his opponent is going to do by placing himself in her position. He looks at the game from the perspective of his opponent to determine what he would do if the situation were reversed. The first-order theory of mind agent then uses this action as a prediction of his opponent’s behavior. For example, suppose that the first-order theory of mind agent realizes he has been playing paper a lot. He believes that his opponent may be trying to take advantage of this by playing scissors. If that is true, the agent can take advantage of this behavior by playing rock (see Figure 3).

In the game, the green bars indicates how likely a first-order theory of mind agent considers it to be that the agent will win the next round by playing (R)ock, (P)aper, or (S)cissors. This suggestion combines the agent’s zero-order and first-order beliefs. Below the graph, the weight indicates to what extent first-order theory of mind influences the decision of the agent. The accuracy indicates the average accuracy of the agent’s first-order beliefs in predicting the behavior of the opponent.

Figure 4: If the blue agent has second-order beliefs, he believes that his orange opponent believes that he himself is trying to learn and exploit patterns in her behavior. This allows him to anticipate how the orange opponent will try to exploit his behavior.

A second-order theory of mind agent takes his reasoning one step further, and realizes that his opponent may be a first-order theory of mind agent. He puts himself into the position of his opponent, but also believes that she might be putting herself into his position. For example, suppose that the second-order theory of mind agent realizes his opponent is playing paper a lot. Zero-order theory of mind makes him realize that he could take advantage of this predictable behavior by playing scissors. A second-order theory of mind agent thinks that his opponent may be expecting him to do so, and therefore that she will play rock to take advantage of the way he behaves. If that is true, the agent should continue playing paper himself (see Figure 4). The agent’s second-order beliefs are indicated by the blue bars.

In the script on this page, agents can continue this stepwise reasoning even further to use third-order and even fourth-order theory of mind. The associated beliefs are represented by orange and gray bars, respectively.

Although the agents in the game use theory of mind to predict the future, they do not remember the past choices of their opponent. Instead, when they see the outcome of a game, they form beliefs about what the opponent is going to do next time. After this, they immediately forget what they saw. This means that these agents can only look at very simple patterns of behavior. However, increasingly higher orders of theory of mind allow the agents to exhibit increasingly more complex patterns of behavior. Using the script on this page, you can experiment to see to what extent higher orders of theory of mind are still useful in rock-paper-scissors. In addition, you can also play the game against one of the agents yourself. The mental content of the agent then shows how closely your behavior corresponds to behavior of agents of different orders of theory of mind.

Controls

With the script, you can see how agents perform better when their theory of mind level increases. In addition, you can test your ability against computer agents, and see what agents believe you are doing when playing rock-paper-scissors.

  • Player 1/2 theory of mind: The radio buttons determine the order of theory of mind of the two players. Players can be any order of theory of mind up to fourth-order. Additionally, the second player can be controlled by a human user.
  • Learning speed: Determines how quickly an agent changes his beliefs based on new information. A learning speed of 0.0 means that an agent does not learn at all, but will always repeat the same behavior. An agent with learning speed 1.0, on the other hand, believes that the previous game gives him all the information he needs to predict his opponent’s next action. Agents do not try to model the learning speed of their opponent. Instead, if the two agents have different learning speeds, they will not be able to correctly model the beliefs of their opponent.
  • Reset game: Resets the game to the start situation. The score and accuracy information is reset to zero as well.
  • Play round: Play one game of rock-paper-scissors. This can only be done when player two is not user-controlled.
  • Rock, paper and scissors: When player two is user-controlled, selecting one of the three possible moves plays one game, with player two’s choice.
  • Show mental content: A human player can use the graphs to determine what the agent will do next, or what a computer agent would do next if he were the one to play next. For a human player, the game is more challenging if the graphs are not visible. Uncheck the box to hide mental content information from the graphs.

An older version of the rock-paper-scissors script is available as a Java applet. However, for security reasons, many browsers no longer allow Java applets to be run from a web browser. The rock-paper-scissors applet can be still be downloaded for offline use.

Limited Bidding

Theory of mind in competition

The script on this page (open script in separate tab) shows the implementation of simulated agents playing the game of Limited Bidding. These agents make use of theory of mind, the human ability that allows us to reason about what other people know and believe. Agents use their theory of mind to try and outsmart their opponent. The script on this page demonstrates the effectiveness of this strategy. The controls for this script are explained at the bottom of this page.
Note: The theory of mind agents on this page do not behave the same way as described in the associated paper (also available from the Publications page). Instead of calculating their planned moves for the entire game, agents plan only one action and use a Q-learning approach to learn values for the resulting state.



Game outline

Limited bidding is a simplified version of a game described in “Edward de Bono’s Supermind Pack”. The game is played by two players. At the beginning of the game, each player receives an identical set of five numbered tokens. The number on each token corresponds to the value of the token. Over the course of five rounds, players simultaneously choose one of their own tokens to play. Whoever picks the highest value token wins the round. If both players pick the same value token, there is no winner. However, each token can only be used once per game. This means that players should choose their actions strategically.

Game-theoretically, the optimal way to play limited bidding is by randomizing every choice. That is, under the assumption of common knowledge of rationality, a players should randomly pick one of the tokens still available to them. However, the theory of mind agents modeled on this page suspect that their opponent may not be fully rational. Moreover, they are limited in their ability to make decisions themselves. By playing the game repeatedly against the same opponent, agents try to learn to predict what their opponent will do, and change their strategy accordingly.

Theory of mind

Theory of mind refers to the individual’s ability to model mental content of others, such as beliefs, desires or intentions. The agents modeled in the script are constrained in their theory of mind. At the most basic level, a zero-order theory of mind agent tries to model his opponent through patterns of behavior. For example, a zero-order theory of mind agent might find out that his opponent always plays token 5 at the start of the game, or tends to save token 3 for last. However, he is unable to realize that his opponent might be doing the same. In fact, a zero-order theory of mind agent does not realize that his opponent has goals that are opposite to the ones he has himself. In the script, the agent’s zero-order beliefs are represented by red bars. The height of each red bar indicates how likely the agent believes it to be that his opponent is going to play a certain token.

A first-order theory of mind agent realizes that his opponent might be a zero-order theory of mind agent, and tries to predict what she is going to do by putting himself in her position. He looks at the game from the point of view of his opponent to determine what he would believe if the situation were reversed, and uses this as a prediction for his opponent’s actions. For example, a first-order theory of mind agent might realize that he has started the game by using token 3 a few times in a row, and suspect that his opponent is going to try and take advantage of that by playing 4 in the first round. The agent’s first-order theory of mind would therefore suggest that the agent plays 5 to win the first round. In the script, the height of the green bars represent the agent’s first-order beliefs concerning his opponent’s next action.

A second-order theory of mind agent takes this theory of mind reasoning one step further. He puts himself into the position of his opponent, but also believes that she might be putting herself into his position. In the script, the height of the blue bars indicate the agent’s second-order beliefs.

Based on zero-order, first-order and second-order beliefs, an agent makes different predictions about what his opponent is going to do. The agent must therefore also form beliefs about which of these predictions will yield the best results. An agent’s combined beliefs represent how the different order of theory of mind are combined into a single prediction of his opponent’s actions. In the script, each bar graph depicting an agent’s theory of mind beliefs also indicates the accuracy of the predictions of that particular order of theory of mind, as well as the weight of these predictions in the agent’s next action. For example, a second-order theory of mind agent that has zero weight for his second-order beliefs will ignore his second-order theory of mind, and act as if he were an agent of a lower order of theory of mind.

Although the agents in the script make use of theory of mind, they do not remember the choices of their opponent. Instead, when they see the outcome of a game, they form beliefs about what the opponent is going to do next time and forget what they saw. As an alternative type of agent, a high memory agent is a zero-order theory of mind agent that remembers what his last choice was. That is, the high memory agent forms beliefs about what his opponent is going to do in reaction to him playing each of the possible tokens. In terms of memory, a high memory agent uses about the same amount of space as a second-order theory of mind agent, although this space is used differently. Although the high memory agent is not available in the script on this page, this agent is included in the applet example, which you can download at the bottom of the page.

Controls

The script has a number of controls to allow users to judge the effect of using a higher order of theory of mind on the performance of agents in the limited bidding game.

  • Player 1/2 theory of mind: The radio buttons determine the order of theory of mind of the two players. Players can be any order of theory of mind up to second-order. Additionally, the second player can be controlled by a human user.
  • Learning speed: Determines how quickly an agent changes his beliefs based on new information. A learning speed of 0.0 means that an agent does not learn at all, and will always do the same thing. An agent with learning speed 1.0 on the other hand believes that the previous game gives him all the information he needs to predict his opponent’s behavior. Agents do not try to model the learning speed of their opponent; if the two agents have different learning speeds, they will not be able to correctly model the beliefs of their opponent.
  • Reset game: Resets the game to the start situation. The score and accuracy information is reset to zero as well.
  • Start/Stop: Starts or stops the automatic play of limited bidding games. This can only be done when player two is not user-controlled.
  • Play round: Play one game of limited bidding. This can only be done when player two is not user-controlled.
  • Token buttons: When player 2 is user-controlled, selecting one of the available orange numbered tokens performs a move in the game.
  • Show mental content: A human player can use the graphs to determine what the agent believes that the human player will do next, or what a computer agent would believe if he were the one to play next instead of the human player. For a human player, the game is more challenging if the graphs are not visible.

With the applet, you can see how agents perform better when their theory of mind level increases. The applet also shows that second-order theory of mind agents outperform agents with more memory in Limited Bidding, even though they don’t do better in simple games like Rock-paper-scissors.

An older version of the limited bidding script is available as a Java applet. However, for security reasons, many browsers no longer allow Java applets to be run from a web browser. The Limited Bidding applet can be still be downloaded for offline use. As an additional feature, this Java implementation also includes high memory agents.