Tacit Communication Game

Theory of mind in cooperation

The script on this page (open script in separate tab) shows the implementation of simulated agents playing a simplified version of the cooperative-communicative Tacit Communication Game. These agents make use of theory of mind, the human ability that allows us to reason about what other people know and believe. Using theory of mind, these agents can reason about the goals and intentions of their partner. The script demonstrates how theory of mind can help agents to set up communication more quickly. The controls for this script are explained at the bottom of this page.
The JavaScript on this page makes use of HTML5.
Note: The theory of mind agents behave as described in the associated paper (also available from the Publications page), with some minor changes.

Tacit Communication Game

Figure 1: In the Tacit Communication Game, both players have the same goal. However, only the blue Sender player knows what this goal is.

The Tacit Communication Game is a cooperative communication game played on a 3×3 board of tiles by two players. The game is played in two round. At the start of the first round, the blue Sender token is placed in the middle of the board. The Sender player may rotate and/or move this token to an adjacent tile any number of times. Once the Sender is satisfied with the final position and orientation of his token, the second round starts. When this round starts, the orange Receiver token is placed in the middle of the board, and the Receiver player may rotate and/or move this token to an adjacent tile any number of times. Once she is satisfied with the location and orientation of her token, the game ends.

The Sender and the Receiver share the same goal: the final location and orientation of their respective tokens should match a certain goal configuration. Importantly, only the Sender knows what this goal configuration is (see also Figure 1). During the first round, the Sender should therefore match his token to the goal configuration, but he also needs to communicate the goal configuration of the orange token to the Receiver using only his movements on the board. The Receiver, for her part, has to find out what the goal configuration of her orange token is based on the movements of the blue Sender token. At the end of each game, the Sender and Receiver hear whether they matched their tokens to the goal configuration or not. However, if they failed to reach the goal, the Receiver does not hear what the correct configuration was.

In the full Tacit Communication Game, Sender and Receiver can have tokens of different shapes. Figure 1 shows an example in which the Sender has a round token, while the Receiver has a triangular token. This makes the game more difficult for the Sender, because he will have to let the Receiver know what the goal orientation of her token is without being able to use the orientation of his own token. On this page, we take a look at a simplified version of the Tacit Communication Game, in which Sender and Receiver both have a round token. This means that the Sender only has to let the Receiver know where her orange token should be placed. But even in this simple game, we can already see some interesting behavior. On this page, we focus mostly on the role of theory of mind and predictable behavior.



Theory of mind

In the Tacit Communication Game, the Sender knows what the goal configuration is, but the Receiver does not. It could therefore be beneficial for the players to reason about what the other knows and believes. This reasoning ability is known as theory of mind. The software agents playing the Tacit Communication Game on this page also make use of theory of mind to predict what the other player is going to do. In the game, the theory of mind level of agents can be set to determine how this influences the ability of agents to cooperate.

Figure 2: Zero-order theory of mind agents randomly try actions until they find one that works. In this case, the zero-order theory of mind Sender sends message UP-LEFT, and the Receiver correctly guesses her goal location. This results in both players learning that sending the message UP-LEFT results in the Receiver selecting bottom right location.

The lowest possible order of theory of mind is zero-order theory of mind. Zero-order theory of mind agents can only predict future behavior based on past behavior. This means that a zero-order theory of mind Sender sends random messages to find out how the Receiver reacts to them. Once he has a clear idea of how the Receiver behaves, he sends the message for which he believes the Receiver to match her token to the goal configuration. At the same time, a zero-order theory of mind Receiver randomly tries locations until she finds the one that results in success. Figure 2 shows an example where the Sender sends the message UP-LEFT, after which the Receiver correctly guesses that she should put her token to the bottom right location. Because this was a correct guess, both Sender and Receiver now believe that if the Sender were to send UP-LEFT again, the Receiver would respond by moving her token to the bottom right location.

Figure 3: Errors can lead to conflicting beliefs in the Tacit Communication Game. In this case, a zero-order theory of mind Sender sends message UP-LEFT, and the Receiver incorrectly guesses her goal location. The Sender now believes that sending the message UP-LEFT results in the Receiver selecting the bottom right location. But the Receiver believes that if she sees the message UP-LEFT, she should try anything but the bottom right location.

When the Receiver misinterprets a message the Sender has sent, zero-order theory of mind leads to conflicting beliefs. Figure 3 shows an example of this. Here, a zero-order theory of mind Sender sent the message UP-LEFT, and the zero-order theory of mind Receiver responded incorrectly by selecting the bottom right location. The negative feedback causes the Receiver to believe that, if the Sender were to send the message UP-LEFT again, she should not choose the bottom right location again. On the other hand, the Sender believes that if he were to send the message UP-LEFT again, the Receiver will respond by selecting the bottom right location again.

On first glance, it may seem strange that after this negative feedback, the Sender believes that the Receiver will not change her behavior. After all, the Receiver has also seen this negative feedback, so the Sender should expect that she will change her behavior. However, this would mean that the Sender knows that the Receiver has a goal: to match her token to the goal configuration. But the zero-order theory of mind Sender cannot reason about the beliefs and goals of the Receiver. To the zero-order theory of mind Sender, the Receiver is like a coffee machine with the labels remove. The Sender randomly pushes buttons to try and get the type of coffee he wants. If he has pushed the wrong button, he believes that if he were to press the same button again, the coffee machine would produce the same result. In the same way, the Sender believes that if he sends the same message, the Receiver would produce the same behavior again. For people, however, theory of mind reasoning is so natural that zero-order theory of mind reasoning actually seems counterintuitive.

Figure 4: First-order theory of mind allows a Receiver to look at the game from the perspective of the Sender. This way, a first-order theory of mind Receiver believes that any new message (RIGHT-UP-LEFT-LEFT) is not meant to communicate a goal location that the Sender has already found a good message for.

A first-order theory of mind agent can reason about the goals of others. Such an agent realizes that the two players have the same goal. This especially helps the Receiver she she tries to interpret the messages of the Sender, as is shown in Figure 4. When a first-order theory of mind Receiver sees a message, she tries to figure out for what goal configuration she would have decided to send the same message. This helps the Receiver to interpret new messages. Figure 4 shows a situation in which the Sender has previously sent the message UP-LEFT, after which the Receiver correctly guessed that the bottom right tile was her goal location. When the Receiver sees the message RIGHT-UP-LEFT-LEFT, she takes the perspective of the Sender and concludes that her goal location is not the bottom right tile. After all, if the Sender had wanted her to go to the bottom right tile, he would have sent the message UP-LEFT.

First-order theory of mind agents believe that other players may be zero-order theory of mind agents. However, if both Sender and Receiver are first-order theory of mind agents, both agents are mistaken. For the best results, either the Sender or the Receiver should use zero-order theory of mind.

Figure 5: Second-order theory of mind helps when players are predictable. In this case, Senders want to send short messages that visit the goal location of the Receiver. A second-order theory of mind Sender reasons that sending the message LEFT-DOWN-UP-UP will cause the Receiver to move her token to the bottom left tile, but the message DOWN-LEFT-UP-UP may not.

A second-order theory of mind agent takes this reasoning one step further, and believes that the other player may be a first-order theory of mind agent. This means that the second-order theory of mind agent believes that the other player knows that both players have the same goal. Interestingly, unlike in competition and negotiation, second-order theory of mind does not provide any additional benefits in the standard model. However, second-order theory of mind can be beneficial in cooperative settings such as the Tacit Communication Game when player behavior more predictable.
For example, Senders may prefer to send different messages for different goal configurations. In the game on this page, Sender preferences can be set to short messages, messages that visit the Receiver’s goal location, and short messages that visit the the Receiver’s goal location. This can help agents to play the Tacit Communication Game more effectively.

Figure 5 shows an example in which Senders prefer to send messages that are as short as possible, but also visit the goal location of the Receiver. In this example, the Sender wants to send a message that lets the Receiver know that she should place her token on the bottom left tile. By placing himself in the position of the Receiver, the Sender tries to predict how the Receiver will react to the message LEFT-DOWN-UP-UP. For example, the Receiver may think that the Sender wants her to go to the left tile in the middle row. After all, the Sender’s message visits this location. However, the Receiver knows that there is a shorter message (LEFT-UP) that the Sender could have sent, and that would still have visited the same location. By placing herself in the shoes of the Sender, the Receiver reasons that the Sender would have preferred to send LEFT-UP in this case. As a result, the Receiver believes that the message LEFT-DOWN-UP-UP is not intended to tell her that her goal location is the left tile in the middle row. In fact, the only location that the first-order theory of mind Receiver considers to be her goal location is the bottom left tile.

Through the use of second-order theory of mind, the Sender believes that if he were to send the message LEFT-DOWN-UP-UP, the Receiver would respond by moving to the bottom left tile. Moreover, the use of second-order theory of mind lets the Sender know that LEFT-DOWN-UP-UP is a better message than DOWN-LEFT-UP-UP. Even though the messages have the same length, and the Sender has no preference for either of these messages, the second-order theory of mind Sender believes that the Receiver could misinterpret the message DOWN-LEFT-UP-UP. As Figure 5 suggests, the Receiver may move her token to the middle location in the bottom row. The Sender believes that this would not happen with the message LEFT-DOWN-UP-UP.

Controls

The script above has a number of controls to show the effect of using a higher orders of theory of mind on the performance of agents in the Tacit Communication Game.

  • Sender/Receiver checkboxes: At the top of the script, there are two checkboxes to show and hide the mental content of the Sender and Receiver agents. The mental content shows the agent’s zero-order, first-order and second-order beliefs concerning the behavior of the other player. When a human user is playing the game, this mental content can give a lot of information on what the goal configuration is or how the agents are going to behave. For a more challenging game, remove the check from the appropriate checkbox to hide mental content.
  • Sender/Receiver theory of mind: The radio buttons determine the order of theory of mind of the two players. Players can be any order of theory of mind up to the second. Additionally, players can be controlled by a human user. This way, you can experience the effect of different outcomes on agent behavior firsthand. When human input is accepted, the arrow keys can be used to move the token.
  • Sender preferences: If a Sender has a choice of multiple messages to send, the Sender’s preferences tell him what he message he should pick. Senders can either choose to send short messages, messages that visit the goal location of the Receiver, or short messages that visit the goal location of the Receiver.
  • Reset turn: Resets the message that will be sent to the Receiver. This is only used by human Senders.
  • Playback: Repeats the Sender’s message on the game board.
  • Play turn: Play one round of the game.
  • Start and Stop: Starts and stops automatic play. When started, this mode continuously plays new games.
  • Skip game: Randomly pick a new goal configuration.
  • Clear beliefs: Resets the agents’ beliefs to forget everything they have learned.

With the game script, you can see how agents perform when their theory of mind level changes. In addition, you can experiment with how Sender preferences influence the effectiveness of theory of mind.

Negotiating with alternating offers

Theory of mind in mixed-motive interactions

The script on this page (open script in separate tab) shows the implementation of simulated agents playing the negotiation game Colored Trails. These agents make use of theory of mind, the human ability that allows us to reason about what other people know and believe. This ability is especially useful in negotiations, where the negotiating parties want to cooperate to reach an agreement, but also compete to get the best possible deal for themselves. The script on this page shows how theory of mind can help. The controls for this script are explained at the bottom of this page.
Note: The theory of mind agents behave as described in the associated paper (also available from the Publications page), with some minor changes.

Colored Trails

Figure 1: In this example, the blue player starts in the top left corner and wants to reach the bottom right corner.

Colored Trails is a negotiation game played on a board with tiles of different colors (see also the Colored Trails homepage). There are many different ways to play Colored Trails. The way we describe here is just one possibility. In the setup on this page, two players each receive a set of chips that allows them to move around on the board. Players can move horizontally and vertically to a tile next to their current location, but only if they hand in a chip of the same color as their destination tile. For example, the blue player in Figure 1 can move down by handing in a black chip. However, this means that the blue player will no longer have a black chip to reach his goal in the bottom right. Instead, the player can move right first by handing in a yellow chip. This way, he can still use the black chip to reach his goal location.

Each player receives four chips at the beginning of the game, randomly drawn from one of the colors on the board. This means that a player may not end up with the chips he needs to reach his goal location. To help players to reach their goals, players can negotiate over ownership of the chips. Negotiation takes the form of alternating in making an offer. The initiator (blue player) always starts by making the first offer. The responder (orange player) can then decide to accept the offer of the initiator, make a new offer, or withdraw from the negotiation. If the responder accepts, the chips are divided as suggested by the responder and the negotiation ends. Alternatively, if the responder withdraws, each player keeps his own chips and the negotiation ends as well. If the responder decides to make a new offer, the players switch roles and the negotiation continues. Although this process could in principle go on forever, the game we present here has a maximum of 40 offers. This means that once the initiator and the responder have made 20 offers each, the initiator can no longer make a new offer. Instead, he has to accept the offer of the responder, or withdraw from negotiation.

Each player is scored based on how closely he ends up to his goal location, indicated by a flag on the board. The scores are listed in the table below.

Situation Change in score
Ending on your goal location +50 points
Ending anywhere but your goal location -10 points per step towards your goal location
Ending with unused chips +5 points per chip

As the table shows, players get the most points for reaching their goal, although every step in the right direction helps. Also, even if you cannot use a chip to reach your goal location, it is worth a few points. After the negotiation, the game automatically gives the players the highest possible scores given their chips.

At the start of the game, each player is placed at the center of the board and receives a random goal location. When there is at least one computer-controlled player in the game, players only know their own goal location. That is, the initiator does not know the goal location of the responder and vice versa. However, goal locations are always at least three steps away from the center. Also, the initiator and the responder never have the same goal.



Theory of mind

Although the score of a player depends only on his own performance and not on the performance of the other player, whether or not the other player will accept your offer will depend on how it affects his score. It may therefore help to think about the goals of the other player. When people consider what other people want, know, or believe, they are using their theory of mind. The computer agents in the game on this page also make use of theory of mind to predict what the other player is going to do. The game allows the user to restrict agents in their ability to make use of theory of mind. This way, we can find out how higher orders of theory of mind allow agents to negotiate more effectively.

Figure 2: The orange zero-order theory of mind agent believes that the behaviour of the blue player is consistent. If the blue player rejects some offer (eg. 1 black, 2 white, 1 yellow chip), the orange player believes that the blue player will also reject a smaller offer (eg. 2 white, 1 yellow chip).

The lowest possible order of theory of mind is zero-order theory of mind. Zero-order theory of mind agents try to model others through patterns of behaviour. A zero-order theory of mind agent tries to find out what kind of offers are more likely to be successful, without reasoning about the goals of the other player. Through experience, a zero-order theory of mind agent will find out that asking for more than 6 chips, while leaving 2 or fewer chips for the other player, is very unlikely to be accepted. Instead, the zero-order theory of mind agent learns to make “fair” offers without knowing what “fair” means. To help the zero-order theory of mind agent along, agents are programmed with 1,000 games of experience. That is, when starting a game for the first time, an agent already has 1,000 negotiations worth of experience to let him know what kind of offers are more successful than others.

The zero-order theory of mind agent learns what kind of offers are more successful, but he believes that the other player has a set of offers that he is willing to accept. The zero-order theory of mind agent makes offers as if he were pushing buttons on a machine, trying to find out what button will make the trading partner do what the zero-order theory of mind agents wants him to do. On the other hand, the zero-order theory of mind agent believes that the behaviour of the other player is more or less consistent. For example, if the other player rejects an offer, the zero-order theory of mind agent believes that the other player will also reject an offer that gives fewer chips to the other player. Figure 2 shows an example of this. The orange player believes that if the blue player rejects an offer in which the blue player would get 1 black chip, 2 white chips, and 1 yellow chip, then the blue player will also reject an offer in which the blue player gets only the 2 white chips and 1 yellow chip.

Figure 3: If the orange player has first-order theory of mind, he tries to find out what the goal location of the blue player is by analyzing the offers he receives. In this example, there is only one possible goal location for which the blue player could get a higher score with the chips he is asking for than with the chips he already has.

A first-order theory of mind agent realizes that the other player has a goal, and that the other player will only accept offers that will help him reach that goal. The first-order theory of mind agent also realizes that the other player will only make offers that increase his score. By looking carefully at the offers of the other player, the first-order theory of mind agent tries to find out what the goal of the other player is. Once the first-order theory of mind agent knows what the goal location of the other player is, he can make offers that lead to a mutually beneficial outcome.

Figure 3 shows a situation in which the blue player offers to trade one of his yellow chips and a black chip against one white chip of the orange player. If the orange player is a first-order theory of mind agent, he tries to find out for what goal locations the offer of the blue player makes sense. That is, for which goal locations would the blue player have a higher score with the chips he is asking for (2 white and 1 yellow) than with his initial set of chips (1 white, 1 black, and 2 yellow). As it turns out, there is only one such location, as shown in the thought balloon of the orange player. For all other possible goal locations, the blue player would have been better off with his initial set of chips.

In the game above, you can reveal an agent’s first-order belief though the checkbox “Show mental content“. Checking this option shows a grid like the game board, where brighter locations indicate that the agent believes it to be more likely to be the other player’s goal location. This means that once an agent is convinced that the other player has a particular goal location, that location will appear white while the other locations will be black. In addition, the weight of first-order theory of mind shows the degree to which first-order theory of mind determines the agent’s behaviour. If the weight is close to 1, the agent always selects an offer suggested by his first-order theory of mind. If the weight is close to 0, the agent tends to ignore the predictions of first-order theory of mind, and behave as a zero-order theory of mind agent instead. Finally, the accuracy indicates how accurately first-order theory of mind has predicted the behaviour of the other agent. However, note that the accuracy will be very low in the beginning of the game, while the agent does not know the goal location of the other player.

Using first-order theory of mind, an agent tries to determine what the goal location of the other player is. This allows a first-order theory of mind agent to get a better idea of what kind of offers the other player is going to accept. But an agent can also use first-order theory of mind to try and manipulate the other player. A first-order theory of mind agent believes that the other player might be a zero-order theory of mind agent, who learns what the first-order theory of mind agent wants through the offers he makes. By strategically selecting his offer, the first-order theory of mind agent can try to change the beliefs of the other player. The first-order theory of mind agent may sometimes make an offer that he knows the other player would never accept because it would reduce his score. The reason for this is to push the other player into making an offer that is better for the first-order theory of mind agent. For example, a first-order theory of mind agent may ask for 3 black chips if he believes that it would convince the the other player to offer the agent at least 2 black chips.

Figure 4: If the blue agent has second-order beliefs, he can try to manipulate what the other player believes about the agent’s goal location. In this case, the agent believes that if he could for the purple chip to make his trading partner believe he needs it to reach his goal location, even though he does not need that chip.

A second-order theory of mind agent takes his reasoning one step further, and realizes that the other player may be a first-order theory of mind agent. This means that the second-order theory of mind agent believes that the other player knows that the agent has a goal, and that the other player may be trying to find out what his goal location is. Instead of trying to find out what the goal location of the other player is, a second-order theory of mind agent can make an offer that signals his own goal location to the other player. By telling the other player what his goal location is, the agent gives the other player the opportunity to find a mutually beneficial solution.

Alternatively, a second-order theory of mind agent can select offers that give very little information about his goal location to get a higher score. For example, the second-order theory of mind agent can make an offer that suggests that his goal location is further away than it actually is. For example, the blue agent in Figure 4 believes that by asking for enough chips to reach the top left tile (2 white, 1 purple, 1 yellow chip), the other player will believe that that is his goal location, even though is actual goal location is closer to the center.

When an agent’s mental content is shown in the game, it shows both first-order and second-order beliefs about the goal location of the other player. In addition, the weight of second-order theory of mind indicate to what degree second-order theory of mind influences the behaviour of the agent, while the accuracy shows how close the predictions made by second-order theory of mind match the offers actually made by the other agent.

An important feature of the agents in the game is that although they use theory of mind to predict future behaviour, they have no memory to recall previous behaviour. An agent sees the offer made by the other player, changes his beliefs accordingly, and then forgets he ever saw the offer. One of the behaviours you may see a lot is agents “insisting” on a certain distribution of chips by making the same offer over and over again. In part, this is because the agents do not remember making that offer before.

Controls

The script below has a number of controls to show the effect of using a higher orders of theory of mind on the performance of agents in rock-paper-scissors.

  • Initiator/Responder theory of mind: The radio buttons determine the order of theory of mind of the two players. Players can be any order of theory of mind up to second-order. Additionally, players can be controlled by a human user. When there is a human user in the game, the goal location of the computer player is not revealed until the end of the game. However, if two human users play the game, the goal are not hidden.
  • Show mental content: The mental content shows the agent’s first-order and second-order beliefs concerning the goal location of the other player. When a human user is playing the game, this information can give some information on how the offers are interpreted by agents. However, for a more challenging negotiation, uncheck the option to hide mental content.
  • Accept offer, Make new offer and Withdraw from negotiation: When a human user is playing the game, these buttons allows control over the next move. Use the arrow buttons to select the offer you want to make and press Make new offer. Alternatively, Accept offer accepts the previous offer, while Withdraw from negotiation stops the game without trading any chips.
  • Play round and New game: Play one round of the negotiation game. If the game has ended, pressing this button starts a new game.
  • Start and Stop: Starts and stops automatic play. When started, this mode plays a new round every 0.5 seconds.
  • Reset game: Resets the game to the start situation. The score and accuracy information is reset to zero as well.

With the game script, you can see how agents perform better when their theory of mind level increases. In addition, you can test your ability against computer agents, and see what agents believe you are doing when negotiating.