Theory of mind in cooperation
The script on this page (open script in separate tab) shows the implementation of simulated agents playing a simplified version of the cooperative-communicative Tacit Communication Game. These agents make use of theory of mind, the human ability that allows us to reason about what other people know and believe. Using theory of mind, these agents can reason about the goals and intentions of their partner. The script demonstrates how theory of mind can help agents to set up communication more quickly. The controls for this script are explained at the bottom of this page.
Note: The theory of mind agents behave as described in the associated paper (also available from the Publications page), with some minor changes.
Tacit Communication Game
|Figure 1: In the Tacit Communication Game, both players have the same goal. However, only the blue Sender player knows what this goal is.|
The Tacit Communication Game is a cooperative communication game played on a 3×3 board of tiles by two players. The game is played in two round. At the start of the first round, the blue Sender token is placed in the middle of the board. The Sender player may rotate and/or move this token to an adjacent tile any number of times. Once the Sender is satisfied with the final position and orientation of his token, the second round starts. When this round starts, the orange Receiver token is placed in the middle of the board, and the Receiver player may rotate and/or move this token to an adjacent tile any number of times. Once she is satisfied with the location and orientation of her token, the game ends.
The Sender and the Receiver share the same goal: the final location and orientation of their respective tokens should match a certain goal configuration. Importantly, only the Sender knows what this goal configuration is (see also Figure 1). During the first round, the Sender should therefore match his token to the goal configuration, but he also needs to communicate the goal configuration of the orange token to the Receiver using only his movements on the board. The Receiver, for her part, has to find out what the goal configuration of her orange token is based on the movements of the blue Sender token. At the end of each game, the Sender and Receiver hear whether they matched their tokens to the goal configuration or not. However, if they failed to reach the goal, the Receiver does not hear what the correct configuration was.
In the full Tacit Communication Game, Sender and Receiver can have tokens of different shapes. Figure 1 shows an example in which the Sender has a round token, while the Receiver has a triangular token. This makes the game more difficult for the Sender, because he will have to let the Receiver know what the goal orientation of her token is without being able to use the orientation of his own token. On this page, we take a look at a simplified version of the Tacit Communication Game, in which Sender and Receiver both have a round token. This means that the Sender only has to let the Receiver know where her orange token should be placed. But even in this simple game, we can already see some interesting behavior. On this page, we focus mostly on the role of theory of mind and predictable behavior.
Theory of mind
In the Tacit Communication Game, the Sender knows what the goal configuration is, but the Receiver does not. It could therefore be beneficial for the players to reason about what the other knows and believes. This reasoning ability is known as theory of mind. The software agents playing the Tacit Communication Game on this page also make use of theory of mind to predict what the other player is going to do. In the game, the theory of mind level of agents can be set to determine how this influences the ability of agents to cooperate.
|Figure 2: Zero-order theory of mind agents randomly try actions until they find one that works. In this case, the zero-order theory of mind Sender sends message UP-LEFT, and the Receiver correctly guesses her goal location. This results in both players learning that sending the message UP-LEFT results in the Receiver selecting bottom right location.|
The lowest possible order of theory of mind is zero-order theory of mind. Zero-order theory of mind agents can only predict future behavior based on past behavior. This means that a zero-order theory of mind Sender sends random messages to find out how the Receiver reacts to them. Once he has a clear idea of how the Receiver behaves, he sends the message for which he believes the Receiver to match her token to the goal configuration. At the same time, a zero-order theory of mind Receiver randomly tries locations until she finds the one that results in success. Figure 2 shows an example where the Sender sends the message UP-LEFT, after which the Receiver correctly guesses that she should put her token to the bottom right location. Because this was a correct guess, both Sender and Receiver now believe that if the Sender were to send UP-LEFT again, the Receiver would respond by moving her token to the bottom right location.
|Figure 3: Errors can lead to conflicting beliefs in the Tacit Communication Game. In this case, a zero-order theory of mind Sender sends message UP-LEFT, and the Receiver incorrectly guesses her goal location. The Sender now believes that sending the message UP-LEFT results in the Receiver selecting the bottom right location. But the Receiver believes that if she sees the message UP-LEFT, she should try anything but the bottom right location.|
When the Receiver misinterprets a message the Sender has sent, zero-order theory of mind leads to conflicting beliefs. Figure 3 shows an example of this. Here, a zero-order theory of mind Sender sent the message UP-LEFT, and the zero-order theory of mind Receiver responded incorrectly by selecting the bottom right location. The negative feedback causes the Receiver to believe that, if the Sender were to send the message UP-LEFT again, she should not choose the bottom right location again. On the other hand, the Sender believes that if he were to send the message UP-LEFT again, the Receiver will respond by selecting the bottom right location again.
On first glance, it may seem strange that after this negative feedback, the Sender believes that the Receiver will not change her behavior. After all, the Receiver has also seen this negative feedback, so the Sender should expect that she will change her behavior. However, this would mean that the Sender knows that the Receiver has a goal: to match her token to the goal configuration. But the zero-order theory of mind Sender cannot reason about the beliefs and goals of the Receiver. To the zero-order theory of mind Sender, the Receiver is like a coffee machine with the labels remove. The Sender randomly pushes buttons to try and get the type of coffee he wants. If he has pushed the wrong button, he believes that if he were to press the same button again, the coffee machine would produce the same result. In the same way, the Sender believes that if he sends the same message, the Receiver would produce the same behavior again. For people, however, theory of mind reasoning is so natural that zero-order theory of mind reasoning actually seems counterintuitive.
|Figure 4: First-order theory of mind allows a Receiver to look at the game from the perspective of the Sender. This way, a first-order theory of mind Receiver believes that any new message (RIGHT-UP-LEFT-LEFT) is not meant to communicate a goal location that the Sender has already found a good message for.|
A first-order theory of mind agent can reason about the goals of others. Such an agent realizes that the two players have the same goal. This especially helps the Receiver she she tries to interpret the messages of the Sender, as is shown in Figure 4. When a first-order theory of mind Receiver sees a message, she tries to figure out for what goal configuration she would have decided to send the same message. This helps the Receiver to interpret new messages. Figure 4 shows a situation in which the Sender has previously sent the message UP-LEFT, after which the Receiver correctly guessed that the bottom right tile was her goal location. When the Receiver sees the message RIGHT-UP-LEFT-LEFT, she takes the perspective of the Sender and concludes that her goal location is not the bottom right tile. After all, if the Sender had wanted her to go to the bottom right tile, he would have sent the message UP-LEFT.
First-order theory of mind agents believe that other players may be zero-order theory of mind agents. However, if both Sender and Receiver are first-order theory of mind agents, both agents are mistaken. For the best results, either the Sender or the Receiver should use zero-order theory of mind.
|Figure 5: Second-order theory of mind helps when players are predictable. In this case, Senders want to send short messages that visit the goal location of the Receiver. A second-order theory of mind Sender reasons that sending the message LEFT-DOWN-UP-UP will cause the Receiver to move her token to the bottom left tile, but the message DOWN-LEFT-UP-UP may not.|
A second-order theory of mind agent takes this reasoning one step further, and believes that the other player may be a first-order theory of mind agent. This means that the second-order theory of mind agent believes that the other player knows that both players have the same goal. Interestingly, unlike in competition and negotiation, second-order theory of mind does not provide any additional benefits in the standard model. However, second-order theory of mind can be beneficial in cooperative settings such as the Tacit Communication Game when player behavior more predictable.
For example, Senders may prefer to send different messages for different goal configurations. In the game on this page, Sender preferences can be set to short messages, messages that visit the Receiver’s goal location, and short messages that visit the the Receiver’s goal location. This can help agents to play the Tacit Communication Game more effectively.
Figure 5 shows an example in which Senders prefer to send messages that are as short as possible, but also visit the goal location of the Receiver. In this example, the Sender wants to send a message that lets the Receiver know that she should place her token on the bottom left tile. By placing himself in the position of the Receiver, the Sender tries to predict how the Receiver will react to the message LEFT-DOWN-UP-UP. For example, the Receiver may think that the Sender wants her to go to the left tile in the middle row. After all, the Sender’s message visits this location. However, the Receiver knows that there is a shorter message (LEFT-UP) that the Sender could have sent, and that would still have visited the same location. By placing herself in the shoes of the Sender, the Receiver reasons that the Sender would have preferred to send LEFT-UP in this case. As a result, the Receiver believes that the message LEFT-DOWN-UP-UP is not intended to tell her that her goal location is the left tile in the middle row. In fact, the only location that the first-order theory of mind Receiver considers to be her goal location is the bottom left tile.
Through the use of second-order theory of mind, the Sender believes that if he were to send the message LEFT-DOWN-UP-UP, the Receiver would respond by moving to the bottom left tile. Moreover, the use of second-order theory of mind lets the Sender know that LEFT-DOWN-UP-UP is a better message than DOWN-LEFT-UP-UP. Even though the messages have the same length, and the Sender has no preference for either of these messages, the second-order theory of mind Sender believes that the Receiver could misinterpret the message DOWN-LEFT-UP-UP. As Figure 5 suggests, the Receiver may move her token to the middle location in the bottom row. The Sender believes that this would not happen with the message LEFT-DOWN-UP-UP.
The script above has a number of controls to show the effect of using a higher orders of theory of mind on the performance of agents in the Tacit Communication Game.
- Sender/Receiver checkboxes: At the top of the script, there are two checkboxes to show and hide the mental content of the Sender and Receiver agents. The mental content shows the agent’s zero-order, first-order and second-order beliefs concerning the behavior of the other player. When a human user is playing the game, this mental content can give a lot of information on what the goal configuration is or how the agents are going to behave. For a more challenging game, remove the check from the appropriate checkbox to hide mental content.
- Sender/Receiver theory of mind: The radio buttons determine the order of theory of mind of the two players. Players can be any order of theory of mind up to the second. Additionally, players can be controlled by a human user. This way, you can experience the effect of different outcomes on agent behavior firsthand. When human input is accepted, the arrow keys can be used to move the token.
- Sender preferences: If a Sender has a choice of multiple messages to send, the Sender’s preferences tell him what he message he should pick. Senders can either choose to send short messages, messages that visit the goal location of the Receiver, or short messages that visit the goal location of the Receiver.
- Reset turn: Resets the message that will be sent to the Receiver. This is only used by human Senders.
- Playback: Repeats the Sender’s message on the game board.
- Play turn: Play one round of the game.
- Start and Stop: Starts and stops automatic play. When started, this mode continuously plays new games.
- Skip game: Randomly pick a new goal configuration.
- Clear beliefs: Resets the agents’ beliefs to forget everything they have learned.
With the game script, you can see how agents perform when their theory of mind level changes. In addition, you can experiment with how Sender preferences influence the effectiveness of theory of mind.