Theory of mind in competition
The script on this page (open script in separate tab) shows the implementation of simulated agents playing the game of Limited Bidding. These agents make use of theory of mind, the human ability that allows us to reason about what other people know and believe. Agents use their theory of mind to try and outsmart their opponent. The script on this page demonstrates the effectiveness of this strategy. The controls for this script are explained at the bottom of this page.
Note: The theory of mind agents on this page do not behave the same way as described in the associated paper (also available from the Publications page). Instead of calculating their planned moves for the entire game, agents plan only one action and use a Q-learning approach to learn values for the resulting state.
Limited bidding is a simplified version of a game described in “Edward de Bono’s Supermind Pack”. The game is played by two players. At the beginning of the game, each player receives an identical set of five numbered tokens. The number on each token corresponds to the value of the token. Over the course of five rounds, players simultaneously choose one of their own tokens to play. Whoever picks the highest value token wins the round. If both players pick the same value token, there is no winner. However, each token can only be used once per game. This means that players should choose their actions strategically.
Game-theoretically, the optimal way to play limited bidding is by randomizing every choice. That is, under the assumption of common knowledge of rationality, a players should randomly pick one of the tokens still available to them. However, the theory of mind agents modeled on this page suspect that their opponent may not be fully rational. Moreover, they are limited in their ability to make decisions themselves. By playing the game repeatedly against the same opponent, agents try to learn to predict what their opponent will do, and change their strategy accordingly.
Theory of mind
Theory of mind refers to the individual’s ability to model mental content of others, such as beliefs, desires or intentions. The agents modeled in the script are constrained in their theory of mind. At the most basic level, a zero-order theory of mind agent tries to model his opponent through patterns of behavior. For example, a zero-order theory of mind agent might find out that his opponent always plays token 5 at the start of the game, or tends to save token 3 for last. However, he is unable to realize that his opponent might be doing the same. In fact, a zero-order theory of mind agent does not realize that his opponent has goals that are opposite to the ones he has himself. In the script, the agent’s zero-order beliefs are represented by red bars. The height of each red bar indicates how likely the agent believes it to be that his opponent is going to play a certain token.
A first-order theory of mind agent realizes that his opponent might be a zero-order theory of mind agent, and tries to predict what she is going to do by putting himself in her position. He looks at the game from the point of view of his opponent to determine what he would believe if the situation were reversed, and uses this as a prediction for his opponent’s actions. For example, a first-order theory of mind agent might realize that he has started the game by using token 3 a few times in a row, and suspect that his opponent is going to try and take advantage of that by playing 4 in the first round. The agent’s first-order theory of mind would therefore suggest that the agent plays 5 to win the first round. In the script, the height of the green bars represent the agent’s first-order beliefs concerning his opponent’s next action.
A second-order theory of mind agent takes this theory of mind reasoning one step further. He puts himself into the position of his opponent, but also believes that she might be putting herself into his position. In the script, the height of the blue bars indicate the agent’s second-order beliefs.
Based on zero-order, first-order and second-order beliefs, an agent makes different predictions about what his opponent is going to do. The agent must therefore also form beliefs about which of these predictions will yield the best results. An agent’s combined beliefs represent how the different order of theory of mind are combined into a single prediction of his opponent’s actions. In the script, each bar graph depicting an agent’s theory of mind beliefs also indicates the accuracy of the predictions of that particular order of theory of mind, as well as the weight of these predictions in the agent’s next action. For example, a second-order theory of mind agent that has zero weight for his second-order beliefs will ignore his second-order theory of mind, and act as if he were an agent of a lower order of theory of mind.
Although the agents in the script make use of theory of mind, they do not remember the choices of their opponent. Instead, when they see the outcome of a game, they form beliefs about what the opponent is going to do next time and forget what they saw. As an alternative type of agent, a high memory agent is a zero-order theory of mind agent that remembers what his last choice was. That is, the high memory agent forms beliefs about what his opponent is going to do in reaction to him playing each of the possible tokens. In terms of memory, a high memory agent uses about the same amount of space as a second-order theory of mind agent, although this space is used differently. Although the high memory agent is not available in the script on this page, this agent is included in the applet example, which you can download at the bottom of the page.
The script has a number of controls to allow users to judge the effect of using a higher order of theory of mind on the performance of agents in the limited bidding game.
- Player 1/2 theory of mind: The radio buttons determine the order of theory of mind of the two players. Players can be any order of theory of mind up to second-order. Additionally, the second player can be controlled by a human user.
- Learning speed: Determines how quickly an agent changes his beliefs based on new information. A learning speed of 0.0 means that an agent does not learn at all, and will always do the same thing. An agent with learning speed 1.0 on the other hand believes that the previous game gives him all the information he needs to predict his opponent’s behavior. Agents do not try to model the learning speed of their opponent; if the two agents have different learning speeds, they will not be able to correctly model the beliefs of their opponent.
- Reset game: Resets the game to the start situation. The score and accuracy information is reset to zero as well.
- Start/Stop: Starts or stops the automatic play of limited bidding games. This can only be done when player two is not user-controlled.
- Play round: Play one game of limited bidding. This can only be done when player two is not user-controlled.
- Token buttons: When player 2 is user-controlled, selecting one of the available orange numbered tokens performs a move in the game.
- Show mental content: A human player can use the graphs to determine what the agent believes that the human player will do next, or what a computer agent would believe if he were the one to play next instead of the human player. For a human player, the game is more challenging if the graphs are not visible.
With the applet, you can see how agents perform better when their theory of mind level increases. The applet also shows that second-order theory of mind agents outperform agents with more memory in Limited Bidding, even though they don’t do better in simple games like Rock-paper-scissors.
An older version of the limited bidding script is available as a Java applet. However, for security reasons, many browsers no longer allow Java applets to be run from a web browser. The Limited Bidding applet can be still be downloaded for offline use. As an additional feature, this Java implementation also includes high memory agents.