Repeated simple spatial games

It’s all about who you know

Why do people cooperate with each other? Why do animals work together? How can cooperation survive when there are opportunities to cheat? An important observation in these questions is that it matters who you interact with. If you are likely to interact multiple times with the same people, it may be better to cooperate rather than to cheat (see also Axelrod and Hamilton, 1981). The script on this page (open script in separate tab) shows an implementation of a small number of repeated simple spatial games, both cooperative and competitive, and allows you to experiment with some of the spatial features. The controls for this script are explained at the bottom of this page.
The JavaScript on this page makes use of HTML5.

Repeated simple spatial games

The script on this page simulates agents that repeatedly play simple spatial games. These spatial games are called simple because they are:

  • two-player, since every interaction is between two individuals;
  • symmetric, since the outcome of each interaction depends only on the actions of the players, and not on who these players are; and
  • simultaneous, since both players select their action at the same time so that no player can react to the action selected by the other.

Individuals are organized on a grid, which may or may not include empty spots. Each individual has a colour that identifies what strategy it is playing. In the Prisoner’s Dilemma, for example, red individuals defect while blue individuals cooperate. At every time step, every individual plays the simple game with each neighbour within its interaction neighbourhood. Both the range and the shape of the interaction neighbourhood can be set within the script. The outcome of each game depends only on the strategies used by the two players, which can be seen in the payoff matrix in the bottom right of the script. In the Prisoner’s Dilemma, for example, when a cooperator plays against a neighbouring defector, the cooperator gets a score of -1, while the defector receives a score of 4.

All individuals play the game with each neighbour in their interaction neigbourhood. Once all games have been played, individuals look around to determine the neighbour in their vision neighbourhood that has the highest average score. If that score is higher than the individual’s own score, the individual switches its strategy to match the one of their neighbour. If there happens to be more than one neighbour with the highest average score, the individual will randomly pick one of those neighbours’ strategies. However, if one of the strategies that yields the highest observed score is the agent’s own strategy, the agent will always decide to keep its own strategy. This means that a cooperator will choose to stay a cooperator as long as the highest average score that he can see is obtained by another cooperator, even if the individual himself gets a very low score.

The script allows you to simulate a number of different games. The Prisoner’s dilemma, Hawks and doves, and Hawks, doves, and retaliators games are cooperative dilemmas. In these games, all individuals would benefit from the cooperation of other agents, but experience an individual incentive to defect on cooperation. In these cooperative dilemma’s, you can see how “clusters” of cooperation can grow to overcome defectors. What allows these clusters to survive is that in the center of such a cluster, cooperators only interact with other cooperators. That is, individuals inside a cluster cannot be cheated by faraway defectors.

In contrast, Rock-paper-scissors, Rock-paper-scissors-Spock-lizard, and Elemental RPS are purely competitive settings. In each of these setting, each action is the best response to another action so that there is no single optimal action to play. As long as all strategies continue to exist in the population, agents keep changing their strategy to keep up with other agents changing their strategy. In the simulation, this will show itself as waves moving through the population. This kind of behaviour is also the basis of the reason why theory of mind can be helpful in settings such as Rock-paper-scissors (see also theory of mind in Rock-paper-scissors).

Finally, the Coordination and the Stag hunt are purely cooperative settings. In both of these settings, agents want to cooperate, but either do not know how to cooperate (Coordination game), or are afraid that the other may not cooperate (Stag hunt game). In these games, the simulation tends to quickly settle into an equilibrium, where no agent wants to change its strategy. Whether or not there are multiple actions that survive in this stable solution depends on the number of empty spots, the vision neighbourhood, and the interaction neighbourhood.


  • Individuals on the grid: You can manually switch the strategy of an agent by clicking on it.
  • Strategy frequency graph: After every change in the situation in the grid, the graph updates to show the new frequencies of each strategy among agents.
  • Game selector: Selects the game that will be played by the agents on the grid. Changing the game will also reset the field.
  • Agent count slider: This determines the number of agents on the grid, between 1500 and 2500 individuals. Since the grid is 50 by 50 agents in size, 2500 agents fills up the grid entirely. When you change the number of agents, the strategies of remaining agents are not changed.
  • Interaction slider: This slider determines the size of the interaction neighbourhood of every agent .
  • Interaction neighbourhood shape toggle: By clicking on the interaction neighbourhood shape icon, you can switch between a cross-shaped (Von Neumann) and a square-shaped (Moore) interaction neighbourhood.
  • Vision slider: This slider determines the vision range of every agent.
  • Vision neighbourhood shape toggle: By clicking on the vision neighbourhood shape icon, you can switch between a cross-shaped (Von Neumann) and a square-shaped (Moore) vision neighbourhood.
  • Simulation speed slider: This shows how quickly changes occur in the grid, varying from one change per 2 seconds to up to 50 changes per second.
  • Play one round: Plays a single round of simulation.
  • Start/stop simulation: The simulation will automatically stop when after a round, no agents changed their strategy. Starting the simulation does not reset the board. This means you can pause the simulation to make some manual changes if you want, and continue from the situation you have created.
  • Reset field: Randomly re-assigns empty spots across the board, and strategies across agents.
  • Change payoff matrix: By clicking on the star (*) near the payoff matrix, you can change the payoff matrix as you see fit. Note: the new matrix is not checked for validity. Make sure you enter a valid payoff matrix.


Theory of mind in competition

The script on this page (open script in separate tab) shows the implementation of simulated agents playing the game of rock-paper-scissors. These agents differ make use of theory of mind, the human ability that allows us to reason about what other people know and believe. Even though rock-paper-scissors is a simple game in which trying to outsmart your opponent seems useless, the script on this page shows how theory of mind can still be effective. The controls for this script are explained at the bottom of this page.
Note: The theory of mind agents behave as described in the associated paper (also available from the Publications page), with some minor changes.

Game outline

Figure 1: In rock-paper-scissors, scissors (orange agent) beats paper (blue agent).

Rock-paper-scissors is a two-player game in which players simultaneously choose to play either rock, paper or scissors. If the two players made the same choice, the game ends in a tie. However, if one player chose scissors, while the other chose paper, the player that chose scissors wins (see Figure 1). In the same way, rock wins from scissors, and paper wins from rock.

According to game theory, the only stable strategy when playing rock-paper-scissors is to randomly choose one of the possibilities. After all, if you play according to some pattern, the other player might learn that pattern over many repeated games, and exploit that knowledge. Playing randomly makes sure that the opponent cannot learn any patterns in the way you play the game. Although this strategy works well, people are not very good at playing randomly. For example, people usually avoid playing rock when they have just played rock two times in a row, even though this should not matter in truly random play. Also, if there are some people that are not playing randomly, smart players may be able to exploit this and get a higher score than a random player.

Theory of mind

In game settings, people often consider what their opponents know and believe, by making use of what is known as theory of mind. The computer agents in the game on this page also make use of theory of mind to predict what their opponent is going to do. The game allows the user to restrict agents in their ability to make use of theory of mind. This way, we can determine whether higher orders of theory of mind allows agents to win more often in rock-paper-scissors.

Figure 2: The blue zero-order theory of mind agent tries to learn patterns in the behavior of his opponent.

The lowest possible order of theory of mind is zero-order theory of mind. Zero-order theory of mind agents try to model their opponent through patterns of behavior. For example, suppose that the opponent has always played paper before. In this case, the zero-order theory of mind agent believes that it is very likely that she will be playing paper again (see Figure 2). When a zero-order theory of mind agent sees that his opponent is often playing paper, he will try to take advantage of this by playing scissors. In the rock-paper-scissors script, the red bars indicate the agent’s zero-order tendencies to play (R)ock, (P)aper, or (S)cissors. The higher the bar for rock (R), for example, the more likely it is that a zero-order theory of mind agent will play rock.

The text below the red bars show to what extent zero-order theory of mind determines the next action of the agent (“Weight”), as well as the average accuracy of the predictions of zero-order theory of mind (“Accuracy”).

Figure 3: If the blue agent has first-order beliefs, he believes that his orange opponent may be trying to learn and exploit patterns in his behavior. By looking at the patterns in his own behavior, the blue agent predicts how the orange opponent will try to exploit these patterns.

A zero-order theory of mind agent tries to learn patterns in the behavior of his opponent, but does not realize that his opponent could be doing the same thing. A first-order theory of mind agent realizes that his opponent may be using zero-order theory of mind. He tries to predict what his opponent is going to do by placing himself in her position. He looks at the game from the perspective of his opponent to determine what he would do if the situation were reversed. The first-order theory of mind agent then uses this action as a prediction of his opponent’s behavior. For example, suppose that the first-order theory of mind agent realizes he has been playing paper a lot. He believes that his opponent may be trying to take advantage of this by playing scissors. If that is true, the agent can take advantage of this behavior by playing rock (see Figure 3).

In the game, the green bars indicates how likely a first-order theory of mind agent considers it to be that the agent will win the next round by playing (R)ock, (P)aper, or (S)cissors. This suggestion combines the agent’s zero-order and first-order beliefs. Below the graph, the weight indicates to what extent first-order theory of mind influences the decision of the agent. The accuracy indicates the average accuracy of the agent’s first-order beliefs in predicting the behavior of the opponent.

Figure 4: If the blue agent has second-order beliefs, he believes that his orange opponent believes that he himself is trying to learn and exploit patterns in her behavior. This allows him to anticipate how the orange opponent will try to exploit his behavior.

A second-order theory of mind agent takes his reasoning one step further, and realizes that his opponent may be a first-order theory of mind agent. He puts himself into the position of his opponent, but also believes that she might be putting herself into his position. For example, suppose that the second-order theory of mind agent realizes his opponent is playing paper a lot. Zero-order theory of mind makes him realize that he could take advantage of this predictable behavior by playing scissors. A second-order theory of mind agent thinks that his opponent may be expecting him to do so, and therefore that she will play rock to take advantage of the way he behaves. If that is true, the agent should continue playing paper himself (see Figure 4). The agent’s second-order beliefs are indicated by the blue bars.

In the script on this page, agents can continue this stepwise reasoning even further to use third-order and even fourth-order theory of mind. The associated beliefs are represented by orange and gray bars, respectively.

Although the agents in the game use theory of mind to predict the future, they do not remember the past choices of their opponent. Instead, when they see the outcome of a game, they form beliefs about what the opponent is going to do next time. After this, they immediately forget what they saw. This means that these agents can only look at very simple patterns of behavior. However, increasingly higher orders of theory of mind allow the agents to exhibit increasingly more complex patterns of behavior. Using the script on this page, you can experiment to see to what extent higher orders of theory of mind are still useful in rock-paper-scissors. In addition, you can also play the game against one of the agents yourself. The mental content of the agent then shows how closely your behavior corresponds to behavior of agents of different orders of theory of mind.


With the script, you can see how agents perform better when their theory of mind level increases. In addition, you can test your ability against computer agents, and see what agents believe you are doing when playing rock-paper-scissors.

  • Player 1/2 theory of mind: The radio buttons determine the order of theory of mind of the two players. Players can be any order of theory of mind up to fourth-order. Additionally, the second player can be controlled by a human user.
  • Learning speed: Determines how quickly an agent changes his beliefs based on new information. A learning speed of 0.0 means that an agent does not learn at all, but will always repeat the same behavior. An agent with learning speed 1.0, on the other hand, believes that the previous game gives him all the information he needs to predict his opponent’s next action. Agents do not try to model the learning speed of their opponent. Instead, if the two agents have different learning speeds, they will not be able to correctly model the beliefs of their opponent.
  • Reset game: Resets the game to the start situation. The score and accuracy information is reset to zero as well.
  • Play round: Play one game of rock-paper-scissors. This can only be done when player two is not user-controlled.
  • Rock, paper and scissors: When player two is user-controlled, selecting one of the three possible moves plays one game, with player two’s choice.
  • Show mental content: A human player can use the graphs to determine what the agent will do next, or what a computer agent would do next if he were the one to play next. For a human player, the game is more challenging if the graphs are not visible. Uncheck the box to hide mental content information from the graphs.

An older version of the rock-paper-scissors script is available as a Java applet. However, for security reasons, many browsers no longer allow Java applets to be run from a web browser. The rock-paper-scissors applet can be still be downloaded for offline use.