Fitch diagram helper

Though this be madness…

Fitch diagrams are a way of constructing formal logic proofs in sentential logic or in predicate logic. The script on this page page (open script in separate tab) allows users to constuct these proofs and check its validity automatically.

The script allows the use of the logical connectives listed in the table below. In addition, the script allows for the use of predicates, functions, variables, and constants. The only restrictions on these is that they must start with a letter and consist only of letters, numbers, and the underscore (_) character. In particular, the typical restrictions that predicates must start with uppercase letters, while functions and variables must start with lowercase letters, is not enforced.

Logical connective Logic symbol Script symbol
Equality = =
Conjunction &, ^
Disjunction |
Negation ¬ ~, !
Absurdity #
Implication >
Biconditional <
Universal quantifier @
Existential quantifier ?

Rules

Each statement in a proof has to be justified by a rule. Below, we describe each of the possible rules in more detail. In some cases, the rule alone justifies the use of a certain statement. In general, however, a rule refers to previous statements and/or subproofs. Single statements are referred to through their line number, while subproofs are referred to by the first and last line of the subproof, separated by a dash (e.g. 4-6). If a rule refers to more than one statement and/of subproof, they are separated by a comma (e.g. 3, 4-5, 7). The script on this page will automatically check whether rules are applied correctly. Below, we look at each of the supported rules in more detail.

Premise

Premises are the statements that are assumed to be true, and from which we want to derive the goal or conclusion. As a result, premise rules are always allowed, except that they must appear as a single block at the start of the proof. That is, you cannot slip in an extra premise halfway through the proof: it must appear at the top.

Assumption

Assumptions, as the name suggests, are assumed to be true. However, unlike premises, assumptions are temporary. In essence, an assumption starts a subproof that has a single premise (the assumption). While a subproof can refer to any statement in proofs that contain it, there are only a few ways in which the main proof can interact with subproofs. We will visit these rules below.

Flag

A flag is like an assumption, except that a flag assumes that some constant exists rather than that some statement is true. Note that this must be a new constant, it may not already exist in some statement before the flag statement. The flag rule is only needed to introduce a universal quantifier (∀).

Tautology

A tautology rule introduces a statement that is logically true (e.g. P ∨ ¬ P). In general, you should avoid introducing tautologies this way, and instead use Fitch to derive the tautologies you need. After all, if you can derive Q from premises P1, …, Pn, then (P1 ∧ … ∧ Pn) → Q is a tautology. Using this tautology to prove Q, however, would miss the point of making a proof.

When you do introduce a tautology this way, the script will check whether the formula you provided actually is a tautology. However, this check may fail for formulas that contain certain combinations of universal and existential quantifiers. If this happens, the script will trust your judgment on whether or not the formula is a tautology,

Reiteration

Reiteration repeats a previously derived statement. You can only reiterate statements that are in the same subproof, or a subproof containing the reiteration. Note that you never need to use reiteration, it is simply there to make the proof more readable.

Equality (=) rules

The equality symbol (=) indicates that the left-hand side is the same as the right-hand side. In terms of Fitch proofs, equality is not the same as mathematical equality. From a statement a=b, it does not follow that b=a.

= Introduction

k a = a = Introduction

Any constant or variable equals itself.

= Elimination

k P(a)
m a = b
n P(b) = Elimination: k, m

An equality a=b allows any occurence of the left-hand side (a) of the equation to be replaced by the right-hand side (b).

Conjunction (∧) rules

The conjunction symbol (∧) indicates that both the left-hand side and the right-hand side of a statement are true.

∧ Introduction

k P
m Q
n P ∧ Q ∧ Introduction: k, m

For any two derived formulas P and Q, their conjunction P ∧ Q can also be derived.

∧ Elimination

k P ∧ Q
m P ∧ Elimination: k

Whenever a conjunction P ∧ Q is derived, both the left-hand side (P) and the right-hand side (Q) can also be derived.

Disjunction (∨) rules

The disjunction symbol (∨) indicates that of the left-hand side and the right-hand side of a statement, at least one of them is true (and possible both). Subproofs are needed to eliminate a disjunction: to show that R follows from P ∨ Q, you must show that R follows from assuming P, and that R also follows from assuming Q.

∨ Introduction

k P
m P ∨ Q ∨ Introduction: k

Any disjunction in which either the left-hand side or the right-hand side is a derived formula can also be derived.

∨ Elimination

g P ∨ Q
h P
j R
k Q
m R
n R ∨ Elimination: g, h-j, k-m

Formula R can be derived from a disjunction P ∨ Q if R follows from assuming the left-hand side (P) of a disjunction, but also from assuming the right-hand side (Q) of the same disjunction.

Negation (¬) rules

The negation symbol (¬) indicates that the underlying statement is not true. Introducing a negation also relies on a subproof: if assuming P results in inconsistencies, then ¬P can be derived.

¬ Introduction

k P
m
n ¬ P ¬ Introduction: k-m

Whenever absurdity can be derived in a subproof, the assumption of that subproof can be derived to be false.

¬ Elimination

k ¬ ¬ P
m P ¬ Elimination: k

A double negation can be eliminated.

Absurdity (⊥) rules

Absurdity (⊥) is the result of two contradicatory statements. From absurdity, any statement can be derived.

⊥ Introduction

k P
m ¬ P
n ⊥ Introduction: k, m

Aburdity follows when both a formula and its negation can be derived.

⊥ Elimination

k
m P ⊥ Elimination: k

Any formula follow froms absurdity.

Implication (→) rules

The implication symbol (→) indicates that whenever the left-hand side can be derived, the right-hand side must also be true.

→ Introduction

k P
m Q
n P → Q → Introduction: k-m

If formula Q can be derived from assuming P, then the corresponding conditional can be derived.

→ Elimination

k P → Q
m P
n Q → Elimination: k, m

If Q follows from P, and P is derived, then Q can be derived.

Biconditional implication (↔) rules

The biconditional implication symbol (↔) indicates an implication that works both ways. That is, a biconditional implication P ↔ Q means that when P is true, Q can be derived, but also that if Q is true, P can be derived.

↔ Introduction

g P
h Q
k Q
m P
n P ↔ Q ↔ Introduction: g-h, k-m

If formula Q follows from assuming P, and P follows from assuming Q, then the corresponding biconditional can be derived.

↔ Elimination

k P ↔ Q
m Q
n P → Elimination: k, m

If P can be derived if and only if Q can be derived, then P follows from Q, and Q follows from P.

Universal quantifier (∀) rules

The universal quantifier (∀) indicates that some statement holds for all possible constants. For example, if we know that ∀x P(x) is true and that c is some constant, then P(c) is true. But be careful: a universal quantifier does not imply that a constant actually exists.

∀ Introduction

k a
m P(a)
n ∀x P(x) ∀ Introduction: k-m

If a formula can be derived for a randomly chosen constant, it holds universally.

∀ Elimination

k ∀x P(x)
m P(a) ∀ Elimination: k

If a formula holds universally, it holds for any (existing) constant.

Existential quantifier (∃) rules

The existential quantifier (∃) indicates that a statement holds for at least one constant, though it gives no information about the identity of that constant. For example, if you know that ∃x P(x), this tells you that there is some constant c so that P(c) holds, but it gives no information at all about whether or not for a given constant b, P(b) holds.

∃ Introduction

k P(a)
m ∃x P(x) ∃ Introduction: k

If a formula holds for some constant (a), there exists a constant for which the formula holds.

∃ Elimination

k ∃x P(x)
m P(a) ∃ Introduction: k

If a formula holds for some constant, give that constant a name that has not previously been used.

Search algorithms

Seek, and ye shall find

To find a solution, you usually need to search for it. For artificial agents, this is no different. On this page, we explore several ways in which agents can search, and list some of the advantages and disadvantages. The JavaScript on this page makes use of HTML5.

We represent a search problem as a graph, where each node is a point in the solution. For example, this could be a road map, where different nodes are different cities. But nodes can also represent different actions in a plan. For example, an agent that wants to make a cup of tea can search through his repertoire of actions and find that it should pass through the nodes of “putting on the kettle” and “pouring the hot water” to get to the goal of “enjoying a nice cup of tea”.

In our representation, agents always try to find a path from the start S to the goal G. Every arrow represents a path from one node to another, and can only be travelled in one direction. The number next to each edge shows the cost of taking that path (e.g. the time, effort, or energy consumed by taking the path). When there is no edge connecting two nodes, it is not possible to go from one node to the other directly. However, it may still be possible to go there indirectly.

Below, we describe a number of search algorithms, and show an example of the way this algorithm works in practice. This list of algorithms includes

Each example consists of the example network and the stack. The stack shows what options the agent can still explore. Whenever a node is put onto the stack (the agent recognizes the possibility of going to that node), the corresponding node in the network turns green. When the node is removed from the stack (the agent has finished considering the option), it turns blue. As soon as the goal node G turns blue, the agent has found a solution and stops. In some cases, it may happen that an edge turns red. This indicates that the agent has chosen not to investigate that path at all.

To compare these search algorithms more directly, you can also view the search algorithm script page, which provides a simultaneous view of the network, the stack, the search tree, and the algorithm.

Depth first search

An agent that uses depth first search is always moving forward. Every time the agent reaches a node, it looks at all the options available from that point and chooses to continue searching in the last direction that the agent has observed. Backtracking to a previous point only happens when the agent reaches a dead end, and is therefore forces to turn back.

One of the main advantages of depth first search is that it requires very little memory. But because of that, depth first search may lead agents to go around in circles. Fortunately, there are no circular paths in the example above. In general, to prevent going around in circles, agents can skip visited nodes. Whenever they see a node for the second time, they ignore the possibility of going there.

Depth limited search

As mentioned above, the disadvantage of depth first search is that you may get stuck in a circular path. One way to overcome this is stop searching after a given number of steps. For example, the agent may restrict itself for paths that visit no more than three nodes. This way, the agent is sure not to get stuck in a circular path for all eternity. The downside, of course, is that if there the path from start to goal visits at least four nodes, the agent cannot find a solution even though it exists.

Iterative deepening search

The disadvantage of depth first search is that you may get stuck in a circular path. Depth limited search solves this partially by putting a limit on the length of the path. Unfortunately, this may lead the agent to be unable to find the solution, because it takes too many steps.

To solve this problem, the agent could simply increase its search limit and try again. This is exactly what iterative deepening search does. Whenever the agent is unable to find a path from start to goal, the agent increases its search depth limit and restarts the search. But the demonstration shows that this solution has its own problems: the agent may be doing the same search over and over again, forgetting what it has learned every time.

Breadth first search

If depth first search is about always moving forward, breadth first search is about not straying too far. An agent that follows breadth first search explores its options in the order it observed these options. Rather than staying on one chosen path, the agent will jump back and forth between options until it finds the solution.

Breadth first search has a huge advantage over depth first search. Because breadth first search means a meticulous search for a path to the goal, it is guaranteed to find such a path if it exists. It will even find the path that takes the fewest number of steps, though this may not be the path with the lowest cost. This advantage comes at a cost of the memory needed to remember all paths that are not yet explored.

Uniform cost search

Uniform cost search is a refinement of breadth first search. Whereas breadth first search always considers the path with the fewest steps first, uniform cost search prefers to consider paths with lower costs. An agent that follows uniform cost search orders its stack of paths to consider based on the length of the path. In the stack view of the example above, the length of the path is listed below the node. Since the agent always looks at the path with the shortest length first, as soon as the agent considers the goal node G, it has found the shortest path from start to goal.

Greedy best first search

Unlike the previous search algorithms, which were all uninformed search, greedy best first search is an informed search algorithm. That is, greedy best first search makes use of information that is not part of the problem description in the form of a heuristic. This heuristic gives an estimate for the cost of reaching the goal from any node. For example, a heuristic on a road map could be the straight-line distance between two cities. This is easier to calculate than the road distance, but still gives the agent some indication of whether it is looking in the right direction. This is the main idea of the heuristic: it should be easy to calculate, but still give information that is relevant to the search.

An agent that follows greedy best first search orders its stack based on the heuristic value, and therefore tries nodes with a low heuristic value first. That is, it first tries those nodes that are expected to be very close to the goal node already, and ignore those paths that go into the wrong direction. However, like depth first search, this can cause the agent to go around in circles. Another disadvantage of greedy best first search is that it needs a heuristic, which may not be easy to get.

A* search

Like greedy best first search, A* search is an informed search algorithm that makes use of a heuristic in addition to the problem description. But instead of relying only on the heuristic, A* search also looks at how long the path already is (similar to uniform cost search. That is, A* search makes an estimate of the total length of the path rather than the remaining distance, and orders the stack based on this estimate.

Like uniform cost search, A* search will find the shortest length path, but only if the heuristic follows a simple rule: the heuristic cannot overestimate the cost of actually reaching the goal. That is, the heuristic value for a node cannot be more than the actual cost of reaching the goal from that node. It can, however, be less. In fact, for the special case that the heuristic value is zero for all nodes, A* search is the same as uniform cost search. But for a more informative heuristic, A* search could be a lot more efficient in finding the path with the lowest cost.

Boids: Flocking made simple

Birds of a feather flock together

In 1986, Craig Reynolds developed a model of flocking, in which agents group together through the interaction of a few simple rules (see also Reynolds, 1987). In this model, agents form groups even though they do not have a group identity, or even a concept of what a group is. The behaviour of a group as a whole is determined entirely by the interaction of individual agent choices based on the local situation. The script on this page (open script in separate tab) allows you to experiment with different interaction rules for boids. The controls for this script are described at the bottom of the page.
The JavaScript on this page makes use of HTML5.

Boids: Bird-like objects

Boids, or bird-like objects, are simple agents that react to their local environment based on a few simple rules. In its simplest form, boids’ flocking behaviour is the result of the three rules avoid, align, and approach. In addition to these basic flocking rules, the script on this page implements two more rules, which cause boids to flee from predators and return to the center of the screen.

Figure 1: Boids exhibit flocking behaviour through a combination of three rules. Boids avoid collision with others, align their direction with nearby flockmates, and approach to distant boids.

The avoid rule is meant to prevent boids from colliding with their flockmates. Every flockmate within the avoidance range forces the boid to move away from that flockmate. Figure 1 shows a situation with a boid that has one flockmate in its avoidance range, indicated by a red circle. In general, the avoid rule is the rule with the shortest range and the highest impact on the boid’s behaviour. In the script on this page, the avoidance force is also the only force that varies with the distance to the flockmate. The closer the boid is to a flockmate, the stronger the avoidance force felt by the boid. In addition, a boid that feels an avoidance force will ignore align or approach forces.

The align rule causes boids that are part of the same flock to have the same general direction. For every flockmate within the alignment range, a boid will feel a force to match its heading to that of the flockmate. If there are multiple flockmates in the alignment range, the boid tries to move towards the average direction of those flockmates. The situation in Figure 1 shows four flockmates in the alignment range (indicated by a blue circle) that are not in the avoidance range. The boid in the center wants to align itself with each of these flockmates.

The aproach rule makes boids move towards the center of the group of flockmates that they can see. Each boids feels a gravitational force towards the center of all flockmates in its approach range. This rule makes sure that boids will not drift out of the group. Among the flocking rules, the approach rule is typically the one with the highest range, which means that any boid outside the approach range is usually ignored entirely. However, a group may be much larger than the approach range, and the actions of a boid may have effects that

By changing the strength and range of the flocking rules, it is possible to vary flocking behaviour. For example, boids will group together even without any alignment forces. Such groups have much less coordinated behaviour than groups that do experience alignment forces, however.

Predators

Figure 2: Boids flee from predators that they detect in their fleeing range.

In addition to flocking, the script on this page implements two more behavioural rules for boids. The first of these is the flee rule, which causes boids to flee from predators. In the script, predators are larger boids that only experience one type of force: the predation force. Predators always move towards the nearest boid, unhindered by flocking behaviour. In addition, boids also consider the mouse cursor to be a predator.

The flee rule is similar to the avoid rule, except that the fleeing force is not dependent on the distance to the predator. As soon as a boid detects a predator in its fleeing range (depicted as a magenta coloured circle in the script), it will try to move in the opposite direction. The fleeing force is applied after the flocking forces, which also means that fleeing behaviour overrides flocking.

Although boids only flee from predators that appear in their fleeing range, boids can experience contagious fleeing. When a boid changes its heading due to fleeing, flockmates in its alignment range experience a force to head into the same direction. Because of this behaviour, it can seem as if boids flee from predators that they never saw.

Home range

Figure 3: Boids experience a force to return to the center of the screen.

Finally, boids can also experience an attractive force from their home range, which causes them to return to the center of the screen. In the script, the home range of the boids is indicated by a cyan area. Unlike other behavioural rules, which have a maximum range, the return rule has a minimum range. Boids only experience a force to return to the center of the screen when they leave the home range.

Returning to the center of the screen is applied before flocking and fleeing. This means that although boids prefer to stay within the home range, they may leave the home range because of the presence of flockmates or predators.

Controls

  • Zoom slider: Determines the magnification of the script area.
  • Boids sliders: Determines the number and movement speed of boids in the simulation.
  • Predators sliders: Determines the number and movement speed of predators in the simulation.
  • Lightbulb slider: This determines the number of lightbulbs in the arena. For performance reasons, the number of lightbulbs has been limited to four.
  • Behaviour sliders: For each behavioural rule avoid, align, approach, flee, and return, these sliders control the range of the rule as well as the strength of the rule.
  • Behaviour checkboxes: In this script, one boid is selected at random to visualize the range of behavioural rules. When the checkboxes of the behavioural rules avoid, align, approach, or flee are checked, the corresponding range is shown around this boid as a coloured ring. For the return rule, this checkbox determines whether the home range is indicated as a solid coloured circle.

Braitenberg vehicles

Simple rules can appear to be smart

Braitenberg vehicles are simple vehicles that show how the interaction between simple rules can lead to complex behaviour. Although these vehicles have no ability to make decisions or even to remember the past, a Braitenberg vehicle may show flexible behaviour and appear to be goal-directed. The script on this page (open script in separate tab) allows you to experiment with Braitenberg vehicles and see how these vehicles act and interact in a simple environment. The controls for this script are described at the bottom of the page.
The JavaScript on this page makes use of HTML5.

Braitenberg vehicles

Braitenberg vehicles are simple autonomous agents that move around based on sensor input. On this page, we consider vehicles that have two wheels, each controlled by its own motor. Sensors that measure the light intensity can affect the output of these motors, either positively or negatively. Depending on how where the sensors are located and how these sensors are connected to the motors, the behaviour displayed by the vehicle can differ greatly.


Fear Aggression Love Exploration
Figure 1: Four simple Braitenberg vehicles.

Figure 1 shows an example of four simple Braitenberg vehicles. These four vehicles are also preprogrammed in the script. Although these vehicles are quite similar and have the same light-sensitive sensors at the same locations, the way these sensors are attached to the motors causes strong differences in their behaviour.

The first vehicle, called Fear, connects the leftmost sensor to the left wheel motor and the rightmost sensor to the right wheel motor. In both cases, the forward speed of the motor is increased when the sensor detects more light. This means that when this vehicle detects light to the left, the left motor increases speed and the vehicle veers to the right. That is, Fear tries to move away from the light.

The second vehicle, Aggression, is similar to Fear, except that the sensors are wired to motor at the opposite end of the vehicle. That is, the leftmost sensor is connected to the right wheel motor while the rightmost sensor connects to the left wheel motor. Like Fear, the sensor connections of Aggression are positive. This means that if the vehicle detects light to the left, the right motor increases speed and the vehicle veers to the left. That is, Aggression moves towards the light. The closer the vehicle gets to the light, the stronger the increase in speed of the motors, until the vehicle speeds through the light.

Love is a vehicle that is similar to Fear, except that the sensors decrease the forward speed of the motor to which they are connected. This means that when the vehicle detects light to the left, the left motor decreases speed or even reverses, and the vehicle moves to the left. Like Aggression, Love moves towards the light. Unlike Aggression, however, the closer Love gets to the light, the slower it moves. As a result, Love moves towards the light until it reaches the perfect distance.

The final example is Exploration. Exploration has the same crossed wiring of Aggression, except that the sensors decrease the speed of the motors to which they are attached. This means that if the vehicle detects light to the left, the right motor decreases speed or even reverses, and the vehicle veers to the right. Like Fear, Exploration avoids the light. However, Exploration slows down in light areas, almost as if it is cautiously exploring the light.

In the script above, you can test out the different basic Braitenberg vehicles by loading them into the selected vehicle. By varying the number of lightbulbs and dragging and dropping them to different locations, you can experiment with the way they react to different environments.

More complex vehicles

In addition to the basic vehicles, the script above allows you to construct your own vehicle design. Each vehicle has a base speed for each motor, which means that vehicles can move forward, backward, or in circles when left in the dark. In addition to the lightbulbs in the environment, each Braitenberg vehicle is also equipped with a lightbulb of its own. This addition of light onto vehicles themselves can result in surprising new behaviour.

The lower part of the script shows a larger Braitenberg vehicle in a black box, which allows you to customize a vehicle. Each vehicle has up to eight sensors, which are depicted as coloured dots. By dragging and dropping these sensors, you can place them anywhere you want along the exterior of the vehicle. Sensors are colour coded to indicate that they are connected to the left wheel (white sensors), the right wheel (black sensors), or the lightbulb (yellow sensors). Note that a vehicle’s sensors are never affected by its own lightbulb.

Even though Braitenberg vehicles are purely mechanical, you may notice that it is easier to describe these vehicles as if they had intentions and goals. This shows that in some cases, it is easier to understand these vehicles through theory of mind, by pretending that they have unobservable mental content.

Controls

  • Arena: In the arena, vehicles and lightbulbs can be moved by dragging and dropping them in their desired location.
  • Lightbulb slider: This determines the number of lightbulbs in the arena. For performance reasons, the number of lightbulbs has been limited to four.
  • Vehicle slider: This determines the number of Braitenberg vehicles in the arena. For performance reasons, the number of vehicles has been limited to four.
  • Load a basic vehicle select box and button: These controls allow a user to load one of the four basic Braitenberg vehicles shown in Figure 1 into the vehicle selected from the arena. In addition, it also allows a user to load a custom Braitenberg vehicle. These controls can also be used to save a Braitenberg vehicle design for later use.
  • Default speed of left/right motor slider: This slider determines the speed of the left and right wheels for a vehicle in the dark.
  • Default illumination slider: This slider determines the brightness of a lightbulb of the vehicle when left in the dark.
  • Braitenberg vehicle bay: This control shows the selected Braitenberg vehicle, and allows the user to move around sensors through dragging and dropping.
  • Number of sensors on vehicle slider: This slider determines the number of sensors on the vehicle.
  • Selected sensor slider and select box: This determines what the selected sensor is connected to (left motor, right motor, or lightbulb), and the strength of this connection.
  • Pause/Continue simulation button: Allows a user to pause and continue execution of the simulation.
  • Save simulation setup button: Downloads the current Braitenberg arena state as a text file. This can be loaded into the current arena state with the Load simulation setup button.
  • Load simulation setup button: Allows a user to change the code of the Braitenberg arena. This can be used to load a previously saved arena state.

Repeated simple spatial games

It’s all about who you know

Why do people cooperate with each other? Why do animals work together? How can cooperation survive when there are opportunities to cheat? An important observation in these questions is that it matters who you interact with. If you are likely to interact multiple times with the same people, it may be better to cooperate rather than to cheat (see also Axelrod and Hamilton, 1981). The script on this page (open script in separate tab) shows an implementation of a small number of repeated simple spatial games, both cooperative and competitive, and allows you to experiment with some of the spatial features. The controls for this script are explained at the bottom of this page.
The JavaScript on this page makes use of HTML5.

Repeated simple spatial games

The script on this page simulates agents that repeatedly play simple spatial games. These spatial games are called simple because they are:

  • two-player, since every interaction is between two individuals;
  • symmetric, since the outcome of each interaction depends only on the actions of the players, and not on who these players are; and
  • simultaneous, since both players select their action at the same time so that no player can react to the action selected by the other.

Individuals are organized on a grid, which may or may not include empty spots. Each individual has a colour that identifies what strategy it is playing. In the Prisoner’s Dilemma, for example, red individuals defect while blue individuals cooperate. At every time step, every individual plays the simple game with each neighbour within its interaction neighbourhood. Both the range and the shape of the interaction neighbourhood can be set within the script. The outcome of each game depends only on the strategies used by the two players, which can be seen in the payoff matrix in the bottom right of the script. In the Prisoner’s Dilemma, for example, when a cooperator plays against a neighbouring defector, the cooperator gets a score of -1, while the defector receives a score of 4.

All individuals play the game with each neighbour in their interaction neigbourhood. Once all games have been played, individuals look around to determine the neighbour in their vision neighbourhood that has the highest average score. If that score is higher than the individual’s own score, the individual switches its strategy to match the one of their neighbour. If there happens to be more than one neighbour with the highest average score, the individual will randomly pick one of those neighbours’ strategies. However, if one of the strategies that yields the highest observed score is the agent’s own strategy, the agent will always decide to keep its own strategy. This means that a cooperator will choose to stay a cooperator as long as the highest average score that he can see is obtained by another cooperator, even if the individual himself gets a very low score.

The script allows you to simulate a number of different games. The Prisoner’s dilemma, Hawks and doves, and Hawks, doves, and retaliators games are cooperative dilemmas. In these games, all individuals would benefit from the cooperation of other agents, but experience an individual incentive to defect on cooperation. In these cooperative dilemma’s, you can see how “clusters” of cooperation can grow to overcome defectors. What allows these clusters to survive is that in the center of such a cluster, cooperators only interact with other cooperators. That is, individuals inside a cluster cannot be cheated by faraway defectors.

In contrast, Rock-paper-scissors, Rock-paper-scissors-Spock-lizard, and Elemental RPS are purely competitive settings. In each of these setting, each action is the best response to another action so that there is no single optimal action to play. As long as all strategies continue to exist in the population, agents keep changing their strategy to keep up with other agents changing their strategy. In the simulation, this will show itself as waves moving through the population. This kind of behaviour is also the basis of the reason why theory of mind can be helpful in settings such as Rock-paper-scissors (see also theory of mind in Rock-paper-scissors).

Finally, the Coordination and the Stag hunt are purely cooperative settings. In both of these settings, agents want to cooperate, but either do not know how to cooperate (Coordination game), or are afraid that the other may not cooperate (Stag hunt game). In these games, the simulation tends to quickly settle into an equilibrium, where no agent wants to change its strategy. Whether or not there are multiple actions that survive in this stable solution depends on the number of empty spots, the vision neighbourhood, and the interaction neighbourhood.

Controls

  • Individuals on the grid: You can manually switch the strategy of an agent by clicking on it.
  • Strategy frequency graph: After every change in the situation in the grid, the graph updates to show the new frequencies of each strategy among agents.
  • Game selector: Selects the game that will be played by the agents on the grid. Changing the game will also reset the field.
  • Agent count slider: This determines the number of agents on the grid, between 1500 and 2500 individuals. Since the grid is 50 by 50 agents in size, 2500 agents fills up the grid entirely. When you change the number of agents, the strategies of remaining agents are not changed.
  • Interaction slider: This slider determines the size of the interaction neighbourhood of every agent .
  • Interaction neighbourhood shape toggle: By clicking on the interaction neighbourhood shape icon, you can switch between a cross-shaped (Von Neumann) and a square-shaped (Moore) interaction neighbourhood.
  • Vision slider: This slider determines the vision range of every agent.
  • Vision neighbourhood shape toggle: By clicking on the vision neighbourhood shape icon, you can switch between a cross-shaped (Von Neumann) and a square-shaped (Moore) vision neighbourhood.
  • Simulation speed slider: This shows how quickly changes occur in the grid, varying from one change per 2 seconds to up to 50 changes per second.
  • Play one round: Plays a single round of simulation.
  • Start/stop simulation: The simulation will automatically stop when after a round, no agents changed their strategy. Starting the simulation does not reset the board. This means you can pause the simulation to make some manual changes if you want, and continue from the situation you have created.
  • Reset field: Randomly re-assigns empty spots across the board, and strategies across agents.
  • Change payoff matrix: By clicking on the star (*) near the payoff matrix, you can change the payoff matrix as you see fit. Note: the new matrix is not checked for validity. Make sure you enter a valid payoff matrix.

Phantom jams

Heading for a world-wide traffic jam

Some traffic jams have clear causes, such as an accident, road works, or an upcoming on-ramp. However, some traffic jams seem to occur for reason at all. Such phantom jams can even occur on completely visible roads at low speeds (see also the paper by Yuki Sugiyama et al., 2008 and their video). The script on this page (open script in separate tab) simulates traffic flow on a six-lane circular road. Cars are exactly the same in the way they accelerate and decelerate, but differ in the speed at which they want to travel. The script allows you to experiment to determine what causes phantom jams and how they can be dissolved. The controls for this script are explained at the bottom of the page.
The JavaScript on this page makes use of HTML5.

Going around in circles

The script on this pages simulates cars that drive along a circular road while keeping a safe distance from other cars. Each simulated car has a preferred top speed that it will try to achieve. However, since some cars have a higher preferred speed than others, the fast driving cars will occasionally get too close to a car ahead, and will have to hit the brakes. In the simulation, cars always brake with the same intensity. This means that cars will sometimes brake more intensely than necessary, and drop its speed below the speed of the car ahead. Together with the small variation in the preferred top speed of cars, these effects can cause a ripple in the traffic flow that cause a phantom jam.

In addition to variations in speed and braking too intensely, the simulation has a number of other features that can make phantom jams worse. Firstly, cars that have stopped moving will not pull up immediately when the car ahead of them starts moving. Instead, cars wait a little while until the distance is large enough. Additionally, when cars want to accelerate but cannot because of a car ahead of them, they may decide to change lanes. Although a car can help to clear up a phantom jam in a lane by changing to another lane, changing lanes can cause another jam in the lane they are entering.

The script above has a graph that allows you to keep track of phantom jams by plotting the percentage of jammed cars, which are cars that have come to a complete halt, and the average speed of cars. The average speed of cars is plotted as a percentage of the speed at which these cars prefer to drive. By running the script, you will find that phantom jams can occur for a lot of different settings of the sliders. But are there ways to prevent phantom jams, or to dissolve them after they have already happened? For example, does it help to lower the speed limit, or to force cars to stay in their lanes? And what can drivers do themselves to avoid and solve phantom jams? Are the solutions that avoid phantom jams the same as those that avoid these jams?

Controls

  • Summary graph: After every change in the situation on the road, the graph updates to show the new percentage of jammed cars, which are currently standing still, and the average speed of cars, as a percentage of their top speed.
  • Car count slider: This determines the number of cars on the road, with a range of 75 to 175 cars.
  • Average top speed slider: This slider determines the preferred driving speed or top speed of cars. The value represented by this slider is the average top speed across all cars. That is, individual cars differ in their actual top speed.
  • Variation top speed slider: This slider determines the range of individual variation in the top speed of cars. More variation means that faster cars more often have to brake due to slower cars ahead.
  • Acceleration slider: This slider determines how quickly cars accelerate to their top speed.
  • Preferred deceleration slider: This slider determines at what speed cars prefer to brake. A lower value of the preferred deceleration means that cars that encounter a slower car ahead will be faster to brake and bring their own speed down more gradually. A higher value means that cars tend to brake more abruptly.
  • Lane changing slider: This slider determines the likelihood that a car will change lanes. A car will only consider changing lanes when it is prevented from accelerating because of a car ahead.
  • Zoom slider: This slider controls the magnification of the road.
  • Simulation speed slider: This shows how quickly time passes on the road.
  • Reset field: Resets the cars by spreading them out approximately equally across the tarmac.
  • Road: By double-clicking the road, you can switch representations of the road. In the more schematic representation with white cars, you can additionally select a car to follow its behaviour more easily.

The sprites for this script have been adapted from the sprites of the 1999 computer game Grand Theft Auto 2 by Rockstar Games.

Schelling’s model of segregation

A little bit of a preference goes a long way

As the saying goes: “birds of a feather flock together”. People with similar backgrounds tend to end up in the same social group. This effect can be so extreme that it eventually leads to the kind of segregation in which the only people of the same background interact with each other. But how much of such a “preference for similarity” is needed to get a segregated society? In 1971, Thomas Schelling tried to answer this question using multi-agent simulations (Schelling, 1971). The script on this page (open script in separate tab) shows an implementation of Schelling’s model of segregation. The controls for this script are outlined at the bottom of the page.
The JavaScript on this page makes use of HTML5.

AgentVille supporters

The script on this page simulates how segregation occurs on the bleachers of AgentVille. The fictional town of AgentVille is known for its annual sports event, which draws many supporters of both Team Orange and Team Blue. When a supporter arrives at the stadium, he randomly decides where to sit and watch the event. But that does not mean that AgentVille supporters have no preferences on where they want to sit. Each supporter wants to sit close to someone that supports the same team. In fact, each supporter agent wants that among neighboring seats, there is some minimum percentage of people that supports the same team as he does. The way this works is demonstrated in Figure 1. As you can see by moving the agents around these nine spaces, each supporter agent has up to eight neighboring seats. An agent only shows a wide grin of happiness when at least 50% of the people in these neighboring seats supports his team.

Figure 1: Agents want to have a minimum percentage (50% in this case) of people in neighboring seats to support the same team as they do. In this interactive figure, you can drag agents to a new position to see how this works. Only happy supporters show a wide grin.

The pursuit of happiness

If a supporter is unhappy with his seat, he will eventually get up and move to one of the empty seats. It is not hard to see that if every agent demands that 100% of the people in neighboring seats supports the same team as he does, this process of switching seats will only stop when the supporters for both teams are completely separated. But does a lower similarity preference also cause segregation on the bleachers? Is the number of empty seats important? If each supporter wants 50% of his neighbours to support their team, will the average similarity be 50% when all supporters are happy, or would it be much higher?

The script on this page allows you to play around with different situations on the bleachers and see what will happen to the way supporters seat themselves. The graph keeps track of the number of happy supporters (green line) and the average percentage of people in neighboring seats that support the same team (red line). One thing you may want to keep an eye on, is the difference between the preferences of individual supporters (as shown by the desired similarity slider) and what the average similarity of neighboring supporters ends up being.

Controls

  • Support agents on the bleachers: You can manually drag agents to empty seats. An agent that is happy with his current situation will show a broad grin. Agents without such a grin will eventually want to move to another seat.
  • Happiness / similarity graph: After every change in the situation on the bleachers, the graph shows the current percentage of agents that is happy (green) and the average percentage of agents in neighboring seats that support the same team (red).
  • Number of agents slider: This determines the number of agents trying to find a spot on the bleachers. There are 200 seats available, and the number of agents can vary between 1 and If the number of agents is even, there will be as many Blue Team supporters as Orange Team supporters. If the number of agents is odd, there will be one more supporter for Orange Team.
  • Desired similarity slider: This slider determines for each agent what percentage of neighoring agents that support the same team will make him happy. All agents have exactly the same desired similarity.
  • Simulation speed slider: This shows how quickly changes occur on the bleachers, varying from one change per second to up to 2500 changes per second.
  • Start/stop simulation: The simulation will automatically stop when 100% of agents is happy. Starting the simulation does not reset the board. This means you can pause the simulation to make some manual changes if you want, and continue from the situation you have created.
  • Reset field: Removes all agents from the board and lets each of them choose a new seat.

Tacit Communication Game

Theory of mind in cooperation

The script on this page (open script in separate tab) shows the implementation of simulated agents playing a simplified version of the cooperative-communicative Tacit Communication Game. These agents make use of theory of mind, the human ability that allows us to reason about what other people know and believe. Using theory of mind, these agents can reason about the goals and intentions of their partner. The script demonstrates how theory of mind can help agents to set up communication more quickly. The controls for this script are explained at the bottom of this page.
The JavaScript on this page makes use of HTML5.
Note: The theory of mind agents behave as described in the associated paper (also available from the Publications page), with some minor changes.

Tacit Communication Game

Figure 1: In the Tacit Communication Game, both players have the same goal. However, only the blue Sender player knows what this goal is.

The Tacit Communication Game is a cooperative communication game played on a 3×3 board of tiles by two players. The game is played in two round. At the start of the first round, the blue Sender token is placed in the middle of the board. The Sender player may rotate and/or move this token to an adjacent tile any number of times. Once the Sender is satisfied with the final position and orientation of his token, the second round starts. When this round starts, the orange Receiver token is placed in the middle of the board, and the Receiver player may rotate and/or move this token to an adjacent tile any number of times. Once she is satisfied with the location and orientation of her token, the game ends.

The Sender and the Receiver share the same goal: the final location and orientation of their respective tokens should match a certain goal configuration. Importantly, only the Sender knows what this goal configuration is (see also Figure 1). During the first round, the Sender should therefore match his token to the goal configuration, but he also needs to communicate the goal configuration of the orange token to the Receiver using only his movements on the board. The Receiver, for her part, has to find out what the goal configuration of her orange token is based on the movements of the blue Sender token. At the end of each game, the Sender and Receiver hear whether they matched their tokens to the goal configuration or not. However, if they failed to reach the goal, the Receiver does not hear what the correct configuration was.

In the full Tacit Communication Game, Sender and Receiver can have tokens of different shapes. Figure 1 shows an example in which the Sender has a round token, while the Receiver has a triangular token. This makes the game more difficult for the Sender, because he will have to let the Receiver know what the goal orientation of her token is without being able to use the orientation of his own token. On this page, we take a look at a simplified version of the Tacit Communication Game, in which Sender and Receiver both have a round token. This means that the Sender only has to let the Receiver know where her orange token should be placed. But even in this simple game, we can already see some interesting behavior. On this page, we focus mostly on the role of theory of mind and predictable behavior.



Theory of mind

In the Tacit Communication Game, the Sender knows what the goal configuration is, but the Receiver does not. It could therefore be beneficial for the players to reason about what the other knows and believes. This reasoning ability is known as theory of mind. The software agents playing the Tacit Communication Game on this page also make use of theory of mind to predict what the other player is going to do. In the game, the theory of mind level of agents can be set to determine how this influences the ability of agents to cooperate.

Figure 2: Zero-order theory of mind agents randomly try actions until they find one that works. In this case, the zero-order theory of mind Sender sends message UP-LEFT, and the Receiver correctly guesses her goal location. This results in both players learning that sending the message UP-LEFT results in the Receiver selecting bottom right location.

The lowest possible order of theory of mind is zero-order theory of mind. Zero-order theory of mind agents can only predict future behavior based on past behavior. This means that a zero-order theory of mind Sender sends random messages to find out how the Receiver reacts to them. Once he has a clear idea of how the Receiver behaves, he sends the message for which he believes the Receiver to match her token to the goal configuration. At the same time, a zero-order theory of mind Receiver randomly tries locations until she finds the one that results in success. Figure 2 shows an example where the Sender sends the message UP-LEFT, after which the Receiver correctly guesses that she should put her token to the bottom right location. Because this was a correct guess, both Sender and Receiver now believe that if the Sender were to send UP-LEFT again, the Receiver would respond by moving her token to the bottom right location.

Figure 3: Errors can lead to conflicting beliefs in the Tacit Communication Game. In this case, a zero-order theory of mind Sender sends message UP-LEFT, and the Receiver incorrectly guesses her goal location. The Sender now believes that sending the message UP-LEFT results in the Receiver selecting the bottom right location. But the Receiver believes that if she sees the message UP-LEFT, she should try anything but the bottom right location.

When the Receiver misinterprets a message the Sender has sent, zero-order theory of mind leads to conflicting beliefs. Figure 3 shows an example of this. Here, a zero-order theory of mind Sender sent the message UP-LEFT, and the zero-order theory of mind Receiver responded incorrectly by selecting the bottom right location. The negative feedback causes the Receiver to believe that, if the Sender were to send the message UP-LEFT again, she should not choose the bottom right location again. On the other hand, the Sender believes that if he were to send the message UP-LEFT again, the Receiver will respond by selecting the bottom right location again.

On first glance, it may seem strange that after this negative feedback, the Sender believes that the Receiver will not change her behavior. After all, the Receiver has also seen this negative feedback, so the Sender should expect that she will change her behavior. However, this would mean that the Sender knows that the Receiver has a goal: to match her token to the goal configuration. But the zero-order theory of mind Sender cannot reason about the beliefs and goals of the Receiver. To the zero-order theory of mind Sender, the Receiver is like a coffee machine with the labels remove. The Sender randomly pushes buttons to try and get the type of coffee he wants. If he has pushed the wrong button, he believes that if he were to press the same button again, the coffee machine would produce the same result. In the same way, the Sender believes that if he sends the same message, the Receiver would produce the same behavior again. For people, however, theory of mind reasoning is so natural that zero-order theory of mind reasoning actually seems counterintuitive.

Figure 4: First-order theory of mind allows a Receiver to look at the game from the perspective of the Sender. This way, a first-order theory of mind Receiver believes that any new message (RIGHT-UP-LEFT-LEFT) is not meant to communicate a goal location that the Sender has already found a good message for.

A first-order theory of mind agent can reason about the goals of others. Such an agent realizes that the two players have the same goal. This especially helps the Receiver she she tries to interpret the messages of the Sender, as is shown in Figure 4. When a first-order theory of mind Receiver sees a message, she tries to figure out for what goal configuration she would have decided to send the same message. This helps the Receiver to interpret new messages. Figure 4 shows a situation in which the Sender has previously sent the message UP-LEFT, after which the Receiver correctly guessed that the bottom right tile was her goal location. When the Receiver sees the message RIGHT-UP-LEFT-LEFT, she takes the perspective of the Sender and concludes that her goal location is not the bottom right tile. After all, if the Sender had wanted her to go to the bottom right tile, he would have sent the message UP-LEFT.

First-order theory of mind agents believe that other players may be zero-order theory of mind agents. However, if both Sender and Receiver are first-order theory of mind agents, both agents are mistaken. For the best results, either the Sender or the Receiver should use zero-order theory of mind.

Figure 5: Second-order theory of mind helps when players are predictable. In this case, Senders want to send short messages that visit the goal location of the Receiver. A second-order theory of mind Sender reasons that sending the message LEFT-DOWN-UP-UP will cause the Receiver to move her token to the bottom left tile, but the message DOWN-LEFT-UP-UP may not.

A second-order theory of mind agent takes this reasoning one step further, and believes that the other player may be a first-order theory of mind agent. This means that the second-order theory of mind agent believes that the other player knows that both players have the same goal. Interestingly, unlike in competition and negotiation, second-order theory of mind does not provide any additional benefits in the standard model. However, second-order theory of mind can be beneficial in cooperative settings such as the Tacit Communication Game when player behavior more predictable.
For example, Senders may prefer to send different messages for different goal configurations. In the game on this page, Sender preferences can be set to short messages, messages that visit the Receiver’s goal location, and short messages that visit the the Receiver’s goal location. This can help agents to play the Tacit Communication Game more effectively.

Figure 5 shows an example in which Senders prefer to send messages that are as short as possible, but also visit the goal location of the Receiver. In this example, the Sender wants to send a message that lets the Receiver know that she should place her token on the bottom left tile. By placing himself in the position of the Receiver, the Sender tries to predict how the Receiver will react to the message LEFT-DOWN-UP-UP. For example, the Receiver may think that the Sender wants her to go to the left tile in the middle row. After all, the Sender’s message visits this location. However, the Receiver knows that there is a shorter message (LEFT-UP) that the Sender could have sent, and that would still have visited the same location. By placing herself in the shoes of the Sender, the Receiver reasons that the Sender would have preferred to send LEFT-UP in this case. As a result, the Receiver believes that the message LEFT-DOWN-UP-UP is not intended to tell her that her goal location is the left tile in the middle row. In fact, the only location that the first-order theory of mind Receiver considers to be her goal location is the bottom left tile.

Through the use of second-order theory of mind, the Sender believes that if he were to send the message LEFT-DOWN-UP-UP, the Receiver would respond by moving to the bottom left tile. Moreover, the use of second-order theory of mind lets the Sender know that LEFT-DOWN-UP-UP is a better message than DOWN-LEFT-UP-UP. Even though the messages have the same length, and the Sender has no preference for either of these messages, the second-order theory of mind Sender believes that the Receiver could misinterpret the message DOWN-LEFT-UP-UP. As Figure 5 suggests, the Receiver may move her token to the middle location in the bottom row. The Sender believes that this would not happen with the message LEFT-DOWN-UP-UP.

Controls

The script above has a number of controls to show the effect of using a higher orders of theory of mind on the performance of agents in the Tacit Communication Game.

  • Sender/Receiver checkboxes: At the top of the script, there are two checkboxes to show and hide the mental content of the Sender and Receiver agents. The mental content shows the agent’s zero-order, first-order and second-order beliefs concerning the behavior of the other player. When a human user is playing the game, this mental content can give a lot of information on what the goal configuration is or how the agents are going to behave. For a more challenging game, remove the check from the appropriate checkbox to hide mental content.
  • Sender/Receiver theory of mind: The radio buttons determine the order of theory of mind of the two players. Players can be any order of theory of mind up to the second. Additionally, players can be controlled by a human user. This way, you can experience the effect of different outcomes on agent behavior firsthand. When human input is accepted, the arrow keys can be used to move the token.
  • Sender preferences: If a Sender has a choice of multiple messages to send, the Sender’s preferences tell him what he message he should pick. Senders can either choose to send short messages, messages that visit the goal location of the Receiver, or short messages that visit the goal location of the Receiver.
  • Reset turn: Resets the message that will be sent to the Receiver. This is only used by human Senders.
  • Playback: Repeats the Sender’s message on the game board.
  • Play turn: Play one round of the game.
  • Start and Stop: Starts and stops automatic play. When started, this mode continuously plays new games.
  • Skip game: Randomly pick a new goal configuration.
  • Clear beliefs: Resets the agents’ beliefs to forget everything they have learned.

With the game script, you can see how agents perform when their theory of mind level changes. In addition, you can experiment with how Sender preferences influence the effectiveness of theory of mind.

Negotiating with alternating offers

Theory of mind in mixed-motive interactions

The script on this page (open script in separate tab) shows the implementation of simulated agents playing the negotiation game Colored Trails. These agents make use of theory of mind, the human ability that allows us to reason about what other people know and believe. This ability is especially useful in negotiations, where the negotiating parties want to cooperate to reach an agreement, but also compete to get the best possible deal for themselves. The script on this page shows how theory of mind can help. The controls for this script are explained at the bottom of this page.
Note: The theory of mind agents behave as described in the associated paper (also available from the Publications page), with some minor changes.

Colored Trails

Figure 1: In this example, the blue player starts in the top left corner and wants to reach the bottom right corner.

Colored Trails is a negotiation game played on a board with tiles of different colors (see also the Colored Trails homepage). There are many different ways to play Colored Trails. The way we describe here is just one possibility. In the setup on this page, two players each receive a set of chips that allows them to move around on the board. Players can move horizontally and vertically to a tile next to their current location, but only if they hand in a chip of the same color as their destination tile. For example, the blue player in Figure 1 can move down by handing in a black chip. However, this means that the blue player will no longer have a black chip to reach his goal in the bottom right. Instead, the player can move right first by handing in a yellow chip. This way, he can still use the black chip to reach his goal location.

Each player receives four chips at the beginning of the game, randomly drawn from one of the colors on the board. This means that a player may not end up with the chips he needs to reach his goal location. To help players to reach their goals, players can negotiate over ownership of the chips. Negotiation takes the form of alternating in making an offer. The initiator (blue player) always starts by making the first offer. The responder (orange player) can then decide to accept the offer of the initiator, make a new offer, or withdraw from the negotiation. If the responder accepts, the chips are divided as suggested by the responder and the negotiation ends. Alternatively, if the responder withdraws, each player keeps his own chips and the negotiation ends as well. If the responder decides to make a new offer, the players switch roles and the negotiation continues. Although this process could in principle go on forever, the game we present here has a maximum of 40 offers. This means that once the initiator and the responder have made 20 offers each, the initiator can no longer make a new offer. Instead, he has to accept the offer of the responder, or withdraw from negotiation.

Each player is scored based on how closely he ends up to his goal location, indicated by a flag on the board. The scores are listed in the table below.

Situation Change in score
Ending on your goal location +50 points
Ending anywhere but your goal location -10 points per step towards your goal location
Ending with unused chips +5 points per chip

As the table shows, players get the most points for reaching their goal, although every step in the right direction helps. Also, even if you cannot use a chip to reach your goal location, it is worth a few points. After the negotiation, the game automatically gives the players the highest possible scores given their chips.

At the start of the game, each player is placed at the center of the board and receives a random goal location. When there is at least one computer-controlled player in the game, players only know their own goal location. That is, the initiator does not know the goal location of the responder and vice versa. However, goal locations are always at least three steps away from the center. Also, the initiator and the responder never have the same goal.



Theory of mind

Although the score of a player depends only on his own performance and not on the performance of the other player, whether or not the other player will accept your offer will depend on how it affects his score. It may therefore help to think about the goals of the other player. When people consider what other people want, know, or believe, they are using their theory of mind. The computer agents in the game on this page also make use of theory of mind to predict what the other player is going to do. The game allows the user to restrict agents in their ability to make use of theory of mind. This way, we can find out how higher orders of theory of mind allow agents to negotiate more effectively.

Figure 2: The orange zero-order theory of mind agent believes that the behaviour of the blue player is consistent. If the blue player rejects some offer (eg. 1 black, 2 white, 1 yellow chip), the orange player believes that the blue player will also reject a smaller offer (eg. 2 white, 1 yellow chip).

The lowest possible order of theory of mind is zero-order theory of mind. Zero-order theory of mind agents try to model others through patterns of behaviour. A zero-order theory of mind agent tries to find out what kind of offers are more likely to be successful, without reasoning about the goals of the other player. Through experience, a zero-order theory of mind agent will find out that asking for more than 6 chips, while leaving 2 or fewer chips for the other player, is very unlikely to be accepted. Instead, the zero-order theory of mind agent learns to make “fair” offers without knowing what “fair” means. To help the zero-order theory of mind agent along, agents are programmed with 1,000 games of experience. That is, when starting a game for the first time, an agent already has 1,000 negotiations worth of experience to let him know what kind of offers are more successful than others.

The zero-order theory of mind agent learns what kind of offers are more successful, but he believes that the other player has a set of offers that he is willing to accept. The zero-order theory of mind agent makes offers as if he were pushing buttons on a machine, trying to find out what button will make the trading partner do what the zero-order theory of mind agents wants him to do. On the other hand, the zero-order theory of mind agent believes that the behaviour of the other player is more or less consistent. For example, if the other player rejects an offer, the zero-order theory of mind agent believes that the other player will also reject an offer that gives fewer chips to the other player. Figure 2 shows an example of this. The orange player believes that if the blue player rejects an offer in which the blue player would get 1 black chip, 2 white chips, and 1 yellow chip, then the blue player will also reject an offer in which the blue player gets only the 2 white chips and 1 yellow chip.

Figure 3: If the orange player has first-order theory of mind, he tries to find out what the goal location of the blue player is by analyzing the offers he receives. In this example, there is only one possible goal location for which the blue player could get a higher score with the chips he is asking for than with the chips he already has.

A first-order theory of mind agent realizes that the other player has a goal, and that the other player will only accept offers that will help him reach that goal. The first-order theory of mind agent also realizes that the other player will only make offers that increase his score. By looking carefully at the offers of the other player, the first-order theory of mind agent tries to find out what the goal of the other player is. Once the first-order theory of mind agent knows what the goal location of the other player is, he can make offers that lead to a mutually beneficial outcome.

Figure 3 shows a situation in which the blue player offers to trade one of his yellow chips and a black chip against one white chip of the orange player. If the orange player is a first-order theory of mind agent, he tries to find out for what goal locations the offer of the blue player makes sense. That is, for which goal locations would the blue player have a higher score with the chips he is asking for (2 white and 1 yellow) than with his initial set of chips (1 white, 1 black, and 2 yellow). As it turns out, there is only one such location, as shown in the thought balloon of the orange player. For all other possible goal locations, the blue player would have been better off with his initial set of chips.

In the game above, you can reveal an agent’s first-order belief though the checkbox “Show mental content“. Checking this option shows a grid like the game board, where brighter locations indicate that the agent believes it to be more likely to be the other player’s goal location. This means that once an agent is convinced that the other player has a particular goal location, that location will appear white while the other locations will be black. In addition, the weight of first-order theory of mind shows the degree to which first-order theory of mind determines the agent’s behaviour. If the weight is close to 1, the agent always selects an offer suggested by his first-order theory of mind. If the weight is close to 0, the agent tends to ignore the predictions of first-order theory of mind, and behave as a zero-order theory of mind agent instead. Finally, the accuracy indicates how accurately first-order theory of mind has predicted the behaviour of the other agent. However, note that the accuracy will be very low in the beginning of the game, while the agent does not know the goal location of the other player.

Using first-order theory of mind, an agent tries to determine what the goal location of the other player is. This allows a first-order theory of mind agent to get a better idea of what kind of offers the other player is going to accept. But an agent can also use first-order theory of mind to try and manipulate the other player. A first-order theory of mind agent believes that the other player might be a zero-order theory of mind agent, who learns what the first-order theory of mind agent wants through the offers he makes. By strategically selecting his offer, the first-order theory of mind agent can try to change the beliefs of the other player. The first-order theory of mind agent may sometimes make an offer that he knows the other player would never accept because it would reduce his score. The reason for this is to push the other player into making an offer that is better for the first-order theory of mind agent. For example, a first-order theory of mind agent may ask for 3 black chips if he believes that it would convince the the other player to offer the agent at least 2 black chips.

Figure 4: If the blue agent has second-order beliefs, he can try to manipulate what the other player believes about the agent’s goal location. In this case, the agent believes that if he could for the purple chip to make his trading partner believe he needs it to reach his goal location, even though he does not need that chip.

A second-order theory of mind agent takes his reasoning one step further, and realizes that the other player may be a first-order theory of mind agent. This means that the second-order theory of mind agent believes that the other player knows that the agent has a goal, and that the other player may be trying to find out what his goal location is. Instead of trying to find out what the goal location of the other player is, a second-order theory of mind agent can make an offer that signals his own goal location to the other player. By telling the other player what his goal location is, the agent gives the other player the opportunity to find a mutually beneficial solution.

Alternatively, a second-order theory of mind agent can select offers that give very little information about his goal location to get a higher score. For example, the second-order theory of mind agent can make an offer that suggests that his goal location is further away than it actually is. For example, the blue agent in Figure 4 believes that by asking for enough chips to reach the top left tile (2 white, 1 purple, 1 yellow chip), the other player will believe that that is his goal location, even though is actual goal location is closer to the center.

When an agent’s mental content is shown in the game, it shows both first-order and second-order beliefs about the goal location of the other player. In addition, the weight of second-order theory of mind indicate to what degree second-order theory of mind influences the behaviour of the agent, while the accuracy shows how close the predictions made by second-order theory of mind match the offers actually made by the other agent.

An important feature of the agents in the game is that although they use theory of mind to predict future behaviour, they have no memory to recall previous behaviour. An agent sees the offer made by the other player, changes his beliefs accordingly, and then forgets he ever saw the offer. One of the behaviours you may see a lot is agents “insisting” on a certain distribution of chips by making the same offer over and over again. In part, this is because the agents do not remember making that offer before.

Controls

The script below has a number of controls to show the effect of using a higher orders of theory of mind on the performance of agents in rock-paper-scissors.

  • Initiator/Responder theory of mind: The radio buttons determine the order of theory of mind of the two players. Players can be any order of theory of mind up to second-order. Additionally, players can be controlled by a human user. When there is a human user in the game, the goal location of the computer player is not revealed until the end of the game. However, if two human users play the game, the goal are not hidden.
  • Show mental content: The mental content shows the agent’s first-order and second-order beliefs concerning the goal location of the other player. When a human user is playing the game, this information can give some information on how the offers are interpreted by agents. However, for a more challenging negotiation, uncheck the option to hide mental content.
  • Accept offer, Make new offer and Withdraw from negotiation: When a human user is playing the game, these buttons allows control over the next move. Use the arrow buttons to select the offer you want to make and press Make new offer. Alternatively, Accept offer accepts the previous offer, while Withdraw from negotiation stops the game without trading any chips.
  • Play round and New game: Play one round of the negotiation game. If the game has ended, pressing this button starts a new game.
  • Start and Stop: Starts and stops automatic play. When started, this mode plays a new round every 0.5 seconds.
  • Reset game: Resets the game to the start situation. The score and accuracy information is reset to zero as well.

With the game script, you can see how agents perform better when their theory of mind level increases. In addition, you can test your ability against computer agents, and see what agents believe you are doing when negotiating.

Rock-paper-scissors

Theory of mind in competition

The script on this page (open script in separate tab) shows the implementation of simulated agents playing the game of rock-paper-scissors. These agents differ make use of theory of mind, the human ability that allows us to reason about what other people know and believe. Even though rock-paper-scissors is a simple game in which trying to outsmart your opponent seems useless, the script on this page shows how theory of mind can still be effective. The controls for this script are explained at the bottom of this page.
Note: The theory of mind agents behave as described in the associated paper (also available from the Publications page), with some minor changes.



Game outline

Figure 1: In rock-paper-scissors, scissors (orange agent) beats paper (blue agent).

Rock-paper-scissors is a two-player game in which players simultaneously choose to play either rock, paper or scissors. If the two players made the same choice, the game ends in a tie. However, if one player chose scissors, while the other chose paper, the player that chose scissors wins (see Figure 1). In the same way, rock wins from scissors, and paper wins from rock.

According to game theory, the only stable strategy when playing rock-paper-scissors is to randomly choose one of the possibilities. After all, if you play according to some pattern, the other player might learn that pattern over many repeated games, and exploit that knowledge. Playing randomly makes sure that the opponent cannot learn any patterns in the way you play the game. Although this strategy works well, people are not very good at playing randomly. For example, people usually avoid playing rock when they have just played rock two times in a row, even though this should not matter in truly random play. Also, if there are some people that are not playing randomly, smart players may be able to exploit this and get a higher score than a random player.

Theory of mind

In game settings, people often consider what their opponents know and believe, by making use of what is known as theory of mind. The computer agents in the game on this page also make use of theory of mind to predict what their opponent is going to do. The game allows the user to restrict agents in their ability to make use of theory of mind. This way, we can determine whether higher orders of theory of mind allows agents to win more often in rock-paper-scissors.

Figure 2: The blue zero-order theory of mind agent tries to learn patterns in the behavior of his opponent.

The lowest possible order of theory of mind is zero-order theory of mind. Zero-order theory of mind agents try to model their opponent through patterns of behavior. For example, suppose that the opponent has always played paper before. In this case, the zero-order theory of mind agent believes that it is very likely that she will be playing paper again (see Figure 2). When a zero-order theory of mind agent sees that his opponent is often playing paper, he will try to take advantage of this by playing scissors. In the rock-paper-scissors script, the red bars indicate the agent’s zero-order tendencies to play (R)ock, (P)aper, or (S)cissors. The higher the bar for rock (R), for example, the more likely it is that a zero-order theory of mind agent will play rock.

The text below the red bars show to what extent zero-order theory of mind determines the next action of the agent (“Weight”), as well as the average accuracy of the predictions of zero-order theory of mind (“Accuracy”).

Figure 3: If the blue agent has first-order beliefs, he believes that his orange opponent may be trying to learn and exploit patterns in his behavior. By looking at the patterns in his own behavior, the blue agent predicts how the orange opponent will try to exploit these patterns.

A zero-order theory of mind agent tries to learn patterns in the behavior of his opponent, but does not realize that his opponent could be doing the same thing. A first-order theory of mind agent realizes that his opponent may be using zero-order theory of mind. He tries to predict what his opponent is going to do by placing himself in her position. He looks at the game from the perspective of his opponent to determine what he would do if the situation were reversed. The first-order theory of mind agent then uses this action as a prediction of his opponent’s behavior. For example, suppose that the first-order theory of mind agent realizes he has been playing paper a lot. He believes that his opponent may be trying to take advantage of this by playing scissors. If that is true, the agent can take advantage of this behavior by playing rock (see Figure 3).

In the game, the green bars indicates how likely a first-order theory of mind agent considers it to be that the agent will win the next round by playing (R)ock, (P)aper, or (S)cissors. This suggestion combines the agent’s zero-order and first-order beliefs. Below the graph, the weight indicates to what extent first-order theory of mind influences the decision of the agent. The accuracy indicates the average accuracy of the agent’s first-order beliefs in predicting the behavior of the opponent.

Figure 4: If the blue agent has second-order beliefs, he believes that his orange opponent believes that he himself is trying to learn and exploit patterns in her behavior. This allows him to anticipate how the orange opponent will try to exploit his behavior.

A second-order theory of mind agent takes his reasoning one step further, and realizes that his opponent may be a first-order theory of mind agent. He puts himself into the position of his opponent, but also believes that she might be putting herself into his position. For example, suppose that the second-order theory of mind agent realizes his opponent is playing paper a lot. Zero-order theory of mind makes him realize that he could take advantage of this predictable behavior by playing scissors. A second-order theory of mind agent thinks that his opponent may be expecting him to do so, and therefore that she will play rock to take advantage of the way he behaves. If that is true, the agent should continue playing paper himself (see Figure 4). The agent’s second-order beliefs are indicated by the blue bars.

In the script on this page, agents can continue this stepwise reasoning even further to use third-order and even fourth-order theory of mind. The associated beliefs are represented by orange and gray bars, respectively.

Although the agents in the game use theory of mind to predict the future, they do not remember the past choices of their opponent. Instead, when they see the outcome of a game, they form beliefs about what the opponent is going to do next time. After this, they immediately forget what they saw. This means that these agents can only look at very simple patterns of behavior. However, increasingly higher orders of theory of mind allow the agents to exhibit increasingly more complex patterns of behavior. Using the script on this page, you can experiment to see to what extent higher orders of theory of mind are still useful in rock-paper-scissors. In addition, you can also play the game against one of the agents yourself. The mental content of the agent then shows how closely your behavior corresponds to behavior of agents of different orders of theory of mind.

Controls

With the script, you can see how agents perform better when their theory of mind level increases. In addition, you can test your ability against computer agents, and see what agents believe you are doing when playing rock-paper-scissors.

  • Player 1/2 theory of mind: The radio buttons determine the order of theory of mind of the two players. Players can be any order of theory of mind up to fourth-order. Additionally, the second player can be controlled by a human user.
  • Learning speed: Determines how quickly an agent changes his beliefs based on new information. A learning speed of 0.0 means that an agent does not learn at all, but will always repeat the same behavior. An agent with learning speed 1.0, on the other hand, believes that the previous game gives him all the information he needs to predict his opponent’s next action. Agents do not try to model the learning speed of their opponent. Instead, if the two agents have different learning speeds, they will not be able to correctly model the beliefs of their opponent.
  • Reset game: Resets the game to the start situation. The score and accuracy information is reset to zero as well.
  • Play round: Play one game of rock-paper-scissors. This can only be done when player two is not user-controlled.
  • Rock, paper and scissors: When player two is user-controlled, selecting one of the three possible moves plays one game, with player two’s choice.
  • Show mental content: A human player can use the graphs to determine what the agent will do next, or what a computer agent would do next if he were the one to play next. For a human player, the game is more challenging if the graphs are not visible. Uncheck the box to hide mental content information from the graphs.

An older version of the rock-paper-scissors script is available as a Java applet. However, for security reasons, many browsers no longer allow Java applets to be run from a web browser. The rock-paper-scissors applet can be still be downloaded for offline use.

Limited Bidding

Theory of mind in competition

The script on this page (open script in separate tab) shows the implementation of simulated agents playing the game of Limited Bidding. These agents make use of theory of mind, the human ability that allows us to reason about what other people know and believe. Agents use their theory of mind to try and outsmart their opponent. The script on this page demonstrates the effectiveness of this strategy. The controls for this script are explained at the bottom of this page.
Note: The theory of mind agents on this page do not behave the same way as described in the associated paper (also available from the Publications page). Instead of calculating their planned moves for the entire game, agents plan only one action and use a Q-learning approach to learn values for the resulting state.



Game outline

Limited bidding is a simplified version of a game described in “Edward de Bono’s Supermind Pack”. The game is played by two players. At the beginning of the game, each player receives an identical set of five numbered tokens. The number on each token corresponds to the value of the token. Over the course of five rounds, players simultaneously choose one of their own tokens to play. Whoever picks the highest value token wins the round. If both players pick the same value token, there is no winner. However, each token can only be used once per game. This means that players should choose their actions strategically.

Game-theoretically, the optimal way to play limited bidding is by randomizing every choice. That is, under the assumption of common knowledge of rationality, a players should randomly pick one of the tokens still available to them. However, the theory of mind agents modeled on this page suspect that their opponent may not be fully rational. Moreover, they are limited in their ability to make decisions themselves. By playing the game repeatedly against the same opponent, agents try to learn to predict what their opponent will do, and change their strategy accordingly.

Theory of mind

Theory of mind refers to the individual’s ability to model mental content of others, such as beliefs, desires or intentions. The agents modeled in the script are constrained in their theory of mind. At the most basic level, a zero-order theory of mind agent tries to model his opponent through patterns of behavior. For example, a zero-order theory of mind agent might find out that his opponent always plays token 5 at the start of the game, or tends to save token 3 for last. However, he is unable to realize that his opponent might be doing the same. In fact, a zero-order theory of mind agent does not realize that his opponent has goals that are opposite to the ones he has himself. In the script, the agent’s zero-order beliefs are represented by red bars. The height of each red bar indicates how likely the agent believes it to be that his opponent is going to play a certain token.

A first-order theory of mind agent realizes that his opponent might be a zero-order theory of mind agent, and tries to predict what she is going to do by putting himself in her position. He looks at the game from the point of view of his opponent to determine what he would believe if the situation were reversed, and uses this as a prediction for his opponent’s actions. For example, a first-order theory of mind agent might realize that he has started the game by using token 3 a few times in a row, and suspect that his opponent is going to try and take advantage of that by playing 4 in the first round. The agent’s first-order theory of mind would therefore suggest that the agent plays 5 to win the first round. In the script, the height of the green bars represent the agent’s first-order beliefs concerning his opponent’s next action.

A second-order theory of mind agent takes this theory of mind reasoning one step further. He puts himself into the position of his opponent, but also believes that she might be putting herself into his position. In the script, the height of the blue bars indicate the agent’s second-order beliefs.

Based on zero-order, first-order and second-order beliefs, an agent makes different predictions about what his opponent is going to do. The agent must therefore also form beliefs about which of these predictions will yield the best results. An agent’s combined beliefs represent how the different order of theory of mind are combined into a single prediction of his opponent’s actions. In the script, each bar graph depicting an agent’s theory of mind beliefs also indicates the accuracy of the predictions of that particular order of theory of mind, as well as the weight of these predictions in the agent’s next action. For example, a second-order theory of mind agent that has zero weight for his second-order beliefs will ignore his second-order theory of mind, and act as if he were an agent of a lower order of theory of mind.

Although the agents in the script make use of theory of mind, they do not remember the choices of their opponent. Instead, when they see the outcome of a game, they form beliefs about what the opponent is going to do next time and forget what they saw. As an alternative type of agent, a high memory agent is a zero-order theory of mind agent that remembers what his last choice was. That is, the high memory agent forms beliefs about what his opponent is going to do in reaction to him playing each of the possible tokens. In terms of memory, a high memory agent uses about the same amount of space as a second-order theory of mind agent, although this space is used differently. Although the high memory agent is not available in the script on this page, this agent is included in the applet example, which you can download at the bottom of the page.

Controls

The script has a number of controls to allow users to judge the effect of using a higher order of theory of mind on the performance of agents in the limited bidding game.

  • Player 1/2 theory of mind: The radio buttons determine the order of theory of mind of the two players. Players can be any order of theory of mind up to second-order. Additionally, the second player can be controlled by a human user.
  • Learning speed: Determines how quickly an agent changes his beliefs based on new information. A learning speed of 0.0 means that an agent does not learn at all, and will always do the same thing. An agent with learning speed 1.0 on the other hand believes that the previous game gives him all the information he needs to predict his opponent’s behavior. Agents do not try to model the learning speed of their opponent; if the two agents have different learning speeds, they will not be able to correctly model the beliefs of their opponent.
  • Reset game: Resets the game to the start situation. The score and accuracy information is reset to zero as well.
  • Start/Stop: Starts or stops the automatic play of limited bidding games. This can only be done when player two is not user-controlled.
  • Play round: Play one game of limited bidding. This can only be done when player two is not user-controlled.
  • Token buttons: When player 2 is user-controlled, selecting one of the available orange numbered tokens performs a move in the game.
  • Show mental content: A human player can use the graphs to determine what the agent believes that the human player will do next, or what a computer agent would believe if he were the one to play next instead of the human player. For a human player, the game is more challenging if the graphs are not visible.

With the applet, you can see how agents perform better when their theory of mind level increases. The applet also shows that second-order theory of mind agents outperform agents with more memory in Limited Bidding, even though they don’t do better in simple games like Rock-paper-scissors.

An older version of the limited bidding script is available as a Java applet. However, for security reasons, many browsers no longer allow Java applets to be run from a web browser. The Limited Bidding applet can be still be downloaded for offline use. As an additional feature, this Java implementation also includes high memory agents.