FACTOID # 86: Around 80% of all livejournal users are from the United States of America.
 
 Home   Encyclopedia   Statistics   Countries A-Z   Flags   Maps   Education   Forum   FAQ   About 
 
WHAT'S NEW
RECENT ARTICLES
More Recent Articles »
 

Encyclopedia > Prisoners Dilemma
Will the two prisoners cooperate to minimise total loss of liberty or will one of them, trusting the other to cooperate, betray him so as to go free?
Will the two prisoners cooperate to minimise total loss of liberty or will one of them, trusting the other to cooperate, betray him so as to go free?

The prisoner's dilemma is a type of non-zero-sum game. In this game theory problem, as in many others, it is assumed that each individual player is trying to maximise his own advantage, without concern for the well-being of the other player. This Nash equilibrium does not lead to a jointly optimum solution in the prisoner's dilemma; in the equilibrium, each prisoner chooses to defect even though the joint payoff of the players would be higher by cooperating. Unfortunately (for the prisoners), each player has an individual incentive to cheat even after promising to cooperate. This is the heart of the dilemma.


In the iterated prisoner's dilemma cooperation may arise as an equilibrium outcome. Here the game is played repeatedly. Since the game is repeated, each player is afforded an opportunity to punish the other player for previous non-cooperative play. Thus, the incentive to cheat may be overcome by the threat of punishment, leading to a superior, cooperative outcome.

Contents

The classical prisoner's dilemma

The classical prisoner's dilemma (PD) is as follows:

Two suspects, you and another person, are arrested by the police. The police have insufficient evidence for a conviction, and having separated the both of you, visit each of you and offer the same deal: if you confess and your accomplice remains silent, he gets the full 10-year sentence and you go free. If he confesses and you remain silent, you get the full 10-year sentence and he goes free. If you both stay silent, all they can do is give you both 6 months for a minor charge. If you both confess, you each get 6 years.

It can be summarized thus:

You Deny You Confess
He Denies Both serve six months He serves ten years; you go free
He Confesses He goes free; you serve ten years Both serve six years

Let's assume both prisoners are completely selfish and their only goal is to minimize their own jail terms. As a prisoner you have two options: to cooperate with your accomplice and stay quiet, or to betray your accomplice and confess. The outcome of each choice depends on the choice of your accomplice; unfortunately, however, you don't know the choice of your accomplice. Even if you were able to talk to him, you couldn't be sure whether to trust him.


If you expect your accomplice will choose to cooperate and stay quiet, the optimal choice for you would be to confess, as this means you get to go free immediately, while your accomplice lingers in jail for 10 years. If you expect your accomplice will choose to confess, your best choice is to confess as well, since then at least you can be spared the full 10 years serving time and have to sit out 6 years, while your accomplice does the same. If however you both decide to cooperate and stay quiet, you would both be able to get out in 6 months.


Confessing is a dominant strategy for both players. No matter what the other player's choice is, you can always reduce your sentence by confessing. Unfortunately for the prisoners, this leads to a poor outcome where both confess and both get heavy jail sentences. This is the core of the dilemma.


If reasoned from the perspective of the optimal interest of the group (of two prisoners), the correct outcome would be for both prisoners to cooperate with each other, as this would reduce the total jail time served by the group to one year total. Any other decision would be worse for the two prisoners considered together. However by each following their selfish interests, the two prisoners each receive a lengthy sentence.


If you had an opportunity to punish the other player for confessing, then a cooperative outcome could be sustained. The iterated form of this game (discussed below) presents an opportunity for such punishment. In that game, if your accomplice cheats by confessing this time, you can punish him by cheating next time yourself. Thus, the iterated game builds in an opportunity for punishment absent in the classic one-period game.


A similar but different game

The cognitive scientist Douglas Hofstadter (see References, below) once suggested that people often find problems such as the PD problem easier to understand when it is illustrated in the form of a simple game, or trade-off. One of several examples he used was two people meeting and exchanging closed bags, with the understanding that one of them contains money, and the other contains an item being bought. Either player can choose to honor the deal by putting into his bag what he agreed, or he can defect by handing over an empty bag. In this exchange game defection is always the best course.


The PD payoff matrix

In the same article, Hofstadter also observed that the PD payoff matrix can, in fact, be written in a variety of ways, as long as it conforms to the following principle:

T > R > P > S

where T is the temptation to defect (ie, what you get when you defect and the other player cooperates); R is the reward for mutual cooperation; P is the punishment for mutual defection; and S is the sucker's payoff (ie, what you get when you cooperate and the other player defects).


(It is also usually the case that (T + S)/2 < R, and this is required in the iterated case.)


The above formula, then, ensures that, whatever the precise numbers in each part of the payoff matrix, it is always 'better' for each player to defect regardless of what the other does.


Following this principle, and simplifying the PD to the above 'bag switching' scenario (or an Axelrod-type two player game, see below), we get the following 'canonical' PD payoff matrix — that is, the one that is normally shown in literature on the subject:

Cooperate Defect
Cooperate 3, 3 0, 5
Defect 5, 0 1, 1

In "win-win" terminology the table would look like this:

Cooperate Defect
Cooperate win-win lose much-win much
Defect win much-lose much lose-lose

Real-life examples

These particular examples, involving prisoners and bag switching and so forth, may seem contrived, but there are in fact many examples in human interaction as well as interactions in nature which have the same payoff matrix. The prisoner's dilemma is therefore of interest to the social sciences such as economics, politics and sociology, as well as to the biological sciences such as ethology and evolutionary biology.


In political science, for instance, the PD scenario is often used to illustrate the problem of two states engaged in an arms race. Both will reason that they have two options, either to increase military expenditure or to make an agreement to reduce weapons. Neither state can be certain that the other one will keep to such an agreement; therefore, they both incline towards military expansion. The irony is that both states seem to act rationally, but the result is completely irrational.


Another interesting example concerns a well-known concept in cycling races, for instance in the Tour de France. Consider two cyclists halfway in a race, with the peloton (larger group) at great distance. The two cyclists often work together (mutual cooperation) by sharing the tough load of the front position, where there is no shelter from the wind. If neither of the cyclists makes an effort to stay ahead, the peloton will soon catch up (mutual defection). An often-seen scenario is one cyclist doing the hard work alone (cooperating), keeping the two ahead of the peloton. In the end, this will likely lead to a victory for the second cyclist (defecting) who has an easy ride in the first rider's slipstream.


William Poundstone, in a book about the PD (see References below), describes a situation in New Zealand where newspaper boxes are left unlocked. It is possible for someone to take a paper without paying (defecting) but very few do, recognizing the resultant harm if everybody stole newspapers (mutual defection).


Lastly, the theoretical conclusion of PD is one reason why, in many countries, plea bargaining is forbidden. Often, precisely the PD scenario applies: it is in the interest of both suspects to confess and testify against the other prisoner/suspect, even if each is innocent of the alleged crime. Arguably, the worst case is when only one party is guilty — here, the innocent one is unlikely to confess, while the guilty one is likely to confess and testify against the innocent.


Many real-life dilemmas involve multiple players. Although metaphorical, Hardin's tragedy of the commons may be viewed as an example of a multi-player generalisation of the PD: Each villager makes a choice for personal gain or restraint. The collective reward for unanimous (or even frequent) defection is very low payoffs (representing the destruction of the "commons").


The iterated prisoner's dilemma

In his book The Evolution of Cooperation (1984), Robert Axelrod explored an extension to the classical PD scenario, which he called the iterated prisoner's dilemma (IPD). In this, participants have to choose their mutual strategy again and again, and have memory of their previous encounters. Axelrod invited academic colleagues all over the world to devise computer strategies to compete in an IPD tournament. The programs that were entered varied widely in algorithmic complexity; initial hostility; capacity for forgiveness; and so forth.


Axelrod discovered that when these encounters were repeated over a long period of time with many players, each with different strategies, "greedy" strategies tended to do very poorly in the long run while more "altruistic" strategies did better, as judged purely by self-interest. He used this to show a possible mechanism to explain what had previously been a difficult hole in Darwinian theory: how can seemingly altruistic behavior evolve from the purely selfish mechanisms of natural selection?


The best deterministic strategy was found to be "Tit for Tat", which Anatol Rapoport developed and entered into the tournament. It was the simplest of any program entered, containing only four lines of BASIC, and won the contest. The strategy is simply to cooperate on the first iteration of the game; after that, do what your opponent did on the previous move. A slightly better strategy is "Tit for Tat with forgiveness". When your opponent defects, on the next move you sometimes cooperate anyway with small probability (around 1%-5%). This allows for occasional recovery from getting trapped in a cycle of defections. The exact probability depends on the lineup of opponents. "Tit for Tat with forgiveness" is best when miscommunication is introduced to the game. That means that sometimes your move is incorrectly reported to your opponent: you cooperate but your opponent hears that you defected.


Tit for Tat was successful, Axelrod argued, for two main reasons. Firstly, it is 'nice': that is, it starts off cooperating and only defects in response to another player's defection, so it is never responsible for initiating a cycle of mutual defections. Secondly, it is provocable, always responding to what the other player does; it punishes another player immediately when they defect, but equally immediately it responds in kind if they start to cooperate again. Such clear, straightforward behaviour means that the other player can easily understand the logic behind Tit for Tat's actions, and can therefore figure out how to work alongside it productively. It is no coincidence, incidentally, that most of the worst-performing strategies in Axelrod's tournament were ones that were not designed to be responsive to the other player's choices; against such a player, the best strategy is simply to defect every time, because you can never be sure of establishing reliable mutual cooperation.


For the iterated PD, it is not always correct to say that any given strategy is the best. For example, consider a population where everyone defects every time, except for a single individual following the Tit-for-Tat strategy. That individual is at a slight disadvantage because of the loss on the first turn. In such a population, the optimal strategy for that individual is to defect every time. In a population with a certain percentage of always-defectors and the rest being Tit-for-Tat players, the optimal strategy for an individual depends on the percentage, and on the length of the game. Simulations of populations have been done, where individuals with low scores die off, and those with high scores reproduce. The mix of algorithms in the final population generally depends on the mix in the initial population.


Although Tit-for-Tat was long considered to be the most solid basic strategy, a team from Southampton University in England introduced a new strategy at the 20th-anniversary Iterated Prisoner's Dilemma competition, which proved to be more successful than Tit-for-Tat. This strategy relied on cooperation between programs to achieve the highest number of points for a single program. The University submitted 60 programs to the competition, which were designed to recognize each other through a series of five to ten moves at the start. Once this recognition was made, one program would always cooperate and the other would always defect, assuring the maximum number of points for the defector. If the program realized that it was playing a non-Southampton player, it would continuously defect in an attempt to minimize the score of the competing program. As a result (http://www.prisoners-dilemma.com/results/cec04/ipd_cec04_full_run.html), this strategy ended up taking the top three positions in the competition, as well as a number of positions towards the bottom. Although this strategy is notable in that it proved more effective than Tit-for-Tat, it takes advantage of the fact that multiple entries were allowed in this particular competition. In a competition where one has control of only a single player, Tit-for-Tat is almost certainly a better strategy.


If an iterated PD is going to be iterated exactly N times, for some known constant N, then there is another interesting fact. The Nash equilibrium is to defect every time. That is easily proved by induction. You might as well defect on the last turn, since your opponent will not have a chance to punish you. Therefore, you will both defect on the last turn. Then, you might as well defect on the second-to-last turn, since your opponent will defect on the last no matter what you do. And so on. For cooperation to remain appealing, then, the future must be indeterminate for both players. One solution is to make the total number of turns N random.


Another odd case is "play forever" prisoner's dilemma. The game is repeated infinitely many times, and your score is the average (suitably computed).


The prisoner's dilemma game is fundamental to certain theories of human cooperation and trust. On the assumption that transactions between two people requiring trust can be modelled by the PD, cooperative behavior in populations may be modelled by a multi-player, iterated, version of the game. It has, consequently, fascinated many, many scholars over the years. As of 1975, Grofman and Pool estimate the count of scholarly articles devoted to it at over 2000.


Variants

There are also some variants of the game, with subtle but important differences in the payoff matrices, which are listed below.


Chicken

Another important non zero-sum game type is called "Chicken", named after the car racing game. Two cars drive towards each other for an apparent head-on collision - the first to swerve out of the way is "chicken". Both players can swerve to avoid the crash (cooperate) or keep going (defect). In Chicken if your opponent cooperates, you are better off to defect - this is your best possible outcome. If your opponent defects, you are better off to cooperate. Mutual defection is the worst possible outcome (hence an unstable equilibrium), but in the Prisoner's Dilemma the worst possible outcome is cooperating while the other person defects (so both defecting is a stable equilibrium). In both games, "both cooperate" is an unstable equilibrium.


A typical payoff matrix would read:

  • If both players cooperate, each gets +5.
  • If one cooperates and the other defects, the first gets +1 and the other gets +10.
  • If both defect, each gets -20.

Another example often given is that of two farmers who use the same irrigation system for their fields. The system can be adequately maintained by one person, but both farmers gain equal benefit from it. If one farmer does not do his share of maintenance, it is still in the other farmer's interests to do so, because he will be benefiting whatever the other one does. Therefore, if one farmer can establish himself as the dominant defector - ie, if the habit becomes ingrained that the other one does all the maintenance work - he will be likely to continue to do so.


Yet another example appears among animals battling for mates or territory; read "fight" for "defect", and "submit" for "cooperate". As before, the penalty for mutual defection (mutual injury) is worse than the "sucker's payoff" for showing submission, and the result is the same: a stable dominance hierarchy. However, a new feature of this example is that the game matrix isn't symmetrical: if one animal is slightly stronger than another, that one has less to lose from defecting/fighting. In the resulting dominance hierarchy, the stronger will almost certainly end up consistently defecting, and the weaker consistently cooperating, until the perception of relative strength changes and the lower-ranked animal is ready to risk a fight. Dominant animals may occasionally pick a fight to "remind" the submissive ones of their strength and ward off such challenges, but without doing (or risking) serious damage.


Assurance game

An Assurance Game has a similar structure to the prisoner's dilemma, except that the rewards for mutual co-operation are higher than those for defection. A typical pay-off matrix would read:

  • If both players cooperate, each gets +10.
  • If you cooperate and the other player defects, you get +1 and the other player gets +5.
  • If both defect, each gets +3.

The Assurance Game is potentially very stable because it always gives the highest rewards to players who establish a habit of mutual co-operation. However, there is still the problem that the players might not realise that it is in their interests to co-operate. They might, for example, mistakenly believe that they are playing a Prisoner's Dilemma or Chicken game, and arrange their strategies accordingly.


Friend or foe

Friend or Foe is a game show airing currently on the Game Show Network. It is an example of the prisoner's dilemma game tested by real people, but in an artificial setting. On the game show, three pairs of people compete. As each pair is eliminated, they play a game of Prisoner's Dilemma to determine how their winnings are split. If they both cooperate ("Friend"), they share the winnings 50-50. If one cooperates and the other defects ("Foe"), the defector gets all the winnings and the cooperator gets nothing. If both defect, both leave with nothing. Notice that the payoff matrix is slightly different from the standard one given above, as the payouts for the "both defect" and the "I cooperate and opponent defects" cases are identical. This makes the "both defect" a neutral equilibrium, compared with being a stable equilibrium in standard prisoner's dilemma. If you know your opponent is going to vote "Foe", then your choice does not affect your winnings. In a certain sense, "Friend or Foe" is between "Prisoner's Dilemma" and "Chicken".


The payoff matrix is

  • If both players cooperate, each gets +1.
  • If both defect, each gets 0.
  • If you cooperate and the other person defects, you get +0 and he gets +2.

Friend or Foe would be useful for someone who wanted to do a real-life analysis of prisoner's dilemma. Notice that you only get to play once, so all the issues involving repeated playing are not present and a "tit for tat" strategy cannot develop.


In Friend or Foe, each player is allowed to make a statement to convince the other of his friend-ishness before both make the secret decision to cooperate or defect. One possible way to 'beat the system' would be for a player to tell his rival, "I am going to choose foe. If you trust me to split the winnings with you later, choose friend. Otherwise, if you choose foe, we both walk away with nothing." A greedier version of this would be "I am going to choose foe. I am going to give you X%, and I'll take (100-X)% of the total prize package. So, take it or leave it, we both get something or we both get nothing." Now, the trick is to minimize X such that the other contestant will still choose friend. Basically, you have to know the threshold at which the utility he gets from watching you get nothing exceeds the utility he gets from the money he stands to win if he just went along.


This approach has not yet been tried in the game; it's possible that the judges might not allow it.


Business Decisions

An occurrence of the prisoner’s dilemma in real life can be found in business. Two competing firms must decide how many resources to devote to advertisement. The effectiveness of Firm A’s advertising is partially determined by the advertising conducted by Firm B. Likewise, the profit derived from advertising for Firm B is affected by the advertising conducted by Firm A. If both Firm A and Firm B choose to advertise during a given period the advertising cancels out, receipts remain constant, and expenses increase due to the cost of advertising. Both firms would benefit from a reduction in advertising. However, should Firm B choose not to advertise, Firm A could benefit greatly by advertising.


References

  • Axelrod, Robert and Hamilton, William D. (1981). "The Evolution of Cooperation". Science, 211:1390–1396.
  • Axelrod, Robert (1984). The Evolution of Cooperation.
  • Axelrod, Robert (1997). The Complexity of Cooperation. Princeton University Press. ISBN 0691015678.
  • Grofman and Pool (1975). "Bayesian Models for Iterated Prisoner's Dilemma Games". General Systems 20:185–94.
  • Hofstadter, Douglas R. (1985) The Prisoner's Dilemma Computer Tournaments and the Evolution of Cooperation Ch.29 from Metamagical Themas: questing for the essence of mind and pattern. ISBN 0465045669.
  • Poundstone, William (1992). Prisoner's Dilemma: John von Neumann, Game Theory, and the Puzzle of the Bomb. Doubleday. ISBN 0385415672. A wide-ranging popular introduction, as the title indicates.
  • Rapoport, Anatol and Chammah, Albert M. (1965). Prisoner's Dilemma. University of Michigan Press. An account of many experiments in which the psychological game Prisoner's Dilemma was played.
  • Verhoeff, Tom (1998). "The Trader's Dilemma: A Continuous Version of the Prisoner's Dilemma". Computing Science Notes 93/02, Faculty of Mathematics and Computing Science, Technische Universiteit Eindhoven, The Netherlands.
  • New Tack Wins Prisoner's Dilemma (http://www.wired.com/news/culture/0,1284,65317,00.html) (from Wired.com)

See also

External links

  • A good introduction (http://www.cdam.lse.ac.uk/Reports/Files/cdam-2001-09.pdf) to game theory with a terse and accurate treatment of the prisoner's dilemma complete with a glossary of defined terms.
  • Play the iterated prisoner's dilemma online (http://www.gametheory.net/Web/PDilemma/)
  • View research centered on the prisoner's dilemma online (http://www.ic.sunysb.edu/Stu/wbraynen/ptft/)
  • See this critic (http://www.mises.org/fullstory.aspx?Id=1404) on economists, including those of the "contractarian" school, who have used certain game theoretic results (e.g. the Prisoner's Dilemma) to justify state intervention to "improve" upon the outcome of autonomous individuals. After all, if individuals can't manage to cooperate on their own, they may need an outside agent to compel an outcome best for everyone.
  • William Thomas (http://www.objectivistcenter.org/objectivism/q-and-a-answer.asp?QuestionID=82) argues that the "prisoner's dilemma" is not the right game to model real life interaction on most issues, but that the iterated prisoner's dilemma is more commonly encountered and realistic.
Topics in game theory
Evolutionarily stable strategy - Mechanism design - No-win - Winner's curse - Zero-sum
Games: Prisoner's dilemma - Chicken - Assurance games - Ultimatum game - Matching pennies ...
Related topics: Mathematics - Economics - Behavioral economics - Evolutionary biology - Evolutionary game theory - Population genetics - Behavioral ecology
[ edit ]

  Results from FactBites:
 
Hobbes and the Prisoner's Dilemma (2490 words)
The situation in Hobbes' state of nature is like the Prisoners' Dilemma: By acting solely on self-interest, you and the other people end up in a situation that is worse for everyone than an alternative possible outcome that you could have reached.
What makes this a prisoner's dilemma is that no matter what B does, A is better off preparing; and whatever A does, B is better off preparing.
In the original prisoners' case in which the prisoners are allowed to confer and make promises, if we imagine that this situation happens again and again, getting busted on, say, drug-possession charges, where the penalties are in days rather than years, criminals will get reputations for keeping promises or violating them.
  More results at FactBites »

 

COMMENTARY     


Share your thoughts, questions and commentary here
Your name
Your location
Your comments
Please enter the 5-letter protection code


Lesson Plans | Student Area | Student FAQ | Reviews | Press Releases |  Feeds | Contact
The Wikipedia article included on this page is licensed under the GFDL.
Images may be subject to relevant owners' copyright.
All other elements are (c) copyright NationMaster.com 2003-5. All Rights Reserved.
Usage implies agreement with terms.