Submitted by Eric Peters, CIO of One River Asset Management
Prisoner’s Dilemma
Two members of a criminal gang are arrested and imprisoned. Each prisoner is in solitary confinement with no means of communicating with the other. The prosecutors lack sufficient evidence to convict the pair on the principal charge, but they have enough to convict both on a lesser charge. Simultaneously, the prosecutors offer each prisoner a bargain. Each prisoner is given the opportunity either to betray the other by testifying that the other committed the crime, or to cooperate with the other by remaining silent. The offer is:
If A and B each betray the other, each of them serves 2yrs in prison. If A betrays B but B remains silent, A will be set free and B will serve 3yrs in prison (and vice versa). If A and B both remain silent, both will only serve 1yr in prison (on the lesser charge). If two players play prisoner’s dilemma more than once in succession and they remember previous actions of their opponent and change their strategy accordingly, the game is called iterated prisoner’s dilemma. The iterated prisoner’s dilemma game is fundamental to some theories of human cooperation and trust. On the assumption that the game can model transactions between two people requiring trust, cooperative behavior in populations may be modeled by a multi-player, iterated, version of the game. Which I will explore in today’s Anecdote (below).
In the iterated prisoner’s dilemma version, the classic game is played repeatedly between the same prisoners, who continuously have the opportunity to penalize the other for previous decisions. If the number of times the game will be played is known to the players, then (by backward induction) two classically rational players will betray each other repeatedly, for the same reasons as the single-shot variant. But in an infinite or unknown length game there is no fixed optimum strategy, and prisoner’s dilemma tournaments have been held to compete and test algorithms for such cases in an attempt to determine optimal strategies.
In such tournaments, when these encounters are repeated over a long period of time with many players, each with different strategies, greedy strategies tend to do very poorly in the long run while more altruistic strategies do better, as judged purely by self-interest. The winning strategy is “tit for tat”. The strategy is simply to cooperate on the first iteration of the game; after that, the player does what his or her opponent did on the previous move. Depending on the situation, a slightly better strategy can be “tit for tat with forgiveness”. When the opponent defects, on the next move, the player sometimes cooperates anyway. This allows for occasional recovery from getting trapped in a cycle of defections.
Almost all top-scoring strategies are “optimistic” (not defecting before its opponent does); therefore, a purely selfish strategy will not “cheat” on its opponent, for purely self-interested reasons. However, the successful strategy must not be a...