homunculus: Why selfishness still doesn’t pay

Here’s my latest news story for Nature.

____________________________________________________________________

A recent finding that undermines conventional thinking on the evolution of cooperation doesn’t, after all, prevent altruistic behaviour from emerging.

For the past several decades, one of the central results of game theory has seemed to be that self-interest can drive social cooperation, because in the long term selfish behaviour hurts you as much as your competitors. Last May, two leading physicists, William Press of the University of Texas and Freeman Dyson of the Institute of Advanced Study in Princeton, argued otherwise. They showed how, in the classic ‘game’ from which cooperation seems to evolve, called the Prisoner’s Dilemma, it’s possible to be successfully selfish [1].

This apparently revolutionary idea has now been challenged by two evolutionary biologists at Michigan State University in East Lansing. In a preprint [2], Christoph Adami and Arend Hintze say that the strategy proposed by Press and Dyson is “evolutionarily unstable”. In a population of agents all seeking for the best way to play the Prisoner’s Dilemma, those using the new selfish strategy will eventually be bested by more generous players.

The Prisoner’s Dilemma is a simple ‘game’ that captures the fundamental problem faced by a population of organisms competing for limited resources: the temptation to cheat or freeload. You might do better acting together (cooperating) than alone, but the temptation is to let others put in the effort or face the risks while sharing yourself in the rewards.

In the Prisoner’s Dilemma as it was formulated by researchers in the 1950s, two prisoners accused of a crime are questioned. If one helps convict the other by testifying against him, he is offered a lighter sentence. But if both testify against the other, their sentences will be heavier than if both refuse to do so.

In a single ‘round’ of this game, it always makes sense to ‘defect’ – to shop the other guy. That way you’re better off whatever your opponent does. But if the game is played again and again – if you have repeated opportunities to cheat on the other player – you both do better to cooperate. This so-called iterated Prisoner’s Dilemma has been used to show how cooperation could arise in selfish populations: those genetically disposed to cooperate will be more successful than those predisposed to defect.

But what’s the best way to play the iterated game in a population of individuals using many different strategies? In the 1980s, political scientist Robert Axelrod tried to answer that question by staging computerized tournaments, inviting anyone to submit a strategy and then pitching them all against one another in many one-to-one bouts.

The winner was a very simple strategy called Tit-for-Tat (TfT), which merely copies its opponent’s behaviour from the last round. If the opponent defected in the last round, TfT does so in the current one. Against cooperators, TfT always cooperates; against defectors, it always defects. It is, in effect, ‘tough but fair’. The moral message seemed reassuring: it pays to be nice, but nastiness should be punished.

However, in further studies it became clear that TfT might not always dominate in evolutionary games where the most successful strategies are propagated from generation to generation. Slightly more forgiving strategies, which don’t get caught in cycles of mutual recrimination by a single mistaken defection, can do better in the long run. In fact, there is no single best way to play the game – it depends on your opponents. Nonetheless, the iterated Prisoner’s Dilemma seemed to explain how cooperation between unrelated individuals might evolve: why some animals hunt in packs and why we have altruistic instincts.

Press and Dyson seemed to shatter this cosy picture. They showed that there exists a class of strategies, which for technical reasons they call zero-determinant (ZD) strategies, in which one player can force the other to accept a less-than-equal share of the ‘payoff’ in the Prisoner’s Dilemma. In effect, the victim has to either grit his teeth and accept this unfair division, or punish the other player at a greater cost to himself. This turns the game into a different one, known as an Ultimatum game, in which one player is presented with the choice of either accepting an unequal distribution of the payoff or, if he refuses, both players losing out.

It turns out that TfT is just a special case of these ZD strategies in which the payoffs happen to be equal. Like a TfT player, a ZD player bases his next choice of cooperate/defect on what happened in the last round: it is said to be a ‘memory-one’ strategy. But instead of being rigidly deterministic – this previous outcome dictates that choice – it is probabilistic: the choice to cooperate or defect is made with a certain probability for each of the four possible outcomes of the last round. A judicious choice of these probabilities enables one player to control the payoff that the other receives.

According to William Poundstone, author of the 1992 book The Prisoner’s Dilemma, “The Press-Dyson finding directly challenges the two notions at the heart of the Prisoner’s Dilemma – that you can't fool evolution, and that the most successful strategies are fair strategies.” Nonetheless, Press says that “Freeman's and my paper has been warmly received by Prisoner’s Dilemma experts. More than one has expressed regret at not having discovered the ZD strategies previously.”

“The paper did indeed cause quite a stir, because the main result appeared to be completely new, despite intense research in this area for the last 30 years”, says Adami. It wasn’t totally new, however – in 1997 game theorists Karl Sigmund of the University of Vienna and Martin Nowak of Harvard University discovered strategies that similarly allow one player to fix the other’s payoff at a specified level [3]. But they admit that “we didn’t know about the vast and fascinating realm of zero-determinant strategies.” The work of Press and Dyson “opens a new facet in the study of trigger strategies and folk theorems for iterated games, and offers a highly stimulating approach for moral philosophers,” they say.

The ZD strategies are not as dispiriting as perhaps they sound, says Press, because they allow a new balance to be found if both players understand the principles. “Once both players understand ZD, then each has the power to set the other’s score, independent of the other’s actions. This allows them to make an enforceable treaty, not possible using simpler strategies.”

In other words, the ZD strategy forces players to reflect, to think ahead, to consider the opponent’s point of view, and not just try to get the highest possible score. It “allows for a whole range of careful, deliberative negotiations”, Press says. “This is a world in which diplomacy trumps conflict.”

But now Adami and Hintze say that this world might not exist – or not for long. They find that, in an evolutionary iterated Prisoner’s Dilemma game in which the prevalence of particular strategies depends on their success, ZD players are soon out-competed by others using more common strategies, and so they will evolve to become non-ZD players themselves. That’s because ZD players suffer from the same problem as habitual defectors: they do badly against their own kind.

There’s one exception: ZD players can persist if they can figure out whether they are playing another ZD player or not. Then they can exploit the advantages of ZD strategies against non-ZD players, but will switch to a more advantageous non-ZD strategy when faced with their own kind.

That in turn means, however, that non-ZD players could gain the upper hand by using strategies that look like ZD but are not, thus fooling ZD players into abandoning their extortionate strategy. This could lead to the same kind of ‘arms race’ seen in some kinds of biological mimicry, where a harmless species evolves to look like a harmful one, while the harmful one tries to evolve away from its imitator.

Sigmund and Nowak, with colleague Christian Hilbe, have also shown in work not yet published that the ZD strategy is evolutionarily unstable, but can pave the way for the emergence of cooperators from a more selfish community. “ZD strategies do not establish a strong foothold in the population”, says Sigmund.

Game theorist and economist Samuel Bowles at the Santa Fe Institute in New Mexico feels that these results demote the interest of the ZD strategies. “The question of their evolutionary stability is critical, and the paper makes their limitations clear. Because they are not evolutionarily stable, I’d call them merely a curiosity of little interest to evolutionary biology or any of the other biological sciences.”

Adami is not so sure that they won’t be fund in the wild. “We don’t usually have nearly enough information about animal decisions”, he says. “But in my experience, anything that is imaginable has probably evolved somewhere, sometime. To gather conclusive evidence about it is a whole different matter.”

Could they be found in human society too? “It’s not inconceivable”, says Adami, “but we have to keep in mind that humans very rarely make decisions based only on their and their opponent's last move. It is much more likely that this type of strategy is in use in automated trading programs, such as those involved in high-frequency trading of stocks and commodities. However, because these programs are usually secret, we wouldn't know about it.”

References

1. Press, W. H. & Dyson, F. J. Proc. Natl Acad. Sci. USA 10.1073/pnas.1206569109 (2012).
2. Adami, C. & Hintze, A. preprint http://www.arxiv.org/abs/1208.2666 (2012).
3. Nowak, M. A., Boerlijst, M. C. & Sigmund, K., Am. Math. Soc. Monthly 104, 303-307 (1997).

homunculus

Friday, August 24, 2012

Why selfishness still doesn’t pay

No comments: