Thursday, January 08, 2015

Computer becomes "unbeatable" at poker

Here's a longer version of my story on Nature News on the new poker-playing computer program.

_________________________________________________________________

A computer algorithm has perfected the art of playing a popular version of the gambling card game

A new computer algorithm can play poker, in one of its most popular variants, essentially perfectly. Its creators say that it is virtually “incapable of losing against any opponent in a fair game.”

This is a step beyond a computer program that can beat top human players, as Deep Blue famously did against Garry Kasparov in chess in 1997. The poker program devised by Michael Bowling and colleagues at the University of Alberta in Edmonton, Canada, along with Finnish software developer Oskari Tammelin, plays perfectly, to all intents and purposes. This means that this particular variant of poker, called Heads-up Limit Hold’em (HULHE), can be considered “solved”. The algorithm is described in a paper in Science [1].

The strategy they have computed, says computer poker researcher Eric Jackson in Menlo Park, California, is so close to perfect “as to render pointless further work on this game.”

“I think that it will come as a surprise to experts that a game this big has been solved”, says Jackson. “I follow the work on computer poker closely and I did not expect that heads-up limit was going to be solved this soon.”

A few other popular games have been solved before, including checkers, which a team from the same computer science department at Alberta (including Neil Burch, coauthor of the new study) cracked in 2007 [2].

But poker is harder to solve than checkers. Chess and checkers are examples of perfect-information games, where players can have perfect knowledge of all past events in a game. In poker, there are some things a player doesn’t know: most crucially, which cards the other player has been dealt.

Devising a strategy to deal perfectly with that uncertainty is very hard. This hidden information is what gives a poker player the opportunity to bluff: to face the other player down with a relatively weak hand.

But although bluffing looks like a very human, psychological element of the game, it’s not. You can calculate how to bluff optimally. “Bluffing falls out of the mathematics of the game”, says Bowling. If you’re dealt a jack, say, it is possible to figure out how often you should ideally bluff with it. Some of the early pioneers of game theory, such as John von Neumann, aimed to develop mathematical strategies for bluffing.

The real challenge for a poker algorithm is dealing with the immense number of possible ways the game can be played. Bowling and colleagues have looked at one of the most popular forms, called Texas Hold’em, in which the dealer deals two cards to each player (face down) and also a set of face-up “community cards”. Players can bet, raise or fold after each deal. With just two players, the game becomes Heads-up, and it is a “limit” game when it has fixed bet sizes and a fixed number of raises. There are 3.16×10^17 states that HULHE can reach, and 3.19×10^14 possible points where a player must make a decision.

The new algorithm involves calculating all possible decisions in advance, so that they can just be looked up as a game proceeds. This is done in a learning process: the strategy begins by making decisions randomly, and is then updated by experience as the algorithm attaches a “regret” value to each decision depending on how poorly it fared. It takes a little more than 1500 training rounds to make the program essentially invincible.

This “counterfactual regret minimization” (CFR) procedure has been widely adopted in the Annual Computer Poker Competition, which has run since 2006. But Bowling and colleagues have now improved the procedure by allowing it to re-evaluate decisions considered to be poor in earlier training rounds.

The other crucial innovation was the handling of the vast amounts of information needing to be stored to develop and use the strategy – of the order of 262 terabytes. This volume of data demands disk storage, which is slow to access. The researchers figured out a data-compression method that reduces the volume to a more manageable 11 TB, and which adds only 5% to the computation time from the use of disk storage.

“I think the counterfactual regret algorithm is the major advance”, says computer scientist Jonathan Shapiro of the University of Manchester. “But they have done several other very clever things to make this problem computational feasible.”

Although the new algorithm plays “perfectly”, it is not necessarily unbeatable, since there is a large chance element in poker: it depends on the hand you’re dealt. The algorithm “may in fact lose after any finite number of hands, if it were unlucky”, says Bowling. But it always wins in the long run.

What’s more, there are two other limitations. It only “weakly” solves HULHE, meaning that it plays perfectly only if the strategy has been played throughout, and not if the computer player is suddenly dropped into the middle of a game (a situation that is in fact not really clearly defined for poker, as it is say for checkers).

And it is only what the researchers call “essentially solved”, meaning that it is not strictly unbeatable: there is an extremely small margin by which, in theory, it might be beaten by skill rather than chance. But this margin is negligible in practice. “Even if a human could identify the perfect counter-strategy to exploit our solution”, says Bowling, “and even if they could play this counter-strategy without error, and even if they spent 70 years only playing poker (over 60 million hands), they still couldn’t be statistically confident that they are winning [by superior play rather than chance].”

There has been heated debate about the use of computers (poker bots) for online poker games. “For quite a while now there has been a struggle between people who write bots and try to get them to play surreptitiously on online sites, and the sites who try to detect them and ban them”, says Jackson. But Bowling says that their algorithm will have little direct effect on this, because the popularity of online HULHE has declined as human players got better. “While it may dry up the last vestige of online HULHE play just due to the perception that humans can query a perfect strategy, this impact is going to minimal”, he says. But it may respark a philosophical discussion about bots in online poker.”

“I definitely think the ideas in this paper will be fruitfully applied to other forms of poker, such as no-limit”, says Jackson. “And more generally to other games, whether with dice or cards or whatever, that have imperfect information.”

But the stakes might be higher still for imperfect-information “games” beyond mere play and gambling. The stock market seems an obvious candidate, but Bowling explains that there the unknowns are too great. “The deck of cards in the stock market, which define the distribution at chance events, is not common knowledge and not even really knowable.”

He says that it might be useful, however, for portfolio management. “We’ve been investigating robust decision-making where the goal is to optimize a particular risk measure (such as value-at-risk)”, he says. “Such robust decision-making scenarios can often be cast as a game having an almost identical form to some of our poker games, with our solution techniques be immediately applicable.” The team’s current explorations beyond poker are, however, focused on supporting medical decision-making, in collaboration with diabetes specialists.

1. Bowling, M., Burch, N., Johanson, M. & Tammelin, O. Science 347, 145-149 (2015).
2. Schaeffer, J. et al., Science 317, 1518-1522 (2007).

229 comments:

«Oldest   ‹Older   401 – 229 of 229
«Oldest ‹Older   401 – 229 of 229   Newer› Newest»