One of my all-time favorite games is Hanabi, a cooperative game where you hold your cards facing outwards, so other players can see them but you can’t. Hanabi is a great game for two people to play over and over again and try to master. My wife and I have played for years, and after every round we discuss our play and try to improve our strategy. It was only after playing many times, and getting deep into the game, that we noticed a central ambiguity in the rules. Your goal in the game is to play out cards, in sequence, onto 5 stacks. When the game ends, your score is the value of the top card of each stack. If you have a perfect game, and successfully play out all the cards, your score is 25 points.
Seems pretty straightforward, right? But one night, while discussing our strategy, we realized there was a big difference between playing in such a way as to maximize our score, and playing in a way to maximize our chances of getting a perfect score of 25, and that we had to pick between them.
Sometimes you are faced with a choice like this:
Move 1: results in a 75% chance of scoring 24 and a 25% chance of scoring 25.
Move 2: results in a 75% chance of scoring 25 and 25% chance of scoring 20.
In terms of expected value, that means move 1 is worth 24.25 points, and move 2 is worth 23.75. So we should go with move 1, right? But, wait, is it obvious that the goal of the game is to maximize points? Getting 25 is a perfect score. Isn’t that what we should be trying to do? Almost by definition?
One of the ways ambiguities like these are often resolved is in metagame structures like tournaments. You can imagine a Hanabi tournament whose structure made it perfectly clear which move to make, by giving (or not giving) a special value to perfect scores. The clarity of the tournament’s rules would function as a protective barrier, sealing up the tiny leak this small ambiguity had revealed in the game’s core ruleset. (Of course we know tournaments themselves can sometimes suffer the same kind of problem.)
But here there was no tournament, just the two of us, sitting around the dining room table, trying to figure it out. It wasn’t clear which of these two ways of thinking about the game’s ultimate goal was intended by Antoine Bauza, the game’s designer. Both led to interesting choices, neither was clearly “broken”. Eventually, we decided based on what seemed most fun to us.
There’s something ironic about encountering this kind of ambiguity in a game. Because, typically, games work by creating tiny worlds that, unlike the real one, are precisely defined and clearly delineated. The real world is filled with situations where it’s very hard to know what to do. It’s hard to know what to do in Chess, too, but this second kind of hardness is completely different. Unlike the wicked problems of the real world, the problem of Chess is specially designed so that we can get our hands around it, it’s colossally big, but well-defined and neatly tractable.
Of course games are full of ambiguities in terms of how we think about them, how we interpret them, all of our messy and convoluted interactions with their representational, symbolic, social, and cultural meanings. What’s surprising is encountering ambiguity in a game’s abstract, formal system, like the part where it says what you’re actually trying to do. To look closely at what seemed to be a simple, clearly-defined outline and see it dissolve, on close inspection, into a permeable blur.
The concept of Donkeyspace is a way of talking about this kind of ambiguity, when it’s not totally clear, in a game, which problem you’re actually trying to solve. Donkeyspace occurs when players intentionally choose moves that they know are sub-optimal according to one way of interpreting the game’s goal, but optimal according to another. In Poker, where the term originates, it highlights the difference between game-theoretically optimal (GTO) strategy and a different strategy we might call profit maximization. A GTO strategy is one that no other strategy outperforms. No matter what strategy you go up against, if you are playing GTO, no one can have an edge against you. This is called being “unexploitable”. Intuitively, you could think of GTO as the solution to Poker, once you’ve found it, you know there’s no strategy that can beat it, so you’re done, right?
Except, what if you find yourself playing against Push-Bot, a simplistic strategy that goes all-in every hand? Yes, GTO will have an edge against Push-Bot, it will make money, but it won’t make as much money as it could. To make the most money while playing against Push-Bot, you should deviate from GTO in order to capitalize on Push-Bot’s mistakes. In addition to making more money than GTO (if done correctly) this “pluperfect” strategy is also exponentially more challenging to execute. Not only do you need to know GTO perfect play, you also need to be able to identify exploitable opponents when you encounter them, and deviate from GTO in just the right direction to exploit them. And you have to do this while navigating a population of strategies that range from fully exploitable to deviously deceptive counter-predators who appear to be exploitable but are secretly trying to trick you into shifting too far from GTO and then punish your over-adjustment before you’ve had a chance to retreat back to safety. This is the jungle of Donkeyspace.
Another example from Poker is the question of game selection. Seeking out games where you have a big edge against weaker players is one of the most important aspects of maximizing your profit. On the other hand, the best players often seek out the toughest opponents they can find. For them, gaining an edge against the best players in the world is the ultimate goal of the game.
So what, then, is the actual goal of Poker? Is it perfect play? Profit maximization? Maximizing your rate of learning? Beating the best competition you can find? Winning bracelets? This question can’t be resolved by reference to the game’s rules, or to any book on strategy. There is a sense in which it cannot be resolved at all.
Something similar happened during the famous 2016 match between AlphaGo and Lee Sedol. While AlphaGo played in superhuman way that the professional commentators recognized as sublime, it also made several moves that seemed oddly mediocre, where clearly better alternatives were obviously available. Why was AlphaGo making these apparently “slack” moves? Eventually, this mystery was (sort of) resolved. The reason these overlooked moves seemed obviously better to the human commentators was that they would have increased AlphaGo’s score. But AlphaGo didn’t care about absolute score, it only cared about chance of winning. For us, it’s only natural to use the margin between our score and our opponent’s score as a kind of proxy for our chance of winning. It’s better to be winning by 50 points than to be winning by 1 point because this bigger margin protects us against the variance in our noisy, imperfect predictions about how the game will unfold.
But AlphaGo didn’t evolve to think the way we do. It was programmed to maximize the chance of winning, not win by the largest margin. And these so-called “slack” moves were moves that did just that.
Was AlphaGo right? Is this the “correct” way to play Go? I think the answer is simply yes. The goal of Go is clearly to win the game, not to “win by a lot”. We’ve just been using the “win by a lot” heuristic because it’s so hard for us to calculate the actual effects of our moves to the degree of precision that AlphaGo can.
And how long have we been making this same mistake in real life? For how long have we hungered to crush our enemies, to see them driven before us, to hear the lamentations of their women? Has this desire for domination, for the total subjection of our opponents, been, all this time, a crude heuristic? A way of compensating for our inability to correctly calculate the truly optimal moves, moves which sometimes lead to victories whose margins are smaller, but with greater certainty? In other words, are we overlooking the best way of maximizing the chance of getting the thing we want, because we have mistaken this barbaric proxy for the actual thing we want, like idiots?
I believe this was the first great lesson we could have learned from AI. And, for the most part, it has gone unnoticed. It’s not that AlphaGo is wise, AlphaGo is stupid. It’s just that, somehow, miraculously, we are stupider. At least, unlike AlphaGo, we know when we’ve made a mistake. And we can decide which game we want to play.
Always a fascinating topic! I recently was having a very similar discussion with my wife about Wordle. Let's say I've got it narrowed down to three words. For my next guess I can try one of those, but if it's not right it doesn't help narrow it down at all. Or I can guess a word I know is wrong, but guarantees that my next guess will be right. Which do you go with? And if you've only got two guesses left does that change your answer?
I think this feeds into much of the same 'what do you consider victory' question.
Another example is semi-cooperative games, where 'all players lose' is a possible outcome. Some players will rate the outcomes (from best to worst) as:
> I win
> Someone else wins
> Everyone loses
But others will rank them as:
> I win
> Everyone loses
> Someone else wins
If all players don't agree on these rankings the game will break down. As a designer you can hint about which you want through theme or setting, or just saying explicitly. But there are no guarantees.
Thank you for this. It reminds me of the Ursula K. Le Guin line, "How you play is what you win."