So, Sam Bankman-Fried was found guilty on all counts. In addition to wrapping up that particular subplot in the unfolding weirdness of the present moment, this verdict seems like it should serve as some kind of object lesson. But about what? What are we supposed to learn here?
I think the answer to that question has something to do with utilitarianism, the ethical framework which was first worked out thoroughly in the late 18th century and whose fundamental principle is that the morality of actions should be judged on the basis of what outcomes follow from them, that we should choose actions that lead to the greatest total good. And, as usual, I think games are involved.
Utilitarianism is foundational to the thinking of Peter Singer, the philosopher who was highly influential on the creation of the effective altruism movement. As we all know, EA played an important role in inspiring and, in his own mind at least, justifying SBF’s crime spree. It was an early encounter with EA founder Will MacAskill that sent SBF down the path of earning to give. And it was, plausibly, a genuine belief that his actions were optimizing for the greatest good that made SBF ignore convention, common sense, common decency, and rule of law in pursuit of risky value-maximizing behavior that was almost guaranteed to end in disaster.
One common take on SBF is that he was only pretending to care about this stuff, that he was a sociopath who used effective altruism as a cover to do crimes for regular reasons, but I’m not so sure. I think it’s likely he was honestly doing what he thought would maximize the overall good for everyone, that, if crypto hadn’t tanked, and FTX had survived his shady manipulations and gone on to be worth trillions, he would have given most of his fortune away. It seems to me that, ironically, SBF would almost certainly have caused less harm if he had been a regular old greedy capitalist, just trying to amass a personal fortune, rather than a starry-eyed true believer, on a mission to save the world.
Perhaps there is something fundamentally wrong with utilitarianism? Well, there is certainly something puzzling about it. On the one hand it seems obviously, almost tautologically, correct. After all, any of the ethical frameworks you might suggest in its place seem, ultimately, to come back to some version of utilitarianism as their fundamental justification. If, for example, you prefer some kind of virtue-based, rules-based, or rights-based approach, presumably you do so because you think we would be better off using such a system. It’s hard to imagine someone arguing for a system that they think would make the world worse off overall. As a result, all of these other systems seem more like different flavors of utilitarianism rather than true alternatives.
On the other hand, utilitarianism seems to lead to all kinds of confusing paradoxes - utility monsters, repugnant conclusions, Pascal muggings, St. Petersburg games - some of which are directly related to SBF’s destructive behavior. Here, for example, is an excerpt from an interview SBF did with Tyler Cowen in 2022, where he eagerly bites the St. Petersburg bullet of maximizing expected utility, no matter the risk of ruin:
TC: Should a Benthamite be risk-neutral with regard to social welfare?
SBF: Yes, that I feel very strongly about.
TC: Ok, but let’s say there’s a game: 51% [chance] you double the earth out somewhere else, 49% it all disappears. And would you keep on playing that game, double or nothing?
SBF: Yeah…take the pure hypothetical… yeah.
TC: So then you keep on playing the game. What’s the chance we’re left with anything? Don’t I just St. Petersburg Paradox you into non-existence?
SBF: No, not necessarily – maybe [we’re] St. Petersburg paradox-ed into an enormously valuable existence. That’s the other option.
The way that something like the St. Petersburg game works is that an enormous amount of value is packed into an infinitesimal slice of the probability distribution. In the vast amount of cases, you end up with nothing, but once in a blue moon, you end up with an unfathomably large payoff. As long as the math works out, then the principle of maximizing expected value tells you to keep doubling down, no matter how unlikely you are to win. Typically, there are pragmatic considerations which offset this dynamic. For example, if the payoff is in dollars, the numbers are capped by the finite amount of dollars that can realistically exist. And, regardless of the units in which the payoff is measured, for most people there are diminishing returns - once they’ve accumulated a certain amount, each additional unit of value is worth less, in pure utility, than the previous. SBF, however, thought these considerations didn’t apply in his case. After all, he wasn’t going to use the money to buy champagne and yachts, he was going to save humanity, and the entire future human light cone can absorb a lot of value.
I think it’s important, when considering problems like this, not to rush to a conclusion. It feels intuitively like SBF is making an obvious mistake. But, when you stop and think, it’s harder than expected to pinpoint exactly what the error is. This is true of effective altruism in general. Who can doubt that paying careful attention to the positive benefits of your charitable contributions is better than just YOLO-ing money at things that sound good, feel nice, are established traditions, or have some other shallow features that make them seem appealing. And yet, with its current focus on long-termism and the existential risk of future AI, for many people, EA looks like a runaway train that blew through Crazy Town three stops ago.
Again, when we disagree with some EA position, it feels like there’s a good reason that would be easy to articulate if we had to. They haven’t discounted sufficiently to account for the uncertainty of long-term predictions, they overestimate the accuracy of their models, they’ve become fanatics, they’ve been infiltrated by bad actors, they are blind to the political context of their projects, etc…
Except, if you spend any time reading and listening to the EA leadership and community, it’s clear that they are smart, thoughtful, earnest people who are aware of all of these criticisms (and plenty more we haven’t thought of), take them seriously, and do their best to rigorously and sensibly address them. The fact that, despite this, they hitched their wagon to the crypto Bernie Madoff and are leading the crusade to implement a Butlerian Jihad, is evidence of the scope and complexity of the problem, not proof of some obvious flaw in their reasoning. There’s no simple, glaring, conceptual mistake that’s easy to point to.
Perhaps the trouble with utilitarianism is not its fundamental principle, which, after all, seems eminently reasonable, but in the way the overall framework operates as a complete system. And maybe there’s a problem with big, overall systemic frameworks in general.
While examples of the “do that which will lead to the best outcome” principle can be found in various forms throughout the history of philosophy, utilitarianism as a complete system is an artifact of of the Enlightenment, and is deeply connected to Enlightenment principles of rationality - to the idea that one shouldn’t just blindly accept tradition, convention, or the pronouncements of authority, one should attempt to work things out using logic, good sense, and, when possible, math.
(It’s hard to overstate my enthusiasm for these principles, by the way. While I fully recognize the complicated and problematic aspects of the Enlightenment as a historical project, it is a project I believe in and am committed to. Its problems are my problems and I want to do what I can to work on them.)
Rationality, as a complete systemic framework, and as an identity, is subject to the same kinds of dilemmas as those outlined above for utilitarianism - occupying some strange superposition of obviously true and obviously broken.
I’m a big fan of David Chapman’s writings on this topic. He sees the problem, in a nutshell, like this - rationality is a miraculously powerful, world-changing conceptual tool. But it isn’t, and can never be, an eternal, global, universal framework. Treating it as such prevents us from understanding how it actually works and gets us into all kinds of trouble. “Doing” rationality on a problem requires us to prepare the problem in a specific way, to divide it up and organize it into the kind of measurable, quantitative units upon which the operations of mathematics and logic can be applied. But the world doesn’t naturally exist in such a state. Deciding when, and how, to go into this mode is a hard problem and one that can’t itself be solved by rationality, for obvious reasons.
There is a sense in which rationality has dismantled, or is in the process of dismantling, the eternal, global, universal frameworks on which we used to depend - superstition, myth, tradition, religion. It’s only natural to expect rationality to step into the resulting vacuum, pick up the fallen crown, and assume the throne. But what if it doesn’t? What if it can’t? This is the muddy pond into which utilitarians, rationalists, and effective altruists have stumbled. I’d like to rescue them but I’m wearing these expensive shoes.
Common sense and common decency were in short supply at the FTX Bahamas HQ. Arguably, they are in short supply everywhere. “Common” in this context doesn’t mean ubiquitous. It means non-fancy, rough-and-ready, constructed, ad-hoc, from local materials. It also means collective, distributed, shared, as in the technical, philosophical use of the term “common knowledge”, which is a necessary ingredient for things like stop signs to work. It’s not enough that everyone knows what they mean, everyone needs to know that everyone knows. How could a stupid metal sign produce the optimal strategy for directing traffic? It can’t. But norms, even arbitrary ones, are what allow us to safely co-exist, and you can see why people might zealously defend them, even the inefficient ones, even the bad ones.
What does this have to do with games? Well, allegedly, SBF played a lot of them, and always had one going during meetings and interviews to occupy his hyperactive attention. Games appeal to problem-solvers, to optimizers, and to independent thinkers who are skeptical of received wisdom, who want to figure things out for themselves. The gamer mindset, like the hacker mindset, can be stubbornly literal, seeing through social conventions, abstractions, and shared illusions to the underlying mechanisms in their brute material reality, mechanisms which can be manipulated for instrumental purposes.
And games are toy worlds that are ideally suited for exactly this kind of instrumental manipulation, in which premises, goals and outcomes are precisely and explicitly defined. I first encountered the concept of expected value in the game of Poker, where it functions perfectly well and leads to no known paradoxes. The roots of probability theory itself are in the study of games of this kind.
In his book The Black Swan, Nassim Taleb introduced the concept of the ludic fallacy, the illusion that we can accurately model the unruly and infinitely surprising behavior of the real world using the logic that governs the well-defined formal systems of games. We fall prey to this fallacy when we treat our deep uncertainty about the universe as if it was equivalent to the tame and tidy uncertainty generated by dice and cards. It is useful for those of us who love games, and who find them to be a valuable source of ideas and insights, to keep this warning in mind. Concepts like expected value, which work so brilliantly inside of games, are of limited and sporadic use beyond them. Games are different from the world. But this is not a flaw, it is the primary source of their power.
Jeremey Bentham, one of the founders of utilitarianism, thought of moral value in simple terms of pleasure and pain. Famously, Bentham used the trivial kids game of push-pin to illustrate this belief, “Prejudice apart, the game of push-pin is of equal value with the arts and sciences of music and poetry. If the game of push-pin furnish more pleasure, it is more valuable than either.”
150 years later, the anthropologist Clifford Geertz used a different Bentham quote as the centerpiece of his brilliant essay Deep Play: Notes on the Balinese Cockfight. Bentham had introduced the concept of deep play in a footnote about the irrationality of gambling with a large fraction of your bankroll. If you bet everything you’ve got, the pleasure you experience if you win is less than the pain you will suffer if you lose. Therefore, for Bentham, deep play, this kind of high-stakes gambling, is irrational, and should be discouraged, or even forbidden.
But for Geertz, observing exactly this kind of reckless, high-stakes gambling among the Balinese cockfighters, Bentham’s mathematical analysis misses something crucial:
And though to a Benthamite this might seem merely to increase the irrationality of the enterprise that much further, to the Balinese what it mainly increases is the meaningfulness of it all. And as (to follow Weber rather than Bentham) the imposition of meaning on life is the major end and primary condition of human existence, that access of significance more than compensates for the economic costs involved.
The trick of games is that they offer one set of meanings that is crystal clear - the precisely articulated rules that establish their explicit goals and lay out exactly what is and isn’t allowed in pursuit of them, and another set of meanings that is ambiguous, endlessly contested, and occasionally profound. SBF got lost somewhere between the two, and now he’s going to jail. The rest of us are out here too, somewhere between the crisp, artificial clarity of our systems and the angry explosion of blood and feathers that is the world, all in, screaming.
Anyway, this is sort of what my book is about.
I used to work as an engineer in Silicon Valley where utilitarianism was the underlying name of the game for everyone around me. But I think this engendered an ultimately useless moral philosophy I call "utilitarian nihilism". It goes like this:
Person gets job at FAANG because they want to have a positive impact on as many people as possible. But then it's pointed out (especially at Facebook, where I was) that it seems like the impact we're having on people is actually bad. "Well I'm really just a cog in this enormous machine and it's not really up to me whether we produce bad stuff or good stuff."
And so people would just shed themselves of all agency and not attempt to change anything. If utilitarianism means the greatest good for the greatest number, but I can't impact a great number of people, then I guess it actually doesn't really matter what I do one way or the other -- hence the nihilism. I think the problem here is the universality of utilitarianism encourages you to keep thinking bigger and bigger even as your ability to impact events at that scale gets smaller.
There's a similar utilitarian hand-washing that comes with eating meat too that goes like: whether or not I eat this bacon, this pig is already dead. And the next one is too because the spreadsheets they're using to decide how to confine and slaughter pigs at scale would just tell them to charge less for bacon rather than confine and kill fewer pigs. So since my actions won't make a difference, I might as well enjoy this bacon.
But if your moral philosophy can only be used to rationalize selfish decisions instead of good ones then it's clearly a bad moral philosophy -- I don't need any philosophy at all if I just want to do whatever I feel like!
For the curious, the moral philosophy I have developed for myself turns utilitarianism inside-out: I don't want to benefit from harm done to others. Regardless of whether my action prevents that harm, I don't to gain from it. So I quit working at Facebook even though someone else took my place. And I don't eat meat even though animals still spend their lives in cages. I still benefit from a lot of harm done around the world, but at least I have a work in progress and have regained agency in a system that discourages me from thinking that I have any.
My current short version of "why utilitarianism doesn't work": expected value (or utils) is always a bad model of the world. In reality, non-transitivity is everywhere: maximizing Rock will fail against maximizing Paper, which will fail against maximizing Scissors.
It's impossible to define a universal metric: you can either have a well-behaved utility function that only some people will agree with, or a pathological utility function that produces values that are incomparable and do not have a "maximum".
(This argument extends to singulatarian AI too. There's an assumption that an AI can be "aligned" with itself, but I think that might actually be impossible. Humans often fail at being aligned with themselves. Corporations fail worse.)