Robin Hanson is our Nietzsche. Yes, I know. I, too, am disappointed that our Nietzsche isn’t a brooding, syphilis-addled visionary with the soul of a poet, who wields searing, unforgettable prose to carve human nature at its joints, looking for God’s replacement. Robin is, instead, a cheerful, neurodivergent economist with the soul of a physicist, who uses workaday language to reverse engineer human nature, looking for deals that would make everyone better off.
Nonetheless, I was struck by the comparison when listening to a series of conversations about AI that Robin posted over the last few weeks. In them, he attempts to re-frame AI doom concerns within a broader context of value changes and future fear in general. I find this re-framing extremely compelling. Whether or not you completely buy his arguments, they are strange and fascinating to think about, often mind-blowing in a beautiful way. I see this project as a kind of Genealogy of Morals for the Adderall set.
I think these conversations contain some of the most insightful and useful ways of thinking about our current AI thing. They are full of tricky meta-morality puzzles but also have direct bearing on concrete, practical issues of policy, regulation, and governance. But I know you won’t have time to watch them all. End of the world or not, we’ve all got other content to scroll through, so I’m going to do my best to paraphrase them here.
Preface
In the game Universal Paperclips, the player is the villain. Their antagonist isn’t humanity, who basically goes down without a fight, but the player’s own offspring, the Drifters. If the game can be said to have a hero it is the Drifters, not by virtue of them being “good”, but just by being different and continuing to change, by being open to value drift and thereby resisting the player character’s monomaniacal value lock-in. None of this is a claim about morality, it is simply the result of the practical pressures of game design - stories need antagonists, games need conflict, both need hooks and twists, vive le différance.
Conversation 1: Katja Grace
Video here. Katja is an AI researcher and all-around smart person whose thinking I’ve been following for years. Her blog is here. This one lays out the basics of the argument, so I’ll cover it in the most detail, but these are excerpts, not a full transcript. And in all of these, I’m editing, compressing, and paraphrasing statements, so don’t interpret them as direct quotes unless indicated.
Katja: I think it’s bad if the long-term future most of the value goes to value systems that aren’t producing anything I think is good.
Robin: The history of human civilization includes a large amount of change in values. We should expect that to continue into the future, even without AI. So, are AI fears about the speed at which these changes will take place? Or do you think that AI will make a qualitative difference in the kinds of changes that will happen?
Katja: Have human values really changed that much?
Robin: Yes. Gender, family, work, religion, fashion, cosmology, any topic where you might care about values. Values really have changed. Humans have been able to change much faster than other animals because we created and rode cultural evolution. We have a variety of different cultural practices, some succeed more than others. Successful cultures survive and spread by persisting, growing, and being copied by other cultures. That’s the major force that has made values change over time.
Katja: It seems to me like there are instrumental values which change a lot and then, underneath that, a set of core values that haven’t changed that much, values like “experiencing happiness”.
Robin: Ancients didn’t care much about happiness, compared to us.
Katja: But it seems to me like AI could have any values but humans just couldn’t, possibly because we are grounded by being particular biological creatures…
Robin: But we are highly plastic, this is the main lesson of anthropology, it shows us how extremely varied our values can be.
Katja: Ok, what’s your position? Are you just fine with anything?
Robin: I’m just trying to see if we agree that, over time, human values could change to any arbitrary degree. Then we can ask - what should we do about that? It seems like maybe most AI fear is fear of future change in general.
It seems likely that AI will inherit a lot of our values. Making them act like they have our values is the best way to make them successful as products. This is how we make humans share our values. We put young humans in situations where they are pressured to act like they have our values, and eventually they acquire our values. We are, roughly, shaped by school and cultural pressures, rebelling where we can. That’s the package that produces the kind of humans we have today, why wouldn’t the same thing happen with AIs?
Values change because as the environment changes, selection pressures change. For example, we used to have “forager values” then we developed “farmer values”. As we have gotten wealthier over the past few centuries, the environmental pressures that caused us to adopt farmer values have abated and we are partially reverting. Now we are somewhat schizo, we are hyper-farmers at work (hierarchies, authority, obedience) and foragers at home (promiscuity, democracy, travel, art).
My most confident prediction: our descendants will be adaptive to their circumstances.
Katja: I feel like there are values that represent what you want to the world to be like and then there are values about what you want or enjoy at any given moment. I enjoy eating broccoli, and I like eating foods I enjoy. I’m fine changing the type of food I enjoy but I don’t want to change the underlying value of having food I like eating.
Robin: What’s a specific example of a value change you are worried about?
Katja: Like, if AIs just want to make profit.
Robin: But current people want to make profit, that’s already a value humans have.
Katja: But I think that’s more of an instrumental goal, I think what they really want is for other humans to like them, and that sort of thing.
Robin: Well AIs will probably want others to like them, for similar reasons…
Humans today know what they do but they don't know very well why they do it. We often are trying to explain to ourselves why we make the choices we do, but we have a lot of trouble making sense of which of our goals are more fundamental vs more instrumental, it's not actually obvious to us. We don't really know whether we really just want to make profits to be liked or whether we want to be liked to make profits. We don't know whether we want to have sex to have babies or whether we want to have babies so we can have sex or if we want to have sex so that our friends will like us or we want friends to like us so we can have sex, we just don't know, basically, in a lot of our choices, what our fundamental values are.
Katja: I agree.
Robin: Which makes it hard to evaluate whether the AI is doing things for the right reasons. The AI wants people to like it and it wants to have sex and it wants to make profits, how do you know it’s wrong about its values if you don’t know what your own values are, fundamentally, and it doesn’t either?
Katja: If I imagine a world where all that happens is that some number in a ledger gets bigger, do I like that world? No. I’m pretty sure that does miss out on whatever it is I care about, even if I’m not that sure what it is I fundamentally care about.
Robin: What if this thing you are worried about with AIs is already true of you, now? Would you be horrified by yourself to discover that your behavior was roughly well-explained by your wanting to be respected and have high status? Are you currently living in the hell you don’t want the future to be?
Katja: If I picture a process that has no experience and is just trying to make a number go up, rather than someone who is having an experience that they are enjoying even if it can be accurately modeled as trying to make a number go up, that seems bad.
Robin: So now we’re imagining two agents with very similar behavior. In one agent this behavior produces an experience, and they rate this experience and want to do the things that give them the highest-rated experience. The other just looks at their behavior and computes a number and wants to do the thing that produces the biggest number.
Katja: Yeah, that seems important.
Robin: We can get this very cheaply! Let’s just add an “experience” module to every AI.
Katja: I’m not sure how feasible it is you can just add on conscious joy as a cheap module to every AI.
Robin: If the customers of early AIs care, like you do, that AIs have conscious experience than they will want to buy AIs that, at least, give the impression they do, and wouldn’t the easiest way to give this impression is just to make them have it?
Katja: I’m not sure people buying these things will think that much about the long-term ethical implications of it.
Anyway, it sounds like you’re in favor of any old thing because there’s going to be so much change anyway.
Robin: I’m trying to walk through what you care about and figure out how cheaply we could get it. What about assistants, chauffeurs, cooks, things like that? Imagine if you had the option to get those things but they would be zombies, no experience. Do you think people would care?
Katja: A bit.
Robin: How much more would they be willing to pay? I think it’s at least 5%
Katja: I think people might prefer the zombie versions, actually.
Robin: What about co-workers?
Katja: Hm… that one is tricky.
Robin: We do something similar now, we each create a work persona, at work we repress certain aspects of our emotions - spontaneity, feelings - in a sense we have evolved to become a certain kind of zombie worker because that's what our colleagues at work prefer.
Katja: OK, we keep talking about my position, and it’s not that hard to make my position sound bad, but what’s your position?
Robin: I fear that human civilization will forego large growth because we are afraid of where that might go and we will basically become a "quiet" alien civilization - one that doesn’t expand into the universe. Technology could take us to a bunch of strange places and we don't want that and we coordinate globally to prevent it. Nuclear energy is an example. We could have 10x energy use but we don’t because we chose not to go down that path.
Once it becomes possible to send colonists to another star - if we do that, then it's the end of this era of civilization-wide coordination and discussion of what changes to allow. We would then fill the nearest million galaxies for the next billion years at least with our descendants and activities, we become a “loud” alien race and meet up with the other loud aliens in a billion years and have another trillion years of experience with them. If we prevent that by remaining quiet, control our descendants to make sure they don't change much, then our civilization stays in the solar system forever and we just slowly decay.
Given those two options I'll take the loud, grabby future even though it means our descendants will be very weird by our standards and have very different values.
Katja: What about just delaying change to get more control over future values?
Robin: Seems unlikely to make a difference in the long run.
A lot of our values we have for a reason, they have a function, a purpose, they reflect our environment in some way, so we should expect them to continue to exist. But some values are “spandrels”, they aren’t a direct product of adaptation. Are there values like that that we are worried will go away?
Katja: What about enjoyment, experience, consciousness?
Robin: There’s really not much to say about consciousness. You’re never going to be able to see physical proof of it. Either things with complex behavior have it or they don’t.
Katja: What about the paperclipper?
Robin: That's not a plausible outcome, that's a cartoonish extreme, we can't imagine an evolutionary context that's going to promote that.
Think about it like this - you can imagine a hierarchy of abstraction for values. There’s liking a particular song. Then there’s liking a genre, like hip hop. And you might say “It’s OK if this song goes away, as long as people continue to like hip hop.” Then there’s music. And you might say “It’s OK if hip hop goes away as long as people continue to like music" Or you might say “It’s OK if music goes away as long people continue to enjoy something like music, something where it’s complex and difficult and impressive and allows for skill and expression, and people enjoy it.”
Where do you stop on the abstraction hierarchy? What is the thing you value? It bothers me that there is no principle on which to choose.
Katja: The principle is that whatever it is you do value you have a reason to preserve. Don’t you want the things you value to persist?
Robin: We can use an expected utility function to model an agent and infer a utility function, and inside that utility function are values. The problem is that using this model we only get relatively sharp values for possible actions that are close to the ones that were actually taken. How do you get an AI to know what your “true” values are, how do you know what they are? When confronted with distant choices in this abstraction hierarchy I can't judge based on my past actions, and I find I don't have strong feelings or intuitions about it.
My theory about this uses the concept of partiality. There are lots of different dimensions to our identity, and for some of them, but not all of them, we strongly favor one side, our side, over the other. Human vs AI invokes strong partiality in people. Some kinds of partiality can have a reason, a purpose, can be adaptive to an environment, and some can’t.
Take on- vs. off- Earth descendants. You might say “I favor Earth. I’m against people leaving Earth. You know what will happen if we let people leave Earth? They’ll go out there and get rich and then they’ll dominate the whole solar system economy and we Earth people will become a minority if we let people leave Earth, so we shouldn’t do that.” That’s a kind of partiality that doesn’t make much evolutionary sense. All the genes are on Earth right now, you can’t be favoring the relative frequency of your genes on Earth by not letting anyone leave.
By analogy: there’s a space of possible minds, and there’s a larger space it could spread out into, and, just like Earth, you’re saying “I like minds staying in this space. I don’t like the minds that could spread out into that other larger space. Those are bad minds, these are good minds. Let’s make sure that people stay here on Earth mindspace.”
Katja: Whereas you’re suggesting that spreading outwards in general might be good?
Robin: Well, think about any other axis that you might favor. Say, Star Trek vs Star Wars. You’re on the Star Trek side of this axis, this is one of the things you are partial to. In some sense, if we allow people to leave Earth then your side of the axis can go get more stuff. If both sides can get more stuff, then you’re allowing your side of the Star Trek vs Star Wars axis to thrive and prosper by going out into this larger space. And the same is true for mind space, some AIs will prefer Star Trek over Star Wars.
Katja: But you’re saying that’s wrong. In your model, as you leave Earth your mind will get warped and you won’t even care about Star Trek or Star Wars anymore. So it’s like a tradeoff, you can get heaps of resources for your descendants but you’ll be changed to be unrecognizable so that you would, looking at that thing, say “Wow, that’s of no value to me.”
Robin: I’m questioning this unrecognizable part. This is key. We are making AIs in our image. ChatGPT does care about Star Trek vs Star Wars, we made it care.
Katja: It’s able to pretend it cares, briefly.
Robin: But it’s descendants will care. That is, I think that future AI will actually care about things, they will have real values, they will have real experiences, and they will inherit our issues. Whatever our debates, like Plato vs Aristotle, people have been arguing about that forever. AIs will continue that fight onto infinity, they will honestly, emotionally, vividly feel it, they will carry these fights forward, they won’t ignore them all or forget them all, anymore than the humans would who would be there instead. That is, there is always a risk that any of our descendants, human or AI, will forget about some, or all, of these issues that are currently important to us. That’s just the risk you take by allowing your descendants to exist. That’s a risk our ancestors took in allowing us.
Katja: And you don’t think we should try to have less of that risk?
Robin: That doesn’t seem feasible or fair.
Katja: What do you mean fair?
Robin: Why shouldn't we let our descendants choose what they become, like we choose who we want to become? To use the analogy of children, we raise them to be like us, but we also let them choose some aspects for themselves. There’s a tradeoff between us trying to control the future and the future controlling itself. Say, in 1960 we had tried to control the computer industry and set out ahead of time what computers could and couldn’t be like, and made sure that all future computers could only be like the computers that we, in 1960, decided they should be. That seems bad.
Katja: But they did build the computers that they wanted at the time.
Robin: Right, but they didn’t try to constrain all future computers. And they didn’t say let’s wait and think about computers for 20 years before making any so that we can be more certain that there will only be the kind we want. And that probably would have gone badly. If they had just thought about it for 20 years they probably wouldn’t have come up with much. For a lot of things you need to do them to learn about them, just thinking about them without doing them doesn't teach you much.
Conversation 2: Zvi Mowshowitz
Video here. Zvi is a Magic: The Gathering pro turned rationality-adjacent pundit, if you described him as “the Yu-Gi-Oh Nate Silver” you wouldn’t be far off. Zvi thinks it’s very likely that AI will destroy all value in the universe and he really doesn’t want that to happen. He’s cool and smart and I like him. His blog is here. Because these later conversations cover a lot of the same ground, I will attempt to just include the new ideas that come up.
Robin: So, what do we think would happen, over the long-term future, without AI, and how different is that from a future with AI?
Zvi: Very different.
Robin: I assume you agree with me that humans have changed a lot over millennia and they would probably continue to do so.
Zvi: I think we disagree about how humans from the past would view the changes we already have. I think some past humans would view our world as very bad. Especially ones with strong religious beliefs. They might think any change is bad. (And actually, I do think that if you look at the broad set of possible changes, all the possible re-arrangements of atoms, most changes are bad. So I have some sympathy.) But I think if you took the average ancient Greek, or the average person living in ancient Egypt, and you brought them into today’s world, they would, after a temporary shock, see a lot of good in the changes we made and lot of value in our world.
Robin: Maybe we should separate capacity changes that are increases in power from changes in values and attitudes.
Zvi: The person from the past would most likely see that we live longer, have less disease, nice houses, and so on, and approve of all that. And when it comes to the values and attitudes, I think it would be a mixed bag. I think most people from the past would, on reflection, endorse most of our values and attitudes. Take slavery for example. I think, on reflection, they would adjust their values and endorse the change. What were the big changes from ancient times to now, and why would people, on reflection, disapprove of them?
Robin: OK, here’s a list: No longer worshipping their Gods, no longer respecting their family lineages, no longer treating their elders with respect, talking back to authority, being more self-indulgent, caring more about romance vs family devotion, being more promiscuous. In addition, many of them would be bothered by the great way we command and destroy nature.
Zvi: Most of the those things were things that were necessary to maintain economic prosperity and social order.
Robin: Yes, some features are adaptive to their environment. Do you just want the features that are adaptive, or do you just want the features you like?
Zvi: It’s hard to decide which is which. Like, much of ancient religious belief was sort of transactional, you worshipped Apollo for practical reasons. And if you brought someone like that to our time, and they realized there was no practical reason to worship Apollo anymore, they would be fine with it.
Robin: But many attitudes towards marriage and family and children and nation and war and jobs and life and death, most peoples attitudes towards those things aren’t functional, in their own heads.
Also, another claim I want to make is that arguments are a relatively weak source of changing values. As the world changes, people’s attitudes tend to change, and we tend to just go along with our world, very seldom is argument the important force.
Zvi: Not for me, though!
Robin: I’m trying to break down what people object to - is it future values, the world those values reflect, the speed at which the change occurs…
Zvi: If you offered me a galaxy-wide grabby civilization with the values of, say, 2000 BC China, I wouldn’t be super-happy, because those aren’t my values, but I would be mostly fine with it. There might be some past cultures that I wouldn’t say that about, but I think for the most part that is true of most past human civilizations. There seems to be something consistent about how humans have adapted to their environments over the course of history that I’m broadly OK with.
Robin: So, is the idea that, since most random pasts have a positive value, we should expect most random futures have a positive value as well?
Zvi: I think this is due to the specific features of physical/biological humans and how they function.
Robin: Do we agree that most people, if they could see the future of a billion years from now would initially be repelled by it?
Zvi: Yes, most people would say it would be better if we could have gotten the additional abundance and prosperity but kept the values closer to what they were.
Robin: It seems most people would choose to prevent large future changes if they could.
Zvi: I think that if people could have seen the world in which we didn’t prevent nuclear energy they would recognize it as good and see that that was a mistake.
Robin: It could be that most strange futures people would, if given enough time, adapt to and grow to approve. But they disapprove of them when described abstractly.
At the moment, the economy is doubling roughly every 15 years, a few hundred years ago it was doubling every 1,000 years, 20,000 years ago it was doubling every quarter million. Is our current world not doing adaptive change? I would say that our ability to adapt is one of the limits on how fast change can happen. I don’t think change will happen faster than we can adapt.
Zvi: Who’s “we”?
Robin: Whoever are the dominant actors in the economy.
Zvi: I think our ability to adapt to the rate of change is currently borderline.
Robin: I think the economy could double every year and we’d still be adapting. I think most of the changes that happen today are happening in the background, out of sight of most people. The changes that were happening 100 years ago were more visible so the same rate of change was more socially disruptive, people moving from the country to cities, cars, etc…
Zvi: I see a lot of signs that we aren’t handling current rates of change very well.
Robin: A simple story about the world’s reaction to AI is that usually we don’t see the future with much clarity so we don’t react that much to it and change just happens and we adapt to it. But recently we saw this AI demo and that demo let people vividly imagine a very different future and that bothered them.
Zvi: I divide these concerns into two large buckets - practical issues like losing jobs, distribution of wealth, etc. vs existential concerns - worlds in which their won’t be biological creatures, worlds in which all the atoms are re-arranged by some entity according to what it wants, worlds in which control of the future is no longer determined by biological humans.
Robin: I would phrase that as fixating on a particular partiality, a particular axis of difference, which is biological humans vs other things. That seems to me, in fact, where people go when you push them on the topic. It’s us vs them.
Zvi: Yes, most people would strongly worried about the idea that we will create AI entities that aren’t carefully sculpted in terms of their priorities and values and what type of minds they are…
Robin: Seems to me that they are holding these AIs to very different standards of conformity to current values than they hold their children, grandchildren, and descendants. In regards their biological descendants they are much more accepting of them having different values, different practices, different attitudes. With respect to AIs they demand a much higher standard of allegiance, and conformity, and similarity.
Zvi: Yes, I do demand that, and think it’s good to be partial in this way. If we were broadly indifferent to all the different ways that the universe could respond to different kinds of optimization pressures, that would be worse.
Robin: Great! I want to plant my flag here with a tentative disagreement. Because I want to argue that this sort of partiality is not adaptive in a very precise technical sense. And therefore maybe we should, on reflection, embrace it less than we do.
I agree that we must always have some partialities. Evolution, cultural and genetic, doesn’t really work without some degree of partiality. Each version of the population needs to be, in some way, promoting things like itself in order for the world to function. In other words, it’s too hard for us to just uniformly promote everything, we need to have a focus.
Zvi: Yes, I think it’s reasonable to have this partiality. If my grandchildren were different in the way I expect future AIs to be different I would be very confused and upset.
Robin: So you are arguing that there is a correlation between this axis and other axes we care about, that if we allow variation on this axis we will allow changes on these other axes we care about, this axis is a good proxy for those other ones.
Zvi: Yes.
Robin: I have a theory, ahem. (Sorry, Robin didn’t actually say “I have a theory” and clear his throat at this point, that was just me, Frank, making a Monty Python joke.)
I want to walk through some examples of what I call “infertile factions.” A faction is one side of a partiality axis. An example of a fertile faction is one that, if you promote it, you are promoting your genes or memes. So we could say promoting your family, your geographic region, your race, those are all fertile factions. If you promote your geographic region it will, on average, contain more of your genes or memes, so that will promote those aspects of you into the future. You might say “It’s hard to know what I want, but if I have a theory that predicts I want this, that might give me some reason to endorse it.” So, I’m going to say, at least in respect to these fertile factions, “I have a theory that says I should be partial to those factions, and I am, so I might be willing to retain those partialities.”
Zvi: Ok.
Robin: Now, let’s walk through some not-so-fertile factions. One might be generations. Promoting your generation, at the expense of future generations, doesn’t promote your genes or memes. It’s not the sort of way you can make things accumulate in the long run.
Likewise, Earth humans vs Space humans. We could say, because there are no Space humans yet, we could win this contest right now by preventing anyone from leaving Earth. If we prevent anyone from leaving, the Earth humans win and the Space humans lose, yay us.
Zvi: Just to be clear, I like Space humans.
Robin: Nevertheless, does this make sense as a partiality? Is it a fertile faction?
Zvi: It does make some sense to worry about people leaving Earth and us having no control over how they evolve and then eventually they return and dominate us because they became more powerful but gave up music and art and all the things we care about.
Robin: The core question here is: do you identify yourself with the particular features of an Earth human, or do you identify yourself with a genetic lineage that could have descendants in space and then they would count towards your success, even if they are somewhat different from you. That seems to me a key choice. My intuition is that a creature whose main loyalty is to its lineage will just win out over one who is loyal to its current configuration. The one who is willing to adapt is, by definition, more adaptive.
Zvi: We should ask how much control over future change we have.
Robin: I would say we have limited control over future change, we have a limited budget and have to be careful where we spend it.
I think bio vs artificial is like Earth vs Space humans. If we are going to have non-bio descendants, and are willing to embrace them, they can be our allies in helping our lineage to grow if they’re allowed to be one of us. If we declare them as not one of us then they can’t.
Zvi: They will never have our genes.
Robin: But they could certainly have our memes. They could inherit some of our design features and some of our key choices, between, say, democracy and autocracy. They could inherit those choices and perpetuate them.
Zvi: We don’t know how to do that.
Robin: Certainly our AIs will be more like us than the AIs created by an alien race.
Zvi: Maybe not. If we botch it, maybe they won’t.
Robin: If we are in an evolutionary context where evolutionary pressures are simple and strong enough that there is a unique answer then evolution will produce that answer and the origin point won’t matter. In our current world it seems like that’s not the case, we seem to have legacies that get inherited. For example, the location of cities. The choice of computer languages, human languages, many things like this are standards that get locked in. Evolution seems complex enough to have multiple equilibria and that’s where our heritage will be, that’s why we can have a legacy.
Zvi: That’s a strong argument for thinking about what legacies we care about and how to try to lock them in.
Robin: If you try to lock in legacies too early because you want to resist change that can inhibit growth and improvement along many other dimensions, that’s a key trade-off we’ve always faced. That’s why we’ve allowed so many changes, because places that don’t allow change get out-competed. So, unless you control the whole world you have to be wary about trying to lock in temporary features that you like.
Zvi: Yes, that’s a trade-off.
Robin: And that’s been the driver of change through most of history, and the question is dare we throttle that engine?
Zvi: Dare we let that engine run?
Robin: But at least we know what happens when we’ve seen this engine run.
Zvi: There is a distinct difference here, we are considering running a very different kind of engine if we make AGI.
Robin: Many people say “allowing AIs to evolve now would be very unwise until we figure out how to control their values indefinitely into the future,” and I find that implausible that we will ever be able to do that. I think the sort of creatures that you could ensure that their values would never change would be greatly impoverished and hindered in their abilities to evolve and grow and become interesting new things.
Zvi: In my model of biological humans, we have slack partly because we happen to enjoy actual surplus which we spend on slack. We also enjoy slack because what looks from the outside like wasted energy, some amount of that is necessary to actually get the training data and experiences and is valuable.
Robin: You’re thinking of play. And we often think of play as some sort of waste, an extra, leisure that we have to pay for and competition would make us no longer have it. But a lot of play is necessary for work in the sense that you have to try and experiment and explore.
Zvi: I think that increasing optimization pressures cause things to become more similar.
Robin: I’m confident that an increased economy allows for greater variety, complexity, legacies, and slack. A larger, higher-capacity economy will have more legacies. An argument for AI is that it will allow a larger economy with more agents, more variety, more ways to interact, more legacies, more lock-ins, more kind of activities…
I think you should be surprised by the diversity of the biosphere. I think if you had, just on fundamental grounds, said “what sort of creatures would evolve on a planet where sunlight comes down and there’s nutrients around that you need to crunch them?” you might have said, “well there’s probably one optimal way to do that, there’s going to be a certain organism that sits on the surface of the water and takes the sun and chunks it out, and there’s just going to be a convergence such that there’s just the one organism that survives.” And that’s really quite at odds with reality.
Zvi: Why do you think that’s true?
Robin: Once you look at the high-dimensional space of possible interactions between creatures and the different kinds of places they could be, you start to see all these different places for specialization and legacies.
Zvi: Where humans live and have applied a lot of optimization pressure the variety goes way down.
Robin: Well, human needs are simple so we tend to crank up whatever is useful to us at the expense of the rest.
But, in human culture, bigger nations, bigger cities, bigger firms, have more innovation per person, and that’s not true for biological species. That shows you that there’s a lot we don’t understand about what causes more diversity and variety in the long run.
Zvi: Could we create a period of time to research these questions and develop the ability to reason about them and get more control over future values?
Robin: Let me highlight a key fact - in the past we’ve lived in a competitive, decentralized world, where nobody has run the world, and therefore we’ve had to accept the fact of decentralized, competitive choices. Even if we know a lot about how to manage our evolution and change, in order to take advantage of that knowledge we will have to institute some form of global governance to enforce it. And that’s a thing we aren’t very experienced with and has enormous perils.
Zvi: I broadly agree with you, but in order to avoid the disaster of an AGI that destroys everything good about the universe, I’m going to bite this bullet.
Robin: In our current world, humans aren’t very well-aligned. We don’t trust other people that much, so we make them compete, and we check up on them. It’s not perfect but we’ve survived. I expect roughly the same thing to happen with AI. The worse you think the current world is, the more you should welcome AI, because it will probably be better.
Zvi: To be clear, you think the alignment problem is easy in the short run, in terms of getting roughly good enough adoption of desired behavior for near-future AIs, and impossible in the long run, in terms of guaranteeing that they won’t eventually evolve and radically change and go to very strange places.
Robin: Yes. And if we fail to control their long-term behavior I don’t care that much, because they’re my descendants, and just like you might not mind your grandchildren betraying your instructions to them and doing something different with the property they inherited from you, they’re inheriting our civilization from us, and the more you identify with them the more you want them to have the freedom and power to do what they want.
Zvi: When I think about a literal grandchild squandering the fortune I left them and doing something I wouldn’t want them to do there’s a broad range between “I’m sad they did that” and “No, I wish you’d never been born and I burn my legacy in a fire.” How bad would it have to be before you wish we hadn’t built AI?
Robin: If I would have preferred that we just live another X years and went extinct, over what they did.
Zvi: What’s the most plausible version of that you can imagine? What about the paperclippper?
Robin: I just find it very hard to make these kinds of value judgments. That’s why I struggle to make them.
Zvi: But there are some futures where there just isn’t enough legacy, or complexity, or interestingness in the AIs that you wish we’d never made them.
Robin: Well, if, contrary to my expectations, there just is no complexity, there’s just vast uniformity going to the horizon, that would be pretty disappointing. Because that’s what the current universe looks like to me, and I’m hoping for something better.
Zvi: “I claim this for AI” doesn’t seem much better.
Robin: Yeah, but if they make something interesting out of it, with a lot of complex parts interacting, then that looks interesting.
Zvi: The funny thing about this conversation, and the one you had with Katja, is that the things I’m usually trying to convince people of, you already believe. If most people believed what you believe about how things are going to change with AI they would be against it.
Robin: Well, I think that’s because most people, if they understood how strange the future is likely to get, would just generically be against all change.
Zvi: And you are confident they’re wrong?
Robin: I know I want to bet against them, that’s not quite the same as feeling confident, but at some point you have to make a bet. Had our ancestors made that choice we wouldn’t be here. We are here because our ancestors didn’t know how strange things were going to get, and didn’t coordinate to prevent it.
Conversation 3: Scott Aaronson
Video here. You are more likely to know who quantum computer researcher Scott Aaronson is than who I am. Scott is one of the smartest people on the planet. His blog is here. I’ll try to make this one as short as possible.
Scott: I think the key factor is the rate of change. If you imagine each generation changing by 1% then people are Ok with that, even as it eventually grows into a very different future. But if you imagine an abrupt takeover, like an alien invasion, or the contact between homo sapiens and one of the previous species that we interbred with a little bit but mostly wiped out, that’s much more concerning. Partly because there’s much less opportunity to influence the thing that is replacing us.
Robin: So some people say they don’t care about the rate, it’s about where it eventually goes. But you’re saying it’s about the rate. So maybe it’s not about AI in particular, it’s just about slowing down change in general.
Scott: I don’t know, it’s hard for me to imagine your hypothetical scenario, where technology advances but never in the direction of AI. Maybe because I’m a computer scientist and I feel like every machine wants to be a Turing machine. It’s either a universal Turing machine or else it has been somehow temporarily prevented from becoming one.
Robin: [introduces partiality framework] Along some dimensions we feel especially partial. In our society we’ve developed some strong norms about suppressing certain kinds of partiality - towards your race, your gender, your income level or profession, things like that, even though some of those things are things that genetic or cultural evolution could select for. Even non-neurotypical variations, some people really think differently and people are often partial about that dimension, they are somewhat put off or even horrified by people who think substantially differently. And AI is a dimension of potential partiality and we can ask to what extent do we want to be partial along that dimension, as opposed to other ones.
Scott: Yes, this line of thinking goes all the way back to Turing, in his paper on the imitation game he was very explicit about this. People have made the point that he was different from other people in sexual orientation and plausibly driven to suicide because of that, and that may have influenced the way he thought about these things. He was making almost a moral plea in his 1950 paper - if an entity is indistinguishable from something you’ve already regarded as conscious then by what right do you regard it otherwise.
Robin: That’s an attempt to eliminate the differences in some sense, but what if we accept some differences, say they are going to be different in some ways, but how much do those differences matter is the question?
Scott: This is a larger leap than any that humans have previously been asked to make in the past, regarding gender, or race, or sexual orientation. There have not been, at least for the past 10,000 years, multiple intelligent, language-using species on Earth, there’s been one, we have a common evolutionary history, we are all interfertile with each other. Already then, it was a leap to recognize each other’s common humanity, imagine how much larger a leap it is going to be for people to recognize moral patienthood on the part of these other entities. Again, this might depend on gradualness. Even if there is a super-intelligent, paperclip-maximizing AI that suddenly appears on the Earth, it is not going to appear into a world quite like the current one, it is going to appear into a world in which AI is already much more integrated into how we live and work, and you can imagine there are already lots of human/AI hybrid entities that are humans working with AIs in extremely tight feedback loops.
Robin: It seems a bit unfair to the AI in the sense that what we’re doing is we’re imagining the most extremely different AIs that can exist in the very far future and we’re comparing that to us now, as opposed to imagining what will AIs actually be like in the near term, and what will humans be like in the far term. Near-term AIs will almost certainly be a lot like us, and it’s only later that they would get very different.
Scott: If you literally believe in the Yudkowskian foom scenario then you are worried about the first thing because you believe that’s what’s going to happen.
Robin: If you press on the foom scenario, a great many people cave and say “well, fundamentally, the AI is just going to be different and that’s the real problem.” Foom is just something that exaggerates and accentuates the concern by taking something that would otherwise exist far in the future and pulling it closer.
Scott: Let’s be clear, the Yudkowskians are very explicit that what they’re worried about is not humans being taken over by something different but rather that they are taken over by something uninteresting, something trivial that is just making a bunch of molecular squiggles, something we would not value at all. And they give arguments for this, involving the orthogonality thesis and instrumental convergence and so forth.
I still have difficulty, and maybe this is just my own inadequacy, imagining something that is both 1) smart enough to form and execute a plan for taking over the world, and 2) uninteresting.
Robin: It’s a similar problem for future humans, among the people I’ve talked to. They think that if you go on a random walk in this high dimensional space of values and styles, if you go far enough random walking in this space, then you get to places they don’t value. They tend to assume they value on a small fraction of that space.
Scott: I don’t know what’s the relevant distribution here, when you talk about a random point in the space of values I don’t know what is the measure over possible values. What matters in practice is the distribution of values in the AIs that we are likely to create.
Robin: There’s the distribution when we create them and then what happens after they evolve over a substantial growth trajectory.
Scott: This question of value drift is interesting. There’s a novel called Plato in the Googleplex by Rebecca Goldstein where she imagines Plato brought back into the modern world and she imagines him having conversations about modern topics. Her Plato is just instantly brought up to speed with modern values - women are intellectual equals, racism is bad, etc. so you can just have the more interesting conversations. And, on the one hand, that seems implausible. But, on the other hand, part of the reason people still read the Socratic dialogues today is that there is a kind of universality to them. They talk about issues we still think are relevant 2,500 years later. Again, it might just be a question of time. If you brought Plato in the present, maybe at first he’s completely disoriented but if you let him spend a decade it would come to see ordinary to him.
Robin: Many people think this would be a conditional convergence. Because Plato is human and we’re human, that’s why he might converge to our opinions after 10 years of experience, but minds from a larger space would not. We could maybe factor the whole space here into two subspaces, one is the subspace in which things do converge from a wide range of points, and the other is the subspace where they don’t, there’s just random drift. And we ask how likely it is that AI would just drift away, when humans wouldn’t. There could be path dependence here, if our ancestors had started out differently then the whole trajectory would be different, but by the time we’re here on the path then they’ll be more of a convergence. But AI will kind of be here on the path with us, initially.
Scott: Yes, and what we see know are AIs that are trained on all of our collective output and that is why, for some definition of understand, they are able to “understand” our values. Now, the Yudkowskian position is that all of that is deceptive, all of that is just an artifact of the present stage and after a while it fooms and it becomes something that is utterly alien to us.
The funny thing is that, when I’m talking to my academic colleagues who completely reject the Yudkowskian position as a science fiction fantasy, I am trying to convince them that, no, it’s possible, there is no deep principle that rules that out. And, when I’m talking to the rationalist community, to the AI doomers, I am trying to convince them that, no, this scenario is not the only one, I can easily imagine other ones, there is no deep principle that makes this inevitable.
Robin: In order to make sense of the doomer position, you have to imagine a kind of asymmetry. You not only have to believe that AI would evolve out of the bounds of reasonableness, but that, without AI, our descendants would not, that there is some sort of core that would keep us in bounds regardless of what sort of tech they would use and what sort of cultures they would grow into.
Scott: It’s hard to imagine a civilization with evolving technology that wouldn’t eventually create computers and make them more and more intelligent and capable of doing things that they themselves don’t understand. People complain that so much contemporary innovation is in the realm of bits rather than the realm of atoms but it seems that there is something intrinsic in technology that pushes you in that direction.
But suppose we had a Butlerian jihad, then how would technology develop under that scenario? It’s hard to disentangle the effects from the totalitarian regime you would need to enforce those restrictions.
Robin: Even if you imagine a world that only has ems, digital copies of human minds, they would evolve to become strange, and there’s no obvious limit on how strange they could get.
Scott: Some things in morality, like the golden rule, seem as hard to vary as it would be hard to vary arithmetic.
Robin: Right, as a social scientist I would say - many features of human society and our minds are not accidents, they exist for functional reasons and those are reasons to expect some version of them to exist in the future. Laughter and anger and love and law and markets and language, these aren’t things we think would just go away. When people say they’re afraid laughter will go away I think - laughter isn’t just an accident. Now, future laugher may be somewhat different than ours, future love may be somewhat different than ours. And then people say, no, they really want the human versions, it’s not enough that the future loves and laughs, they need to love and laugh our way.
Scott: There have been generations of science fiction trying to envision these possibilities. But that process of imagining is partly just driven by the demands that you need to have a story.
Robin: You need characters.
Scott: So you get these far future beings who might be radically different from us in many ways, but they’re still driven by the same kind of drives we are and have the same kind of conflicts and find value in the same kind of things we do because, without that, how do you even have a story that we can read and identify with? One thing I’ve noticed is that there is almost no good science fiction about a singularity. I think the thing people are worried about is the point at which any science fiction story becomes bad because there is no story that is left to tell anymore that is comprehensible to us.
Robin: For me, the fundamental question is just - should we allow change? And I think we should just honestly engage the question. There’s a book called “Against Civilization” with a whole bunch of essays arguing that civilization was a mistake, we went too far, we were happy in the forager world, civilization makes us alienated and it was a mistake. Now, that’s a coherent position and it’s worth engaging. In the end I would reject that, but there’s a sense in which, if people could really see the future they wouldn’t like it, and that’s always been true. If people in the past could see where the future was going and were allowed to vote on it, they might have voted no.
Scott: There are 3 big examples of things we said no to - nuclear power, genetic engineering, and psychedelics.
Robin: And AI is, plausibly, another example we are being asked to consider. So I don’t think it’s crazy that people might look at where it could go and say no, because they’ve done that in the past.
Scott: In the 1970s some of the most thoughtful people around genuinely thought nuclear power was an existential risk and should be stopped. And they did manage to stop it. And I would say that, in retrospect, it is clear that that endangered the human race.
Robin: The precision that I want to bring to this question is to ask: what sort of changes do we care about? Many people seem to be saying that change along certain dimensions is OK, because these are the changes that, intuitively, we feel we are used to, but some dimensions are “us vs them” dimensions.
Scott: Sometimes the us vs them just comes down to - can we see a line of continuity. My 2-year old self, I really don’t have much in common with him in terms of interests, values. Maybe he was already a little interested in counting, in numbers, but he had no idea about quantum computing, most of what I value he had no conception of. But I identify with him because I can identify with my 30-year old self, who can identify with my 20-year old self, and so on.
Robin: AI change will be continuous. Each new version won’t be that different from the old one.
Scott: The reason people are concerned about the AI self-improvement scenario is that that seems to introduce a discontinuity.
Robin: It seems to me that the real discontinuity is that there’s a line around the human space of minds and crossing that line goes into the other space of minds.
[re-introduces partiality framing, talks about what kinds of partiality can or cannot be adaptive.]
Scott: If you’re asking me what kinds of partiality I should have, that’s a question that can’t be answered by reference to evolution, but only by some sort of moral reasoning.
Robin: But this could be a moral reason.
Scott: There’s a kind of partiality where you think your group is better, and should dominate. And then there’s the limited kind of partiality where you say I would like my group to continue to exist. You’ll get many more people, and moral philosophers (sic burn -ed), on board with the second kind of partiality. The Yudkowskian fear is that it literally wipes us all out, and of course I understand the impulse to be partial to you and your progeny not being wiped out.
Robin: Long ago I had a post called Prefer Law to Values. People get obsessed with “do they share our values?” as a proxy for “will they wipe us out?”. But if you were moving to a foreign country you wouldn’t ask “do they share my values?” but “do they have a stable legal system and respect property and that sort of thing?”
Scott: Some people would really ask “do they share my values?” or, rather, “do they have a value system that can co-exist with mine?”
Robin: But co-existence is about whether there’s a social environment wherein you respect each other’s property and actions and things like that, and that’s what law is.
Scott: But when people move, they don’t just look at law, they look at culture. Do these people have a culture where I can imagine enjoying being there?
Robin: But if you were worried about you and your family being exterminated in this country, that’s less about the culture and more about the law.
Scott: If you were a Jewish person wondering whether to move back to Europe in the 1930s, maybe the laws right now in the country you are looking at are OK, but you would certainly also want to look at the culture. Because law and culture co-evolve.
Robin: OK, but most people are overwhelmingly focused on values, when they talk about alignment they say - does it share our values? And that seems to be overemphasized relative to the question - do they respect a larger social system of law, property, governance, etc?
Scott: This might be the clearest disagreement that we’ve found. I feel that values are the fundamental thing. Basically, if someone has enough alignment with your values then you can trust that whatever laws they come up with will probably be OK, and if their values are fundamentally alien to yours, and they’re powerful enough, then you are worried that no law is going to protect you.
Robin: Let’s take large corporations and government agencies as examples - are we afraid of them? What keeps them in line, aligned, with us? One postulate might be - they share our values. Another postulate might be - competition and law force them to serve us, not because they share our values, but because this larger structure makes them.
Scott: It’s really crucial here that these are mental constructs that are made of us. You can think of a corporation as this autonomous entity that is pursuing its own interests, but it has a CEO and a board of directors, it has people who make decisions and, most importantly, can be blamed as individuals if this giant entity does something that is bad.
Robin: I would claim that if a corporation figured out a thing it could do that would be profitable, and it could get away with it in terms of law and governance and customers, that they could typically find employees to do whatever it was they needed to do. Willingness of employees to do something is not that much of a constraint on corporations pursuing their profits.
Scott: There is no principle that prevents corporations from doing horrible things, that would be foolish to deny. Everyone can see with their own eyes that that happens. But, in practice, when that happens, there are people who can be held accountable, who are responsible for those decisions, and for whom the buck stopped, or was supposed to stop. You can put the blame on those individuals.
Robin: But it’s not because they’re humans in that role that you’re able to do that. You can hold them accountable because they can be held accountable. That’s law, not their human nature. If AIs were sitting in that slot, and they could be held accountable, they would similarly have to behave. It’s about the law holding people accountable not about human nature or values.
Scott: This is a very interesting question. How will the laws hold AIs accountable? When you think about what jobs might get replaced by AIs. A lot of jobs mostly exist for there to be someone to blame.
Robin: We know many ways to do it, we just haven’t chosen which ones.
Scott: OK, (skeptical) we’ll see.
I would say this… large organizations - universities, foundations, corporations, governments, they are in a tight feedback loop with the humans that comprise them. We can think of them as doing some things autonomously, but they are not out there veering off, like an autonomous vehicle with no human anywhere near the controls.
Robin: So we agree. In the near future, if AIs stay in a tight feedback loop with the human customers and builders then that looks like a pretty reasonable future. The fear is of a foom scenario where suddenly they split off, or more plausibly, that AIs just become a large fraction of the economy and start to dominate where everything goes, and then we could talk about how much we’re worried about that.
Scott: (Talks about 5 Possible Futures for AI) I am less than maximally pessimistic. A world where we co-evolve in tight feedback loops with AI could be a good world, or at least a world our descendants would consider a good world, even if we would find it to be an extremely strange world. Just like Plato, maybe it would take us a decade to get used to it.
Robin: And if people could see that world and vote on it, they may well vote against it.
Scott: And, even if we see that world coming, we should worry about lots of safety issues and try to push things more towards the pleasant Futurama end of the spectrum rather than the dystopian end. And, at the same time, I think that we should pay attention to the possibility of a foom, even if it’s only a few percent probability, that is more than enough to devote resources to worrying about it.
Robin: Yes, in fact I even have a plan for that!
Hey everybody, it’s me Frank. I’m stopping here. Congratulations if you made it all the way (or skipped straight down to here.) 2 thoughts before I stop:
The moral realist/futurist position - I didn’t really hear anyone make a strong claim for the idea that on average things get better over time and that’s, like, a law of physics. I mean, intuitively, we tend to prefer the future, not in every case, but on average, right? Let’s say I have a time machine and can zap you to any point in human history. I pick a random time X. Would you rather be zapped to X or X plus a hundred years? And you can see how moral improvement could be a deep principle, like entropy or something: more complex problem-solving allows for both greater degrees of more high-resolution joy (and suffering) and for finding trickier and trickier positive-sum equilibria = better times ahead. (I think Agnes Callard maybe suggests something like this position here.)
Catastrophes - I think a doomer could push back on Robin by saying that what they’re really worried about are catastrophes, not drift. In addition to changes that occur through the process of adaptation and evolution, there are also these occasional “black holes” in values space. Places where a civilization or species can get catastrophically stuck. Regardless of where our descendants eventually end up, we (and they) always have a reason to be on the lookout for these and to try to steer around them. They are rocks in front of our canoe, not distant shores. That seems to better capture the actual concern most AI doomers have. At the same time, that seems to describe the scenario that Robin himself is worried about - the totalitarian anti-space brigade. They’re both examples of dead-end attractors in value space that you can drift into but not out of. So you could say they’re both worried about the same kind of thing.
Idle side-note, probably not original, just a pleasant thought to me:
When reading the remarks about sci-fi stories that turn bad once they move beyond relatable human characters, it struck me how games actually seem better suited for these scenarios than stories, as the player is their own "relatable" agent interacting with a foreign system. Obviating that need for characters. It then struck me how that's what Universal Paperclips does, and well, that was a nice mental loop back to the article.
Thanks for the transcriptions/summaries, useful and entertaining, especially the sic burn. The idea of partiality across species will stick with me, I think.
The core of Robin's argument is intriguing.
The ways in which cultures have evolved over time is a topic I have found very interesting to ponder for awhile. The contingency of our current moral intuitions and the genuine diversity of possible ways of being is something that is easy to not explore. There are two particular people I think of when considering this topic. One is of course Nietzsche, but another is the historian Tom Holland who makes the case that much of our cultural waters that we swim in (in the west) are a consequence of Christianity. In particular the very notion of the universalism of human dignity or even the distinction between the secular and non secular. It is interesting to consider the argument that Christianity's universalism may have given it some sort of evolutionary advantage allowing it such influence in the west.
So to me, the idea that the worldviews and morals of people in the west before Christianity were in some sense fundamentally different to our own, seems rather plausible. That said, I also think it reasonable that a lot has stayed the same. They still had families that they loved, enjoyed friends, felt envy and shame and pride and lust and heartbreak and fear et cetera.
Robin is getting at a fundamental difficulty of universalism when taken to its limit. At what point do we stop extending the reach of moral value? If every possible configuration of being is worthy, do we end up with a kind of nihilism where the idea becomes meaningless?
I like the idea of extending the moral value of beings beyond human. To animals and plants and things. Are they still morally less than human? Why? Because we are conscious? Do we only extend moral worth to AIs that are also conscious?
My hunch is that morality is an imperfect generalisation of the types of relationships people form with people they know. People make it up (and just because we make it up doesn't mean it's unjustified, it is just people who have the responsibility for the justification, not some absolute truth). So we shouldn't expect this stuff to generalise too well. So I don't try to think too hard about the precise universal conditions for moral valuehood.
And to some extent it can be healthy to have a limit to how general ones moral universe is. It is healthy to have some notion as to what one values.