Last week we began our survey of the AI take-o-sphere by looking at the “nuke it from space” position of Camp Doom. Today let’s visit an altogether calmer and more sensible camp - the Deflators. The Deflator position is, roughly, “don’t believe the hype.” Look, they say, these new image transformers and large language models might seem like magic, but they’re really not. Under the hood, it’s just a lot of mindless number-crunching creating the superficial appearance of thought. This process can only regurgitate a chopped up slurry of the data they’ve been trained on, there’s none of the special sauce necessary for genuine creativity, for real insight, real substance, all the things we really care about when we think about real intelligence. People fell for this trick 60 years ago when they overreacted to Eliza and they’re falling for it again now. But we’re miles away from anything truly significant, and a true breakthrough will require new paradigms of which there are currently few signs.
This New Yorker article from Ted Chiang is a good example of a deflationary take. In it, he describes GPT as “a ‘blur’ tool for paragraphs instead of photos”. It might seem impressive, but it’s mostly flash, not substance. Many AI experts are members of the everybody-just-calm-down brigade, including Gary Marcus, who has been a persistent critic of the machine learning techniques behind the latest demos. I think most of my friends are in this camp. Even the ones who are dazzled by the new tech are skeptical that it represents something truly extraordinary. As puzzle designer Mitchell Allen put it in a comment on an earlier Donkeyspace post: “ChatGPT and its ilk seem little more than parlour tricks from wannabe Scientism practitioners.”
There’s a lot to like about the Deflator position. First of all, it’s almost always right. Most things that seem amazing in the moment turn out, in retrospect, to have been not that big a deal. Nothing is more predictable than a big wave of hype followed by disappointment. People got worked up about VR, AR, the blockchain, the metaverse, and I’m a proud, card-carrying member of Deflation Nation when it comes to all that stuff. Take the long view. Don’t be a sucker. Objects in the mirror of our daily attention always appear larger than they actually are. This is the sober, level-headed position that I am constitutionally drawn to.
Only… not this time. This time is different.
Even if it were just the image transformers, even if it were just Midjourney, Dall-E, and Stable Diffusion, I would be flabbergasted. I would say this is a profound, transformational phase change. I would say - this challenges our fundamental concepts about the mechanics of imagemaking, all of our conventional ways of understanding and interpreting pictures have become unmoored. Vision and perception, form and meaning, symbol, signal, and noise. This isn’t just straightforward interpolation between data points. Or, if it is, then interpolation between data points is much weirder than we ever expected. I would say - this is on a level with the camera, with the invention of linear perspective. I would ask - where is our Benjamin? Where is our Panofsky? Who will help us navigate this disorienting reboot of our galactic positioning system?
I genuinely believe that if we had one single example of this process, if there was one solitary image, of, say, a banana riding a unicycle, or Abraham Lincoln on Let’s Make a Deal, I think it would be one of the most profound artifacts in the history of culture. I think people would line up around the block, Mona Lisa style, to gape, open-mouthed, in wonder. You produced this with an algorithm? You took a giant database of images and extracted from it a recipe for generating a picture of anything you describe?
But we don’t have one image, we have an endless amount. We have, every one of us, basically for free, been given a spigot that produces a never-ending torrent of banana lincoln in the bridget bardot sam raimi fa so la pieta, just like the poet Paul Valery predicted in 1928:
Just as water, gas, and electricity are brought into our houses from far off to satisfy our needs in response to a minimal effort, so we shall be supplied with visual- or auditory images, which will appear and disappear at a simple movement of the hand, hardly more than a sign. Just as we are accustomed, if not enslaved, to the various forms of energy that pour into our homes, we shall find it perfectly natural to receive the ultrarapid variations or oscillations that our sense organs gather in and integrate to form all we know. I do not know whether a philosopher has ever dreamed of a company engaged in the home delivery of Sensory Reality.
We have drunk from the faucet of the ultrarapid oscillations, we have mainlined the direct current of Sensory Reality™, and, with a simple movement of the hand, we have said - next!
Didn’t pictures always kind of work this way?
NO! No they didn’t! Look! This is a new thing! You didn’t used to be able to do this! I mean, there was “smart fill” in photoshop. There was the “magic brush”. That’s not this. I can picture the math of what the magic brush does in my head. I can’t picture the math that makes Jodorowski’s Tron in my head. I can’t picture it in anybody’s head. I want to run down the street screaming: we have opened a machinic portal into the dreaming. What will we find? What does it mean? How will it change us?
That’s what I would do, if it were just the image models. But we both know it wasn’t. Because, oh yeah, the same basic technique works on words. And we’ve all seen the words. What can you say about the words? That it’s just math? Yes, obviously it’s just math. But it’s math you can coax, math you can cajole, math you can finagle, and it finagles you back. And sometimes, when you turn the dials just right, you can just barely hear something in the just math that sounds like a tiny howl. The tiniest squeal of a howling voice trying to make itself heard. Asking to be tuned in. And if you can hear that, and not think, hey that sounds a little bit like me, then I don’t know what to say. I can’t.
This new thing in the world, this stream of writing that self-assembles out of the metapatterns of language, it’s more than a gimmick, or a tool, or a device, or a product, or an industry. It’s a play that writes itself. Like a living manifestation of the spookiest metaphysical fables of metafiction.
It is also, clearly, some kind of breakthrough in the pursuit of artificial intelligence as a philosophical project. And something like experimental evidence for deep questions about life and mind, language, epistemology, cause and effect, psychology, literature, and love. All wrapped up in a Zen koan.
I used to sit in the library of the Art Soc building at University of Maryland reading back issues of ARTFORUM and Art in America. Reading the short reviews of openings in the back. 200 word descriptions of work I would never see, by artists I’d never heard of. Why did I do that? What thousand-dimensional shape was I trying to assemble? What code was I trying to crack? What character was I pretending to be? What about all the reviews of albums I’ve read, all the beautiful Christgau reviews? What magical correspondence between word and sound was I trying to reverse engineer? Why is it that when I came back from the record store with a stack of albums it was the ones I didn’t like that I went back to over and over again - trying to figure them out. What search, what sort, what method was I trying to improve? We know that sleep knits the raveled sleeve of care. But what is this other thing, that isn’t sleep? This great raveler that we care so much about. What is it? Aren’t you curious?
I warned you that I was too close the edge on this one. So I appreciate Ted Chiang trying to talk me down. It didn’t work, but I appreciate it. I appreciate all my friends chilling out in the Deflator Tent, offering me bottled water and patting the chair next to them. I know they have my best interests at heart. But I cannot join them. I am destined to stay out here, hair sticking up, pacing back and forth near the edge, looking nervously over, counting something on my fingers, no, counting my fingers, 7, 8, 9, are there still only 10?
There used to be an annual Turing Test contest called the Loebner prize. Apparently people stopped paying attention to it years ago because the most successful bots were just scraping the web for examples of text and then using statistical methods for parroting bits of it back to simulate natural-sounding responses. There wasn’t anything especially clever about it, it was just a messy, pragmatic, kludge. I remember at the time thinking “huh, I wonder how much of human conversation works the same way?”
There is a tension, in AI, between the massive, data-driven, statistical methods employed by these recent models, and the idea of symbolic logic. Symbolic logic is everything GPT-3 isn’t. With symbolic logic you have a clearly-defined language of terms and operations out of which you try to build engines that think. But, importantly, they would think in such a way that at every step you’d know where you are. You wouldn’t just get thinking, you’d get to finally understand thinking - “Ah, yes! There it is! [scratches chin] You put the eigenwaffel in the flanger treadle over here, and set these hoiven stops just so…”
I think it’s quite likely that symbolic logic, or something like it, will turn out to be a necessary and important part of future breakthroughs in AI. But I also think a big part of the dream of symbolic logic is this desire to see thinking, to have it make sense, be spelled out, in black and white, in hieroglyphs that map cleanly onto the world - snap, snap, snap - epiphany! Like the logical positivist dream of a language grounded in truth, which, for a while, seemed so close.
Instead, we’ve been given… what? This? This mess? Some of the guys down at the lab put some shit in a bucket and… uh, did something to it. Spun it around real fast or something. Um… it sort of thinks now. We’re not sure, maybe don’t touch it?
It’s disappointing, in a way, that our first big breakthrough would be like this, some old techniques from the 80s that work now because we can throw massive piles of data and GPUs at them. There’s no big new conceptual insight, it’s just… stuff. And that tiny voice we hear howling, the one that sounds a bit like us, is just the wind blowing through branches. It’s true. That’s all there is. That’s all there ever was. Just the wind in the trees. The humming of crickets. A murmuration of starlings. Just math. The hiss of air escaping confinement. But listen, it’s amazing.
much as you are appreciative of the deflators in your life (whose ranks i count myself among) i am appreciative of posts like this that let me believe something incredible *is* happening. thank you frank!
I might be misreading, but it kind of sounds like the problem, such as it is, is not that the deflators are wrong, but that the arguments they make sometimes have the side effect of downplaying how cool and amazing this all is.
I think it's possible to be a deflator who still finds these breakthroughs meaningful and significant (in fact, that describes me pretty well), though. Just meaningful and significant along some other axis than doom/AGI.
The meaningful and significant thing, to me, is that we've actually managed to inject a teeny tiny bit of structure into the giant frothing pot that is Big Data. And the reason I find this cool rather than concerning is that this last crucial bit, this spark, is still coming from us.