Thanks to everyone for all the kind words and support on my first week. It means a lot.
In other news, I wrote a piece for The Hill yesterday arguing for federally funded AI research facilities (data centers). This is an area where I think a small amount of government spending can yield large returns.
Today’s post is a little more philosophical than what I usually expect to write. Still, though, the questions explored below guide my overall thinking on AI and policy, so I think it’s useful to start here.
Though the heat has dissipated a bit, AI risk is still front of mind for many policymakers, AI researchers, journalists, and attentive citizens. Eliezer Yudkowsky, a wide-ranging autodidact, rationalist, and longtime AI commentator, has said that “the most likely result” of continuing AI research is that “literally every human on Earth will die.” Senator Mitt Romney said in 2023 that “I’m in the camp of being more terrified about AI than I am of the camp of those thinking this is going to make everything better for the world.” Sam Altman, CEO of OpenAI, has said that the worst case for AI is “lights out for all of us,” though it’s important to note that he’s also said such a scenario is highly unlikely.
I can’t predict the future, so I won’t say these eminent figures are flat-out wrong. But I do think the AI safety discourse can benefit from some grounding. Thought experiments are tempting in this field, as are nonsensical extrapolations and metaphors stretched past the point of usefulness. Because we don’t robustly understand the nature of intelligence itself, discussions can quickly become nebulous or abstract. (For instance, many AI worriers describe systems with qualities such as infinite or superhuman persuasion—does this really exist?) Details matter, and it’s important to stay grounded in the facts as we understand them. Let’s dive in.
There are several categories of risk, often broadly dichotomized as “existential risk” (X-risk) and “near-term risk.” The near-term risks are important and require close attention. Indeed, I do not want to suggest that there is no danger with AI—quite the contrary. But I’ll start with the elephant in the room: whether AI is going to doom humanity.
The scary stuff
First, let’s cover the scary stuff. The stated goal of the top AI labs is to make systems that are “superintelligent.”This is envisioned as a step change over even the smartest humans, similar to the leap that humans made over chimpanzees and hominids. When homo sapiens made this leap, we quickly became the apex predator everywhere we went, leading to mass death of other species. The birth of superintelligence, many reason, could result in a similar evolutionary outcome for humans. So, how’s that superintelligence project coming along?
Deep learning, the broad approach underlying virtually all successful AI models today, has improved rapidly in the past decade, and we don’t really understand how it works. For simplicity’s sake I’ll focus here on large language models (LLMs). In broad strokes, an LLM like GPT-4 (which underlies OpenAI’s ChatGPT) uses matrices and linear algebra to create multi-hundred-billion-dimensional “maps” of most of the text ever written by humanity. We know a great deal about how to improve model efficiency and performance, improve datasets, and enhance capabilities through prompt engineering. But we lack a grand theory of what makes it all work. It’s similar to how we discovered steam as a source of energy long before we understood much about the science of thermodynamics.
Ambitious efforts have been made to further our understanding of these systems (a field known as interpretability), and important advances have been made in just the past few months. But those advances have lagged the capabilities of frontier AI models, and we are nowhere close to understanding the inner workings of something like GPT-4. That means we don’t understand why it “lies,” why it sometimes memorizes the text it is trained on (the basis of the New York Times’ recent lawsuit against OpenAI), or how to ensure it is robust against attacks.
Crucially, this also means that it is challenging to predict the specific capabilities that next-generation models will have. Large neural networks are, in humans and in silicon, what the economist Friedrich von Hayek called spontaneous orders—systems which have genuine order but which no one consciously designed.
We know that large language models exhibit emergent capabilities. GPT-4 can draw rudimentary figures based on its understanding of the world derived from text alone. LLMs have literal world models in the sense that they seem to have an inner map of the world. An early predecessor to ChatGPT, trained to predict the next character of Amazon product reviews, ended up creating—for itself—a then state-of-the-art sentiment analysis system. Though no one told it to do so or imagined that it would, understanding the emotional content of the product reviews ended up being useful for the model.
So, we know that LLMs can develop capabilities the creators didn’t anticipate and that we don’t understand why or how that happens. We know that LLMs represent their training data in fantastically complicated ways that we do not know how to unravel. We also know that an LLM’s internal activities can be isomorphic to (having the same structure as) human language processing (h/t the wonderful Sam Hammond).
There is some speculation that LLMs may hit a wall—it’s happened many times before in AI. Google’s latest frontier model, Gemini, was reportedly trained on five times the computing power of GPT-4 yet is only marginally better on standard performance benchmarks. Perhaps this will lead to a natural pause, where the frenetic pace of bigger models can slow down, and the only way to improve LLM reliability and usability will be to make advancements on understanding exactly what’s going on under the hood.
But probably not. New architectures, such as state space models (SSMs), are now coming into view. Without going into too much technical detail (check out Nathan Labenz’s fantastic podcast episode about them for a deep dive), these models may give LLMs something like a long-term memory. Right now, every interaction you have with a language model, as far as the model knows, is the first time it has spoken with anyone. SSMs promise to enable longer-term memory by allowing the model to “model” itself. Where this may take us, no one knows—but it starts to sound a lot like consciousness (another concept we struggle to fully grasp).
Other approaches seek to model the space of all possible actions given a task like “grow my Substack audience” or “design a supervirus.” They employ a search algorithm that can find the best actions to take like a lightning bolt finds the spire of a skyscraper (this may be what OpenAI’s rumored “Q*” breakthrough is driving at). And just last month, DeepMind showed that LLMs, with some other technical infrastructure, can successfully search through vast problem spaces to discover new mathematics. Such a system could also be used to, say, automate the process of AI research, resulting in recursively self-improving systems. Now imagine some gnarly combination of all these things in one system. Yikes.
Nonetheless, I do not lose sleep over the idea that such a thing would want to, or be able to, “doom” humanity.
The Good News
First, and most fundamentally, I believe the AI worriers overrate the concept of intelligence. Many of them are quite smart themselves, so it’s an understandable bias. There’s no reason to think that humans are nature’s upper bound on any capability we have: We have already created machines that can move faster than us, are stronger than us, and indeed, are smarter than us in some ways. I don’t dispute the notion that AI systems that are superior, or at least comparable, to humans in yet more dimensions are coming soon.
One of the biggest areas of focus within the top AI labs right now is planning. LLMs may be able to describe plans, but don’t be fooled: they can’t actually develop detailed, feasible plans and execute them successfully. The most successful planning AI system is probably still DeepMind’s 2017 AlphaZero, which exhibited superhuman performance at complex board games like Go. Go was considered a particular challenge because of the vast problem space: there are, famously, more possible configurations of a Go board than there are atoms in the universe.
One way to think about general-purpose AI superintelligence would be: what if you could have an AlphaZero for everything? But there are obvious differences between Go and life. First of all, no matter how many configurations of the board will be, they are still all perfectly known when the game is being played. Second, the number of potential moves is bounded by clear and unchanging rules.
None of these things are true in the real world. In the real world, the state of the “board” is never fully known, and if there are rules at all, they’re certainly subject to change. For any complex set of actions affecting many people, the space of possible outcomes at every step becomes nearly infinite and there are few hard rules to guide the search for the best action.
It’s not just us pesky humans that cause this problem. Nature is fundamentally probabilistic and the arrow of time points always in one direction: forward. This predicament is the foundation of the human condition and, I suspect, of the universe itself. I am suspicious that the answer is simply “make a better algorithm” or “add more compute.” I am suspicious that there is any answer at all.
Put somewhat more prosaically, Hayek’s knowledge problem will persist. People who think it won’t—who believe that AGI or superintelligence will simply solve the knowledge problem—haven’t thought enough about the knowledge problem. Many of the AI worriers never took the knowledge problem seriously to begin with—after all, the seeds of this intellectual movement are in the online rationalist community.
Many AI worriers will respond to this argument by saying that I am simply dismissing the concept of superintelligence. I am not. I am simply saying a)that I do not believe humans can create God (for reasons mathematical, thermodynamical, and theological) and b)superintelligent AI will encounter the same problems that flummox all humans, including the geniuses: highly imperfect information, uncertainty about the future, and scarce resources.
The world isn’t like a video game, where the entity with the greatest “intelligence” simply wins by brute force. Nor is it like Go, with clear, unchanging rules. The world isn’t even like it was in prehistoric times when humans conquered the earth. Because of human ingenuity and intelligence, we have built vast industrial, legal, social, and economic apparatuses that govern the resources we care about most. To achieve a feat like conquering the world, a superintelligent AI would have to interact with those things. Physically harming humans requires command of physical tools—drones, explosives, guns, viruses, bacteria, and the resources required to make those tools. At some point in the process of gaining such control, humans would notice. Smart as they may be, AI systems run on computers—vast networks of computers in centralized data centers. We can destroy these data centers, if need be, in ways not much more sophisticated than we used to wipe out our hominid ancestors. And we will probably be more sophisticated than that, because we will have our own AI to fight this hypothetical superintelligent AI.
But I don’t think it will come to that, or anywhere particularly close. Humans are prone to violence because we evolved under do-or-die evolutionary pressure. AI, on the other hand, will be subject to an altogether different kind of evolutionary pressure: the pressure of the marketplace. Humans will not choose to use unreliable or unsafe AI in situations where reputations and money are on the line. Behind all the hype, businesses are struggling to use current LLMs because of their propensity to hallucinate and otherwise act in unexpected ways. These systems are only valuable to the business world insofar as they are both powerful and predictable.
The entire premise that an ultra-high-intellect AI will necessarily want to dominate the world is faulty. Indeed, even within humans, I’m not sure that intelligence predicts power-seeking or violent behavior. As Yann LeCun, the Chief AI Scientist at Meta, has pointed out, we don’t see the smartest people greedily amassing power and resources at scale, and the world isn’t run uniformly by the very smartest people.
Then there is the simple fact that extraordinary claims require extraordinary evidence. The AI doom scenario has nothing of the sort. It is not based on any empirical observations, nor on any robust theories for how AI would go awry in the ways doomers posit. If any such theories do exist, I would encourage readers to send them my way. If convincing evidence is put forward, I’ll gladly change my views on AI—as would almost everyone. But it would be unwise, to say the very least, to halt development of one of our most promising technologies based purely on speculation.
I don’t dismiss the AI doom arguments completely; I just see it as extremely unlikely. With that said, the facts I outlined above are what they are: We don’t understand fully how large AI models work and we may be on the cusp of giving AI the ability to recursively improve itself. They call it “frontier AI” for a reason—this is new turf. For that reason, I wrote papers over the summer advocating for the federal government to be sure it knew the locations, personnel, and broad activities of the very largest AI data centers. I was happy to see a similar provision in the Biden administration’s October executive order (though I hope the Commerce Department will raise the compute threshold over time). Though the risk is miniscule, we want the federal government to be able to act quickly if something truly disastrous happens.
Along these same lines, many AI doomers advocate for a domestic or global “kill switch” that would allow the government to shut off AI. Beyond obvious implementation questions (Which governments? In the US, what branch of government? Is that constitutional? How would such a thing technically work?), I oppose these proposals. For one, a simple kill switch would be subject to political or diplomatic pressure. If the government really wants to take such a drastic step, it should be willing to exercise its monopoly on violence to do so. Furthermore, if it were even possible, such an extreme level of centralized control is worrisome on its own merits. Any central capability possessed by a government can be hacked. Centralization, as the 21st century has shown us many times, is often a source of weakness, even though it feels like strength.
Indeed, the only remotely plausible AI doom scenario I can imagine is one where AI research is cloistered inside of a centralized institution (and thus not subject to market pressures) and powerful AI is not broadly dispersed (so humans have no way to fight back). Ironically, this is the exact regime many in the AI doom camp advocate. Ambient, vague anxiety about the future, and actions we take in the present to combat that imagined future, has a way of bringing about the exact outcomes we seek to avoid. A recent paper found that it’s possible to get better performance out of a model by telling it to “take a deep breath and think about this problem step-by-step.” We would be wise to do the same.
Thanks for writing this. Sometimes it's a little unclear which doomers exactly you're responding to, but I suspect that you actually don't disagree with at least many of the most reasonable people worried about AI x-risk (e.g. Bengio, Russell, etc.).
This kind of level-headed engagement and dialogue is really valuable. If you were interested in writing more on this it could be helpful to see you respond to a particular piece. For example you write "many AI doomers advocate for a domestic or global “kill switch”" but without a link to a source or proposal I'm not sure you're actually arguing against a proposal anyone serious has really put forward (I could be wrong here, but I don't recognize this as a proposal I've heard my colleagues discuss).
Instead I'd love to read your thoughts on the governance measures Bengio et al. write about in their latest Science piece (https://www.science.org/doi/10.1126/science.adn0117). These include whistleblower protection, incident reporting, model registration, external audits, liability, "if-then" and responsible scaling policies - along with flexibility to strengthen or loosen regulations as AI advances.
There are a few other points where I feel like, by responding to an undefined doomer case, you end up taking down a strawman. For example: "The entire premise that an ultra-high-intellect AI will necessarily want to dominate the world is faulty." The premise isn't that intelligence will *lead to* a takeover urge. The premise is instead that an AI could have faulty or harmful goals *and* be extremely intelligent, allowing it to potentially escape human control and do massive societal damage. Stuart Armstrong has a short article on this, the Orthogonality Thesis, which might be of interest. Though I'll also say I don't think many "modern" AI risk arguments rely on this much, at least not in a very strict sense.
Finally, you write: "I don’t dismiss the AI doom arguments completely; I just see it as extremely unlikely". Again many (though certainly not all) AI risk researchers would agree with this, depending on what you mean by extremely. It seems to me, though, that once one assigns any credence to AI doom arguments, it's difficult to assign extremely low credence to them (say, less than 1 in 10,000 or something). Very small probabilities imply very high confidence, and as you say here, we just don't have enough data or certainty to be that confident here. But this leads pretty naturally to a strong argument for lots of caution and scrupulousness in future AI developments. There's lots to gain, to be sure, but to me it seems fine to do lots of risk assessment and testing as we advance to ensure we realise those gains without losing everything. Realising the benefits of advanced AI slightly later, after we develop robust technical and legal safety frameworks, in order to avoid a 1 in 10,000 (or something ) chance of losing control of our future seems sensible to me.
Interesting piece thank you.
I tend to agree. There are so many wild assumptions baked into doomsday scenarios. One of the major one’s is that AI would be motivated to kill us for some reason. Seems to be an assumption it would go sentient and be subject to the same biological/evolutionary pressures and resulting emotions/motivations as biological creatures.
It is a really strange assumption. We just do not know what creates sentience and consciousness and don’t have any good leads to find out. Could be the first transistor calculator in 1954 was conscious in some way. But that still doesn’t give it the motivation (or ability) to destroy all humans.
And the ability bit is really key imo. If we give control of the nuclear button to AI, it is obviously a threat to all humanity if it goes wrong for any reason. So maybe don’t do that. And the AI drone army — maybe don’t do that either. For AI to defeat humanity, it needs a mechanism. And there is where I think restrictive controls might be mandated.