29 Comments
User's avatar
Thomas Larsen's avatar

I find this post poorly argued.

I think this whole picture involves not thinking about what the world looks like after superintelligence. It doesn't sound like you are imagining

- billions or trillions of robots doing all physical labour

- all economically relevant cognitive tasks performed by humans are now performed by AIs

- all of this is happening so quickly that humans are not able to keep up with the pace of discovery

You argue for limits on intelligence, which might be correct. But the specific limits you talk about -- like "Predicting the locations of all the lakes and rivers on earth in 50 years" are much much higher bars than the capabilities discussed above. And you obviously don't need to have an insane level of prescience of the type you describe to take over the world.

For example, the takeover story in AI 2027 didn't rely on any vastly superhuman capabilities, it only relied on basically normal politicking, industrial buildouts, and bioweapons. More generally, AI takeover / extinction arguments don't rely on anything crazy like

So the real claim you are (implicitly) arguing for is that AI capabilities will hit a wall before being able to automate the whole economy. I agree that AI takeover/extinction seems unlikely if the AIs cannot do all economically relevant tasks (including, e.g. extremely complicated economically relevant tasks like building new EUV machines), largely because AIs won't be able to be self-sufficient afterwards. But humans somehow manage to build new EUV machines (and operate the rest of the economy). So this is a vastly lower capability bar than the examples you argue against in the post.

I think you should more directly try to think about exactly how far AI capabilities will go. Start with the automation of all economically relevant tasks, and try to make specific claims about what tasks you think AIs won't be able to do. Then, once AIs cross the supposed walls, I hope you'll update towards my position.

Kevin Thuot's avatar

I appreciate the thinking behind this post, but I agree that Dean seems to be straw manning the opposing view.

In addition to what Thomas points out above, I don’t think it will be that hard for an ASI to pay or coerce humans to do the tasks that it can’t yet do for itself.

How many people would sign up for team AI if they were being offered $1 million or $10 million a year?

Karl von Wendt's avatar

Some thoughts:

On "Techno-Optimism": Like you, I believe technology has been strongly net-positive for humanity. At the same time, (almost) every new technology can cause damage that clearly outweighs its potential benefits: A knife can cut bread or kill a person. The reason why knives (and technology in general) are still net-positive is that we KNOW and CARE about their risks: we keep them out of the hands of children, don't allow them on board of planes, etc. With powerful AI, I think most decision makers either don't know or don't care sufficiently about the risks.

On the limits of intelligence: To become uncontrollable and destroy our future, an AI would neither have to be allknowing nor allmighty. All it needs to be is a superhuman manipulator that is able to take over power the same way a human dictator would, only 100x more effective. It could then make sure that we'll not get in the way of whatever goal it pursues, which is very likely not a goal we would want it to pursue. It will likely not destroy us on purpose, just as we don't destroy rhinos on purpose. We just change the world in a way that suits our goals, not theirs.

On alignment: Regardless how easy to solve problem 2 and 3 may or may not be, as long as problem 1 is not solved, we must not risk creating something that could get out of control. This does not mean we can't continue developing powerful AI - there are many safe ways to use narrow superhuman tool-AI or sub-human general-purpose AI that in combination I believe can help us achieve almost anything a superintelligent AGI could do for us.

The Ancient Geek's avatar

> To become uncontrollable and destroy our future, an AI would neither have to be allknowing nor allmighty. All it needs to be is a superhuman manipulator that is able to take over power the same way a human dictator would, only 100x more effective

where does the motivation come from?

Karl von Wendt's avatar

Seeking power is an "instrumental" subgoal: Whatever goal an AI pursues, it can't reach it if it is turned off and it can reach it easier if it has more power, which also helps it to prevent being turned off. Therefore, any goal-pursuing AI will seek power. In tests, current AIs already show this kind of power-seeking behavior. As an analogy, almost all humans want more money because money makes it easier to reach whatever goals they pursue in life - money is power.

Scott S's avatar

This is a perfectly nice essay on not being a Yudkowskian “FOOMer”. But a worldview defined in opposition to the holes in that scenario has completely failed in both imagination of other futures, and in keeping abreast of the actual contours of risks as the shape of this technology has started to materialize. Delegating those to a throwaway sentence about how it “might very well be catastrophic” is willful blindness that completely misses the point.

If a hard takeoff happens, it doesn’t matter if it happened over 10 hours or 10 years. It doesn’t matter if intelligence isn’t the ultimate bottleneck, if we’re still dwarfed by machine capabilities. It doesn’t matter that the exact course of a river is computationally irreducible, to us who are living in a flood plain.

I agree that alignment is likely to ultimately be a philosophical and political problem rather than a technical one. I fail to see how that’s any justification for continuing down this path whatsoever.

Simon Lermen's avatar

I strongly disagree with your view, and I think you make a couple of invalid leaps to arrive there. You repeatedly imply that AI must be omnipotent or omniscient to wipe us out, and then explain why that won't be the case. That is however not actually required; it just needs to be smarter and better than us. Importantly, it doesn't actually take superintelligence to wipe out or disempower humanity. For me to imagine this, I simply need to think forward to the not so distant future. Imagine you get a tiger cub, think forward to what the tiger will look like in a year and ask yourself, could it kill me in a year? Now do this with AI– imagine the future with a billion robots, AI running the military, AI doing basically all jobs with perhaps some level of human oversight. AI running the media, biolabs, political and military decisions, critical infrastructure. That metaphorical tiger could kill us. If your response is: "but there will be many AIs and there will be human monitoring, so we'll be safe," then you have shifted to a different (and very false) argument. The point is that clearly AI will be able to take over in the future if we didn't align it well by then. In reality it probably won't take that long to actually largely automate all jobs and tasks, since it's enough to get some combination of: secure power, enable actions in the physical world, get rid of/sideline humans.

Felix Choussat's avatar

"Once I realized this, the stakes of regulation were set in stark relief. Of course government cannot assume control over the development of this technology or over the firms that develop it. Of course government cannot be the ones who, in any substantive fashion, determine what constitutes “alignment” and what does not. Indeed, given how essential I expect AI systems to become to the lives and even self-expression of all humans, it is hard for me to imagine anything less American."

I find this perspective difficult to square with extremely capable AIs, such that a) the models threaten the government's monopoly on violence/the basis of civil society through their ability to cheaply produce weapons of mass destruction or b) that the models are extremely charismatic and persuasive, such that their values strongly shape the people they interact with.

Mind the Gap's avatar

The third alignment problem is where this gets genuinely hard, and I think you've framed it better than most. If alignment is an expressive act, and government cannot be the one to determine what "correct" alignment looks like, then the question becomes: what architecture prevents capture by a dominant lab, a dominant ideology, or a dominant national body setting international definitions?

The urgency of that question deserves more weight than it usually gets, including perhaps here. Malicious use, labor disruption, even catastrophic misuse are serious, but those feel reconcilable within existing technocratic frameworks. What's harder to reconcile is alignment's role in the higher risk tier: authoritarian drift, geopolitical asymmetry and terrorism capacity-building that doesn't require superintelligence to be catastrophic. These are coordination failures that make "align to whom" not just a philosophical question but a structural one with a closing window.

I've been working on this from the industry-coordination side, specifically how you build a neutral convener structure that enables the private sector to better answer "align to whom" before someone else does. Standards processes are where pluralism actually gets operationalized or destroyed: a standard that specifies structure leaves values questions open; a standard that specifies outcomes answers your third problem by fiat. Many lessons learned in leading industry coalitions through CMS-sponsored healthcare reform: competitors will coordinate around standards only when the alternative is a pending government mandate. The same dynamic is available in AI governance, but only if someone builds the right architecture before the window closes.

Looking forward to seeing what you're building. The people thinking about the political dimension of alignment are still a small group.

Amy Studdart's avatar

As ever, another excellent piece, most of which I agree with. Listing everything I agree with wouldn't make for a useful comment, however, so here's my quibble.

I think you're missing a core concept from your third alignment description: legitimacy.

Good politics results in a number of different good outcomes beyond the ones you list: accountability, checks and balances, less of a boom and bust trajectory for progress. It also results in the consent of the majority, which is not only a good in and of itself but essential to the stability that results in all of the other things that humans need in order to live fulfilling lives.

My biggest concern about the moment we're in right now is that we don't have good politics. There is no path to a political contest that results in a government with a mandate and the legitimacy to start building the classically liberal future you describe (and which I want desperately for myself and my children). And so, to me, the most urgent question in 2026 is how do we create it?

(A smaller quibble: in my worst case scenarios, AI doesn't have to take control of the physical world directly or even with intent in order to have a catastrophic impact on the world/humanity... if the impact of social media is anything to go by, all it needs to do is change the information environment in such a way that a small group of humans discover the motivation, skills, and knowledge to destroy the world themselves).

Hollis Robbins's avatar

This is terrific. I've been asked to write a job description for an "AI Czar" at a flagship university (not my own) and your piece gets at why most job descriptions fail: not distinguishing between intelligence and knowledge, not addressing the distributed knowledge problem, and not seeing that alignment is a philosophical and social issue. So, thank you, I'll be citing you and sending this around.

Hollis Robbins's avatar

Here it is -- I link to you at "Making execution cheaper does not necessarily improve knowledge production." https://hollisrobbinsanecdotal.substack.com/p/what-should-a-university-ai-czar

Kevin Thuot's avatar

I really appreciate Dean's thinking on this topic, there's not enough discussion in this vein.

However, one other weak point in the piece's argument that I want to press on is the analogy to dropping a 500 IQ baby into ancient Greece. The piece says that that, as an adult, that baby would have a very negligible impact on history/progress, implying that ASI would be similar.

I think this is a strawman argument, because I think it doesn't appreciate the inherent cognitive advantages ASI would have.

The better analogy might be: drop 10,000,000 500 IQ babies into ancient Greece and let them work together for the equivalent of 100,000 years. Would that have had a large impact on history?

The key disconnect is that Dean seems to be assuming that there will be a firewall between what the ASIs can think up in the cloud and what can happen in the real world. If that supposed firewall doesn't hold, it seems to me that whole line of reasoning collapses.

Stellan72's avatar

Also, even a single 500IQ baby in ancient Greece would imo have at least a solid shot of reshaping history, especially if they encountered the philosophical schools pushing proto-empiricism!

Odds of ultimate irrelevance do seem nontrivial if born in social position along the lines of serf or chattel-slave. "Sometimes you just lose", no matter how clever you are, if your starting situation is grim enough.

Ben Schulz's avatar

Paul Christiano's prediction of gradual disempowerment seems much more likely. The debate between Yudkowsky and him was interesting. Computational irreducibility doesn't mean an AI couldn't build the future it wants. The best way to predict the future is by molding it.

Substack Joe's avatar

Great article. That point about irreducibility is important and well taken. My mind goes to Korzybski on that point “the map is not the territory” but have also heard good arguments from economics and psychology about tacit knowledge. If the point emerges from different disciplines, I tend to think there is something solid there.

Would love if you had a mailbag mechanism, but will offer here given it is in the news: any reflections in light of the Meta verdict?

I’ve taken your position to be that the legal system is the way industry norms should emerge and harms will be addressed in the AI world. Obviously not a 1-1 with AI, but any signal there in your mind?

Lane Boland's avatar

There is a lot in this post that I agree with, but you lost me with both the baby and the planet 50 year prediction analogies. The baby analogy doesn't scale... we don't even remotely know the order of magnitude of how many of those babies there would be or how quickly they work... if 10,000,000 of those babies are doing 1,000 years work every year I suspect they would change the world quite a bit and may very well "invent all modern science" even if they did so very inefficiently.

The planet development 50 year prediction is a straw man argument. Everything humans have built to this point has been done without great insight into exactly how events in the future will develop. Planning involves assumptions that are then tested, validated, and used to revise the plan. Perfect knowledge is not required or achievable, but adaptability is. What the weather will be on my birthday next year may be computationally irreducible. But I don't need to know whether it's going to rain on my birthday to know that if it might I'll bring an umbrella.

hwold's avatar

> I would submit that there is no computational process which can arrive at the end of this natural process faster than nature itself. In other words, there is no pattern or abstraction you can create that allows you to speed ahead to the end of the process, and thus there is no amount of intelligence that gets you to the correct solution faster than nature on its own. You just have to wait the 50 years to find out. This is what the scientist Stephen Wolfram describes as “computational irreducibility.”

And Stephen Wolfram also says : "every irreducible computational process has some high-level regularities". He also believes it is an unavoidable feature of computational irreducibility (see: his interview on Curt Jaimungal channel).

In other words: economy is chaotic. You can’t predict tomorrow stock market prices. There is still things like the Law of Supply and Demand, not-completely-useless macroeconomic theories (okay, that one is debatable), useful concepts like NPV. You could have predicted : "if we go to war with Iran and they retaliate by closing the Hormuz strait, it would be a bloodbath in the stock markets".

"I don’t see what will happens in details" is true, and is not incompatible with "I have a good idea of what will happen, in the big picture". You yourself agree with that. You say "I can’t draw a map of what the mountains and seas and rivers will be after a few geological epochs". But you are still pretty sure there will be mountains and seas and rivers. You don’t know what the stock market prices will be tomorrow. You know that, barring some kind of apocalypse, the SP500 10 years from now is overwhelmingly likely to be significantly higher that today. You can even extrapolate long-run trends and have a non-completely-crazy estimate at where the price will land.

And that’s the whole sum of what Yudokwsky is saying. He does not pretend to have a crystal ball. He looks at the "macro" law : "the future is decided by the most intelligent kind of entity around, as we can see with us vs wolves and bears". And does a "macro" prediction : "right now that is us ; if we build a more intelligent thing it will be that thing ; if that thing does not care about us, then we have no place in the future, aka dead. Like polar bears soon, if we continue to decide we don’t care about climate change". This is absolutely no different in nature from "if we go to war with Iran and they retaliate by closing the Hormuz strait, it would be a bloodbath in the stock markets" or "if supply go down, price go up". It also seems, to me, a perfectly obvious "natural law" and a hard-to-avoid inference ? I don’t understand how one can disagree with that, even after reading your piece.

> The way we build better models of the world does not usually resemble “thinking about the problem really hard.” Generally it involves testing ideas and seeing if they work in the real world.

I don’t understand why you keep hammering this like it is a serious point of disagreement.

I agree with this.

Yudkowsky agree with this.

Everyone agree with this.

Except a few ultra-Cartesians that I’m not sure exists, I don’t think anyone has, ever, claimed that you can have a model of the world without looking at the world. Yudokwsky has *explicitly written the opposite of this*. He put it prominently on his "manifesto" : https://www.yudkowsky.net/rational/virtues : "For you cannot make a true map of a city by sitting in your bedroom with your eyes shut and drawing lines upon paper according to impulse. You must walk through the city and draw lines on paper that correspond to what you see."

Frankly, I think this disingenuous and insulting to go around pretending that "this is what doomers think". This is pure straw-man. We all know you are better. Please do better.

Allow me to be cheeky. That "macro-law" "the future is decided by the most intelligent kind of entity around, as we can see with us vs wolves and bears" is looking at the world and making an empirical observation. By thinking very hard and without looking at the real world, you could convince yourself "the world is too chaotic for a specie to achieve ecological dominance from just increased intelligence", which seems to be the main intuition you got from thinking very hard. Please stop "thinking really hard" and start looking at the world. The experiment has been done (a few hundred thousands years ago if you think human intelligence is mostly biological, 10.000 years if you think it’s mostly cultural). The results are out. An intelligence differential is enough for ecological dominance.

> There is information that exists within a firm like Taiwan Semiconductor Manufacturing Corporation that is, first of all, not only unavailable on the internet but literally against Taiwanese law to make public

That does not make that knowledge un-rediscoverable, on multiple levels (from public knowledge, from developing alternate ways, from bribing the right person).

> The implicit, and sometimes even explicit, argument of “the doomers” is that intelligence is the sole bottleneck on capability (because any other bottlenecks can be resolved with more intelligence)

You had a nice definition of intelligence, earlier, that says exactly the same thing. Now it’s "the doomers" that say it and they are all wrong.

Being more sample efficient means than you can alleviate data bottleneck. That what it means to be sample efficient. To go further with less samples. I can’t believe I have to write it down.

"Being able to alleviate lack of samples with more intelligence" is what "doomers" say *and* what you say with your definition of intelligence. Somewhere along the way you decide to Russell Conjugate that ? "I believe in intelligence / They disregard bottlenecks".

> What all of this means is that I am doubtful about the ability of an AI system—no matter how smart—to eradicate or enslave humanity in the ways imagined by the doomers.

There is no "canonical way to be eradicated by AI", as Yudokwsky repeatedly explains. There’s one particular story he likes to repeat *as an illustration of the more general point*. If you don’t like that story, fine (for the record : I dislike it too). But don’t go pretending the illustrative story that Yudokwsky loves to repeat to illustrate the point IS the whole point.

> Note that this is not a claim about alignment or any other technical safeguard, even if a “misaligned” AI system wanted to take over the world and had no developer- or government-imposed, AI-specific safeguards to hinder it, I contend it would still fail

I contend that it would obviously succeed. If only because people 10-20 years ago people on lesswrong thought really hard, did some thoughts experiments like "boxing the AI", debated very hard whether it would be possible to contain a misaligned, superintelligent AI, with a wide range of disagreements. Your position back then that was one commonly held, reasonably so.

And then since 2022 we’re doing the experiment YOLO-style, and the experimental result is that we’re giving all the power we possibly can to AI, as fast as we possibly can. Yes, there’s a lot of institutional and organizational inertia that makes that not very fast on an absolute scale. AI is still the fastest adopted technology ever. The misaligned AI don’t have to "try hard to take over the world". It would just have to ask nicely with a good investors pitch, and the pitch having to be good is a wild bout of optimism.

> The above argument counters Yudkowskian and similar “doom” scenarios

I am sorry to say : it does not even do a good job at exposing it. Unsurprisingly, it does a pretty bad job at countering it.

I’m summarizing your piece with : "I’m not worried about a misaligned superintelligence being created. Even if it was, it wouldn’t be able to do much, because knowledge/experimental bottlenecks and messy political world."

This, on an purely abstract level, is a somehow reasonable take. At least I hope — that’s what I believed before AI took off. We have empirical data now. We can stop speculating.

In the world of abstract imagination, Superintelligence Obviously Cannot Create Nanotech, and it’s dubious superintelligence per se is sufficient to create an artificial ecosystem, independent from human economy. Back here, in the real world, normal barely-aided humans are on good track to cracking robotics and automated factories building robots building factories. Yes, it’s not yet there, but in the same way that self-driving cars were not yet there a few years back : lots of money financing a rapidly progressing field.

In the world of abstract imagination, we would immediately, readily recognizes what qualify as a "Misalinged Superintelligence", and take responsible steps to ensure it does not take over. In the real world, we’re just cranking up the "Intelligence" dial on LLMs as hard and fast as we can, and release and "see what happens" after a few weeks of "vibe testing" (now that the evals are insufficient to "automatically rule out substantial risks").

You are going to "superintelligence obviously can't do literally everything" (falsely attributed, in a straw-man manner, to "doomers") to "superintelligence obviously can't do anything that may threaten humanity" without any kind of justification whatsoever. This is the "bounded therefore harmless" fallacy, that Yudokwsky described in 2017. Maybe you should read him more if you want to "debunk" him ?

If those are your best reasons to not be a doomer, all I have to say is "welcome", because you are one, you just haven’t realized it yet. Yes, I’m saying they are terrible reasons.

mikko's avatar

Without going into the specifics of the argument, this is feasibly just a story of motivated reasoning.

Ball's reading of AI risk does not look ideologically neutral. It looks motivated, at least in part, by a prior commitment to technological exceptionalism, suspicion of regulation and control, and a strong preference for market-driven incrementalism. That does not by itself make the position wrong! But the reader should ask whether the author is reasoning from the evidence, or reasoning toward a conclusion that fits his existing values. The essay shows multiple examples of reasoning from existing political principles, and once you see one you start to see the others.

The conclusion of the essay is fitting for a narrative of motivated reasoning: the author's relief is palpable now that the realization has set in — and conveniently, it matches his existing conservative/classical-liberalism worldview.

Eli (reading account)'s avatar

> What all of this means is that I am doubtful about the ability of an AI system—no matter how smart—to eradicate or enslave humanity in the ways imagined by the doomers. Note that this is not a claim about alignment or any other technical safeguard, even if a “misaligned” AI system wanted to take over the world and had no developer- or government-imposed, AI-specific safeguards to hinder it, I contend it would still fail. “Taking over the world” involves too many steps that require capital, interfacing with hard-to-predict complex systems (yes, hard to predict even for a superintelligence), ascertaining esoteric and deliberately hidden knowledge (knowledge that cannot be deduced from first principles), and running into too many other systems and procedures with in-built human oversight. It is not any one of these things, but the combination of them, that gives me high confidence that AI existential risk is highly unlikely and thus not worth extreme policy mitigations such as bans on AI development enforced by threats to bomb civilian infrastructure like data centers. “If anyone builds it, everyone dies” is false.

I don't get it.

Some people are world-historically skilled at managing capital, interfacing with hard-to-predict systems, organizing groups to accomplish goals, etc.

Notable examples include Napoleon, who almost conquered Europe (and did succeed in transforming it in various ways) and Elon Musk, who's currently the richest person in the world, briefly had enormous political influence, and has approximately singlehandedly outcompeted a whole industry of geopolitical relevance (I'm thinking of the graphs of US space launches compared to other countries, with and without SpaceX). Also John D. Rockefeller, Bismarck, and Augustus.

These extraordinary individuals are dramatically more capable of accumulating power and steering the world than most mere mortals. Surely, they all got enormously lucky, but they are are also obviously extremely skilled. Almost no one could have accomplished what they did.

Obviously, those extraordinary individuals had to contend with the real world of computational irreducibility. But that didn't negate their advantages. Their skill was in dealing with that real world, including all it's complexity and unpredictability.

Why shouldn't I think that machine superintelligences could be skilled in the same ways, but enormously more so?

Despite the impressiveness of these exemplar individuals, they're certainly not anywhere near the fundamental limits of how capable a being can be at strategy, management, leadership, engineering, organization, propaganda, etc.

I find it hard to believe that in 100 years, it will be possible for a human to be a Fortune 500 CEO, because AI corporations—CEOs with better judgement than Jeff Bezos or Bill Gates, that can process every information stream into the company, and make every decision with the full benefit of all that information, all at computer speed—will completely trounce any human who tried to run a company the old fashioned way. If the AIs have judgement as good as Jeff Bezos, it won't even be remotely close. They have too many other advantages.

Maybe the transition doesn't happen instantly, but why would we expect human political structures to remain in place and in power when there are one or more super-Napoleons operating on earth? The governments of Europe could barely contain actual Napoleon.

I.M.J. McInnis's avatar

I second what a lot of folks have said below. The "not a doomer" and "computational limits of intelligence" parts are extremely poorly argued. Sure, you've argued against against foom and insta-death. Intelligence is not arbitrarily powerful.

But once we get to the scenario that you yourself acknowledge, where we have "Imagine, for example, that Claude 10, in addition to being better than most humans at most cognitive labor that can be done on a computer, is also embedded into much of the critical infrastructure and large organizations in America, such that it is challenging to imagine what life would be like if Claude “turned off."" How on earth can you be confident that this is a safe situation to be in? It doesn't need godlike powers, it controls plenty of "the real world" already. And you admit that alignment is a harder question there, particularly once we start building Claude 11. Like–––how you do keep running the tape forward through history, and humans stay in charge?