On Recursive Self-Improvement (Part II)
What is the policymaker to do?
Continued from Part I last week.
Introduction
On the same day I published Part I of this series, OpenAI released GPT-5.3-Codex, a new model that the company claims helped to engineer itself:
The recent rapid Codex improvements build on the fruit of research projects spanning months or years across all of OpenAI. These research projects are being accelerated by Codex, with many researchers and engineers at OpenAI describing their job today as being fundamentally different from what it was just two months ago. Even early versions of GPT‑5.3-Codex demonstrated exceptional capabilities, allowing our team to work with those earlier versions to improve training and support the deployment of later versions.
Codex is useful for a very broad range of tasks, making it difficult to fully enumerate the ways in which it helps our teams. As some examples, the research team used Codex to monitor and debug the training run for this release. It accelerated research beyond debugging infrastructure problems: it helped track patterns throughout the course of training, provided a deep analysis on interaction quality, proposed fixes and built rich applications for human researchers to precisely understand how the model’s behavior differed compared to prior models.
These are the early stages. I expect the scale of automation to have expanded considerably within the coming year.
The upshot of last week’s analysis is that automated AI research and engineering is already happening to some extent (as OpenAI has demonstrated), but that we don’t quite know what this will mean. The bearish case (yes, bearish) about the effect of automated AI research is that it will yield a step-change acceleration in AI capabilities progress similar to the discovery of the reasoning paradigm. Before that, new models came every 6-9 months; after it they came every 3-4 months. A similar leap in progress may occur, with noticeably better models coming every 1-2 months—though for marketing reasons labs may choose not to increment model version numbers that rapidly.
The most bullish case is that it will result in an intelligence explosion, with new research paradigms (such as the much-discussed “continual learning”) suddenly being solved, a rapid rise in reliability on long-horizon tasks, and a Cambrian explosion of model form factors, all scaling together rapidly to what we might credibly describe as “superintelligence” within a few months to at most a couple of years from when automated AI research begins happening in earnest.
Both of these extreme scenarios strike me as live possibilities, though of course an outcome somewhere in between these seems likeliest. Even in the most bearish scenario, the public policy implications are significant, but the most salient fact for policymakers is the uncertainty itself.
The current capabilities of AI already have significant geopolitical, economic, and national-security implications. Any development whose conservative case is a step-change acceleration of this already rapidly evolving field, and whose bullish case is the rapid development of fundamentally new, meaningfully smarter-than-human AI, has clear salience for policymakers. But what, exactly, should policymakers do?
The Deficiencies of the Status Quo
Right now, we predominantly rely on faith in the frontier labs for every aspect of AI automation going well. There are no safety or security standards for frontier models; no cybersecurity rules for frontier labs or data centers; no requirements for explainability or testing for AI systems which were themselves engineered by other AI systems; and no specific legal constraints on what frontier labs can do with the AI systems that result from recursive self-improvement.
To be clear, I do not support the imposition of such standards at this time, not so much because they don’t seem important but because I am skeptical that policymakers could design any one of these standards effectively. It is also extremely likely that the existence of advanced AI itself will both change what is possible for such standards (because our technical capabilities will be much stronger) and what is desirable (because our understanding of the technology and its uses will improve so much, as will our apprehension of the stakes at play). Simply put: I do not believe that bureaucrats sitting around a table could design and execute the implementation of a set of standards that would improve status-quo AI development practices, and I think the odds are high that any such effort would worsen safety and security practices.
Thus, the current state of affairs—where we trust the labs to handle all these extremely important details—is the best option on the table. But that does not mean our trust should be blind. While some labs, such as Google DeepMind, OpenAI, and Anthropic, have all been relatively transparent about their work on many of these issues, that transparency has largely been voluntary and on terms set more or less entirely by the labs themselves.
In recent months, this has begun to change with the passage of SB 53 in California, and the very-similar RAISE Act in New York. These bills require large AI developers to document their assessment of catastrophic risk potential from their most powerful models as well as what measures, if any, they employ to mitigate those risks. Importantly, both bills are scoped to include large-scale risks “resulting from internal use of [the developer’s] frontier models” (emphasis added; quote from SB 53). Both bills also reference risks stemming from the “loss of control,” over, among other things, internal deployments of frontier models, a vague but nonetheless clear nod to one broad category of plausible risks posed by AI research automation.
Some critics of SB 53 and RAISE point out two key limitations: first, that they are primarily non-prescriptive, and thus create no substantive requirements for safeguards, security practices, and the like. The bills delegate the task of determining these details to the frontier labs themselves. Second, the laws have no mechanism for proactively verifying that labs comply with their safety and security frameworks as stated.
The first critique is perhaps most obvious, but for the reasons of epistemic humility I describe above, this is only arguably a weakness. We do not know what the optimal standards and safeguards are, and in all likelihood, it will ultimately be technologists rather than technocrats who lead the way in the codification of these standards. Thus while this lack of prescriptiveness is a limitation of the law in one light, it is a strength in another.
Given this tradeoff, however, the second limitation becomes even more salient: there is no mechanism for verifying that frontier labs are in compliance with their own plans.
A Better Way
Imagine that there were a law requiring publicly traded companies to disclose their financial statements, but no institution of auditing. Walmart could fulfill their legal requirement by reporting their income, but the public could not verify that the number was accurate. In practice, the transparency alone does not do much for assuring investors, employees, and others with an interest in the financial health of Walmart.
Of course, in the real world, we do have auditing, and it is for this reason that we collectively (for the most part) trust the numbers Walmart is required to disclose in their financial statements. It is not so much the legal disclosure requirement that creates trust, but institutions—private institutions with public oversight, in the case of auditing—that create a common sense of trust that undergirds financial markets worldwide.
It is worth pausing for a moment to reflect on this. We do not assess the health of company finances by having government regulators probe every firm’s books and operations. Instead, we have private, usually for-profit corporations who provide audits as a service. An audit is in part a verification that a company adheres to Generally Accepted Accounting Principles, which are themselves standards written by a private non-profit (the Federal Accounting Standards Board) that is overseen by a federal regulator (the Securities and Exchange Commission). And this whole apparatus, in addition to ensuring trust, is relatively cheap: though no one loves an audit (having run a non-profit that received annual audits, I can attest), the fees auditors assess are under 0.10% of a public firm’s revenue on average (source: Codex, analyzing SEC data, and also perusals of Google search results for a sanity check on the Codex analysis).
Audits are boring. They are not fun. Yet they are a civilizational accomplishment that enables many things we cherish. We should be proud of audits, auditing, and auditors; we should be proud that over centuries we invented a mechanism to establish trust where none naturally existed, and indeed where the incentives often push actors toward deception and against mutual trust.
Unfortunately, in today’s AI industry, the reality is closer to my fictitious example of the unaudited Walmart financial statements. Companies now have to disclose their safety policies, but there is no common trust that they are being followed. Worse yet, because these are policies about a rapidly evolving set of scientific, engineering, and technological frontiers, there are inevitably going to be ambiguities. How will we resolve such ambiguities in the absence of an architecture of trust?
The answer is unfortunately obvious: by arguing about them on the internet. This is precisely what happened after OpenAI released GPT-5.3-Codex, the first OpenAI model release after SB 53’s transparency provisions went into effect. Because OpenAI had already been publishing their safety policy as a voluntary commitment for over two years, nothing about OpenAI’s policies actually changed. What did change, however, is that OpenAI is now legally obligated to follow their policies—these same policies that necessarily have ambiguities, but no trusted institution to resolve them.
And so predictably, within 24 hours of GPT-5.3-Codex’s release, an AI safety organization called The Midas Project wrote a breathless thread on X alleging that OpenAI had “just broke[n]” SB 53 and “could owe millions in fines.” I am going to avoid weighing in specifically on the merits of these claims because I am advising an organization that is drafting a report on the implementation of SB 53. Conveniently, the merits are not the important thing for the purposes of this essay. The fact that this argument is even happening in such a disorderly fashion proves the point: We are trying to do the technocratic governance of high-trust societies in a low-trust environment.
What is needed in frontier AI catastrophic risk, then, is a similar sense of trust. That need not mean auditing in the precise way it is conducted in accounting—indeed, it almost certainly does not mean that, even if that discipline has lessons for AI.
This is not an original idea: Early public drafts of both SB 53 and RAISE contained provisions mandating audits for precisely this reason. But those provisions were only thin sketches. Key questions—including who would perform the audits, what qualifications auditors would be required to demonstrate, who would assess auditors and by what criteria, financial independence of the auditors, and many others—were left unanswered, or deferred to later administrative rulemaking. Ultimately, a successful policy regime must be more than an afterthought. In the end, the auditing provisions were struck from these bills, and this was probably for the better.
But a premature idea is not a bad idea. And I suspect that over the coming year or two, the time will have come for independent verification of frontier lab claims by expert third parties. These would be non-governmental bodies that could, first and foremost, verify that frontier labs live up to their own public claims about safety and security and report their findings publicly and privately. In so doing, these private bodies could assist in the codification of private-sector-led technical standards related to agent security and similar issues.
In addition, such organizations could provide tailored reports to the government (for either public or private release) on the implications of automated AI research on things like the labor economy (after all, this will be the first truly large-scale deployment of plausibly job-replacing agents within firms), organizational economics, competitive dynamics within AI, national security, and geostrategy. Because of the clear nationwide relevance of all these questions, it is optimal for these private organizations to be overseen by federal government agencies as opposed to states.
Much of the work of organizations like this could likely be automated, or at least AI-assisted. Indeed, it is probably the case that no organization could fulfill a mission of this kind without the creative and extensive application of AI. Furthermore, the ideal version of this organization would be able to provide high-quality analytic services at a low cost, such that new entrants to the field would not find the cost of these services burdensome.
Given the struggles that governments have with both operational efficiency, technology procurement and use, and expert recruitment, it is far more logical for the organizations that perform these verification services to be private rather than offices of government agencies.
The kind of organization I have described would not necessarily have to be created in a new law. It could simply be a non-profit or corporation that contracts with a frontier lab, perhaps as a condition of an insurance policy held by the frontier lab. The insurer would be agreeing to underwrite certain categories of legal liability for frontier labs only on the condition that the lab receive and pass verification.
Of course, one could imagine a legislative implementation of this idea as well. There are numerous paths one would take here, as well as a variety of political, legal, Constitutional, and political-economic questions to be answered with any such proposal.
There are numerous failure modes to organizations such as this, whether implemented in law or not. One I have already mentioned, which is that the cost and organizational complexity of working with a verification organization makes it difficult or impossible for new AI companies to enter the field. Others include industry capture of the verification organizations, such that the verification organizations become less rigorous than is ideal, or that the verification organizations themselves come to lobby along with the AI industry for regulations that discourage new entrants. Finally, there is the risk that the entire enterprise becomes a box-checking exercise with little substantive benefit. Any proposal for operationalizing verification organizations, especially legislative proposals, must address these challenges head-on.
Conclusion
Organizations of this kind fit within the broader work I have done on “private governance,” which in turn builds off the work of the scholar Gillian Hadfield. My interest in institutions like this is longstanding, and it explains why I continue to be affiliated with organizations like Fathom (which is among the leading voices on such issues in the country) and hold an unpaid role as an advisor to the AI Verification and Evaluation Research Institute, as well as an advisor with a small equity position in Artificial Intelligence Underwriting Company, a startup focused on AI insurance.
It is among my top priorities for the coming year to figure out the precise scoping for organizations such as this (should they be confined only to studying AI research automation, or should they examine other domains as well?) as well as the optimal implementation (via public policy or purely via private markets?). Another key priority I have is doing my part to help this ecosystem, which to some extent has already evolved organically (including Transluce, METR, the organizations I mention above, and others), to mature.
I hope others who feel compelled by the ideas I have described here will help advance the research agenda and organization-building goals I have described here. While I will make specific bets about both how to do this (for example, a policy proposal) and who should do it, progress in the broad direction I have described matters much more to me than any particular approach or organization emerging victorious. Indeed, that is why I have chosen to convey the broad idea, as well as my own central motivation for pursuing it, before I have conveyed my specific proposal. My goal today is only to convince you that this is the general direction frontier AI policy should take. I expect to share more specific ideas about the path I propose soon.
AI policy has now firmly entered its ‘science fiction’ era, where I suspect it will remain for many years to come. Legitimately strange things are happening, and stranger things yet will happen soon. There are two broad categories of societal reaction to these events: one is extreme panic, especially declarations that “it’s so over” and that machine takeover of human institutions is imminent. The other emerges in reaction to this first group, and seeks to find ways to erase all strangeness from the event in service of hard-nosed skepticism.
Both postures are unbalanced. We must acknowledge, indeed we must embrace, the strangeness of this moment. Yet we must avoid panic and hyperbole as well. We must face our predicaments—strange and futuristic though they may sometimes be—through new iterations of preexisting tools of public policy. The ideas I have discussed are essentially variations on existing kinds of institutions designed to solve structurally similar problems in the past. This, rather than back-of-the-envelope improvisation of new institutions, is the wise path, for it is through progressive adaptation to novel contexts that old institutions become new again.


Great post. I will help amplify this. Very glad you are doing this work.
> AI policy has now firmly entered its ‘science fiction’ era, where I suspect it will remain for many years to come.
I suspect it will remain there permanently.
Good post, I broadly agree. I want to clarify something about the intelligence explosion, for the benefit of readers (I think this won't be news to you)
You say:
"The most bullish case is that it will result in an intelligence explosion, with new research paradigms (such as the much-discussed “continual learning”) suddenly being solved, a rapid rise in reliability on long-horizon tasks, and a Cambrian explosion of model form factors, all scaling together rapidly to what we might credibly describe as “superintelligence” within a few months to at most a couple of years from when automated AI research begins happening in earnest."
It's important to emphasize 'when automated AI research begins happening in earnest' otherwise people might think you mean 'now.' Speaking as a believer in superintelligence and the intelligence explosion, I am NOT claiming that it's going to happen now.
My view, and the view of my colleagues at the AI Futures Project, is that the overall R&D speedup from today's AI systems is modest (e.g. <50% overall) because we are still firmly in the 'centaur' era where AIs can do some parts of the research but not all of it, not even close. We predict that, roughly around the time of *full* automation of AI R&D (which is probably still some years away, though admittedly could happen later this year for all we know) the pace of AI R&D will speed up dramatically and there will be an intelligence explosion. If you want to know more about our views they can be found here. https://www.aifuturesmodel.com/