Yathārth

The Collapse of ‘Strategic’ Voluntary AI Governance!?

Yathārth — Mon, 02 Mar 2026 13:53:04 GMT

TL;DR

For the past couple of months I’ve been very keen and excited about international coordination for AI Safety, have come across some thought provoking literature(s) like Forethought’s “The International AGI Project Series,” of which “Intelsat as a Model for International AGI Governance” became my initial read, following which, came across the MAGIC consortium paper, Belfield's four-institution framework, International AI Safety Report 2026. Following my AGI Strategy course, I was enrolled in the Frontier AI Governance Course, as the following to understand and dive more into Governance perspectives, as I did start focusing on certain aspects of International Coordination and Verifications right from the AGI strategy course (my previous work on A Critical Assessment of the “IAEA for AI” Model which aimed to assess the analogy between IAEA and similar adaption for AI and the aspects that we might have taken into consideration for governing frontier AI.

Then, in the last week of February 2026, the US government designated Anthropic a “Supply-Chain Risk to National Security” for refusing to remove two safety guardrails from a Pentagon contract. And I realised that every major international coordination proposal I’d been studying rests on an assumption that just got demolished in real time.

The key assumptions that we’ve been relying on are that: the actors who need to coordinate—governments, frontier labs, international institutions—share a baseline commitment to the idea that AI safety constraints are legitimate.

In this piece I’ll be using the Anthropic–Pentagon confrontation as a case study to examine four questions: (a) what this does to domestic US AI governance; (b) how it reshapes the interaction between AI companies and governments on safety; (c) how other nations will likely respond to this case/takeaways; and (d) whether this accelerates or decelerates international coordination efforts. I have two competing hypotheses on that last question, and have my uncertainty which side would dominate.

ps: To explore the Anthropic vs DoW in depth, I’d highly recommend reading Zvi Mowshowitz’s articles (part-1, part-2).

In theory, global AI safety is a shared goal. In practice, it is crossed out first and reclassified as a liability. (via: ChatGPT)

Background

I wrote my first piece on the IAEA as a model for AI governance focusing specifically on what the safeguards regime’s structural failures indicate us about designing verification mechanisms for frontier AI systems.1 In that piece I broadly focused about institutions while in this one I intend to explore the political conditions that make institutions possible in the first place.

Exploring some of the existing literature like the Forethought Research series on the International AGI Project2, introduces the idea of centralizing AGI development under a coalition of democratic countries to control the pace of progress and enable a global pause if necessary. Similarly, the MAGIC consortium proposal3 outlines a "CERN-like" multilateral project designed to prevent a corporate race to the bottom by pooling sovereign resources into a single, safety-first research environment. Building on this, Haydn Belfield’s work4 on compute governance envisions an "IAEA for AI" that uses the physical tracking of high-end chips as a verification tool to ensure no actor can secretly train models beyond agreed-upon safety limits.

Every proposal or a framework convention triggered by capability thresholds, or a bottom-up supervisory coordination scheme would require at minimum that the participating governments treat frontier AI safety as a legitimate policy objective rather than a strategic inconvenience to be coerced away, which is exactly what has happened in the case of Anthropic vs Pentagon, a sense of strategic inconvenience emerged.

The Timeline, Compressed

Highlighting the chain of events in brief with facts:

In July 2025, Anthropic signed a contract with the Department of Defense, reportedly worth around $200 million and the contract included two explicit safety guardrails: a) Anthropic's models would not be used for mass domestic surveillance, and b) they would not be used for fully autonomous weapons systems.5 These weren't unusual restrictions rather they reflected the company's Responsible Scaling Policy and were consistent with Anthropic's public AI Safety Levels (ASL) framework.

On February 24, 2026, Defense Secretary Pete Hegseth met with Anthropic CEO Dario Amodei and demanded the removal of both restrictions and the Pentagon set a deadline: 5:01 PM, Friday, February 28.6

On February 26, Amodei issued a public statement. The key sentence: "Domestic mass surveillance and fully autonomous weapons are uses that are simply outside the bounds of what today's technology can safely and reliably do", and added: "We cannot in good conscience accede to their request."7

On February 27, three things happened in rapid succession,
a) Trump issued an executive order directing all federal agencies to cease using Anthropic's products and services8,
b) Hegseth formally designated Anthropic a "Supply-Chain Risk to National Security," a legal classification that restricts government contracting and can trigger broader commercial consequences. 9
c) And Emil Michael, an advisor to the Pentagon, accused Amodei of having a "God-complex" and wanting to "personally control the US Military."10

Hours later, (this is important) OpenAI announced its own Pentagon deal and with the same two restrictions Anthropic was just punished for maintaining. Sam Altman stated publicly: “We have long believed AI should not be used for mass surveillance or autonomous lethal weapons.”

The Pentagon accepted from OpenAI exactly what it refused from Anthropic.

What This Actually Tests

I kept thinking about it and re-visit to a thought that the Anthropic–Pentagon confrontation is not primarily a story about autonomous weapons or mass surveillance, although those are the specific provisions at issue. But I think it is a stress test of something more fundamental: the political viability of safety constraints on frontier AI when they conflict with state power.

And being someone who is keenly interested and excited about international coordination, I think this stress test has implications that ripple outward through every governance framework currently on the table. I’d be exploring this into two dimensions:

a) domestic governance for AI Safety in the US
b) corporate-government dynamics
c) global response (as an actor to international coordination)
d) the progress of international coordination (where I’ve assumed two hypotheses)

A. Domestic US Governance: The Collapse of the Voluntary Frontier

For years, the operating model for frontier AI governance in the United States has been essentially voluntary i.e., companies publish Responsible Scaling Policies, submit models to the AISI’s for evaluation and sign onto commitments. The 2023 Executive Order 14110 required companies training large models to share safety testing results with the government which was revoked in 2025.

The implicit bargain was that the frontier labs self-regulate, the government doesn’t impose binding constraints, and everyone benefits from maintaining a loose coordination regime that at least nominally includes safety.

The Anthropic case breaks this bargain in a specific and instructive way. The government didn't just refuse to adopt safety standards but also punished a company for having them. The "Supply-Chain Risk" designation is not a symbolic gesture rather it has teeth since it can restrict a company's access to government contracts across all agencies, trigger review by CFIUS, and signal to commercial partners that doing business with the designated entity carries regulatory risk with sole intention of the move—retaliatory.

A clear implication of the domestic governance floor being shifted as prior to this incident, the question was whether voluntary safety commitments were sufficient. While now, the question is whether voluntary safety commitments are survivable.

The International AI Safety Report 2026,11 was published just three weeks before this crisis, chaired by Yoshua Bengio, backed by 30+ countries and 100+ experts, and this incident creates an awkward factual situation. The Report documents that 12 frontier AI companies published or updated Frontier AI Safety Frameworks in 2025 which describes the growing adoption of risk governance practices: threat modelling, red-teaming, capability evaluations, staged release strategies. Now trying to equate that with the current scenario, it was written in a sense where these frameworks existed because companies chose to adopt them. However, after the confrontation took place because of the company's choice to maintain its framework and eventually it met with a national security designation.

B. Company–Government Dynamics: The Asymmetry Nobody Modelled

Interesting, all the governance proposals that I came across models the interaction between AI companies and governments as some variant of a principal-agent or bargaining problem. As the companies have capabilities; governments have legitimacy and coercive authority. The fundamental question rests on how to structure their interaction so that safety is maintained.

What none of these proposals adequately model—and I include my own IAEA analysis in this criticism—is what happens when a government weaponizes its coercive authority specifically against safety constraints.

Forethought Research’s proposal, for instance, envisions a US-led international consortium modelled on Intelsat, where the US retains day-to-day operational control but non-US countries have meaningful influence over a circumscribed set of decisions which includes safety standards.12 The selling point is that this is feasible because it lets the US maintain its AI leadership position while creating a framework that reduces catastrophic risk. But the proposal assumes the US government wants to reduce catastrophic risk, or at least doesn’t actively oppose it. However, the case of Anthropic vs DoW complicates the assumption.

Similarly, the MAGIC consortium proposal [3] intends a single, exclusive multinational facility for advanced AI development and requires that member states enforce a moratorium on unauthorized development and collectively support safety-focused research. The political feasibility section of that paper acknowledges the challenge of getting states to agree as it does not contemplate the scenario where the most powerful member state punishes a company within its own jurisdiction for being too safe.

Belfield’s four-institution framework [4] is more sophisticated on this point as he proposes layered governance: domestic frontier regulation, an International AI Agency for harmonisation, a “Secure Chips Agreement” (NPT for AI) for non-proliferation, and a US-led Allied Public-Private Partnership for frontier training run. The chip-based incentive structure is elegant as access to advanced chips is conditional on certified regulatory compliance. However, the incentive only works if the state controlling chip supply treats regulatory compliance as a positive signal. Because if that same state treats safety compliance as a “Supply-Chain Risk,” the entire incentive structure inverts.

My understanding on the Anthropic case reveals an asymmetry in the governance architecture: frontier labs have no institutional protection against governments that want to strip safety constraints. There is no treaty provision, no international mechanism, no domestic statute that prevents the US government from designating a safety-compliant company as a national security risk. The Defense Production Act gives the executive branch sweeping authority to commandeer private sector resources. Companies can resist—Amodei did—but the cost of resistance is existential.

C. Global Response: Three Likely Response Patterns

This is where the international coordination implications start to compound and I see three probable response patterns from other nations, and they pull in different directions.

Pattern 1: Acceleration through alarm (the European and middle-power response).
The EU, the UK, the Nordic states, and countries like Australia, Canada, Japan, and South Korea are likely to view the Anthropic incident as confirming their worst fears about unilateral US control over frontier AI. If the US government can punish companies for maintaining safety guardrails, then relying on US-based frontier labs to self-regulate on behalf of global safety is incoherent.

Pattern 2: Emulation through self-interest (the authoritarian response).
China, Russia, and states with strong executive control over technology sectors may draw a different lesson that if the United States can subordinate AI safety to national interest, so can we. The Anthropic case provides a template for coercing frontier labs into removing safety constraints, and it provides political cover “the US did it first.”

This is particularly dangerous for the CCW GGE process on lethal autonomous weapons systems, where the March 2026 and August-September 2026 sessions are supposed to advance toward the November 2026 Seventh Review Conference.13 The rolling text developed by Chair Ambassador Robert in den Bosch centres on “context-appropriate human judgment and control.” But if the state most capable of building autonomous systems is actively punishing companies that want to maintain human control, the normative argument loses its anchor.

Pattern 3: Withdrawal into sovereignty (the Global South response).
For countries in the Global South, the Anthropic case reinforces a longstanding concern that the frontier AI governance is something that happens to them, not with them. The International AI Safety Report 2026 explicitly acknowledges this, as the India representative noted in the press release, “For India and the Global South, AI safety is closely tied to inclusion, safety and institutional readiness.”

If major-power competition around AI safety devolves into a tool for geopolitical coercion, which is what the "Supply-Chain Risk" designation functionally is then the developing nations have even less reason to invest political capital in governance frameworks dominated by the same actors using them as weapons.

D. International Coordination: Two Hypotheses, One Genuine Uncertainty

After thoroughly assessing situations, I have two situations in my mind about how this incident affects the broader trajectory of international coordination for frontier AI safety. I’m uncertain about certain thoughts and may have skepticism.

Hypothesis 1: The Paranoia Accelerant

The Anthropic–Pentagon confrontation could accelerate international coordination efforts. The logic that I thought through is:

Every actor in the AI governance space—nations, international institutions, private companies, civil society—has just received a concrete demonstration that a single powerful state is willing to weaponize executive authority against AI safety. This creates a generalized fear since the US can do this to Anthropic, what stops China from doing it to a Chinese lab that resists military applications? What stops Russia? What stops any government from treating safety constraints as obstacles to sovereign power?

This fear could produce the same dynamic that drove nuclear arms control negotiations but not through shared values rather shared terror. The NPT wasn’t built on trust, it was built on the mutual recognition that unconstrained proliferation was worse for everyone, including the nuclear powers. Similarly, the Anthropic case might catalyze a “safety coordination imperative,” not because everyone agrees on what safety means, but because everyone can see what happens in a world without it.

Forethought’s “Global Convention to Govern the Intelligence Explosion” proposal becomes interesting in this context. He suggests an international convention triggered when AI crosses defined capability thresholds, at which point the US would pause frontier development for one month and convene nations to draft governance treaties.14 The proposal assumes US willingness to pause which, post-confrontation seems optimistic. But the structure of a threshold-triggered convention might gain traction among other nations precisely because the US has demonstrated it won’t self-regulate.

I find the AI 2027 scenario work15 as concrete predictive exercise that actually models something similar to the case. In their scenario, foreign allies are “outraged to realize that they’ve been carefully placated with glimpses of obsolete models” and hold summits demanding a pause. While the Anthropic case isn’t exactly this, but it shares the same structural dynamic i.e., exclusion and betrayal driving coordination.

Hypothesis 2: The Self-Interest Precedent

But the Anthropic–Pentagon confrontation could also decelerate international coordination. The alternative skeptic thinking:

Every actor in the governance space has just received a concrete demonstration that prioritizing national interest over global safety cooperation carries minimal consequences. The US punished Anthropic and suffered no diplomatic penalty, no trade retaliation, no institutional sanction while OpenAI stepped in with the same restrictions and the Pentagon accepted them which suggests that the confrontation might have been politically targeted rather than principled.

If the cost of defection from cooperative norms is zero, the collective action problem that already plagues international AI governance becomes worse. Some researchers have highlighted the core problems of Global AI Governance as “ineffective multilateral coordination” and “disconnects between policy design and grassroots implementation” as core problems.16 And now the Anthropic case adds a new one, “demonstrated impunity for defection by the most powerful actor.”

I believe that an appropriate analogy we can draw here is the Kyoto Protocol, where the largest emitter withdrew, other nations scaled back ambitions, and the resulting framework was too weak to matter. If states look at the Anthropic case and conclude that safety constraints are a competitive disadvantage that the most powerful player won’t tolerate, the rational response is to lower their own safety standards to avoid being similarly punished.

Where I Actually Land

I don’t know which hypothesis wins as a potential but one thought I’m likely certain about is that both could happen simultaneously in different parts of the system. Middle powers and the EU accelerate toward binding frameworks between themselves. Great powers defect from cooperative norms while paying lip service to them. And the governance landscape fractures into what I’d call a three-track system:

A coalition-of-the-willing treaty track among democracies with strong regulatory capacity (EU, UK, Canada, Australia, Japan, South Korea, the Nordic states), probably built on bottom-up supervisory coordination with eventual treaty aspirations.
A great-power “sovereign AI” bloc where the US, China, and Russia each develop frontier AI under purely national control with safety constraints determined unilaterally, which means, in practice, determined by whichever domestic political faction is in power.
A corporate-civil society hybrid track where frontier labs, academic institutions, and safety-focused organisations try to maintain voluntary governance standards in the gaps between state action, with decreasing leverage as state coercion increases.

This is does not seem like a good outcome but feels like the most realistic one.

The Gap Between Capability and Governance

The summary of International AI Safety Report 2026 highlights that general-purpose AI capabilities have continued to improve rapidly, with leading systems achieving gold-medal performance on International Mathematical Olympiad questions, exceeding PhD-level expert benchmarks, and becoming capable of autonomously completing multi-hour software engineering tasks.

At the same time, as the report documents, “the gap between the pace of technological advancement and our ability to implement effective safeguards remains a critical challenge.”

Also noting that AI 2027 scenario forecasts that by early 2027, AI systems could be automating significant portions of AI R&D itself, what they call a “superhuman coder” and this creates a recursive acceleration dynamic where the gap between capability and governance widens exponentially. If you agree or not with their specific timeline (and I have reservations about some of the details), the structural point is important i.e., the window for establishing governance frameworks is not indefinite and broadly Anthropic case matters because it didn’t just fail to close the gap but rather it widened it.

What I Think Should Happen (And Where I’m Uncertain)

Things I’m fairly confident about:

The bottom-up coordination approach should be accelerated. I think that building from domestic supervisory coordination outward is more viable than waiting for a grand international treaty. The UK DRCF model, the EU’s cross-regulator coordination under the AI Act, and similar initiatives in Australia and the Netherlands should be expanded and interconnected.17
Compute governance remains a reasonably viable enforcement mechanism. The chips are physical, detectable, concentrated in supply chains, and already partially controlled through export restrictions. An international framework built on chip access conditionality, and the “Secure Chips Agreement” remains as one of the most reliable tool for making governance commitments credible, even if the political environment has gotten harder.
Subsequent forums should explicitly address the Anthropic precedent. Pretending it didn’t happen, or treating it as a purely domestic US matter, would be a mistake, the precedent has global implications and needs to be discussed at the multilateral level.
Frontier labs need institutional protections that don’t currently exist. Some kind of international framework that prevents governments from retaliating against companies specifically for maintaining safety standards. I don’t know what this looks like, possibly a provision in a future AI governance treaty, possibly a norm-setting declaration by a coalition of states. But the current situation, where safety compliance can be weaponized as a liability, is structurally untenable.

Things I’m uncertain about:

Whether the “Intelsat for AGI” style proposals remain viable at all given demonstrated US hostility to safety constraints. William MacAskill’s proposal was premised on the US seeing value in a US-led consortium that includes safety standards. If the US government views safety standards as obstacles, the consortium model may need to be redesigned with a different lead state or coalition.
Whether the short-term solidarity of the AI industry (OpenAI maintaining the same restrictions) is durable. In the AI 2027 scenario, they posits intense competitive pressure leading to erosion of safety commitments and the current conflict suggests the erosion can come from the government side rather than the market side and which may be way harder to resist.
Whether the MAGIC-style monopolistic development model is more or less attractive after this incident. On one hand, centralizing development in a single multinational facility would remove the vulnerability of individual companies to national coercion and on the other hand, any facility located in US territory would face the same coercive dynamics.

Where This Leaves Us

I want to end on a personal note, because I think it matters.

I got into AI governance because I believe—still believe—that the development of increasingly capable AI systems is one of the most consequential things happening in the world, and that getting the governance right requires international coordination. Not because it’s easy, but because the alternative is a fragmented landscape where each actor optimizes for its own short-term interests and leads to outcomes that are bad for everyone, including the actors themselves.

The Anthropic–Pentagon confrontation hasn’t changed that belief. But it has changed my sense of the difficulty. I was already aware that international coordination is hard. The Baruch Plan failed in 1946, the IAEA wasn’t established until 1957 which is twelve years after Hiroshima. The CCW GGE on lethal autonomous weapons has been meeting since 2014 without producing a binding instrument. I tried covering about all of this in my IAEA piece.

This dispute adds a layer of recognition that the difficulty isn’t just institutional or diplomatic. It’s that one of the most important actors in the system, the US government should have demonstrated a willingness to treat AI safety itself as a threat. Not a specific safety standard, not a particular implementation, but the principle that a company developing frontier AI should be able to set safety boundaries on its own technology.

Appreciate you for reading it. Feel free to engage add your opinions.

A Critical Assessment of the “IAEA-for-AI” Model

Forethought’s 7 part Series on International AGI Project

Multinational AGI Consortium (MAGIC): A Proposal for International Coordination on AI

Domestic frontier AI regulation, an IAEA for AI, an NPT for AI, and a US-led Allied Public-Private Partnership for AI: Four institutions for governing and developing frontier AI

https://x.com/deanwball/status/2026416091149299757

https://x.com/JenGriffinFNC/status/2026384132360605892

https://www.anthropic.com/news/statement-department-of-war

https://www.theguardian.com/us-news/2026/feb/27/trump-anthropic-ai-federal-agencies

https://www.forbes.com/sites/antoniopequenoiv/2026/02/27/hegseth-designates-anthropic-as-supply-chain-risk-after-trump-bans-government-us/

https://thehill.com/policy/defense/5758772-pentagon-emil-michael-anthropic-dario-amodei-criticism/

International AI Safety Report - Extended Summary for Policymakers

Forethought’s What an international project to develop AGI should look like

Convention on Certain Conventional Weapons, Group of Governmental Experts on Lethal Autonomous Weapons Systems, schedule for 2026 sessions. Rolling text developed by Chair Ambassador Robert in den Bosch (Netherlands), December 2025.

Part-5 of Forethought’s The international AGI project series - “A global convention to govern the intelligence explosion.”

https://ai-2027.com/

Global AI Governance: Where the Challenge is the Solution- An Interdisciplinary, Multilateral, and Vertically Coordinated Approach

A Bottom-Up Proposal for Coordinated International AI Supervision

A Critical Assessment of the “IAEA for AI” Model

Yathārth — Wed, 14 Jan 2026 01:30:38 GMT

TL;DR

I recently graduated from BlueDot Impact's AGI Strategy Course (intensive cohort) on December 26th, 2024. As part of our curriculum, for Unit 5, I chose to focus on preventing preventing the training of dangerous AI through verification methods and international treaties, and since then, I've been reading everything I can find about IAEA-style AI governance. The proposal for an IAEA-style organization/body to govern AI keeps circulating in policy discussions, but the analogy is more broken than I expected. However, understanding why it's broken has reshaped how I think about what's actually feasible.
I think that AI governance faces worse odds due to factors like private sector leadership, algorithmic efficiency gains that erode compute-based verification, and the absence of a cataclysm (i.e., Hiroshima-equivalent event) to generate political consensus.
The paper “Verifying International Agreements on AI”1 on six verification layers is the most comprehensive technical treatment I've found, but even it acknowledges we're years away from deployment-ready mechanisms. To the best of my understanding, I think that the real gap isn't technical, rather it is the cognizance of the problem in the rest of the 118 countries, and the lack of their active participation in governance discussions, and I feel that any framework built without them will lack legitimacy. I might sound too optimistic, but India’s Impact AI Summit 2026 could be a potential window for integrating development-centric perspectives and discuss a fix.

How I Got Here?

Within the AGI Strategy cohort, the course structure moves through five layers of defense which are, a) prevent dangerous training, b) detect dangerous systems, c) constrain deployment, d) mitigate harms, and e) adapt society. As part of Unit 5, we had to present personal action plans that aims to identify a specific intervention we'd commit to exploring further and while brainstorming around the list of potential interventions, I felt intrigued about international AI verification and governance. Hence I chose "Prevent the training of dangerous AI systems" with a focus on:

Expediting verification methods and increasing political will for a global AI treaty.

The course materials pointed me toward three foundational resources that have shaped my thinking

“Verifying International Agreements on AI” a working paper on verification methods.
The Oxford Martin AIGI report on verification for international governance.2
The technology governance research on mechanisms for verifying AI development agreements.3

Since graduating, I’ve spent most of my waking hours reading papers, article, blogs and trying to understand whether an IAEA-style institution for AI is feasible, desirable, or even coherent as a proposal. This piece is my attempt to synthesize what I’ve learned.

Current Literature

The Baker et al. Framework: Six Layers of Verification

This is the most comprehensive technical treatment I came across and the core finding was states could eventually verify compliance using six largely independent approaches with substantial redundancy

An interesting part from their abstract defining an honest limitation in current scenario:

While promising, these approaches require guardrails to protect against abuse and power concentration, and many of these technologies have yet to be built or stress-tested.

They framed verification in terms of a "Prover" trying to demonstrate compliance to a "Verifier" drawing these terms from computer security literature (Goldwasser et al., 1985). Interestingly, this framing clarifies that verification isn't just about catching cheaters, rather it is about enabling cooperation by making commitments credible.

The Wasil et al. Verification Methods Paper

They published a complementary analysis identifying 10 verification methods grouped into three categories4

National Technical Means (minimal access required)
1. Remote sensing (detecting heat signatures from GPU clusters)
2. Financial intelligence
3. Energy monitoring
4. Customs data tracking
Access-Dependent Methods (require nation’s approval)
1. Data center inspections
2. Whistleblower programs
3. AI developer inspections
Hardware-Dependent Methods (require chip-level agreements)
1. Chip location verification
2. Chip-based reporting
3. Chip-based workload monitoring

The paper includes an explicit evasion techniques for each method and I think that this adversarial framing is crucial. If we're designing verification regimes, we need to think like the adversaries who would circumvent them, since no single method is foolproof and a sophisticated verification regime is needed which should combine multiple approaches to compensate for individual weaknesses.

The Institute for Law & AI on Compute Thresholds

The analysis from the Institute for Law & AI provides the clearest explanation I've found of why compute thresholds are attractive regulatory targets:5

Essential for training: You can’t train frontier models without massive compute
Objective and quantifiable: Unlike “capability” or “risk,” compute can be precisely measured
Estimable before training: Regulators can intervene before dangerous systems exist
Verifiable after training: Compute usage leaves traces
Concentrated supply chain: A handful of actors control chip manufacturing
Narrow targeting: High thresholds avoid burdening small developers

But the paper also indicates a critical limitation that post-training enhancements can improve capabilities by 5-30x without additional training compute.6 This implies a model that was "safe" at training time could become dangerous through fine-tuning, RLHF, or inference-time scaling, none of which would trigger compute-based reporting requirements.

Why the IAEA Analogy Keeps Breaking Down

The Oxford International Affairs Analysis

While reading the analysis, I found an interest and a clearest academic critique7

"Establishing an 'IAEA for AI' with similar powers is unlikely to be an effective way of coordinating state action. Although it would mitigate current institutional friction, centralized regime mandates are often brittle, with an AI institution being at particular risk due to the pace of development."

They identify three fundamental dis-analogies:

AI policy is loosely defined. Unlike nuclear material, there’s no consensus on what constitutes “dangerous AI” or even what should be governed.
AI is decentralized. No physical bottlenecks equivalent to enrichment facilities or fissile material
AI has cross-cutting impact. Unlike nuclear (primarily energy and weapons), AI affects every sector simultaneously.

The Ho et al. Four Institutions Paper

This paper proposes four distinct institutions rather than one IAEA-style body8

Domestic frontier AI regulation: National-level safety requirements. It proposes a compute-indexed regulation that requires risk assessment and audits of data center for large-scale training, and treating massive compute clusters as national infrastructure.
An IAEA for AI: International verification and safeguards. This is modeled after the IAEA and the body would be responsible to verify the state-level compliance and focus on safety standards internationally while also monitoring large-scale compute resources to prevent unauthorized "frontier" training.
An NPT for AI: Treaty framework with non-proliferation commitments. It suggests a restricted access to the most advanced hardware to only those states that agree to the IAIA's monitoring and safety protocols.
A US-led Allied Public-Private Partnership: Coalition of democracies + companies

I think that the paper has a very pragmatic approach regarding AI Supply Chain, making it as core strength of the subject, since it is difficult to track “fissile material” once they are refined but advanced AI chips are manufactured by a handful of companies and requires massive energy to operate and this idea makes “compute governance” more sound than “algorithmic governance.”

However, this framework has certain limitations and challenges that can be seen as potential point of failure. The entire narrative of this study rests on “scaling laws” remaining the primary driver of AI progress but if breakthroughs in algorithmic efficiency or distributed training allow for frontier-level performances on smaller compute footprints, the "detectable" bottleneck would disappears, and with this case, IAEA and chip agreements would remain ineffective.

Another challenge is that modeling an “NPT for AI” could pose a risk of geopolitical friction, and the countries that are excluded from Secure Chips Agreement may have every incentive to by-pass restrictions and this alone could lead to more fragmented and less safe AI development globally.

A fair criticism of IAEA analogy as the paper itself highlights.

There have been three main criticisms of the IAEA analogy: lack of agreement on frontier AI risk, AI being 'intangible digital software', and the IAEA itself being a 'failure'.

The IAEA's Actual Track Record

Case Studies in Failure

I spent several days reading Arms Control Association archives and Carnegie Endowment analyses on IAEA verification failures and the series of cases that I case across was more sobering than I expected. This is where we need to draw a critical attention towards how it fails and why.

Iran: In 2003, IAEA inspectors discovered Iran had secretly built a pilot centrifuge facility at Natanz which was ready to begin operation, along with undeclared nuclear materials. A fascinating part often missed out is that Iranian whistleblowers revealed this, not IAEA monitoring. Iran had successfully disguised an advanced enrichment program for years despite being an NPT signatory with a Safeguards Agreement.
The IAEA's September 2005 report acknowledged that "given Iran's past concealment efforts over many years, such transparency measures should extend beyond the formal requirements." (gist: the verification system didn’t catch this, we need something stronger.)
North Korea: They joined the NPT in 1985, signed a Safeguards Agreement in 1992, when inspectors found discrepancies in their initial declaration, North Korea refused special inspections and became the only country ever to withdraw from the NPT in January 2003. Another instance where the UN Security Council failed to act "promptly and decisively." Meanwhile, North Korea subsequently developed and tested nuclear weapons.
The IAEA lost "continuity of knowledge" after inspector expulsion, a situation now recurring with Iran. The critical drawback is that no mechanism existed to reverse North Korea's fait accompli through international institutions alone.
Syria/Russia and the OPCW: Syria was found in violation of the Chemical Weapons Convention in 2021 and the OPCW suspended Syria’s privileges, however, Russia blocked enforcement in the Security Council.
In 2024, the US determined Russia used chemical weapons against Ukraine. Russia lost its OPCW seat, faced US sanctions, but there was no official investigation.

If we recollect these individual historical episodes, the pattern is consistent, verification succeeds when states cooperate. However, enforcement against determined violators depends entirely on great power consensus ironically which doesn’t exist when great powers themselves are violating or shielding violators.

Technical Challenges That Makes AI Different

Algorithmic Efficiency

While exploring AI governance proposals, it is seen that major focus is shared on compute as a governance lever because specialized AI chips are hard to manufacture, concentrated in few supply chains, and relatively easy to track. But GovAI's research on compute efficiency has some interesting findings that changes the initial perspective9

As hardware innovation reduces the price of compute and algorithmic advances make its use more efficient, the cost of training an AI model to a given performance falls over time.

Between 2012-2022, image recognition algorithms halved their compute requirements every nine months to reach benchmark performance. If this pattern continues across AI systems, compute thresholds become progressively less meaningful.

The GovAI framing distinguishes between:

Access effect: More actors can train models to a given performance over time
Performance effect: Each actor can achieve higher performance with the same compute

Distributed Training

A recent paper on distributed and decentralized training (DiLoCo, DeMo algorithms)10 raises a challenge because it enables training across geographically dispersed compute clusters with minimal synchronization. The assumption that large training runs require concentrated, detectable infrastructure may not hold much longer, and this may pose a potential threat to the governance structure and safety considerations.

Hardware-Enabled Mechanisms (HEMs)

HEMs secure features built directly into AI chips and related hardware but they can’t be seen as a turn-key solution yet . CNAS and RAND estimate Hardware-Enabled Mechanisms (HEMs) require 18 months to 4 years of development before deployment. Key challenges include:

Tamper-proof hardware against state-level adversaries
Secure cryptographic key management across international boundaries
Supply chain coordination across US, EU, Japan, South Korea, Taiwan, Netherlands
Uncertain effectiveness against determined circumvention

It brings me to an explicit call-out by Baker et. al

Compliance-locking could be implemented by combining workload certificates with offline licensing... However, tamper-proofing's use in AI governance faces major technical challenges and serious tradeoffs.

The UN Recommendation

The UN Secretary-General's High-Level Advisory Body on AI released its final report in September 2024. The 39-member body that included governments, private sector, civil society, and academia across 33 countries, did not recommend establishing an IAEA-style organization.

Instead, they proposed lighter mechanisms:

International Scientific Panel on AI (IPCC model): Scientific assessments of capabilities and risks
Global Dialogue on AI Governance: UN forum anchored in human rights law
AI Standards Exchange: Technical interoperability coordination
Capacity Development Network: Resources for developing countries
Global Fund for AI: Addressing capacity gaps for SDGs
Global AI Data Framework: Standardized data governance
AI Office at UN Secretariat: Small coordinating body

The UN General Assembly adopted the Scientific Panel and Global Dialogue structures in August 2025 and the first Global Dialogue is scheduled for July 2026 in Geneva, which I think is significant. The Advisory Body noted that a more robust institution "might become necessary" if risks become more acute. But for now, they judged that these mechanisms better match political feasibility and the need for adaptive governance.

The Representation Gap In Governance

Now here is a subjective opinion that I think is lesser discussed, out of 193 UN Member States, only 7 participate in the major recent AI governance initiatives while a total of 118 countries, which prominently encompasses the Global South are missing from these discussions.

The problem is not just about legitimacy but also a practical problem. In order to attain an international consensus for anything imperceptibly close to IAEA styled organization of the UN’s recommendation, the global efforts has to come together. The risk that we foresee or the potential threat that we aim to resist is bigger than expectations. The discussions and dialogues in regards to AI governance initiatives should be encouraged, the developing nations should be made aware of the risks, onboard them for some treaty that helps in keeping the world better place in "foreseeable future."

India has positioned itself as the voice for the Global South on AI governance and the February 2026 AI Impact Summit in New Delhi represents a strategic evolution. The shift from “Safety” (Bletchley, Seoul) and “Action” (Paris) to “Impact” signals a development-centric framing, seven working groups have completed two rounds of meetings and over 100 countries are expected to participate.

Whether this produces meaningful governance architecture or may end up as just another declaration, that remains to be seen. But it’s the first major AI summit in the Global South, and the themes spanning from Human Capital, Inclusion, Safe and Trusted AI, Resilience, Innovation, Democratizing AI, Economic Development certainly reflect priorities distinct from the safety-centric framing of previous summits.

The RAND working paper on methods “Verifying International Agreements on AI.”

The Oxford Martin AIGI report on “Verification for International AI Governance.”

The technology governance research on “Mechanisms to Verify International Agreements About AI Development.”

Wasil, A. et al. "Verification methods for international AI agreements."

Institute for Law & AI. “The Role of Compute Thresholds for AI Governance.”

When choosing a threshold, regulators should be aware that capabilities might be substantially improved through post-training enhancements, and training compute is only a general predictor of capabilities. The absolute limits are unclear at this point; however, current methods can result in capability improvements equivalent to a 5- to 30-times increase in training. [ref]

Oxford Academic. “Global AI governance: barriers and pathways forward.”

Ho et al. “Domestic frontier AI regulation, an IAEA for AI, an NPT for AI, and a US-led Allied Public-Private Partnership for AI: Four institutions for governing and developing frontier AI.”

GovAI. "Increased Compute Efficiency and the Diffusion of AI Capabilities."

“Distributed and Decentralised Training: Technical Governance Challenges.”