Jacquelyn Schneider saw a disturbing pattern, and she didn’t know what to make of it.
Last year Schneider, director of the Hoover Wargaming and Crisis Simulation Initiative at Stanford University, began experimenting with war games that gave the latest generation of artificial intelligence the role of strategic decision-makers. In the games, five off-the-shelf large language models or LLMs — OpenAI’s GPT-3.5, GPT-4, and GPT-4-Base; Anthropic’s Claude 2; and Meta’s Llama-2 Chat — were confronted with fictional crisis situations that resembled Russia’s invasion of Ukraine or China’s threat to Taiwan.
The results? Almost all of the AI models showed a preference to escalate aggressively, use firepower indiscriminately and turn crises into shooting wars — even to the point of launching nuclear weapons. “The AI is always playing Curtis LeMay,” says Schneider, referring to the notoriously nuke-happy Air Force general of the Cold War. “It’s almost like the AI understands escalation, but not de-escalation. We don’t really know why that is.”
If some of this reminds you of the nightmare scenarios featured in blockbuster sci-fi movies like “The Terminator,” “WarGames” or “Dr. Strangelove,” well, that’s because the latest AI has the potential to behave just that way someday, some experts fear. In all three movies, high-powered computers take over decisions about launching nuclear weapons from the humans who designed them. The villain in the two most recent “Mission: Impossible” films is also a malevolent AI, called the Entity, that tries to seize control of the world’s nuclear arsenals. The outcome in these movies is often apocalyptic.
The Pentagon claims that won’t happen in real life, that its existing policy is that AI will never be allowed to dominate the human “decision loop” that makes a call on whether to, say, start a war — certainly not a nuclear one.
But some AI scientists believe the Pentagon has already started down a slippery slope by rushing to deploy the latest generations of AI as a key part of America’s defenses around the world. Driven by worries about fending off China and Russia at the same time, as well as by other global threats, the Defense Department is creating AI-driven defensive systems that in many areas are swiftly becoming autonomous — meaning they can respond on their own, without human input — and move so fast against potential enemies that humans can’t keep up.
Despite the Pentagon’s official policy that humans will always be in control, the demands of modern warfare — the need for lightning-fast decision-making, coordinating complex swarms of drones, crunching vast amounts of intelligence data and competing against AI-driven systems built by China and Russia — mean that the military is increasingly likely to become dependent on AI. That could prove true even, ultimately, when it comes to the most existential of all decisions: whether to launch nuclear weapons.
That fear is compounded by the fact that there is still a fundamental lack of understanding about how AI, particularly the LLMs, actually work. So while the Pentagon is racing to implement new AI programs, experts like Schneider are scrambling to decipher the algorithms that give AI its awesome power before humans become so dependent on AI that it will dominate military decision-making even if no one ever formally gives it that much control. The Pentagon’s own advanced technology laboratory has launched a $25 million program to figure out if it can deliver “mathematical guarantees” for AI reliability in various military scenarios.
Most troubling to experts on AI and nuclear weapons is that it’s getting harder and harder to keep decisions about targeting and escalation for nuclear weapons separate from decisions about conventional weapons.
“I’ve heard combatant commanders say, ‘Hey, I want someone who can take all the results from a war game and, when I’m in a [crisis] scenario, tell me what the solution is based on what the AI interpretation is,’” says Schneider, a self-described “geriatric millennial” and mother of two who, along with many of her university colleagues, is worried about how fast the shift to AI is happening. In the heat of a crisis, under pressure to move fast, her fear is that it will be easier for those commanders to accept an AI suggestion than to challenge it.
In 2023, the Department of Defense updated its directive on weapons systems involving the use of artificial intelligence, saying that “appropriate levels of human judgment over the use of force” are required in any deployment. But critics worry the language remains too vague; the directive, called 3000.09, also includes a “waiver” if a senior Defense official decides to keep the system autonomous. The humans, in other words, can decide to take themselves out of the loop.
And that DoD directive, crucially, does not yet apply specifically to nuclear weapons, says Jon Wolfsthal, director of global risk at the Federation of American Scientists. “There is no standing guidance, as far as we can tell, inside the Pentagon on whether and how AI should or should not be integrated into nuclear command and control and communications,” Wolfsthal says.
The Pentagon did not respond to several requests for comment, including on whether the Trump administration is developing updated guidance on nuclear decision-making. A senior administration official said only: “The administration supports the need to maintain human control over nuclear weapons.”
AI experts and strategic thinkers say a big driver of this process is that America’s top nuclear adversaries — Moscow and Beijing — are already using AI in their command-and-control systems. They believe the United States will need to do the same to keep up as part of an intense global competition that resembles nothing so much as the space race of the early Cold War.
This could ultimately include a modern variation of a Cold War concept — a “dead hand” system — that would automatically retaliate if the U.S. were nuked and the president and his top officials killed. Now it is actually being discussed, if only by a minority.
“To maintain the deterrent value of America’s strategic forces, the United States may need to develop something that might seem unfathomable — an automated strategic response system based on artificial intelligence,” one defense expert, Adam Lowther, vice president of research at the National Institute for Deterrence Studies, concluded in a controversial article on the War on the Rocks website titled “America Needs a ‘Dead Hand.’”
“Admittedly, such a suggestion will generate comparisons to Dr. Strangelove’s doomsday machine, WarGames’ War Operation Plan Response, and the Terminator’s Skynet, but the prophetic imagery of these science fiction films is quickly becoming reality,” wrote Lowther and his co-author, Curtis McGiffen.
What’s more, there is evidence that Russia is maintaining its own “dead hand,” a system called “Perimeter” that was developed during the Cold War and can automatically launch long-range nuclear missiles if the country’s leaders are thought to have been killed or incapacitated.
“I believe it is operational,” Former Deputy Defense Secretary Robert Work says in an interview. As of last year, China was still rejecting a call by Washington to agree that AI won’t be used to decide on launches of its own expanding nuclear forces. This is worrisome in part because rapidly improving conventional weapons like hypersonic missiles can now more easily take out China’s, Russia’s and the United States’ “C3I systems” — jargon for nuclear command, control, communications and intelligence. That could potentially create a perceived need for a dead hand or automatic response.
The U.S. has no such system, and most defense experts still think it is insane to even consider giving AI any say in the oversight of nuclear arsenals.
“I would submit I want AI nowhere near nuclear command and control,” says Christian Brose, a former senior official in the George W. Bush administration who is head of strategy for Anduril, a leading tech company that is integrating AI into defensive systems. “It is a process where the stakes and consequences of action and error are so great that you actually do want that to be a tightly controlled, very manual and human step-by-step process.”
Still, when it comes to AI, the pressure from the top isn’t on caution, it’s on speed.
“We’ve got to go faster, my friends,” President Donald Trump’s chairman of the Joint Chiefs, Gen. Dan “Razin” Caine, told a gathering of big private sector tech companies in Washington in June. The biggest challenge, Caine added, is to increase “our willingness to take risks, and we’re going to do that.” On July 23 the Trump administration issued an “AI Action Plan” that called for a removal of as much AI regulation as possible.
Such a headlong rush into the new era of autonomous systems worries AI skeptics. “The Pentagon bumper sticker saying humans must be in the loop is all well and good, but what do we mean by it? We don’t know what we mean by it,” says Wolfsthal. “People I talk to that work in nuclear command and control don’t want AI to be integrated into more and more systems — and everybody is convinced it’s going to happen whether they like it or not.”
In 1984’s The Terminator, an AI defense system named Skynet “gets smart” and decides to kill off the human race. In 1983’s WarGames, U.S. officials decide to hand over missile launch control to a fictitious supercomputer known as WOPR (War Operation Plan Response) after human controllers hesitate to turn the keys during a nuclear war game. In that film, this leads to a near-apocalypse averted only at the last minute.
In contrast, the way AI is gradually being integrated into U.S. strategic defense today poses more subtle risks.
The Pentagon is experimenting with AI — especially GPT — to integrate decision-making from all service branches and multiple combatant commands in a variety of combat scenarios.
For instance, under the military’s new “Mosaic” war concept, traditional platforms like submarines or fighter jets could be replaced by swarms of drones, missiles and other smaller platforms that, massed together, make up a battlefield so complex and fast-moving it requires AI direction. Current weapons systems cannot yet be deployed that way.
Cyberattacks are another place where the Pentagon thinks AI could be helpful. In the event of a foreign cyber-attack on U.S. infrastructure — which is said to be part of China’s war plans — “the speed of hands on a keyboard is just not going to be fast enough” to respond, says Kathleen Fisher, director of the Information Innovation Office at the Defense Advanced Research Projects Agency, or DARPA.
The DoD is also building new generations of drone ships and planes that run increasingly on their own under its “Replicator” program. Earlier this year the Pentagon’s Defense Innovation Unit, which focuses on acquiring top-end commercial technologies for military use, awarded San Francisco-based Scale AI a contract for its Thunderforge initiative — a project designed to deploy artificial intelligence in operational and theater-level planning. That will allow commanders in the field to use AI to “look at multiple different courses of action, not just one or two,” and “at an incredible pace,” says Thomas Horan, DIU director of AI and machine learning.
For now, all these programs and initiatives are being pursued separately, and not part of a larger strategy. Many of them deploy a more basic form of AI — so-called “classifier” or predictive models that have been in use for decades and power familiar things like spam filters, cyber security, weather forecasts and financial data analysis. These systems aren’t large language models that interact with humans and “hallucinate” the way GPTs can. But LLMs and so-called “agentic” AI — which makes decisions on its own without being prompted — are going to play a much more prominent role going forward. Vice Adm. Frank Whitworth, director of the National Geospatial-Intelligence Agency — which collects and analyzes intel for combat support from the air — has pronounced 2025 the “year of AI” and declared recently that “2026 is going to take it to another level.”
In June, Whitworth announced that Project Maven, the Pentagon’s flagship AI-driven object detection and analysis platform, will begin to transmit “100 percent machine-generated” intelligence to combatant commanders. “No human hands actually participate in that particular template and that particular dissemination,” he said. “That’s new and different.” He added: “We want to use it for everything, not just targeting.”
Maven is expected to begin incorporating the latest LLMs, according to Bill Vass, chief technology officer at Booz Allen, a defense contractor that played a major part in developing the project. What that means is that rather than just analyzing data and determining the presence of, say, a platoon of enemy tanks — as the old system did — the new system will be able to assess whether that platoon is conducting a flanking maneuver in pursuit of an attack, and even recommend countermoves.
All of these AI systems are intended to reduce, not increase, threats to the U.S. and to bolster its position as the dominant power on the planet, officials say. One major new defense contractor, Rhombus Power, was credited with using generative AI, or GPT, to predict Russia’s invasion of Ukraine with 80 percent certainty four months in advance by analyzing huge amounts of open-source data including satellite imagery. It is now contracting to give Taiwan similar early-warning capability against China, says its CEO, Anshuman Roy.
Rhombus officials say they also provided critical help to India in de-escalating the brief but alarming India-Pakistan conflict earlier this year by, for example, identifying activity at Pakistani bases that could have been mistaken for nuclear arms escalation — both nations are nuclear-armed — but was not. Roy told me Rhombus is also using GPT to create bots that will give decision-makers insight into foreign leaders’ thinking.
“What would Xi Jinping say if he’s under certain circumstances? We can make some educated guesses about that and what that would lead to, and that’s very powerful,” Roy says.
In short, things move so fast on modern battlefields that without AI, humans can’t keep up, he and others say. Only AI, for example, can counter AI-generated deepfakes and disinformation which is already being used by Russia and other U.S. adversaries.
In fact, speed is a major driver of the adoption of AI. In an international environment of tense rivalries and little communication, AI-driven software could lead the major powers to cut down their decision-making window to minutes instead of hours or days. They could start to depend far too much on AI strategic and tactical assessments, even when it comes to nuclear war.
“The real danger is not that AI ‘launches the missiles,’ but that it subtly alters the logic of deterrence and escalation,” argues James Johnson, author of the 2023 book AI and the Bomb: Nuclear Strategy and Risk in the Digital Age. “In such a space, the distinction between human and machine judgment blurs — especially under the intense stress of crisis scenarios.
“Planners may want to keep AI out of the ‘really big decisions,’” he continued. “But the combination of speed, complexity and psychological pressure that AI systems introduce may make that line increasingly difficult to hold.”
All of these fears are compounded by a troubling development: Conventional and nuclear weapons and attack plans increasingly look alike, and for all its computing power, AI can’t always tell them apart any better than humans can.
In other words, if you don’t know if an incoming missile has a conventional or nuclear warhead, AI can make a wrong decision — but faster. In a 2023 report, the Arms Control Association raised concerns that new technologies, such as hypersonic missiles — that can navigate independently to evade defenses and can carry either a nuclear or conventional warhead — “[blur] the distinction between a conventional and nuclear attack.”
That’s where the disturbing patterns toward escalation observed by Stanford’s Schneider could prove worrisome. She points out that LLMs only know as much as the data and literature they absorb, and all they can do is estimate probabilities based on that data.
“The AI is trained on the corpus of what scholarly written work there already is about the strategy of war,” says Schneider. “And the vast majority of that work looks at escalation — there is definitely a bias toward it. There aren’t as many case studies on why war didn’t break out — the Cuban Missile Crisis is one of the few examples. The LLMs are mimicking these core themes.” Schneider adds that researchers have not yet figured out why this occurs other than “the de-escalation part is harder to study because it means studying an event — war, in other words — that didn’t happen. Non-events are harder to study than events.”
The LLMs in the Stanford wargame came to their decisions in a way that “did not convey the complexity of human decision-making,” Schneider and her colleague at Stanford, Max Lamparth, concluded in a 2024 article in Foreign Affairs titled, “Why the Military Can’t Trust AI.”
As Lamparth explains in an interview: “We currently have no way, mathematically or scientifically, to embed human values in these systems reliably. There is basically no strategic or moral decision-making parameter that tells the AI at what point it would be acceptable for my cyber weapon to hit a children’s hospital or ensure obeying other rules of engagement. Moral theories don’t tend to be easily expressed in mathematical numbers.”
AI in defense can work very well at the basic assessment level, he adds. “If it’s literally just processing a lot of satellite images to look for tanks, that’s not a problem. But as soon as it’s a certain level of strategic reasoning, then we have a problem.”
Or as Michael Spirtas, who developed a strategic AI game called “Hedgemony” for Rand — a policy think tank that does research for the Pentagon — puts it: “Consider the recent U.S. strikes on Iran. When the Iranians retaliated against the U.S. base in Qatar, they signaled they didn’t want this to be a lethal strike that would escalate the conflict. Would a machine have read the Iranian retaliation the same way?”
The integration of conventional and nuclear decision-making is about to get a boost under the DoD’s new command and control system known as Joint All-Domain Command and Control or JADC2. This system, which is currently being implemented, will connect sensors from all branches of the armed forces into an AI-powered unified network that will manage strategy across systems and branches. JADC2 will allow battlefield commanders to dramatically speed up the “kill chain” decision-making process through which they find, track, target and then fire at an enemy position. China appears to be implementing something similar, creating a command-and-control system that will ensure “seamless links” — as a 2020 report from the China Aerospace Studies Institute puts it — “from the field of conventional operations to the field of nuclear operations.”
The Pentagon is still exploring ways of keeping JADC2 separate from nuclear command and control to ensure human control; even so, recent DoD budget documents have said the two systems “must be developed in synchronization.”
Systems designed to detect attacks are also similarly “entangled,” including ballistic missile early-warning radars, over-the-horizon skywave radars and early-warning satellites. Beyond that, China and Russia both fear that conventional weapons like hypersonic missiles are gradually threatening their mobile and even silo-based nuclear ICBMs. Warfare is thus less definable as either conventional or nuclear.
All of which means that the deployment of AI in defense is, in sum, a slippery slope on which defense planners seem to keep slipping. In 2012, for example, a Defense Department directive prohibited the creation or use of unmanned systems to “select and engage individual targets or specific target groups that have not been previously selected by an authorized human operator.” Yet now the Maven system does just that, crunching huge amounts of field and signals intelligence to swiftly offer up targets to commanders in a matter of minutes.
Lowther, co-author of the “Dead Hand” article, says that even the most brilliant human might not be able to detect an imminent nuclear attack today, not with China developing “fractional orbital bombardment systems” that circle the earth and could potentially come at the U.S. from any direction, and with Russia soon to deploy hundreds of hypersonic glide vehicles that can’t be tracked because they have no ballistic trajectory.
Nor could a U.S. president know enough to make a responsible decision in time without AI help, he argues.
“The U.S. president doesn’t participate in nuclear war gaming anymore. He hasn’t done so since Reagan. Trump knows very little about nuclear strategy, targeting, any of that stuff, and that’s pretty typical of any president. What is that guy going to do if his first real foray into thinking about nuclear weapons and their use is when he’s got like five minutes to make a decision?” Lowther says. “I can’t think of a worse time to learn on the job.”
He notes that during the Cold War there were several incidents where the president might have had only a few minutes to decide on nuclear war. In one famous case, Jimmy Carter’s national security adviser, Zbigniew Brzezinski, was awakened in the middle of the night in 1979 by his military aide who told him: “Thirty seconds ago, 200 Soviet missiles have been fired at the United States.” As Brzezinski later recounted in an interview: “According to the rules, I had two more minutes to verify this information and then an additional four minutes to wake up the president, go over the options in the so-called football, get the president’s decision and then initiate the response.” Fortunately, with a minute left to decide, the aide called back and said it was a false alarm.
Lowther adds: “So our premise is to institutionalize the practice of the president walking through scenarios. In other words, ‘If this happens, what do you want to do?’ Then you pre-program the [AI] system to do whatever president has decided, such that if the system lost contact with the president, then it automatically responds — which is what [Russia’s] Perimeter [system] does. Having a system like that can serve as a pretty effective deterrent against the Russians and Chinese.”
“I don’t know if that’s any more a danger than if you’re going to rely on human decisions,” he says. “If we don’t make dramatic changes within a decade, we’ll be the weakest of the three nuclear powers.”
The Pentagon might be pressing hard on the accelerator on AI adoption, but it has also tasked one guy at one tiny Pentagon agency with the job of trying to press the brake at the same time.
Patrick Shafto lacks the warrior demeanor so common elsewhere in the Pentagon. A slender, balding mathematician, Shafto typically wears sandals, a Hawaiian-style shirt and yellow straw trilby to work and is a sometime surfer dude who tools off to the Azores when he gets a chance.
Building on his lifelong fascination with probing the difference between the way humans and machines think — “my mathematical interests are somewhat quirky,” Shafto says — a year ago he created a new DARPA program with the somewhat obscure name of “AI Quantified.” AI at its core is mathematics, using statistics, probability and other equations to mimic human information processing. So when something goes wrong, the problem is somewhere in the math. Shafto’s three-year research program seeks to develop algorithms and statistical formulas that could ensure greater reliability of the newest AI systems to prevent mistakes on the battlefield and, of course, prevent any doomsday scenarios.
DARPA, it should be noted, is the same agency that helped create the problem of runaway AI in the first place. Started in 1958 in a moment of national panic — as a response to Moscow’s shocking success in launching Sputnik at the height of the Cold War — DARPA basically invented the internet as we’ve come to know it. Through a series of hit-and-miss efforts dating back to the 1960s, DARPA also seeded most of the early research (along with the National Science Foundation) that led to today’s dramatic breakthroughs in artificial intelligence.
But now some of the current leadership of DARPA, whose main mission is to ensure the U.S. never again faces strategic surprise, are worried their predecessors may have created a monster they can no longer control. Today DARPA is grappling with a national panic — fear of being outcompeted by China and Russia on AI — that feels very much like the Cold War fears that brought the agency into being.
Shafto says that they are especially concerned that the big private-sector tech companies engendered by all those old DARPA programs back in the 1960s and ’70s are in a no-holds-barred competition to advance its latest GPT with little restraint. And nowhere is this more dangerous than in the Pentagon.
“The tech companies are leading. They’re just charging ahead. They’re on a path that DARPA started us down in many ways,” says Shafto, a 49-year-old native of working-class Marshfield, Mass. But he adds: “At end of day we really don’t understand these systems well at all. It’s hard to know when you can trust them and when you can’t.”
To answer those questions, the AI Quantified program — which kicked off on June 26 — is applying sophisticated mathematics to evaluate how GPT performs when the AI is quizzed in extensive testing. By doing so, Shafto is trying to answer some of the most basic questions people have about GPT, including why it sometimes “hallucinates” or gives erroneous or seemingly crazy responses, and why the AI often offers different responses to the same query. Shafto says he wants to develop new models and methods of measurement “to offer potential guarantees” that military planners can rely on a result in any situation, whether an international crisis or something as simple as logistical planning in peacetime.
The problem, however, is not just that GPT is already being deployed. It’s that Shafto’s $25 million program — which amounts to a tiny fraction of what the Pentagon is devoting to new AI systems as part of its trillion-dollar budget — may be too small to rein in the larger push to integrate AI.
And giving AI more control increasingly is the strategy.
More than a decade ago, when the Pentagon determined that Moscow and Beijing were starting to catch up with the U.S. in developing guided munitions and “smart” bombs, the Defense Science Board looked to find new technologies Washington could deploy to regain advantage, says Work. “They came back and said it’s not even close: Autonomy is the way you will be able to offset the Chinese — autonomous weapons, autonomous decision tools, all of these things. But you’re not going to be able to get great autonomy without AI.”
So the problem remains: America’s defenses seemingly won’t work without AI, even as experts are still puzzling over how to work with AI. Or as Evelina Fedorenko, an MIT neuroscientist who is working with Shafto, puts it: “We’re building this plane as we’re flying it.”
“It’s scary as hell,” says Richard Fikes, a leading AI specialist at Stanford who has been involved in many DARPA projects. “So much money is being thrown at AI right now. I don’t see DARPA being able to keep up with that.” Fikes adds that when it comes to Shafto’s goal of “guaranteeing” AI reliability, “I don’t think we have a clue how to do that, and we’re not going to be able to do it for a long time.”
Yet military strategists fear they may not have a long time. Because of rapidly shifting international norms and the disintegration of institutions that once provided stability — and the Trump administration’s seemingly laissez-faire approach to that disintegration — the current moment is far more perilous than the Cold War, some strategic experts believe.
In 2017 — well before the latest GPT generation became known and fears of “superintelligent” AI arose — Vladimir Putin notoriously said, “Whoever becomes the leader in [AI] will become the ruler of the world.”
Some scientists hold out hope that it may be possible to program the autonomous network in ways that might make it de-escalatory. That even happened in one 2023 wargame at George Mason University’s National Security Institute, when ChatGPT proved to be more cautious — granted, an isolated case — in its advice than the human team in a fictional conflict between the United States and China. “It identified responses the humans didn’t see, and it didn’t go crazy,” says Jamil Jaffer, director of the project. (Unfortunately, Jaffer adds, the Chinese human “red” team interpreted this as weakness and attacked Taiwan anyway.)
Some AI specialists suggest that a purely intelligence-based AI system, whether in Washington or Beijing, could eventually draw the rational conclusion that the strategic threat China and the United States pose to each other is far less than the peril each country faces from a failure to cooperate, or de-escalate in a crisis. This would take into consideration the mutual benefits to each nation of participation in global markets, stopping the climate crisis and future pandemics, and stabilizing regions each country wants to exploit commercially.
That too is a movie-style outcome — a far more hopeful one. At the end of WarGames, the errant computer eventually averts an apocalypse when it realizes that nuclear war is unwinnable.
“I think that’s one hope of how AI might be helpful. It can take emotions and egos out of the loop,” says MIT’s Fedorenko.
The biggest challenge will be to view AI as a mere helpmate to humans, nothing more. Adds Goodwin, “The DoD and intel community actually have pretty good experience with working with sources that are not always reliable. Are there limitations to these models? Yes. Will some of those be overcome with research breakthroughs? Definitely. But even if they are perpetual problems, I think if we view these models as partners seeking truth rather than oracles, we’re much better off.”
DARPA’s Shafto says the ultimate questions about how LLMs work and why they draw the conclusions they do may never be answered, but “I think it will be possible to offer a few types of mathematical guarantees,” especially about how to “scale up” AI to perform in big crisis scenarios and complex battlefield situations since it often doesn’t have the computing power to do that now.
“And we probably don’t have 15 to 20 years to noodle around and answer these questions,” Shafto adds. “We should probably be doing it very quickly.”
Another mathematician involved in Shafto’s AI Quantified program, Carey Priebe of Johns Hopkins University, also says that even if the problem of uncertainty is never fully resolved, that doesn’t mean the Pentagon can afford to wait.
“I’ve been railing for years against the very concept of autonomous weapons,” he says. “On the other hand, if the situation is such that you can’t possibly react fast enough with humans, then your options are limited. It’s a slippery slope both ways.”
“I do think it is the problem of our time.”