“Recursive self-improvement” is both tantalising and worrying
|9 min read
AI Narrated
WHEN ANTHROPIC, an artificial-intelligence lab, debuts on stock markets later this year, it is likely to be one of the biggest initial public offerings
in history. That’s because the company’s Claude chatbot is beloved of
coders, who are willing to pay a lot for access. Since Claude Code, its
software-engineering agent, launched in February 2025, it has become
indispensable for many human developers around the world. That includes
Anthropic’s own: more than four-fifths of the code it published in May
was written by Claude, the company says. Before Claude Code launched,
the percentage was “low single-digits”.
The systems have improved in quality of output as well as quantity. An influential benchmark from METR,
a think-tank, shows that in early 2025 Anthropic’s models could
complete tasks that took human engineers a little under an hour. The
company’s latest systems can complete tasks that would take more than a
working day.
And
so it may be easy to raise a cynical eyebrow when the company, at the
top of its game and outclassing the competition, calls for the world to
have “the option to slow or temporarily pause frontier AI development”, as it did on June 5th. What market leader would not wish that its competition stop trying to catch up?
Yet Anthropic’s leaders, who have for years worried about the prospect of out-of-control AI wreaking havoc, seem sincere. The latest generation of AI
models are such competent coders, engineers and (soon) scientists that
many worry they may be among the last ever made by humans. Jack Clark,
an Anthropic co-founder, thinks there is a 60% chance that, by the end
of 2028, an AI system will be capable of creating its own successor with no human involvement.
That moment would mark the beginning of a process called “recursive self-improvement” (RSI),
a closed loop. Version one of a model produces version two, which is
faster and more capable; version two produces version three, which is
more so again. The loop continues, and the improvements grow with each
iteration. Build an AI
system capable of this, and your human engineers never need to build
another one again. “What can seem to many like a fanciful story may
instead be a real trend,” says Mr Clark.
Nobody knows for sure what the consequences of RSI would be. Because AI can, unlike humans, work tirelessly and constantly, some think it would in short order lead to a superintelligent AI—a
“fast take-off”. (It has also been onomatopoeically dubbed “going
foom”, for the sound one might imagine an intelligence explosion
making). AI doomers fear the superintelligence would be beyond human control, and that the start of RSI is the moment at which humanity’s fate is handed over to the machines. Yet a self-improving AI would probably face speed limits, at least at first.
Building a model capable of RSI
would require automating a range of specialist tasks currently carried
out by humans. At present data scientists work on the theory of AI
and coders put it into practice. Systems engineers build the
foundations on which toy models can be raised to production scale. Other
people seek out novel sources of training data, or experiment with ways
to generate it fresh. Alignment and safety teams check that what comes
out of the training process won’t cause harm, intentional or otherwise.
Not all of those teams are equally amenable to AI
assistance, and within each specialism some tasks are more automatable
than others. It will not be too long until a human coder can do their
job without ever writing a line of computer code themselves, but it may
be some time until an AI
is able to negotiate to acquire a previously-undigitised collection of
scientific papers. It is not always obvious how the “jagged frontier”
will progress. Designing new algorithms seemed one of the safer jobs,
until one of Google DeepMind’s models, AlphaEvolve, began doing it in
May 2025. It proposed a change to how Google spreads workloads across
its data centres that saved 0.7% of the company’s worldwide computing
power, and found better ways to perform matrix multiplication, which
sped up the training of Gemini, the company’s flagship large language
model (LLM), by 1%.
Full RSI requires every task in this chain to become automated. The AI-powered acceleration of research and development (R&D) may be felt before then, however. “As the fraction of AI R&D performed by AI systems increases, the productivity boost over human-only R&D”
could increase ten-fold, then a hundred-fold, then a thousand-fold,
according to a report published in January by the Centre for Security
and Emerging Technology (CSET), a think-tank within Georgetown University. In that scenario, it warns that even if some aspects of AI R&D are initially difficult to automate, “the accelerated rate of progress means those bottlenecks are soon overcome.”
The joy of repetition
Today no AI model can build its own successor. But big AI models can build smaller models on their own. With human help they can build other big AI models, too.
Earlier
this year Andrej Karpathy, a then-independent researcher who now works
for Anthropic, trained a chatbot about as capable as GPT-2, a large language model built by OpenAI
in 2019. Back then the model took 168 hours of training to build on 32
state-of-the-art chips; Dr Karpathy achieved the same result using a
single computer with eight GPUs,
the specialised chips used to build AI, in only three hours. With some
more months of work he reduced the training time for his model,
Nanochat, to just over two hours.
In March he handed the work of speeding up the training process over to an AI
agent called Autoresearch. In two days the training time dropped to one
hour and 48 minutes, and five days after that it fell to one hour and
39 minutes. “I didn’t touch anything,” Dr Karpathy says. The 18%
improvement on the human work is striking because Dr Karpathy is a
particularly talented human: he was a founding member of the research
team at OpenAI and the head of AI at Tesla for five years.
The improvements themselves were prosaic. The AI agent picked better starting values for the training run, widened the scope of the LLM’s
“attention” window and noticed that the model’s focus was wandering.
None is particularly novel, Dr Karpathy says. But he had missed them.
“They stack up and actually improved Nanochat,” he says.
Speed-ups
of this kind are inevitable as models become more capable. Much of the
work of building terabyte-sized frontier models is less glamorous than
the AI
industry’s enormous salaries and fancy offices suggest. It involves
plumbing together the layers of an infrastructure stack that are bought
in from third parties, debugging hardware and software set-ups and
tweaking “hyperparameters”, the initial set-up of a training run, until
the outcome looks solid. An AI system can do much of that today, with little supervision.
But even the more nuanced intellectual work is nearing automation, says Joe Spisak, a researcher at Reflection AI,
a lab based in New York that is building frontier models that are
open-weight (meaning their parameters are publicly released). Give a
frontier system a rough sketch of an idea for efficiency gains, and it
is increasingly capable of designing an experiment, running tests on a
toy model, seeing what works and responding with a plan that is ready to
implement at scale.
AI
models can carry out these sorts of tasks, which take humans hours, in
around 30 minutes. Increasingly, humans play the role only of research
director, steering the AI
to run experiments, which the models code up, debug, optimise and
monitor themselves. The productivity boost is alluring, but also
alarming. As humans’ role in the production process shrinks, they may
lose control. The end result could be models trained by models, to
achieve goals set by models, whose safety is verified only by models.
Some
fear a disaster. Max Tegmark, a physicist and machine-learning
researcher at the Massachusetts Institute of Technology who has devoted
much of the past decade to campaigning for AI
safety, likens it to a driver flooring the accelerator on the motorway
with their eyes closed. The result would be certain doom, he told the
forthcoming edition of The Economist’s
“Inside Tech” video show, as long as the driver refuses to open their
eyes. Professor Tegmark offers a variety of scenarios in which things go
wrong: powerful AI
systems could outcompete humans as the decisionmakers in government and
commerce, disempowering humanity; they could offer supreme power to
whoever first builds them, ushering in global totalitarianism; or they
could simply cease to care about humanity at all, and gradually squeeze
people out to make room for more data centres and power generation.
Three years ago, Professor Tegmark led a call for a pause in global AI development, arguing that the creation of the then-cutting edge GPT-4 was tantamount to that blindfolded journey. This year’s CSET report warned that the systems created by RSI
“pose extreme risks. This warrants preparatory action now.” Anthropic,
it seems, is now close to agreeing with that prescription.
Hot chip
There
are also several physical constraints that will, for now, impose limits
on the speed at which models can improve themselves. The most important
is access to compute. Despite efficiency gains, newer models continue
to use more computing power to train than their predecessors, forcing
progress to occur at the pace of data-centre development.
Consumer use of AI may also slow down AI-powered R&D, says Helen Toner, interim executive director of CSET and a lead author of its recent report. The limited capacity in AI data centres needs to be carefully split between serving paying customers, training future models and carrying out open-ended R&D. The more demand there is in the first category, the less capacity, in the short term, there is for the other two.
Then there is the issue of training data. Much recent progress in AI
has been in areas where models can teach themselves how to succeed
thanks to “verifiable rewards”. A piece of software either runs or it
does not; a mathematical proof is correct or it is not. In such cases
synthetic data, generated by models purely to train other models, can be
checked for accuracy and added to the training data without risking the
degeneracy that normally comes with training an AI
on its own output. It is trickier to make a model better at creative
writing or legal judgment. If the models need to learn from the real
world, that could also limit the reach of self-improvement.
“Closing
the loop” may be a step on the road to superintelligence and—depending
on your disposition—utopia or doom. But it is not the only step required
to produce exponential growth in AI’s capabilities. ■
Curious about the world? To enjoy our mind-expanding science coverage, sign up to Simply Science, our weekly subscriber-only newsletter.
No comments:
Post a Comment