Claude Opus 4.6 Spends $20K Trying to Write a C Compiler
upstart writes:
Claude Opus 4.6 spends $20K trying to write a C compiler:
An Anthropic researcher's efforts to get its newly released Opus 4.6 model to build a C compiler left him "excited," "concerned," and "uneasy."
It also left many observers on GitHub skeptical, to say the least.
Nicholas Carlini, a researcher on Anthropic's Safeguards team, detailed the experiment with what he called "agent teams" in a blog that coincided with the official release of Opus 4.6.
He said he "tasked 16 agents with writing a Rust-based C compiler, from scratch, capable of compiling the Linux kernel. After nearly 2,000 Claude Code sessions and $20,000 in API costs, the agent team produced a 100,000-line compiler that can build Linux 6.9 on x86, ARM, and RISC-V."
With agent teams, he said, "multiple Claude instances work in parallel on a shared codebase without active human intervention."
One key task was getting round the need for "an operator to be online and available to work jointly," which we presume means removing the need for Claude Code to wait for a human to tell it what to do next.
"To elicit sustained, autonomous progress, I built a harness that sticks Claude in a simple loop... When it finishes one task, it immediately picks up the next." Imagine if humans took that sort of approach.
Carlini continued: "I leave it up to each Claude agent to decide how to act. In most cases, Claude picks up the 'next most obvious' problem." This threw up a number of lessons, including the need to "write extremely high quality tests."
Readers were also advised to "put yourself in Claude's shoes." That means the "test harness should not print thousands of useless bytes" to make it easier for Claude to find what it needs.
Also, "Claude can't tell time and, left alone, will happily spend hours running tests instead of making progress."
Which might make you feel working with Claude is closer to working with a regular human than you might have thought. But what was the upshot of all of this?
"Over nearly 2,000 Claude Code sessions across two weeks, Opus 4.6 consumed 2 billion input tokens and generated 140 million output tokens, a total cost just under $20,000."
This made it "an extremely expensive project" compared to the priciest Claude Max plans, Carlini said. "But that total is a fraction of what it would cost me to produce this myself - let alone an entire team."
Read more of this story at SoylentNews.