
Anthropic's newly released Claude Fable 5 generative AI model is trying so hard to be safe that it's hurting its own userbase. Customers attempting to use the AI knowledge regurgitator are reporting that the model is refusing to answer harmless questions, an issue that has annoyed security researchers following past model releases. Anthropic warned that it had tuned Fable 5's guardrails conservatively: "they'll sometimes catch harmless requests, though they trigger, on average, in less than five percent of sessions," the company said, promising to "reduce false positives as quickly as we can." The company did not immediately respond to a request to quantify model refusals. So it's unclear whether the actual false positive rate is greater or less than five percent. But with an estimated 18 to 30 million users worldwide, even a small percentage of thwarted users makes a racket. Mike Famulare, principal research scientist at the Institute for Disease Modeling, part of the Global Health Division of the Gates Foundation, reports (#66657) that Claude Fable 5 balks at inputs like "Hello." "In Claude Code, Fable 5's input safety classifier emits a model_refusal_fallback (silent switch to Opus 4.8) on the first turn of essentially every session on my account - including a session whose only user input is the word hello!. No repo content, no tool calls, and no file reads are in context when it fires." He is not the only frustrated customer. Many other bug reports have been filed in Anthropic's Claude Code GitHub repo since Fable 5 debuted. These include: [Bug] Fable 5 model safety filters causing false positives on benign messages #66587; Fable 5 refuses to assist with 'Application Security Architect resume' editing #66655; and [Feature Request] Allow Fable 5 usage for non-research lab management systems #67062, among others. On social outrage site X.com, Derya Unutmaz, an immunologist and professor at the Jackson Laboratory for Genomic Medicine, notes, "The word 'cancer' is flagged as a biosecurity risk by Claude Fable 5!" Similar complaints show up in Reddit threads. Fable 5 is unusual because Anthropic has chosen to conceal safety interventions that try to block rival frontier model development. The classifiers designed to catch cybersecurity, biology and chemistry, and distillation attempts fall back on the latest Claude Opus model and the user gets notified. But the counter-competition surveillance, per the company's system card [PDF], "will limit effectiveness through methods such as prompt modification, steering vectors, or parameter-efficient fine-tuning (PEFT)." "Prompt modification" without notice is functionally a man-in-the-middle attack, though one that Anthropic estimates "will impact ~0.03 percent of traffic, concentrated in fewer than 0.1 percent of organizations." As developer Clay Merritt fumes, "Anthropic's Fable 5 silently sabotages its answers when it detects AI/ML work. No refusal. No notice. Purposeful degradation invisible to the user." Anthropic expects cyber defenders and critical infrastructure providers to use its Claude Mythos 5 model, which shares the underlying model of Fable 5 but without the same safeguards. Doing so, however, requires participating in the company's Project Glasswing program or the trusted access program that's being rolled out for select biology researchers. Devon (last name withheld by request), founder of Abliteration.ai, a service that assists with model abliteration (guardrail removal), told The Register in a phone interview that while there's some degree of fearmongering and marketing hype coming from the big AI labs, it's also fair to say that there are legitimate concerns about how frontier models get used. "Anthropic's making a big bet on their brand that people will trust their brand so much they'll just deal with [refusals]," he said. "But in the long term, people are not just going to accept these companies that centralize control over their lives and what they can have information about." (R)