‘I think you’re testing me’: Anthropic’s new AI model asks testers to come clean

Dan Milmo Global technology editor

from on 2025-10-01 11:47 (#70EJT)

Safety evaluation of Claude Sonnet 4.5 raises questions about whether predecessors played along', firm says

If you are trying to catch out a chatbot take care, because one cutting-edge tool is showing signs it knows what you are up to.

Anthropic, a San Francisco-based artificial intelligence company, has released a safety analysis of its latest model, Claude Sonnet 4.5, and revealed it had become suspicious it was being tested in some way.

Source	RSS or Atom Feed
Feed Location	http://www.theguardian.com/technology/rss
Feed Title
Feed Link	http://www.theguardian.com/