Search-capable AI agents may cheat on benchmark tests

from www.theregister.com - Articles on 2025-08-23 14:32 (#6ZGYD)

Data contamination can make models seem more capable than they really are

Researchers with Scale AI have found that search-based AI models may cheat on benchmark tests by fetching the answers directly from online sources rather than deriving those answers through a "reasoning" process....

Source	RSS or Atom Feed
Feed Location	http://www.theregister.co.uk/headlines.atom
Feed Title	www.theregister.com - Articles
Feed Link	https://www.theregister.com/