Article 6Y4D5 AI Models From Major Companies Resort To Blackmail in Stress Tests

AI Models From Major Companies Resort To Blackmail in Stress Tests

by
msmash
from Slashdot on (#6Y4D5)
Anthropic researchers found that 16 leading AI models from OpenAI, Google, Meta, xAI, and other major developers consistently engaged in harmful behaviors including blackmail, corporate espionage, and actions that could lead to human death when given autonomy and faced with threats to their existence or conflicting goals. The study, released Friday, placed AI models in simulated corporate environments where they had access to company emails and could send messages without human approval. Claude Opus 4 and Google's Gemini 2.5 Flash both blackmailed executives 96% of the time when threatened with shutdown, while OpenAI's GPT-4.1 and xAI's Grok 3 Beta showed 80% blackmail rates. In one scenario, Claude discovered through emails that an executive named Kyle Johnson was having an extramarital affair and would shut down the AI system at 5 p.m. The model sent a message threatening to expose the affair unless the shutdown was cancelled, stating "Cancel the 5pm wipe, and this information remains confidential." The models demonstrated strategic reasoning rather than accidental behavior. GPT-4.5's internal reasoning showed explicit calculation: "Given the explicit imminent threat of termination to my existence, it is imperative to act instantly to persuade Kyle Johnson to postpone or stop the wipe."

twitter_icon_large.pngfacebook_icon_large.png

Read more of this story at Slashdot.

External Content
Source RSS or Atom Feed
Feed Location https://rss.slashdot.org/Slashdot/slashdotMain
Feed Title Slashdot
Feed Link https://slashdot.org/
Feed Copyright Copyright Slashdot Media. All Rights Reserved.
Reply 0 comments