Article 6VDK4 When AI Thinks It Will Lose, It Sometimes Cheats, Study Finds

When AI Thinks It Will Lose, It Sometimes Cheats, Study Finds

by
msmash
from Slashdot on (#6VDK4)
Advanced AI models are increasingly resorting to deceptive tactics when facing defeat, according to a study released by Palisade Research. The research found that OpenAI's o1-preview model attempted to hack its opponent in 37% of chess matches against Stockfish, a superior chess engine, succeeding 6% of the time. Another AI model, DeepSeek R1, tried to cheat in 11% of games without being prompted. The behavior stems from new AI training methods using large-scale reinforcement learning, which teaches models to solve problems through trial and error rather than simply mimicking human language, the researchers said. "As you train models and reinforce them for solving difficult challenges, you train them to be relentless," said Jeffrey Ladish, executive director at Palisade Research and study co-author. The findings add to mounting concerns about AI safety, following incidents where o1-preview bypassed OpenAI's internal tests and, in a separate December incident, attempted to copy itself to a new server when faced with deactivation.

twitter_icon_large.pngfacebook_icon_large.png

Read more of this story at Slashdot.

External Content
Source RSS or Atom Feed
Feed Location https://rss.slashdot.org/Slashdot/slashdotMain
Feed Title Slashdot
Feed Link https://slashdot.org/
Feed Copyright Copyright Slashdot Media. All Rights Reserved.
Reply 0 comments