‘Many-shot jailbreak’: lab reveals how AI safety features can be easily bypassed

Alex Hern UK technology editor

from Technology | The Guardian on 2024-04-03 14:16 (#6KT9G)

Paper by Anthropic outlines how LLMs can be forced to generate responses to potentially harmful requests

The safety features on some of the most powerful AI tools that stop them being used for cybercrime or terrorism can be bypassed simply by flooding them with examples of wrongdoing, research has shown.

In a paper from the AI lab Anthropic, which produces the large language model (LLM) behind the ChatGPT rival Claude, researchers described an attack they called many-shot jailbreaking". The attack was as simple as it was effective.

Source	RSS or Atom Feed
Feed Location	http://www.theguardian.com/technology/rss
Feed Title	Technology \| The Guardian
Feed Link	https://www.theguardian.com/us/technology
Feed Copyright	Guardian News and Media Limited or its affiliated companies. All rights reserved. 2026