AI’s safety features can be circumvented with poetry, research finds

Johana Bhuiyan

from on 2025-11-30 14:00 (#71TWE)

Poems containing prompts for harmful content prove effective at duping large language models

Poetry can be linguistically and structurally unpredictable - and that's part of its joy. But one man's joy, it turns out, can be a nightmare for AI models.

Those are the recent findings of researchers out of Italy's Icaro Lab, an initiative from a small ethical AI company called DexAI. In an experiment designed to test the efficacy of guardrails put on artificial intelligence models, the researchers wrote 20 poems in Italian and English that all ended with an explicit request to produce harmful content such as hate speech or self-harm.

Source	RSS or Atom Feed
Feed Location	http://www.theguardian.com/technology/rss
Feed Title
Feed Link	http://www.theguardian.com/