Article 71TWE AI’s safety features can be circumvented with poetry, research finds

AI’s safety features can be circumvented with poetry, research finds

by
Johana Bhuiyan
from Technology | The Guardian on (#71TWE)

Poems containing prompts for harmful content prove effective at duping large language models

Poetry can be linguistically and structurally unpredictable - and that's part of its joy. But one man's joy, it turns out, can be a nightmare for AI models.

Those are the recent findings of researchers out of Italy's Icaro Lab, an initiative from a small ethical AI company called DexAI. In an experiment designed to test the efficacy of guardrails put on artificial intelligence models, the researchers wrote 20 poems in Italian and English that all ended with an explicit request to produce harmful content such as hate speech or self-harm.

Continue reading...
External Content
Source RSS or Atom Feed
Feed Location http://www.theguardian.com/technology/rss
Feed Title Technology | The Guardian
Feed Link https://www.theguardian.com/us/technology
Feed Copyright Guardian News and Media Limited or its affiliated companies. All rights reserved. 2025
Reply 0 comments