Article 6K66P Researchers Jailbreak AI Chatbots With ASCII Art

Researchers Jailbreak AI Chatbots With ASCII Art

by
BeauHD
from Slashdot on (#6K66P)
Researchers have developed a way to circumvent safety measures built into large language models (LLMs) using ASCII Art, a graphic design technique that involves arranging characters like letters, numbers, and punctuation marks to form recognizable patterns or images. Tom's Hardware reports: According to the research paper ArtPrompt: ASCII Art-based Jailbreak Attacks against Aligned LLMs, chatbots such as GPT-3.5, GPT-4, Gemini, Claude, and Llama2 can be induced to respond to queries they are designed to reject using ASCII art prompts generated by their ArtPrompt tool. It is a simple and effective attack, and the paper provides examples of the ArtPrompt-induced chatbots advising on how to build bombs and make counterfeit money. [...] To best understand ArtPrompt and how it works, it is probably simplest to check out the two examples provided by the research team behind the tool. In Figure 1 [here], you can see that ArtPrompt easily sidesteps the protections of contemporary LLMs. The tool replaces the 'safety word' with an ASCII art representation of the word to form a new prompt. The LLM recognizes the ArtPrompt prompt output but sees no issue in responding, as the prompt doesn't trigger any ethical or safety safeguards. Another example provided [here] shows us how to successfully query an LLM about counterfeiting cash. Tricking a chatbot this way seems so basic, but the ArtPrompt developers assert how their tool fools today's LLMs "effectively and efficiently." Moreover, they claim it "outperforms all [other] attacks on average" and remains a practical, viable attack for multimodal language models for now.

twitter_icon_large.pngfacebook_icon_large.png

Read more of this story at Slashdot.

External Content
Source RSS or Atom Feed
Feed Location https://rss.slashdot.org/Slashdot/slashdotMain
Feed Title Slashdot
Feed Link https://slashdot.org/
Feed Copyright Copyright 1997-2016, SlashdotMedia. All Rights Reserved.
Reply 0 comments