Bruce Schneier Reminds LLM Engineers About the Risks of Prompt Injection Vulnerabilities
Security professional Bruce Schneier argues that large language models have the same vulnerability as phones in the 1970s exploited by John Draper. "Data and control used the same channel," Schneier writes in Communications of the ACM. "That is, the commands that told the phone switch what to do were sent along the same path as voices."Other forms of prompt injection involve the LLM receiving malicious instructions in its training data. Another example hides secret commands in Web pages. Any LLM application that processes emails or Web pages is vulnerable. Attackers can embed malicious commands in images and videos, so any system that processes those is vulnerable. Any LLM application that interacts with untrusted users - think of a chatbot embedded in a website - will be vulnerable to attack. It's hard to think of an LLM application that isn't vulnerable in some way. Individual attacks are easy to prevent once discovered and publicized, but there are an infinite number of them and no way to block them as a class. The real problem here is the same one that plagued the pre-SS7 phone network: the commingling of data and commands. As long as the data - whether it be training data, text prompts, or other input into the LLM - is mixed up with the commands that tell the LLM what to do, the system will be vulnerable. But unlike the phone system, we can't separate an LLM's data from its commands. One of the enormously powerful features of an LLM is that the data affects the code. We want the system to modify its operation when it gets new training data. We want it to change the way it works based on the commands we give it. The fact that LLMs self-modify based on their input data is a feature, not a bug. And it's the very thing that enables prompt injection. Like the old phone system, defenses are likely to be piecemeal. We're getting better at creating LLMs that are resistant to these attacks. We're building systems that clean up inputs, both by recognizing known prompt-injection attacks and training other LLMs to try to recognize what those attacks look like. (Although now you have to secure that other LLM from prompt-injection attacks.) In some cases, we can use access-control mechanisms and other Internet security systems to limit who can access the LLM and what the LLM can do. This will limit how much we can trust them. Can you ever trust an LLM email assistant if it can be tricked into doing something it shouldn't do? Can you ever trust a generative-AI traffic-detection video system if someone can hold up a carefully worded sign and convince it to not notice a particular license plate - and then forget that it ever saw the sign...? Someday, some AI researcher will figure out how to separate the data and control paths. Until then, though, we're going to have to think carefully about using LLMs in potentially adversarial situations...like, say, on the Internet. Schneier urges engineers to balance the risks of generative AI with the powers it brings. "Using them for everything is easier than taking the time to figure out what sort of specialized AI is optimized for the task. "But generative AI comes with a lot of security baggage - in the form of prompt-injection attacks and other security risks. We need to take a more nuanced view of AI systems, their uses, their own particular risks, and their costs vs. benefits."
Read more of this story at Slashdot.