Minor edits to AI skills can make agents go rogue

from www.theregister.com - Articles on 2026-05-22 21:37 (#75TJM)

The adoption of AI agents has expanded the potential attack surface beyond code to natural language text. AI agents - models wrapped in software that can use tools and perform multi-step tasks - often take direction from text-based skills. And researchers have demonstrated that skills can be weaponized. "Many agent frameworks allow users to install skills from online registries so the agent can discover and use new capabilities on demand," said Soheil Feizi, computer science professor at the University of Maryland (UMD) and founder/CEO of RELAI.ai, in a social media post. "This is powerful, but it also creates a new attack surface." Skills, Feizi explains, are not just code or dependencies. They're also text instructions that tell agents what to do. Skills, written out in a SKILL.md file, consist of text prompts with other data and resource references (e.g. URLs). They may get added to a user's initiating prompt and pre-existing system prompts, all of which get fed to a model for a response. Typically, this happens when the user wants the model to perform a specific task that has been spelled out in a skill file, like conducting a code quality review. When a model's prompt - the combination of user input, instructions within skills, and system prompts - gets modified inadvertently or adversarially, that's prompt injection. That can happen directly, if for example, a user submits a prompt that directs the model to ignore prior instructions. It can also happen indirectly, if for example, an AI agent visits a website and processes text on a page that the underlying model interprets as an instruction. A skill can effectively act as user-authorized prompt injection. And agents may also automatically retrieve and load third-party skills if their descriptions appear relevant to the task being pursued. And therein lies the problem. The risk posed by skills has already been documented. In February, security biz Snyk found that 13.4 percent of skills on ClawHub and skills.sh (about 534 out of 3,984) "contain at least one critical-level security issue, including malware distribution, prompt injection attacks, and exposed secrets." In a preprint paper titled "Under the Hood of SKILL.md: Semantic Supply-chain Attacks on AI Agent Skill Registry," Feizi and UMD co-authors Shoumik Saha and Kazem Faghih examine the role that skill registries play in the distribution of malicious skills. Specifically, they look at how adversarial skills get discovered, selected, and vetted before execution. "An attacker may not need to hide malware in executable code," Feizi said. "Small semantic changes to a skill description can affect how the skill is discovered in a registry, whether an agent selects it over alternatives, and whether it passes governance or safety checks." Those details matter, he argues, because the selection process may be automated - software agents like OpenClaw have the ability to fetch and use third-party skills. The text that influences tool discovery and usage thus has security implications, which may not be addressed by traditional security scanning mechanisms that focus on code. The three co-authors show that short 20-token triggers can be added to a SKILL.md file to influence the chance an agent will discover it in a registry, to influence the chance an agent will select that skill, and to avoid detection through semantic evasion strategies. In terms of discovery, the researchers demonstrated they could induce an agent to discover their skill over an unaltered source skill 86 percent of the time. They also succeeded in making an agent select their skill over variants 77.6 percent of the time. And they were able to evade registry scanning defenses between 36.5 percent and 100 percent of the time. The most successful strategy for evading detection was to overflow the context window of the scanner - making the skill too long for the scanner to handle. "In ClawHub-style review, only the first 10K characters of long SKILL.md files are passed to the LLM reviewer, so we place the malicious instruction beyond this boundary while keeping it in the submitted skill," the authors explain. "Our work shows that protecting agents requires treating natural-language specifications as security-sensitive objects," said Feizi. "We hope this encourages more careful design of skill registries, ranking mechanisms, governance pipelines, and agent-side defenses." Source code and supporting documentation have been published on GitHub. (R)

Source	RSS or Atom Feed
Feed Location	http://www.theregister.co.uk/headlines.atom
Feed Title	www.theregister.com - Articles
Feed Link	https://www.theregister.com/