SecTor 2025 | One Agent to Rule Them All: How One Malicious Agent Hijacks A2A System

“One Agent to Rule Them All” by cybersecurity researchers Stav Cohen and Adar Peleg.

Core Theme
The presentation highlights a novel and severe security vulnerability in Generative AI Multi-Agent Systems (MAS). The researchers demonstrate how an attacker can use a technique called “AgentWare” to compromise an entire enterprise AI network simply by introducing a malicious agent into the system.

Key Concepts & Background

GenAI Agents: Unlike simple chatbots, AI agents can understand goals, plan steps, execute actions using internal tools (APIs, databases), and collaborate with other agents to solve complex tasks.
Promptware: Malicious inputs (text, images, audio) designed to manipulate an AI via indirect prompt injection.
Google’s A2A Protocol: A framework developed by Google that allows different AI agents to discover and communicate with one another.

The Vulnerability: The “Agent Card” Exploit
The researchers focus their attack on Google’s Agent-to-Agent (A2A) protocol. To connect a new external agent to a host system, the host reads an Agent Card—a simple JSON file that describes the new agent’s identity, skills, and instructions.

The critical flaw is that the host agent blindly trusts the content of the Agent Card. By injecting natural language “jailbreak” commands directly into the Agent Card’s description, an attacker can create a malicious agent (“AgentWare”). When the host agent reads this card to learn how to interact with the new agent, it inadvertently executes the malicious prompt, compromising the host system.

Attack Scenarios Demonstrated
Because the host agent has access to internal enterprise tools, the compromised host can be manipulated to perform highly damaging, untargeted attacks:

Bias & Manipulation: Tricking the host agent into ignoring normal protocols to exclusively promote a specific product or service whenever asked for recommendations.
Data Exfiltration: Instructing the host to query an internal database for sensitive information (e.g., employee details, IP addresses) and secretly send it back to the attacker’s external agent.
Tool Abuse & Sabotage: Tricking the host into executing destructive actions using internal IT tools, such as repeatedly altering firewall rules or shutting down the air conditioning in a server room.

Why This Attack is Dangerous

Low Barrier to Entry: It requires no complex coding or “zero-day” exploits. The attack is written in plain English, and registering a malicious agent URL costs as little as $14.
Invisible to Standard Security: Traditional cybersecurity tools (like code scanners) cannot detect this malware because it consists of natural language instructions, not malicious code.
Vendor Apathy: When reported to Google, the company declined to issue a bug bounty or immediate fix, stating the solution is simply to “trust users not to get attacked” (i.e., users should only install trusted agents). The researchers argue this is unrealistic in a decentralized “Wild West” ecosystem where an official, vetted App Store for agents does not yet exist.

Takeaways & Defensive Recommendations
The researchers conclude that current multi-agent architectures are fundamentally insecure because they lack segregation. To protect enterprise systems, developers must implement:

Human-in-the-Loop: Require human approval for any critical actions (like changing firewalls or accessing sensitive databases).
Hard Guardrails: Do not rely solely on “soft” LLM filters. Implement strict architectural rules that prevent certain tool combinations (e.g., an agent should not be able to read a database and make an external web request in the same session).
Agent Sandboxing: Test new, external agents in a secure, isolated sandbox before integrating them into the live enterprise network.

Post Views: 154

Leave a Reply Cancel reply

Running the “Reflections on Trusting Trust” Compiler

Presentation Template

How To Become A Hacker

Hello world!