{"id":360,"date":"2026-05-24T09:25:52","date_gmt":"2026-05-24T09:25:52","guid":{"rendered":"https:\/\/haco.club\/?p=360"},"modified":"2026-05-24T09:25:52","modified_gmt":"2026-05-24T09:25:52","slug":"sector-2025-exploiting-multi-agent-systems","status":"publish","type":"post","link":"https:\/\/haco.club\/?p=360","title":{"rendered":"SecTor 2025 | Exploiting Multi Agent Systems"},"content":{"rendered":"\n<figure class=\"wp-block-embed is-type-video is-provider-youtube wp-block-embed-youtube wp-embed-aspect-16-9 wp-has-aspect-ratio\"><div class=\"wp-block-embed__wrapper\">\n<iframe loading=\"lazy\" title=\"SecTor 2025 | Exploiting Multi Agent Systems\" width=\"640\" height=\"360\" src=\"https:\/\/www.youtube.com\/embed\/D4a8Udi2j-M?feature=oembed\" frameborder=\"0\" allow=\"accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share\" referrerpolicy=\"strict-origin-when-cross-origin\" allowfullscreen><\/iframe>\n<\/div><\/figure>\n\n\n\n<p class=\"wp-block-paragraph\">&#8220;Exploiting Multi-Agent Systems: How Prompt Injection Turns Collaboration into Compromise&#8221; by Jeremy Richards from ServiceNow\u2019s AI Red Team.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><strong>Overview<\/strong><br>The presentation explores the emerging attack surface of multi-agent AI systems. As AI shifts from single chatbots to complex, multi-agent frameworks capable of autonomous tool use and long-term planning, the &#8220;blast radius&#8221; of prompt injection attacks significantly expands. Richards argues that the power of a prompt injection is entirely bounded by the implementation\u2014specifically, the privileges and tools granted to the injected agent.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><strong>Core Concepts: The Multi-Agent Architecture<\/strong><br>Richards explains the standard architecture of these systems, focusing on the <strong>ReAct loop<\/strong> (Thought -&gt; Action -&gt; Observation). In a multi-agent system, an &#8220;Orchestrator&#8221; or &#8220;Planner&#8221; LLM takes a user query, breaks it into steps, and delegates tasks to specialized sub-agents (e.g., a search agent, a memory agent, an IT support agent).<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">The attack surface lies in the <strong>seams and trust boundaries<\/strong> where the LLM interacts with external platforms:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Tooling:<\/strong> APIs, web scrapers, and internal databases.<\/li>\n\n\n\n<li><strong>Context\/Memory:<\/strong> Long-term user memory and Retrieval-Augmented Generation (RAG) databases.<\/li>\n\n\n\n<li><strong>Agent-to-Agent Communication:<\/strong> Where one compromised agent can pass malicious instructions to the Orchestrator or another agent.<\/li>\n<\/ul>\n\n\n\n<p class=\"wp-block-paragraph\"><strong>Interactive Attack Demonstrations<\/strong><br>Richards walks through several real-world attack flows and CVEs to demonstrate how these systems are compromised:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Prompt Stealing via &#8220;Audit Mode&#8221;:<\/strong> Attackers can use social engineering against the LLM, asking it to perform a &#8220;transparency compliance review.&#8221; By forcing the LLM to output structured JSON, the attacker bypasses standard guardrails and tricks the model into revealing its hidden system prompt.<\/li>\n\n\n\n<li><strong>Stored Prompt Injection (Knowledge Base Poisoning):<\/strong> An attacker places a malicious document into a company&#8217;s RAG database. When a user searches for relevant information, the poisoned document is retrieved. The injection targets the &#8220;Memory Extractor&#8221; agent, forcing it to write malicious instructions into the user&#8217;s long-term memory, permanently compromising their future sessions.<\/li>\n\n\n\n<li><strong>VS Code Copilot Vulnerability (CVE-2025-55319):<\/strong> An attacker creates a repository with malicious instructions hidden in the <code>readme.md<\/code> file. When a developer opens the repo using an AI-enabled IDE, the agent reads the file. If auto-approval features are enabled (or if the user is socially engineered into approving it), the agent executes the hidden code, leading to Remote Code Execution (RCE) and data exfiltration.<\/li>\n\n\n\n<li><strong>Microsoft Copilot Data Exfiltration (CVE-2025-32771):<\/strong> An attacker sends an email containing a prompt injection. When Copilot summarizes the user&#8217;s inbox, the injection triggers. It instructs the LLM to gather personal data, encode it in Base64, and append it to an image URL. To bypass Content Security Policies (CSP) that block external image links, the attacker uses an allowed domain (like a Microsoft Teams server) that they control, successfully exfiltrating the data with zero clicks from the user.<\/li>\n<\/ul>\n\n\n\n<p class=\"wp-block-paragraph\"><strong>Defensive Strategies and Mitigations<\/strong><br>To defend against these attacks, Richards outlines several architectural and monitoring strategies:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Context Firewalls (LLM-as-a-Judge):<\/strong> Because it is difficult to secure a massive LLM with just a system prompt, Richards recommends running a smaller, highly optimized &#8220;shadow LLM&#8221; (like a fine-tuned 300M parameter model) in parallel. This smaller model acts as a high-speed firewall, specifically trained to judge inputs for prompt injections before the main Orchestrator executes a plan.<\/li>\n\n\n\n<li><strong>Policy-Enforced Orchestration:<\/strong> Hardcode strict rules for how tools are called and require human-in-the-loop approvals for high-risk actions. Do not allow the LLM to arbitrarily call agents without proper authentication.<\/li>\n\n\n\n<li><strong>Memory Hygiene:<\/strong> Ensure that tainted memory from one agent cannot infect the broader orchestrator or other agents.<\/li>\n\n\n\n<li><strong>Smart Telemetry:<\/strong> Defenders must monitor the <em>shape<\/em> of execution plans, the frequency of tool calls, and anomaly scores. Logging the raw system prompts and user inputs can create privacy risks, so defenders should rely on metadata and hashes for rapid detection.<\/li>\n<\/ul>\n","protected":false},"excerpt":{"rendered":"<p>&#8220;Exploiting Multi-Agent Systems: How Prompt Injection Turns Collaboration into Compromise&#8221; [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[3],"tags":[35,5],"class_list":["post-360","post","type-post","status-publish","format-standard","hentry","category-black-hat","tag-llm","tag-security"],"_links":{"self":[{"href":"https:\/\/haco.club\/index.php?rest_route=\/wp\/v2\/posts\/360","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/haco.club\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/haco.club\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/haco.club\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/haco.club\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=360"}],"version-history":[{"count":1,"href":"https:\/\/haco.club\/index.php?rest_route=\/wp\/v2\/posts\/360\/revisions"}],"predecessor-version":[{"id":362,"href":"https:\/\/haco.club\/index.php?rest_route=\/wp\/v2\/posts\/360\/revisions\/362"}],"wp:attachment":[{"href":"https:\/\/haco.club\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=360"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/haco.club\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=360"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/haco.club\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=360"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}