Black Hat USA 2025 | Reinventing Agentic AI Security With Architectural Controls

“When Guardrails Aren’t Enough: Reinventing Agentic AI Security With Architectural Controls,” delivered by David Brauchler III from NCC Group.

The Core Thesis

The central argument of the presentation is that guardrails are not security boundaries. Much like Web Application Firewalls (WAFs) in the early days of the internet, AI guardrails are merely statistical heuristics. They reduce risk but do not provide “hard” security guarantees and can always be bypassed by a determined attacker. As AI evolves into “agentic” systems—where models can execute tool calls, read databases, and take actions—relying solely on guardrails is a recipe for disaster.

The Root Cause of AI Vulnerabilities

In traditional software, trust is clearly defined (e.g., admins have high trust, standard users have low trust). However, Large Language Models (LLMs) inherently lack these boundaries.

  • Trust Inheritance: An LLM is only as trustworthy as the least trusted input it receives. If an LLM ingests data from a third-party website, an untrusted user prompt, or a poisoned database via Retrieval-Augmented Generation (RAG), the model itself becomes untrusted.
  • Pollution Flows Downstream: Attackers can inject malicious instructions (prompt injection) into data that an LLM will eventually read. The LLM then executes the attacker’s payload, leading to severe vulnerabilities.

Real-World Exploits Demonstrated:
To prove this, Brauchler showcased two severe attacks:

  1. Remote Code Execution (RCE): By feeding malicious context to an AI developer assistant, attackers gained access to a Kubernetes cluster, Azure storage secrets, and internal source code.
  2. Data Exfiltration via RAG (Cross-User Prompt Injection): An attacker poisoned their own user profile. When an admin asked their AI assistant to summarize the attacker’s profile, the poisoned data hijacked the admin’s LLM, instructing it to quietly read the admin’s sensitive files (containing passwords) and exfiltrate them to an external server.

Architectural Solutions (Mitigation Strategies)

Because guardrails fail, organizations must build “zero-trust” architectural controls directly into the application layer surrounding the LLM. Brauchler outlined four key strategies:

  1. Trust Binding (Pinning):
    Do not allow the LLM to independently dictate its own permissions. Instead, the backend application should “pin” the user’s authentication token (e.g., a JWT) to any tool call the LLM makes. This ensures the LLM can never perform an action that the human prompting it doesn’t already have permission to do.
  2. I/O Synchronization (Preventing Operator Evasion):
    A hijacked LLM can lie to a “human-in-the-loop.” It might ask the user to approve “buying a raincoat” but secretly send a backend tool call to “purchase 100 attacker books.” Applications must enforce strict synchronization, ensuring the backend only executes the exact parameters the user visually approved.
  3. Trust Splitting:
    Instead of using one LLM for everything, split tasks between multiple models. Route highly sensitive actions (e.g., deleting accounts, transferring funds) to a “High-Trust LLM” that is never exposed to untrusted data. Route untrusted data processing (e.g., summarizing public web pages) to a “Low-Trust LLM” that has no access to critical backend tools.
  4. Trust Isolation (Data Masking):
    If a high-trust LLM must process a prompt that contains untrusted data, the application layer should mask that data before it hits the LLM’s context window. Replace the untrusted data with a static placeholder, allowing the LLM to plan the action safely without being exposed to a potential prompt injection payload.

Rethinking AI Threat Modeling

To secure AI applications, security teams need to update their threat modeling approaches:

  • Trust Flow Tracking: Map exactly where untrusted data enters the system and track its flow to see if it ever touches an LLM that has access to sensitive data sinks or high-privilege tools.
  • Models as Threat Actors (MATA): When building data flow diagrams, security teams should literally replace the LLM icon with a “Threat Actor” icon. If that threat actor could compromise the system using the tools connected to it, the architecture is flawed and needs stricter controls.

Conclusion

As AI systems are granted more agency to take actions on behalf of users, the attack surface expands exponentially. Security professionals must stop relying on prompt-engineering and AI guardrails for protection, and instead enforce strict, traditional application security boundaries—segmentation, least privilege, and zero-trust—around the AI models.


PS: Zero-Trust is a cybersecurity framework based on one core principle: “Never trust, always verify.”

Unlike traditional security models that assume everything already inside a network is safe (the “castle-and-moat” approach), zero-trust assumes that threats are everywhere—both outside and inside the network. No user, device, or application is trusted by default.

Leave a Reply

Your email address will not be published. Required fields are marked *