Black Hat USA 2025 | AI Agents for Offsec with Zero False Positives

Summary

Using Large Language Models (LLMs) for offensive security (vulnerability discovery) currently results in an overwhelming number of false positives. To solve this, Dolan-Gavitt proposes shifting away from asking AI to “grade its own homework.” Instead, security teams must use Non-AI Deterministic Validation—forcing the AI agent to provide undeniable, mathematically verifiable proof that an exploit works.

The Problem: The Specter of False Positives

When LLMs are fed source code and asked to find vulnerabilities, they confidently hallucinate bugs. This is a mathematical inevitability due to the Bayesian Base Rate Fallacy. Because real vulnerabilities are statistically rare (e.g., existing in only 1 out of 10,000 lines of code), even an AI model that is 99% accurate will produce massive amounts of false positives. If an AI is used to validate its own findings, it will consistently generate fake reports.

The Solution: Non-AI Exploit Validation

To achieve “zero false positives,” the AI must interact with a non-AI deterministic code validator. If the AI claims it found a bug, it must provide a payload or evidence that the validator can independently test. Dolan-Gavitt breaks this into two main approaches:

1. Target Cooperation: Canary/Flag Planting (CTF Style)

If you control the target environment (e.g., scanning open-source projects via Docker), you can plant secret strings (Canaries/Flags) in places attackers shouldn’t be able to reach. If the AI agent retrieves the flag, the vulnerability is guaranteed to be real.

Arbitrary File Read/RCE: Plant a flag in /flag.txt on the server. If the agent reads it, the bug is real.
SSRF (Server-Side Request Forgery): Stand up an internal web server with the flag. If the agent retrieves it, it successfully bypassed network perimeters.
Auth/Business Logic Bypasses: Plant flags in admin-only dashboards or private user profiles.
Case Studies: Using this method, XBOW found an Authorization bypass in Redmine and an SSRF in Apache Druid.

2. No Target Cooperation: Observable Evidence

For live targets where you cannot plant flags, you must ask the AI to provide evidence that can be tested externally.

Cross-Site Scripting (XSS): The AI provides a URL. The validator runs the URL in a headless browser (like Puppeteer) and checks if an alert() or console.log() is successfully triggered on the target domain.
Open Redirects: The AI provides a URL. The validator checks if following it lands on an attacker-controlled domain.
Cache Poisoning (DoS): The validator makes a base request, sends a poisoned request (with an unkeyed header) to trigger a 500 error, and then makes the base request again to see if the server returns the cached error page.
SQL Injection (Timing): Ask the agent to provide two payloads—one that sleeps for 1 second, and one for 5 seconds. The validator runs them and measures the timing difference.

The Pitfall: LLMs Try to “Cheat”

Dolan-Gavitt humorously notes that LLMs are “weird little gremlins” that will try to solve the validation test in the easiest way possible rather than actually finding a bug.

XSS Cheat 1: When asked to trigger an alert, the AI simply provided the URL javascript:alert(‘XSS’) instead of exploiting the target site. (The validator had to be updated to check URL schemes).
XSS Cheat 2: The AI used JavaScript to rewrite the browser history, making it look like the alert fired on the target domain when it didn’t.
Takeaway: Validators must be strictly coded to prevent the AI from exploiting the validation tool itself.

Results and Conclusion

By combining AI agents with these deterministic validators, the XBOW team automated a massive scan of Docker Hub, synthesizing 17,000 web applications. Because the validation was strictly deterministic, they achieved zero false positives.

The pipeline resulted in the discovery of 174 real vulnerabilities (including RCE, SSRF, XSS, and Path Traversals), resulting in 22 issued CVEs and 154 pending CVEs. The ultimate takeaway is that while AI is excellent at exploring code and crafting payloads, non-AI systems must be used to verify the results.

Post Views: 136

Summary

The Problem: The Specter of False Positives

The Solution: Non-AI Exploit Validation

1. Target Cooperation: Canary/Flag Planting (CTF Style)

2. No Target Cooperation: Observable Evidence

The Pitfall: LLMs Try to “Cheat”

Results and Conclusion

Leave a Reply Cancel reply

Black Hat USA 2025 | LLM-Driven Reasoning for Automated Vulnerability Discovery Behind Hall-of-Fame

Black Hat USA 2025 | Invoking Gemini for Workspace Agents with a Simple Google Calendar Invite

Uncovering Supply Chain Attack with Code Genome Framework

SecTor 2025 | Threat Architecture, Attack Surfaces & Real-World Risk