{"id":352,"date":"2026-05-14T08:32:36","date_gmt":"2026-05-14T08:32:36","guid":{"rendered":"https:\/\/haco.club\/?p=352"},"modified":"2026-05-14T08:32:36","modified_gmt":"2026-05-14T08:32:36","slug":"sector-2025-security-and-safety-testing-for-agentic-ai","status":"publish","type":"post","link":"https:\/\/haco.club\/?p=352","title":{"rendered":"SecTor 2025 | Security and Safety Testing for Agentic AI"},"content":{"rendered":"\n<figure class=\"wp-block-embed is-type-video is-provider-youtube wp-block-embed-youtube wp-embed-aspect-16-9 wp-has-aspect-ratio\"><div class=\"wp-block-embed__wrapper\">\n<iframe loading=\"lazy\" title=\"SecTor 2025 | Security and Safety Testing for Agentic AI\" width=\"640\" height=\"360\" src=\"https:\/\/www.youtube.com\/embed\/tTp1uypVeCQ?feature=oembed\" frameborder=\"0\" allow=\"accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share\" referrerpolicy=\"strict-origin-when-cross-origin\" allowfullscreen><\/iframe>\n<\/div><\/figure>\n\n\n\n<p>&#8220;From Prompts to Plans: Security and Safety Testing for Agentic AI&#8221; by Jason Stanley:<\/p>\n\n\n\n<p><strong>The Core Problem: AI is Evolving Faster Than Our Testing Methods<\/strong><br>Stanley begins by highlighting the massive surge in AI adoption across enterprises. However, the nature of AI systems is fundamentally changing. We are moving away from simple, stateless &#8220;chat&#8221; interfaces (where a user inputs a prompt and gets a single reply) toward complex <strong>Agentic AI<\/strong>. These new agents have memory, access to external tools, complex architectures, and the ability to take multi-step actions in real environments.<\/p>\n\n\n\n<p>Consequently, the attack surface has expanded drastically, but testing has not kept up. Current testing methodologies are flawed because they:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Only check the &#8220;front door&#8221;:<\/strong> They focus on the initial prompt, ignoring the &#8220;seams&#8221; (like tools, memory, or external data injections).<\/li>\n\n\n\n<li><strong>Are stateless:<\/strong> They test single interactions rather than complex, multi-step trajectories.<\/li>\n\n\n\n<li><strong>Are context-unaware:<\/strong> Teams rely on generic, public benchmarks rather than testing the actual risks specific to their unique deployment.<\/li>\n\n\n\n<li><strong>Ignore the security vs. utility trade-off:<\/strong> Security tests are often run in isolation. Putting heavy &#8220;guardrails&#8221; on an agent might make it secure, but it often destroys its ability to successfully complete its intended tasks.<\/li>\n<\/ul>\n\n\n\n<p><strong>The Solution: A 3-Step Framework (Map, Test, Promote)<\/strong><br>To address these shortcomings, Stanley proposes a structured methodology for securing Agentic AI:<\/p>\n\n\n\n<p><strong>1. MAP (Build a Specific Threat Model)<\/strong><br>Instead of using generic internet taxonomies, teams must map the risks specific to their system. This involves defining:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Outcomes:<\/strong> What should the agent do, and what would be a disastrous outcome?<\/li>\n\n\n\n<li><strong>Architecture &amp; Surfaces:<\/strong> What tools, databases, and external connections does the agent touch?<\/li>\n\n\n\n<li><strong>Invariants:<\/strong> What are the absolute &#8220;never-do&#8221; rules (e.g., &#8220;never issue a refund over $200 without human approval&#8221;)?<\/li>\n<\/ul>\n\n\n\n<p><strong>2. TEST (Dual-Track Testing)<\/strong><br>Testing must be stateful and measure both risk and utility simultaneously. Stanley advocates for a two-pronged approach:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Context-Aware Benchmarks:<\/strong> To test for known vulnerabilities specific to your system.<\/li>\n\n\n\n<li><strong>Exploratory Search (Red Teaming):<\/strong> To hunt in the &#8220;dark corners&#8221; of the system for unknown vulnerabilities.<br>To facilitate this, ServiceNow open-sourced a testing framework called <strong>DoomArena<\/strong>. It allows developers to test their specific agents in their specific environments, measuring both the <strong>Attack Success Rate (ASR)<\/strong> and the <strong>Task Success Rate (TSR)<\/strong> side-by-side. This ensures that a security patch doesn&#8217;t accidentally break the agent&#8217;s usefulness.<\/li>\n<\/ul>\n\n\n\n<p><strong>3. PROMOTE (From Findings to Automated Tests)<\/strong><br>When red teaming uncovers a new vulnerability, it must be &#8220;promoted&#8221; into an automated regression test suite so the system can be continuously tested against it. However, Stanley warns against <strong>overfitting<\/strong>\u2014simply copy-pasting the exact malicious prompt into a test suite. Instead, teams should extract the <em>pattern<\/em> of the attack and build a generator that creates hundreds of variations of that attack. This prevents the defense from becoming &#8220;brittle.&#8221;<\/p>\n\n\n\n<p><strong>Key Takeaways:<\/strong><\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Test <em>your<\/em> specific system, not a generic internet benchmark.<\/li>\n\n\n\n<li>Formalize and automate exploratory red teaming.<\/li>\n\n\n\n<li>Always measure security (risk) and utility (task success) together.<\/li>\n\n\n\n<li>Promote discovered vulnerabilities into automated tests carefully to ensure robust, generalized defenses.<\/li>\n<\/ul>\n","protected":false},"excerpt":{"rendered":"<p>&#8220;From Prompts to Plans: Security and Safety Testing for Agentic [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[3],"tags":[35,5],"class_list":["post-352","post","type-post","status-publish","format-standard","hentry","category-black-hat","tag-llm","tag-security"],"_links":{"self":[{"href":"https:\/\/haco.club\/index.php?rest_route=\/wp\/v2\/posts\/352","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/haco.club\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/haco.club\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/haco.club\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/haco.club\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=352"}],"version-history":[{"count":1,"href":"https:\/\/haco.club\/index.php?rest_route=\/wp\/v2\/posts\/352\/revisions"}],"predecessor-version":[{"id":353,"href":"https:\/\/haco.club\/index.php?rest_route=\/wp\/v2\/posts\/352\/revisions\/353"}],"wp:attachment":[{"href":"https:\/\/haco.club\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=352"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/haco.club\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=352"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/haco.club\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=352"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}