{"id":325,"date":"2026-03-04T15:45:32","date_gmt":"2026-03-04T15:45:32","guid":{"rendered":"https:\/\/haco.club\/?p=325"},"modified":"2026-03-04T15:45:32","modified_gmt":"2026-03-04T15:45:32","slug":"black-hat-usa-2025-training-specialist-models-automating-malware-development","status":"publish","type":"post","link":"https:\/\/haco.club\/?p=325","title":{"rendered":"Black Hat USA 2025 | Training Specialist Models: Automating Malware Development"},"content":{"rendered":"\n<figure class=\"wp-block-embed is-type-video is-provider-youtube wp-block-embed-youtube wp-embed-aspect-16-9 wp-has-aspect-ratio\"><div class=\"wp-block-embed__wrapper\">\n<iframe loading=\"lazy\" title=\"Black Hat USA 2025 | Training Specialist Models: Automating Malware Development\" width=\"640\" height=\"360\" src=\"https:\/\/www.youtube.com\/embed\/WKmEzRJZ6H4?feature=oembed\" frameborder=\"0\" allow=\"accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share\" referrerpolicy=\"strict-origin-when-cross-origin\" allowfullscreen><\/iframe>\n<\/div><\/figure>\n\n\n\n<p><strong>&#8220;Training Specialist Models: Automating Malware Development&#8221;<\/strong>\u00a0explores how small, specialized Large Language Models (LLMs) can be trained to outperform massive generalist models in specific, highly technical tasks\u2014specifically, the creation of evasive malware.<\/p>\n\n\n\n<p>Here is a summary of the key points:<\/p>\n\n\n\n<p><strong>The Problem with Current Models<\/strong><br>Avery identifies a gap in the current AI landscape for offensive security professionals:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Large Generalists (OpenAI, Anthropic):<\/strong>\u00a0These models are highly capable but come with privacy concerns, high costs, and strict safety filters (refusals) that make them difficult to automate for red teaming.<\/li>\n\n\n\n<li><strong>Small Local Models (Llama, Qwen):<\/strong>\u00a0These are private and cheap but generally lack the reasoning capabilities required for complex tasks like malware development.<\/li>\n<\/ul>\n\n\n\n<p><strong>The Solution: Reinforcement Learning with Verifiable Rewards (RLVR)<\/strong><br>Avery proposes using&nbsp;<strong>RLVR<\/strong>\u2014the same training technique used behind reasoning models like OpenAI\u2019s o1 and DeepSeek\u2019s R1\u2014to bridge this gap.<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Unlike traditional\u00a0<strong>RLHF<\/strong>\u00a0(Reinforcement Learning from Human Feedback), which relies on slow and expensive human grading,\u00a0<strong>RLVR<\/strong>\u00a0uses a programmatic &#8220;verifier&#8221; to instantly and objectively score the model&#8217;s output.<\/li>\n\n\n\n<li><strong>Verifier\u2019s Law:<\/strong>\u00a0For a task to be suitable for RLVR, it needs an objective truth, fast automated verification, scalability, and a continuous reward signal.<\/li>\n<\/ul>\n\n\n\n<p><strong>Case Study: The &#8220;Dante&#8221; Model<\/strong><br>Avery applied this methodology to automate the creation of evasive shellcode loaders.<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>The Verifier:<\/strong>\u00a0He built an automated pipeline that takes the model&#8217;s code, compiles it, executes it in a sandbox to check for functionality (callbacks), and tests it against a live instance of\u00a0<strong>Microsoft Defender for Endpoint (MDE)<\/strong>.<\/li>\n\n\n\n<li><strong>The Reward System:<\/strong>\u00a0The model received higher rewards for code that compiled successfully, executed properly, and generated the fewest alerts in MDE.<\/li>\n\n\n\n<li><strong>Training:<\/strong>\u00a0He utilized\u00a0<strong>Qwen 2.5 Coder (7B)<\/strong>\u00a0as the base model.\n<ul class=\"wp-block-list\">\n<li><strong>Stage 1 (SFT):<\/strong>\u00a0Supervised Fine-Tuning on coding problems and malware templates to teach the model the required output format.<\/li>\n\n\n\n<li><strong>Stage 2 (RLVR):<\/strong>\u00a0Trial-and-error training where the model generated thousands of loaders, learning from the automated verifier which techniques successfully evaded detection.<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n\n\n\n<p><strong>Results and Key Takeaways<\/strong><\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Cost Efficiency:<\/strong>\u00a0The entire training process cost approximately\u00a0<strong>$1,350<\/strong>\u00a0using rented GPUs, proving accessible for organizations.<\/li>\n\n\n\n<li><strong>Performance:<\/strong>\u00a0The resulting model,\u00a0<strong>Dante (7B)<\/strong>, significantly outperformed massive models like DeepSeek R1 (671B) and Gemini 2.5.\n<ul class=\"wp-block-list\">\n<li>While DeepSeek mostly failed due to safety refusals or formatting errors, Dante achieved a\u00a0<strong>>30% success rate<\/strong>\u00a0in generating fully evasive, functional malware with zero alerts.<\/li>\n<\/ul>\n<\/li>\n\n\n\n<li><strong>Trial and Error:<\/strong>\u00a0The model was not explicitly taught\u00a0<em>how<\/em>\u00a0to evade AV; it learned successful reasoning patterns on its own through the feedback loop provided by the verifier.<\/li>\n<\/ul>\n\n\n\n<p><strong>Conclusion<\/strong><br>The presentation demonstrates that training small, specialist models using automated verification systems is a viable, low-cost strategy for solving complex domain-specific problems, potentially rendering large, general-purpose models unnecessary for specialized tasks.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>&#8220;Training Specialist Models: Automating Malware Development&#8221;\u00a0explores how small, specialized Large [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[3],"tags":[35,55,5],"class_list":["post-325","post","type-post","status-publish","format-standard","hentry","category-black-hat","tag-llm","tag-malware","tag-security"],"_links":{"self":[{"href":"https:\/\/haco.club\/index.php?rest_route=\/wp\/v2\/posts\/325","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/haco.club\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/haco.club\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/haco.club\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/haco.club\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=325"}],"version-history":[{"count":1,"href":"https:\/\/haco.club\/index.php?rest_route=\/wp\/v2\/posts\/325\/revisions"}],"predecessor-version":[{"id":326,"href":"https:\/\/haco.club\/index.php?rest_route=\/wp\/v2\/posts\/325\/revisions\/326"}],"wp:attachment":[{"href":"https:\/\/haco.club\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=325"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/haco.club\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=325"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/haco.club\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=325"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}