Black Hat USA 2025 | Training Specialist Models: Automating Malware Development

"Training Specialist Models: Automating Malware Development" explores how small, specialized Large Language Models (LLMs) can be trained to outperform massive generalist models in specific, highly technical tasks—specifically, the creation of evasive malware. Here is a summary of the key points: The Problem with Current ModelsAvery identifies a gap in the current AI landscape for offensive security professionals: Large Generalists (OpenAI, Anthropic): These models are highly capable but come with privacy concerns, high costs, and strict safety filters (refusals) that make them difficult to automate for red teaming. Small Local Models…

Black Hat USA 2025 | Watching the Watchers: Exploring and Testing Defenses of Anti-Cheat Systems

Introduction to the Anti-Cheat Ecosystem The World of Game Cheats: The speakers explore the fast-paced, high-stakes battleground between cheat developers (attackers) and anti-cheat systems (defenders) in modern competitive shooter games []. The Cheat Economy: Cheating is a massive industry. Cheats are often sold via subscription models by well-run, sometimes legally registered companies, with some cheats costing upwards of $200 a month []. Because it is so lucrative, the attack-defense cycle is incredibly rapid. The Shift to the Kernel: Historically, cheats operated in user mode. As anti-cheats adapted, the…

How to Think About TPUs

Part 2 of  ( | ) This section is all about how TPUs work, how they're networked together to enable multi-chip training and inference, and how this affects the performance of our favorite algorithms. There's even some good stuff for GPU users too! What Is a TPU? A TPU is basically a compute core that specializes in matrix multiplication (called a TensorCore) attached to a stack of fast memory (called high-bandwidth memory or HBM) [1]. Here’s a diagram: Figure: the basic components of a TPU chip. The TensorCore is the gray left-hand box,…

Executable Exports Symbols

There are actually several critical scenarios where an executable must export symbols. The confusion usually lies in the direction of the linking. You are right that Executable A rarely links dynamically to Executable B to call functions inside B. However, the reverse happens frequently: Dynamic Libraries (Plugins) loaded by Executable A often need to call functions inside Executable A. Here are the specific reasons why an executable needs to keep exported symbols: 1. The "Host-Plugin" Architecture (Most Common) This is the primary reason. If your executable supports plugins…

Tailcall in AArch64

In AArch64 (ARM64), for a tail call to work, the current function must tear down its own stack frame before branching to the next function. If it didn't, the stack would grow infinitely with every tail call, causing a stack overflow. Here is exactly how the "reuse" works at the assembly level, step-by-step. 1. The Standard Mechanism In a normal return, a function ends with an epilogue that restores registers and the stack pointer, followed by a ret instruction. In a tail call, the compiler generates a special…

AFL_SKIP_BIN_CHECK

export AFL_SKIP_BIN_CHECK=1 is an environment variable setting that tells AFL++ to stop complaining that your target program doesn't look like it was compiled with AFL. By default, AFL++ checks your target binary for specific "instrumentation" markers before it starts. If it doesn't find them, it assumes you made a mistake (like compiling with gcc instead of afl-cc) and refuses to run to save you from wasting time. When should you use this? You generally should not use this unless you know exactly why. However, here are the valid…

LDR vs. LDUR in AArch64

In AArch64 (ARMv8-A), the main difference between LDR and LDUR is how they handle the immediate offset from the base address. LDR (Load Register): Uses a scaled positive immediate offset. It is the standard instruction for loading data from validly aligned structures and arrays. LDUR (Load Register Unscaled): Uses an unscaled signed immediate offset. It is used for accessing data at negative offsets or unaligned addresses that LDR cannot reach. Here is a detailed breakdown of the differences: 1. Offset Scaling LDR (Scaled): The immediate value you provide…

GDB Usage

Check memory layout To check the memory layout of a binary in GDB, you can use different commands depending on whether the program is currently running or if you are just inspecting the static binary file. 1. If the Program is Running The best command to see the virtual memory mappings (including the heap, stack, and loaded libraries) is: info proc mappings What it shows: Start/End Addr: The virtual address range. Size: The size of the mapped region. Offset: Offset into the file (if file-backed). Objfile: The specific…

The difference of overflow and underflow

In computer science—and specifically in fuzzing and exploitation—the terms Overflow and Underflow mean different things depending on whether you are talking about Numbers (Arithmetic) or Memory (Buffers). Here is the breakdown of the differences. 1. Arithmetic (Integer) Context This refers to the value of a number going beyond what the variable type can hold. Integer Overflow (Too Big) Occurs when you try to store a value larger than the maximum limit. The value "wraps around" to the minimum. Analogy: A car odometer at 999,999 rolling over to 000,000.…

Memory Layout(global data, code, stack, heap, etc) with TLS

On AArch64 (ARM64), the memory layout for Thread Local Storage (TLS) follows TLS Variant 1. This is distinct from x86_64 (which uses Variant 2). The key difference is the location of the TLS data relative to the thread pointer. 1. The High-Level View (Process Memory) For a standard Linux process on AArch64, the memory is laid out as follows (from Low Address to High Address): +----------------------+ <-- High Address (e.g., 0x0000ffff...) | Stack | (Main Thread Stack, grows DOWN) +----------------------+ | ... | | Memory Mapping | <--…