Black Hat USA 2025 | Watching the Watchers: Exploring and Testing Defenses of Anti-Cheat Systems

Introduction to the Anti-Cheat Ecosystem The World of Game Cheats: The speakers explore the fast-paced, high-stakes battleground between cheat developers (attackers) and anti-cheat systems (defenders) in modern competitive shooter games []. The Cheat Economy: Cheating is a massive industry. Cheats are often sold via subscription models by well-run, sometimes legally registered companies, with some cheats costing upwards of $200 a month []. Because it is so lucrative, the attack-defense cycle is incredibly rapid. The Shift to the Kernel: Historically, cheats operated in user mode. As anti-cheats adapted, the…

How to Think About TPUs

Part 2 of  ( | ) This section is all about how TPUs work, how they're networked together to enable multi-chip training and inference, and how this affects the performance of our favorite algorithms. There's even some good stuff for GPU users too! What Is a TPU? A TPU is basically a compute core that specializes in matrix multiplication (called a TensorCore) attached to a stack of fast memory (called high-bandwidth memory or HBM) [1]. Here’s a diagram: Figure: the basic components of a TPU chip. The TensorCore is the gray left-hand box,…

Executable Exports Symbols

There are actually several critical scenarios where an executable must export symbols. The confusion usually lies in the direction of the linking. You are right that Executable A rarely links dynamically to Executable B to call functions inside B. However, the reverse happens frequently: Dynamic Libraries (Plugins) loaded by Executable A often need to call functions inside Executable A. Here are the specific reasons why an executable needs to keep exported symbols: 1. The "Host-Plugin" Architecture (Most Common) This is the primary reason. If your executable supports plugins…

Tailcall in AArch64

In AArch64 (ARM64), for a tail call to work, the current function must tear down its own stack frame before branching to the next function. If it didn't, the stack would grow infinitely with every tail call, causing a stack overflow. Here is exactly how the "reuse" works at the assembly level, step-by-step. 1. The Standard Mechanism In a normal return, a function ends with an epilogue that restores registers and the stack pointer, followed by a ret instruction. In a tail call, the compiler generates a special…

AFL_SKIP_BIN_CHECK

export AFL_SKIP_BIN_CHECK=1 is an environment variable setting that tells AFL++ to stop complaining that your target program doesn't look like it was compiled with AFL. By default, AFL++ checks your target binary for specific "instrumentation" markers before it starts. If it doesn't find them, it assumes you made a mistake (like compiling with gcc instead of afl-cc) and refuses to run to save you from wasting time. When should you use this? You generally should not use this unless you know exactly why. However, here are the valid…

LDR vs. LDUR in AArch64

In AArch64 (ARMv8-A), the main difference between LDR and LDUR is how they handle the immediate offset from the base address. LDR (Load Register): Uses a scaled positive immediate offset. It is the standard instruction for loading data from validly aligned structures and arrays. LDUR (Load Register Unscaled): Uses an unscaled signed immediate offset. It is used for accessing data at negative offsets or unaligned addresses that LDR cannot reach. Here is a detailed breakdown of the differences: 1. Offset Scaling LDR (Scaled): The immediate value you provide…

GDB Usage

Check memory layout To check the memory layout of a binary in GDB, you can use different commands depending on whether the program is currently running or if you are just inspecting the static binary file. 1. If the Program is Running The best command to see the virtual memory mappings (including the heap, stack, and loaded libraries) is: info proc mappings What it shows: Start/End Addr: The virtual address range. Size: The size of the mapped region. Offset: Offset into the file (if file-backed). Objfile: The specific…

The difference of overflow and underflow

In computer science—and specifically in fuzzing and exploitation—the terms Overflow and Underflow mean different things depending on whether you are talking about Numbers (Arithmetic) or Memory (Buffers). Here is the breakdown of the differences. 1. Arithmetic (Integer) Context This refers to the value of a number going beyond what the variable type can hold. Integer Overflow (Too Big) Occurs when you try to store a value larger than the maximum limit. The value "wraps around" to the minimum. Analogy: A car odometer at 999,999 rolling over to 000,000.…

Memory Layout(global data, code, stack, heap, etc) with TLS

On AArch64 (ARM64), the memory layout for Thread Local Storage (TLS) follows TLS Variant 1. This is distinct from x86_64 (which uses Variant 2). The key difference is the location of the TLS data relative to the thread pointer. 1. The High-Level View (Process Memory) For a standard Linux process on AArch64, the memory is laid out as follows (from Low Address to High Address): +----------------------+ <-- High Address (e.g., 0x0000ffff...) | Stack | (Main Thread Stack, grows DOWN) +----------------------+ | ... | | Memory Mapping | <--…

AFL Coverage Instrumentation Callback

0000000000000bc0 <bbCallback>: bc0: 90000102 adrp x2, 20000 <_exit@GLIBC_2.17> bc4: f9404c43 ldr x3, [x2, #152] bc8: b4000263 cbz x3, c14 <bbCallback+0x54> bcc: d53bd042 mrs x2, tpidr_el0 bd0: a9bf7bfd stp x29, x30, [sp, #-16]! bd4: 12003c01 and w1, w0, #0xffff bd8: 910003fd mov x29, sp bdc: 90000100 adrp x0, 20000 <_exit@GLIBC_2.17> be0: f9403404 ldr x4, [x0, #104] be4: 9101a000 add x0, x0, #0x68 be8: d63f0080 blr x4 bec: 78606844 ldrh w4, [x2, x0] bf0: 53017c25 lsr w5, w1, #1 bf4: 78206845 strh w5, [x2, x0] bf8: 4a040021 eor w1, w1,…