{"id":288,"date":"2025-12-22T14:28:33","date_gmt":"2025-12-22T14:28:33","guid":{"rendered":"https:\/\/haco.club\/?p=288"},"modified":"2025-12-22T14:31:16","modified_gmt":"2025-12-22T14:31:16","slug":"afl-coverage-instrumentation-callback","status":"publish","type":"post","link":"https:\/\/haco.club\/?p=288","title":{"rendered":"AFL Coverage Instrumentation Callback"},"content":{"rendered":"\n<pre class=\"wp-block-code\"><code>0000000000000bc0 &lt;bbCallback&gt;:\nbc0:\t90000102 \tadrp\tx2, 20000 &lt;_exit@GLIBC_2.17&gt;\nbc4:\tf9404c43 \tldr\tx3, &#91;x2, #152]\nbc8:\tb4000263 \tcbz\tx3, c14 &lt;bbCallback+0x54&gt;\nbcc:\td53bd042 \tmrs\tx2, tpidr_el0\nbd0:\ta9bf7bfd \tstp\tx29, x30, &#91;sp, #-16]!\nbd4:\t12003c01 \tand\tw1, w0, #0xffff\nbd8:\t910003fd \tmov\tx29, sp\nbdc:\t90000100 \tadrp\tx0, 20000 &lt;_exit@GLIBC_2.17&gt;\nbe0:\tf9403404 \tldr\tx4, &#91;x0, #104]\nbe4:\t9101a000 \tadd\tx0, x0, #0x68\nbe8:\td63f0080 \tblr\tx4\nbec:\t78606844 \tldrh\tw4, &#91;x2, x0]\nbf0:\t53017c25 \tlsr\tw5, w1, #1\nbf4:\t78206845 \tstrh\tw5, &#91;x2, x0]\nbf8:\t4a040021 \teor\tw1, w1, w4\nbfc:\t92403c21 \tand\tx1, x1, #0xffff\nc00:\t38616860 \tldrb\tw0, &#91;x3, x1]\nc04:\t11000400 \tadd\tw0, w0, #0x1\nc08:\t38216860 \tstrb\tw0, &#91;x3, x1]\nc0c:\ta8c17bfd \tldp\tx29, x30, &#91;sp], #16\nc10:\td65f03c0 \tret\nc14:\td65f03c0 \tret\nc18:\td503201f \tnop\nc1c:\td503201f \tnop<\/code><\/pre>\n\n\n\n<p>This code snippet is an implementation of the <strong>AFL (American Fuzzy Lop) coverage instrumentation callback<\/strong> for the <strong>AArch64 (ARM64)<\/strong> architecture.<\/p>\n\n\n\n<p>Specifically, it appears to be the function often named <code>__afl_maybe_log<\/code> or <code>bbCallback<\/code> inserted by the compiler (like <code>afl-gcc<\/code> or <code>afl-clang-fast<\/code>) at the start of basic blocks to track code coverage during fuzzing.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">High-Level Logic<\/h3>\n\n\n\n<p>The function performs the following standard AFL logic:<\/p>\n\n\n\n<ol class=\"wp-block-list\">\n<li><strong>Check Initialization:<\/strong> Checks if the shared memory bitmap (the &#8220;Map&#8221;) is initialized.<\/li>\n\n\n\n<li><strong>Thread Local Storage (TLS):<\/strong> Retrieves a thread-local variable that stores the <code>previous_location<\/code> (ID of the last block visited).<\/li>\n\n\n\n<li><strong>Edge Calculation:<\/strong> Computes a unique hash for the transition (edge) between the previous block and the current block using: <code>index = current_location ^ previous_location<\/code>.<\/li>\n\n\n\n<li><strong>Coverage Recording:<\/strong> Increments the counter in the Map at that <code>index<\/code>.<\/li>\n\n\n\n<li><strong>Update State:<\/strong> Updates <code>previous_location<\/code> to <code>current_location &gt;&gt; 1<\/code> for the next block.<\/li>\n<\/ol>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<h3 class=\"wp-block-heading\">Step-by-Step Instruction Breakdown<\/h3>\n\n\n\n<h4 class=\"wp-block-heading\">1. Check if the Map exists<\/h4>\n\n\n\n<pre class=\"wp-block-code\"><code>bc0: 90000102   adrp x2, 20000        ; Load page address of global variables\nbc4: f9404c43   ldr  x3, &#91;x2, #152]   ; Load the global pointer `__afl_area_ptr` (the Map) into x3\nbc8: b4000263   cbz  x3, c14          ; If x3 is NULL (0), jump to c14 (return immediately)<\/code><\/pre>\n\n\n\n<ul class=\"wp-block-list\">\n<li>The code loads the pointer to the shared memory coverage map. If the fuzzer hasn&#8217;t initialized this yet, it simply returns to avoid a crash.<\/li>\n<\/ul>\n\n\n\n<h4 class=\"wp-block-heading\">2. TLS Setup &amp; Context Saving<\/h4>\n\n\n\n<pre class=\"wp-block-code\"><code>bcc: d53bd042   mrs  x2, tpidr_el0    ; Move Thread Pointer ID Register (TLS base) into x2\nbd0: a9bf7bfd   stp  x29, x30, &#91;sp, #-16]! ; Push Frame Pointer (FP) and Link Register (LR) to stack\nbd4: 12003c01   and  w1, w0, #0xffff  ; Mask the input (Current Block ID) to 16 bits. Keep in w1.\nbd8: 910003fd   mov  x29, sp          ; Set up the frame pointer<\/code><\/pre>\n\n\n\n<ul class=\"wp-block-list\">\n<li><code>w0<\/code> contains the <strong>Current Block ID<\/strong> (passed by the caller).<\/li>\n\n\n\n<li><code>tpidr_el0<\/code> is used to access thread-local variables. This is crucial for multi-threaded fuzzing to ensure the <code>previous_location<\/code> variable is unique per thread.<\/li>\n<\/ul>\n\n\n\n<h4 class=\"wp-block-heading\">3. Resolve TLS Offset for <code>__afl_prev_loc<\/code><\/h4>\n\n\n\n<pre class=\"wp-block-code\"><code>bdc: 90000100   adrp x0, 20000        ; Load address for GOT\/Global area\nbe0: f9403404   ldr  x4, &#91;x0, #104]   ; Load a function pointer (likely a TLS descriptor helper)\nbe4: 9101a000   add  x0, x0, #0x68    ; Prepare argument for the helper\nbe8: d63f0080   blr  x4               ; Call helper function<\/code><\/pre>\n\n\n\n<ul class=\"wp-block-list\">\n<li>This block calls a helper function to determine the offset of the thread-local variable <code>__afl_prev_loc<\/code>.<\/li>\n\n\n\n<li>After <code>blr<\/code>, <code>x0<\/code> contains the <strong>offset<\/strong> of <code>__afl_prev_loc<\/code> relative to the thread pointer (<code>x2<\/code>).<\/li>\n<\/ul>\n\n\n\n<h4 class=\"wp-block-heading\">4. Load Previous Location &amp; Calculate Edge<\/h4>\n\n\n\n<pre class=\"wp-block-code\"><code>bec: 78606844   ldrh w4, &#91;x2, x0]     ; Load `prev_loc` (16-bit) from &#91;TLS_Base + Offset]\nbf0: 53017c25   lsr  w5, w1, #1       ; Calculate `next_prev_loc` = (Current_ID &gt;&gt; 1)\nbf4: 78206845   strh w5, &#91;x2, x0]     ; Store `next_prev_loc` back to TLS for the next time\nbf8: 4a040021   eor  w1, w1, w4       ; Calculate Edge Index: (Current_ID ^ prev_loc)\nbfc: 92403c21   and  x1, x1, #0xffff  ; Ensure Index fits in 64KB (16 bits)<\/code><\/pre>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>AFL Algorithm Core:<\/strong>\n<ul class=\"wp-block-list\">\n<li>Current Edge = <code>Current_ID ^ Prev_ID<\/code>.<\/li>\n\n\n\n<li>Future Prev_ID = <code>Current_ID &gt;&gt; 1<\/code>.<\/li>\n<\/ul>\n<\/li>\n\n\n\n<li>The shift (<code>lsr<\/code>) distinguishes the direction of the edge (i.e., A -&gt; B creates a different hash than B -&gt; A).<\/li>\n<\/ul>\n\n\n\n<h4 class=\"wp-block-heading\">5. Update Coverage Map<\/h4>\n\n\n\n<pre class=\"wp-block-code\"><code>c00: 38616860   ldrb w0, &#91;x3, x1]     ; Load byte from Map&#91;Index]\nc04: 11000400   add  w0, w0, #0x1     ; Increment counter\nc08: 38216860   strb w0, &#91;x3, x1]     ; Store byte back to Map&#91;Index]<\/code><\/pre>\n\n\n\n<ul class=\"wp-block-list\">\n<li>This increments the &#8220;hit count&#8221; for this specific code path transition.<\/li>\n<\/ul>\n\n\n\n<h4 class=\"wp-block-heading\">6. Return<\/h4>\n\n\n\n<pre class=\"wp-block-code\"><code>c0c: a8c17bfd   ldp  x29, x30, &#91;sp], #16 ; Pop FP and LR\nc10: d65f03c0   ret                   ; Return to the instrumented function\nc14: d65f03c0   ret                   ; Early return path (if Map was NULL)<\/code><\/pre>\n\n\n\n<h3 class=\"wp-block-heading\">Summary<\/h3>\n\n\n\n<p>This is the assembly equivalent of the following C logic used in fuzzing:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>\/\/ w0 is the compile-time random ID for the current block\nvoid bbCallback(unsigned int cur_loc) {\n  unsigned char *map = __afl_area_ptr;\n  if (!map) return;\n\n  \/\/ Access thread-local storage for previous location\n  unsigned int prev_loc = __afl_prev_loc;\n\n  \/\/ Calculate edge index\n  unsigned int index = (cur_loc ^ prev_loc) &amp; 0xFFFF;\n\n  \/\/ Update map\n  map&#91;index]++;\n\n  \/\/ Update previous location for the next callback\n  __afl_prev_loc = cur_loc &gt;&gt; 1;\n}<\/code><\/pre>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<p>Here is an explanation of these three concepts(Thread Local Storage, thread-local variable and tpidr_el0), moving from the high-level software concept down to the specific hardware implementation in ARM64.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">1. Thread Local Storage (TLS) \u2014 The System<\/h3>\n\n\n\n<p><strong>Thread Local Storage<\/strong> is a computer programming method that uses static or global memory local to a thread.<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>The Problem:<\/strong> Normally, global variables are shared by <strong>all<\/strong> threads in a program. If Thread A changes a global variable, Thread B sees the change. This creates race conditions and requires locking (mutexes), which is slow.<\/li>\n\n\n\n<li><strong>The Solution (TLS):<\/strong> TLS allows you to define a &#8220;global&#8221; variable where <strong>each thread gets its own unique copy<\/strong>.<\/li>\n\n\n\n<li><strong>Analogy:<\/strong>\n<ul class=\"wp-block-list\">\n<li><strong>Global Memory:<\/strong> A whiteboard in the middle of an office. Everyone shares it. If you write on it, I see it.<\/li>\n\n\n\n<li><strong>TLS:<\/strong> A notebook given to every employee. Everyone has a notebook called &#8220;Notes,&#8221; but what I write in mine is completely private from what you write in yours.<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">2. Thread-Local Variable \u2014 The Programming Object<\/h3>\n\n\n\n<p>A <strong>thread-local variable<\/strong> is the specific variable that lives inside the TLS.<\/p>\n\n\n\n<p>In C or C++, you declare them using keywords like <code>__thread<\/code>, <code>_Thread_local<\/code>, or <code>thread_local<\/code>.<\/p>\n\n\n\n<p><strong>Classic Example: <code>errno<\/code><\/strong><br>In standard C programming, <code>errno<\/code> contains the error code of the last failed system call.<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>If <strong>Thread A<\/strong> tries to open a missing file, <code>errno<\/code> becomes 2 (<code>ENOENT<\/code>).<\/li>\n\n\n\n<li>If <strong>Thread B<\/strong> is running successfully at the same time, its <code>errno<\/code> should remain 0.<\/li>\n\n\n\n<li>Therefore, <code>errno<\/code> is a <strong>thread-local variable<\/strong>. If it were a standard global variable, Thread B might incorrectly think it encountered an error because Thread A failed.<\/li>\n<\/ul>\n\n\n\n<p>In the context of your previous AFL code snippet, the variable <code>__afl_prev_loc<\/code> is a thread-local variable. This ensures that if you are fuzzing with multiple threads, Thread A&#8217;s execution history doesn&#8217;t mix with Thread B&#8217;s history.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">3. <code>tpidr_el0<\/code> \u2014 The Hardware Register<\/h3>\n\n\n\n<p>This is the specific <strong>ARM64 (AArch64)<\/strong> CPU register used to implement TLS efficiently.<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Name Breakdown:<\/strong>\n<ul class=\"wp-block-list\">\n<li><strong>TP:<\/strong> Thread Pointer<\/li>\n\n\n\n<li><strong>ID:<\/strong> ID<\/li>\n\n\n\n<li><strong>R:<\/strong> Register<\/li>\n\n\n\n<li><strong>EL0:<\/strong> Exception Level 0 (User Mode)<\/li>\n<\/ul>\n<\/li>\n\n\n\n<li><strong>Purpose:<\/strong> It holds the <strong>base address<\/strong> of the memory region allocated for the <em>currently running thread<\/em>.<\/li>\n<\/ul>\n\n\n\n<p>When the Operating System switches threads (context switch), it updates <code>tpidr_el0<\/code> to point to the new thread&#8217;s private data area.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">How they work together (The Flow)<\/h3>\n\n\n\n<p>When your code wants to access a thread-local variable, the CPU performs the following steps (simplified):<\/p>\n\n\n\n<ol class=\"wp-block-list\">\n<li><strong>Get the Base:<\/strong> The CPU reads <code>tpidr_el0<\/code> to find out &#8220;Where does the current thread&#8217;s private memory start?&#8221;<\/li>\n\n\n\n<li><strong>Get the Offset:<\/strong> The compiler knows that Variable X is located, say, 16 bytes from the start of that memory block.<\/li>\n\n\n\n<li><strong>Calculate Address:<\/strong> Target Address = <code>tpidr_el0<\/code> + 16.<\/li>\n\n\n\n<li><strong>Access:<\/strong> Read\/Write to that address.<\/li>\n<\/ol>\n\n\n\n<h3 class=\"wp-block-heading\">Applying it to your code snippet<\/h3>\n\n\n\n<p>Let&#8217;s look at lines <code>bcc<\/code> to <code>bec<\/code> from your snippet again:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>bcc: d53bd042   mrs  x2, tpidr_el0    ; 1. READ the Thread Pointer base address into x2\n...\nbe8: d63f0080   blr  x4               ; 2. CALL a helper to get the variable's OFFSET (puts it in x0)\nbec: 78606844   ldrh w4, &#91;x2, x0]     ; 3. ACCESS memory at &#91;Base (x2) + Offset (x0)]<\/code><\/pre>\n\n\n\n<p><strong>Why was this necessary in the code?<\/strong><br>AFL updates <code>prev_loc<\/code> (previous location) to track coverage. If the fuzzer is multi-threaded, two threads running the same function simultaneously would corrupt each other&#8217;s coverage map if <code>prev_loc<\/code> were a standard global variable. By using <code>tpidr_el0<\/code> to access a thread-local copy, each thread tracks its own path independently.<\/p>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<p>&#8220;CALL a helper to get the variable&#8217;s OFFSET&#8221;<\/p>\n\n\n\n<p>The helper function is necessary because of <strong>Dynamic Linking<\/strong>.<\/p>\n\n\n\n<p>When you compile this code (likely as a shared library or a Position Independent Executable), the compiler <strong>does not know<\/strong> where the variable <code>__afl_prev_loc<\/code> will be located in memory relative to the thread pointer.<\/p>\n\n\n\n<p>Here is the breakdown of why the helper is needed and what it does.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">1. The Core Problem: &#8220;I don&#8217;t know where I am yet&#8221;<\/h3>\n\n\n\n<p>If you are writing a standard executable (like <code>main.exe<\/code>), the compiler can calculate exactly where every variable is. It can say &#8220;Variable X is at offset 16.&#8221;<\/p>\n\n\n\n<p>However, this code is likely compiled as <strong>Position Independent Code (PIC)<\/strong>. This means:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>This code might be inside a shared library (<code>libafl.so<\/code>).<\/li>\n\n\n\n<li>That library could be loaded into memory at <em>any<\/em> address.<\/li>\n\n\n\n<li>The library might be loaded <em>after<\/em> the program starts (using <code>dlopen<\/code>).<\/li>\n<\/ul>\n\n\n\n<p>Because of this, the offset of <code>__afl_prev_loc<\/code> is <strong>unknown at compile time<\/strong>. The compiler cannot simply write <code>add x0, x2, #16<\/code>. It needs to ask the &#8220;runtime linker&#8221; where the variable ended up.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">2. The Solution: TLS Descriptors (TLSDESC)<\/h3>\n\n\n\n<p>The specific instruction sequence you see (<code>adrp<\/code>, <code>ldr<\/code>, <code>add<\/code>, <code>blr<\/code>) is the signature of the <strong>TLSDESC (TLS Descriptors)<\/strong> model, which is the default for AArch64\/ARM64.<\/p>\n\n\n\n<p>It works like a &#8220;lazy question&#8221;:<\/p>\n\n\n\n<ol class=\"wp-block-list\">\n<li><strong>Preparation:<\/strong> <code>adrp<\/code> and <code>add<\/code> prepare the arguments. They point to a &#8220;descriptor&#8221; in the Global Offset Table (GOT).<\/li>\n\n\n\n<li><strong>The Call (<code>blr<\/code>):<\/strong> You call the helper function stored in that descriptor.<\/li>\n\n\n\n<li><strong>The Answer:<\/strong> The helper calculates just the <strong>offset<\/strong> for you and returns it in <code>x0<\/code>.<\/li>\n\n\n\n<li><strong>The Access:<\/strong> You add that offset to your Thread Pointer (<code>tpidr_el0<\/code>) to find the variable.<\/li>\n<\/ol>\n\n\n\n<h3 class=\"wp-block-heading\">3. Why is it a &#8220;Helper&#8221; and not just a standard function?<\/h3>\n\n\n\n<p>It is a special, highly optimized function provided by the dynamic linker (like <code>ld-linux.so<\/code>).<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>It preserves registers:<\/strong> Notice that the code didn&#8217;t save <code>x1<\/code> or <code>x2<\/code> before calling <code>blr x4<\/code>. A standard function call would require saving these. The TLS helper guarantees it won&#8217;t mess up your temporary registers (except <code>x0<\/code>), making it very fast.<\/li>\n\n\n\n<li><strong>It can change:<\/strong> When the program first starts, the helper might point to a &#8220;slow path&#8221; that calculates the offset and caches it. The next time you call it, it might point to a &#8220;fast path&#8221; that just returns the now-known constant.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Summary<\/h3>\n\n\n\n<p>You need the helper because <strong>the variable&#8217;s location is dynamic<\/strong>.<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Static Build:<\/strong> &#8220;Go to offset 16.&#8221; (Fast, no helper needed)<\/li>\n\n\n\n<li><strong>Dynamic Build (Your Code):<\/strong> &#8220;Call this helper; it will figure out where the variable lives right now, then tell me the offset.&#8221;<\/li>\n<\/ul>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<p>If this code were part of the main executable (e.g., the <code>.exe<\/code> file), you would be absolutely correct: the offset would be static\/constant. This is called the <strong>Local Executable (LE)<\/strong> model.<\/p>\n\n\n\n<p>However, the assembly you provided is using the <strong>General Dynamic (GD)<\/strong> or <strong>TLSDESC<\/strong> model. This happens because the compiler is being conservative. It assumes this code might end up in a <strong>Shared Library (<code>.so<\/code>)<\/strong>.<\/p>\n\n\n\n<p>Here is why the offset <strong>cannot<\/strong> be static in a Shared Library.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">1. The &#8220;Train&#8221; Analogy<\/h3>\n\n\n\n<p>Imagine the Thread Local Storage (TLS) memory area as a long train attached to the Thread Pointer (TP).<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>The Engine:<\/strong> The Thread Pointer (<code>tpidr_el0<\/code>).<\/li>\n\n\n\n<li><strong>Car 1:<\/strong> The variables for the <strong>Main Executable<\/strong>.<\/li>\n\n\n\n<li><strong>Car 2:<\/strong> The variables for <strong>Library A<\/strong>.<\/li>\n\n\n\n<li><strong>Car 3:<\/strong> The variables for <strong>Library B<\/strong>.<\/li>\n<\/ul>\n\n\n\n<h4 class=\"wp-block-heading\">The Problem for the Compiler<\/h4>\n\n\n\n<p>When you are compiling <strong>Library B<\/strong>, the compiler has no idea:<\/p>\n\n\n\n<ol class=\"wp-block-list\">\n<li>How big <strong>Car 1<\/strong> (the main app) will be.<\/li>\n\n\n\n<li>If <strong>Library A<\/strong> will be loaded before or after Library B.<\/li>\n\n\n\n<li>If Library B will be loaded at startup or loaded later (via <code>dlopen<\/code>).<\/li>\n<\/ol>\n\n\n\n<p>Because the compiler doesn&#8217;t know &#8220;how many cars are in front of it,&#8221; it cannot hardcode the distance (offset) from the Engine (TP) to its own variables.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">2. The Dynamic Thread Vector (DTV)<\/h3>\n\n\n\n<p>In complex scenarios (like <code>dlopen<\/code> on Linux), the memory layout isn&#8217;t even a contiguous block (a single train). It looks more like this:<\/p>\n\n\n\n<ol class=\"wp-block-list\">\n<li><code>tpidr_el0<\/code> points to a <strong>Thread Control Block<\/strong>.<\/li>\n\n\n\n<li>Inside that block, there is a pointer to an array called the <strong>DTV (Dynamic Thread Vector)<\/strong>.<\/li>\n\n\n\n<li>The DTV is a list of pointers:\n<ul class=\"wp-block-list\">\n<li>Slot 0 -&gt; points to Main App TLS data<\/li>\n\n\n\n<li>Slot 1 -&gt; points to Lib A TLS data<\/li>\n\n\n\n<li>Slot 2 -&gt; points to Lib B TLS data<\/li>\n<\/ul>\n<\/li>\n<\/ol>\n\n\n\n<p>To find a variable in Library B, the code has to:<\/p>\n\n\n\n<ol class=\"wp-block-list\">\n<li>Read <code>tpidr_el0<\/code>.<\/li>\n\n\n\n<li>Find the DTV.<\/li>\n\n\n\n<li>Read the pointer at Slot 2.<\/li>\n\n\n\n<li>Add the offset inside that specific block.<\/li>\n<\/ol>\n\n\n\n<p>Since the OS assigns &#8220;Slot 2&#8221; at runtime, the compiler <strong>cannot<\/strong> know the final offset relative to <code>tpidr_el0<\/code> at compile time. It requires the helper function to look up the DTV and calculate the current address.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">3. &#8220;Linker Relaxation&#8221; (Why you might see static offsets anyway)<\/h3>\n\n\n\n<p>You mentioned that in your opinion it <em>should<\/em> be static. You are right in the sense that <strong>performance-wise, we want it to be static.<\/strong><\/p>\n\n\n\n<p>If you compile this code as a shared object (<code>.so<\/code>), the compiler generates the slow <code>blr<\/code> helper code (General Dynamic) because it has to be safe.<\/p>\n\n\n\n<p><strong>However<\/strong>, if you verify that this object file ends up in the <strong>main executable<\/strong> during the linking phase, the Linker is smart enough to rewrite the code! This is called <strong>Linker Relaxation<\/strong>.<\/p>\n\n\n\n<p>The Linker sees:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>&#8220;Hey, this code asked for a dynamic offset lookup\u2026&#8221;<\/li>\n\n\n\n<li>&#8220;\u2026but I know this is the main executable.&#8221;<\/li>\n\n\n\n<li>&#8220;\u2026so I know the offset is definitely 16.&#8221;<\/li>\n<\/ul>\n\n\n\n<p>It will physically <strong>overwrite<\/strong> the assembly instructions:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>It replaces the <code>blr<\/code> (call) with <code>nop<\/code> (do nothing).<\/li>\n\n\n\n<li>It replaces the complex setup with a simple <code>mov<\/code> or <code>add<\/code>.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Summary<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Compiler&#8217;s View:<\/strong> &#8220;I don&#8217;t know where this code will live (Exe vs Lib), so I will generate the <strong>Helper Function<\/strong> code just to be safe.&#8221;<\/li>\n\n\n\n<li><strong>Linker&#8217;s View:<\/strong> &#8220;I am building the final binary now. If I can prove the offset is static, I will optimize the assembly to remove the helper.&#8221;<\/li>\n<\/ul>\n\n\n\n<p>The snippet you pasted contains the un-optimized, &#8220;safe&#8221; version (TLSDESC), implying it was compiled with <code>-fPIC<\/code> and the linker didn&#8217;t (or couldn&#8217;t) relax it to the Local Executable model.<\/p>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<p><strong>AFL++ is not multi-threaded by default.<\/strong><\/p>\n\n\n\n<p>By design, a single instance of <code>afl-fuzz<\/code> is a <strong>single-threaded process<\/strong> that occupies exactly <strong>one CPU core<\/strong>.<\/p>\n\n\n\n<p>If you run <code>afl-fuzz<\/code> once on a 64-core machine, it will use 1 core and leave the other 63 idle. To utilize your full hardware, you must manually launch multiple separate instances of the fuzzer (processes), not threads.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">1. Why isn&#8217;t it multi-threaded?<\/h3>\n\n\n\n<p>AFL++ (and the original AFL) relies on <strong>determinism<\/strong> and global state.<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>It uses a &#8220;Forkserver&#8221; mechanism where the fuzzer pauses the target at <code>main()<\/code>, then <code>fork()<\/code>s a new child process for every single input attempt.<\/li>\n\n\n\n<li>Managing this via threads would be incredibly complex because the &#8220;target&#8221; (the program you are fuzzing) might crash, hang, or corrupt memory. If the fuzzer were just a thread in the same process, a crash in the target would crash the fuzzer too.<\/li>\n\n\n\n<li>Process isolation ensures that when the target crashes (which is the goal!), the fuzzer survives to record the crash.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">2. How to run it in &#8220;Parallel Mode&#8221;<\/h3>\n\n\n\n<p>Since it doesn&#8217;t spawn threads automatically, you use <strong>Distributed\/Parallel Fuzzing<\/strong> by running multiple processes that share a folder to sync their findings.<\/p>\n\n\n\n<p>You typically run one &#8220;Main&#8221; instance (deterministic) and many &#8220;Secondary&#8221; instances (randomized).<\/p>\n\n\n\n<p><strong>Example for a 4-core system:<\/strong><\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Core 1 (Main):<\/strong> <code>.\/afl-fuzz -i inputs -o syncdir -M fuzzer01 .\/target_app<\/code> <em>(The <code>-M<\/code> flag tells it to be the Master\/Main instance).<\/em><\/li>\n\n\n\n<li><strong>Core 2 (Secondary):<\/strong> <code>.\/afl-fuzz -i inputs -o syncdir -S fuzzer02 .\/target_app<\/code> <em>(The <code>-S<\/code> flag tells it to be a Secondary instance).<\/em><\/li>\n\n\n\n<li><strong>Core 3 &amp; 4 (More Secondaries):<\/strong><br><code>bash .\/afl-fuzz -i inputs -o syncdir -S fuzzer03 .\/target_app .\/afl-fuzz -i inputs -o syncdir -S fuzzer04 .\/target_app<\/code><\/li>\n<\/ul>\n\n\n\n<p>All instances watch the <code>syncdir<\/code> directory. If <code>fuzzer02<\/code> finds a new interesting path, <code>fuzzer01<\/code> will see it in the folder and add it to its own queue.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">3. Connection to your previous question (TLS)<\/h3>\n\n\n\n<p>This explains why the TLS instrumentation (<code>tpidr_el0<\/code>) you asked about earlier is so important.<\/p>\n\n\n\n<p>Even though <strong>AFL++ itself<\/strong> is single-threaded, the <strong>Target Program<\/strong> you are fuzzing might be multi-threaded.<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>If your target app spawns 4 threads, they will all try to write to the coverage map (the &#8220;Map&#8221; in your assembly code).<\/li>\n\n\n\n<li>Without TLS (Thread Local Storage), Thread A and Thread B in the <em>target<\/em> would overwrite each other&#8217;s <code>prev_loc<\/code> variable, corrupting the coverage data.<\/li>\n\n\n\n<li>The <code>tpidr_el0<\/code> instrumentation ensures that even if the target is multi-threaded, the coverage reporting remains accurate for each thread.<\/li>\n<\/ul>\n","protected":false},"excerpt":{"rendered":"<p>This code snippet is an implementation of the AFL (American [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[42],"tags":[49,30],"class_list":["post-288","post","type-post","status-publish","format-standard","hentry","category-knowledge-base","tag-afl","tag-binary"],"_links":{"self":[{"href":"https:\/\/haco.club\/index.php?rest_route=\/wp\/v2\/posts\/288","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/haco.club\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/haco.club\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/haco.club\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/haco.club\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=288"}],"version-history":[{"count":2,"href":"https:\/\/haco.club\/index.php?rest_route=\/wp\/v2\/posts\/288\/revisions"}],"predecessor-version":[{"id":291,"href":"https:\/\/haco.club\/index.php?rest_route=\/wp\/v2\/posts\/288\/revisions\/291"}],"wp:attachment":[{"href":"https:\/\/haco.club\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=288"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/haco.club\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=288"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/haco.club\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=288"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}