On AArch64 (ARM64), the memory layout for Thread Local Storage (TLS) follows TLS Variant 1.
This is distinct from x86_64 (which uses Variant 2). The key difference is the location of the TLS data relative to the thread pointer.
1. The High-Level View (Process Memory)
For a standard Linux process on AArch64, the memory is laid out as follows (from Low Address to High Address):
+----------------------+ <-- High Address (e.g., 0x0000ffff...)
| Stack | (Main Thread Stack, grows DOWN)
+----------------------+
| ... |
| Memory Mapping | <-- Shared Libraries, Mapped Files
| (mmap region) | AND Secondary Thread Stacks/TLS
| ... |
+----------------------+
| Heap | (Grows UP)
+----------------------+
| BSS | (Uninitialized Global Data)
+----------------------+
| Data | (Initialized Global Data)
+----------------------+
| Text | (Code / Instructions)
+----------------------+ <-- Low Address (e.g., 0x00000000...)
2. The Detailed TLS Layout (Variant 1)
In Variant 1 (used by ARM/AArch64), the Thread Pointer points to the Thread Control Block (TCB), and the actual TLS variables are located at positive offsets (higher addresses) after the TCB.
tpidr_el0 points exactly here: [ TCB Start ]
+---------------------------+
| TLS for Shared Libs |
| (Loaded at startup) |
+---------------------------+
| TLS for Executable | <-- Your "__afl_prev_loc" is here
| (The "Static TLS") |
+---------------------------+ <--- Offset 16 (Start of TLS Data)
| TCB |
| (Thread Control Block) | <--- 16 bytes reserved
+---------------------------+ <--- tpidr_el0 points HERE
- TCB (0 to 16 bytes): This small header contains internal linker data.
- Offset 0: Pointer to DTV (Dynamic Thread Vector).
- Offset 8: Reserved (implementation specific).
- TLS Data (16+ bytes): The actual variables (like
__afl_prev_loc) start immediately after the TCB.
3. Main Thread vs. Secondary Threads
The physical placement of this “TLS Block” differs depending on which thread it is.
A. Secondary Threads (created via pthread_create)
When you create a thread, glibc allocates a single large chunk of memory (via mmap) to hold both the stack and the TLS.
High Address
^ +-------------------------+
| | Gap / Padding |
| +-------------------------+
| | TLS Data |
| +-------------------------+
| | TCB (16 bytes) | <--- tpidr_el0 points here
| +-------------------------+
| | |
| | Thread Stack |
| | (Grows Down) |
| | | |
| | v |
| +-------------------------+
| | Guard Page | (Protected to catch overflows)
+ +-------------------------+
Low Address
- Key Detail: The Stack is located below the TLS. If you overflow the stack (write too much), you hit the Guard Page. You typically do not corrupt your own TLS (which is above you), though buffer overflows on the stack could theoretically read/write upwards.
B. The Main Thread
- Stack: The main stack is allocated by the kernel at the very top of user memory.
- TLS: The Main Thread’s TLS is allocated by the dynamic linker (ld.so) early during startup. It is usually located near the heap or loaded libraries, separate from the main stack.
Why this matters for your AFL code
In the assembly snippet you provided:
adrp ... _exit: It loads a global pointer to find the TLS offset.blr(Helper): The helper calculates the offset.add/ldr: It adds that offset totpidr_el0.
Because AArch64 is Variant 1, the offset returned by the helper will be a positive number (e.g., +24). The code is effectively saying:
“Take the address in tpidr_el0, skip the 16-byte header, and go forward X bytes to find my variable.”
On AArch64 (ARM64) Linux (using glibc), the memory layout differs significantly between the Main Thread and Secondary Threads (created via pthread_create), even though they both use the same internal TCB structure.
Here is the detailed breakdown.
1. The Structure: struct pthread (The TCB)
Regardless of which thread it is, tpidr_el0 always points to the Thread Control Block (TCB). In glibc, this TCB is actually the header of a much larger structure called struct pthread.
For AArch64 (TLS Variant 1), the layout at the pointer address is:
Memory Address: Low ----------------------------------------> High
Pointer: [ tpidr_el0 ]
Contents: [ TCB Header ] [ Static TLS Data (App) ] [ Padding ]
Offsets: +0 +16 +...
Key Fields inside the TCB (struct pthread):
- offset 0x00 (
dtv): Pointer to the Dynamic Thread Vector. This tracks TLS variables for libraries loaded dynamically (viadlopen). - offset 0x08 (
private): Reserved (often used implementation-specific data). - offset 0x28 (
stack_guard): (Approximate offset) The “Stack Canary” value. The compiler reads this value and puts it on the stack to detect buffer overflows. - offset 0x30 (
pointer_guard): Used to XOR function pointers (like insetjmp/longjmp) for security. - Other fields:
tid(Thread ID),pid,cleanup_jmp_buf,joinid(forpthread_join), and scheduling priority.
2. Main Thread Layout
The Main Thread is special because the Kernel creates its stack, but the Dynamic Linker (ld.so) creates its TLS/TCB. Therefore, they are usually in completely different memory regions.
- Stack: Located at the very top of the user address space (growing down).
- TCB/TLS: Located near the executable code or Heap (growing up).
[ High Address (e.g., 0x0000ffff...) ]
+-------------------------+
| Main Stack | <-- Kernel allocates this
| (Grows Down) |
+-------------------------+
| ... |
| (Gigabytes of Gap) |
| ... |
+-------------------------+
| Linked Libs |
+-------------------------+
| Main Thread TLS | <-- ld.so allocates this
| [ Application TLS ] |
| [ TCB Header ] | <-- tpidr_el0 points here
+-------------------------+
[ Low Address ]
3. Secondary Thread Layout (pthreads)
When you call pthread_create, glibc allocates one contiguous block of memory (via mmap) to hold everything for that thread: the stack, the TCB, and the TLS.
This creates a “sandwich” layout where the TCB is effectively at the top of the stack space.
[ High Address ]
+-------------------------+ <--- End of mmap'd block
| Padding / Alignment |
+-------------------------+
| Static TLS Data | <-- "Global" variables for this thread
+-------------------------+
| TCB Header | <-- tpidr_el0 points here
| (struct pthread) |
+-------------------------+
| |
| Thread Stack | <-- Stack starts here and grows DOWN
| |
| | |
| v |
| |
+-------------------------+
| Guard Page | <-- Protected page (SIGSEGV if stack overflows)
+-------------------------+ <--- Start of mmap'd block
[ Low Address ]
Why does this matter for Fuzzing/Exploitation?
- Stack Overflow:
- Main Thread: If you overflow the main stack, you hit unmapped memory (Crash) or legacy environment variables. You generally cannot overwrite the TCB/TLS because it is gigabytes away.
- Secondary Threads: If you overflow a secondary thread’s stack (going down), you hit the Guard Page. However, if you have a Buffer Over-read or an Underflow (writing upwards from a buffer on the stack), you are perilously close to the TCB.
- Targeting the TCB: If an attacker can write slightly above the stack pointer in a secondary thread, they can overwrite:
stack_guard: Bypassing stack canaries.dtv: Hijacking TLS variable lookups.pointer_guard: Bypassing pointer encryption.