{"id":290,"date":"2025-12-22T14:32:11","date_gmt":"2025-12-22T14:32:11","guid":{"rendered":"https:\/\/haco.club\/?p=290"},"modified":"2025-12-22T14:32:11","modified_gmt":"2025-12-22T14:32:11","slug":"memory-layoutglobal-data-code-stack-heap-etc-with-tls","status":"publish","type":"post","link":"https:\/\/haco.club\/?p=290","title":{"rendered":"Memory Layout(global data, code, stack, heap, etc) with TLS"},"content":{"rendered":"\n<p>On AArch64 (ARM64), the memory layout for Thread Local Storage (TLS) follows <strong>TLS Variant 1<\/strong>.<\/p>\n\n\n\n<p>This is distinct from x86_64 (which uses Variant 2). The key difference is the location of the TLS data relative to the thread pointer.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">1. The High-Level View (Process Memory)<\/h3>\n\n\n\n<p>For a standard Linux process on AArch64, the memory is laid out as follows (from Low Address to High Address):<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>      +----------------------+  &lt;-- High Address (e.g., 0x0000ffff...)\n      |        Stack         |  (Main Thread Stack, grows DOWN)\n      +----------------------+\n      |          ...         |\n      |   Memory Mapping     |  &lt;-- Shared Libraries, Mapped Files\n      |      (mmap region)   |      AND Secondary Thread Stacks\/TLS\n      |          ...         |\n      +----------------------+\n      |         Heap         |  (Grows UP)\n      +----------------------+\n      |         BSS          |  (Uninitialized Global Data)\n      +----------------------+\n      |         Data         |  (Initialized Global Data)\n      +----------------------+\n      |         Text         |  (Code \/ Instructions)\n      +----------------------+  &lt;-- Low Address (e.g., 0x00000000...)<\/code><\/pre>\n\n\n\n<h3 class=\"wp-block-heading\">2. The Detailed TLS Layout (Variant 1)<\/h3>\n\n\n\n<p>In <strong>Variant 1<\/strong> (used by ARM\/AArch64), the Thread Pointer points to the <strong>Thread Control Block (TCB)<\/strong>, and the actual TLS variables are located at <strong>positive offsets<\/strong> (higher addresses) after the TCB.<\/p>\n\n\n\n<p><strong><code>tpidr_el0<\/code><\/strong> points exactly here: <code>[ TCB Start ]<\/code><\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>      +---------------------------+\n      |  TLS for Shared Libs      |\n      |  (Loaded at startup)      |\n      +---------------------------+\n      |  TLS for Executable       |  &lt;-- Your \"__afl_prev_loc\" is here\n      |  (The \"Static TLS\")       |\n      +---------------------------+ &lt;--- Offset 16 (Start of TLS Data)\n      |         TCB               |\n      |  (Thread Control Block)   |  &lt;--- 16 bytes reserved\n      +---------------------------+ &lt;--- tpidr_el0 points HERE<\/code><\/pre>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>TCB (0 to 16 bytes):<\/strong> This small header contains internal linker data.\n<ul class=\"wp-block-list\">\n<li><strong>Offset 0:<\/strong> Pointer to <strong>DTV<\/strong> (Dynamic Thread Vector).<\/li>\n\n\n\n<li><strong>Offset 8:<\/strong> Reserved (implementation specific).<\/li>\n<\/ul>\n<\/li>\n\n\n\n<li><strong>TLS Data (16+ bytes):<\/strong> The actual variables (like <code>__afl_prev_loc<\/code>) start immediately after the TCB.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">3. Main Thread vs. Secondary Threads<\/h3>\n\n\n\n<p>The physical placement of this &#8220;TLS Block&#8221; differs depending on which thread it is.<\/p>\n\n\n\n<h4 class=\"wp-block-heading\">A. Secondary Threads (created via <code>pthread_create<\/code>)<\/h4>\n\n\n\n<p>When you create a thread, <code>glibc<\/code> allocates a single large chunk of memory (via <code>mmap<\/code>) to hold <strong>both<\/strong> the stack and the TLS.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>High Address\n   ^  +-------------------------+\n   |  |   Gap \/ Padding         |\n   |  +-------------------------+\n   |  |   TLS Data              |\n   |  +-------------------------+\n   |  |   TCB (16 bytes)        | &lt;--- tpidr_el0 points here\n   |  +-------------------------+\n   |  |                         |\n   |  |      Thread Stack       |\n   |  |      (Grows Down)       |\n   |  |           |             |\n   |  |           v             |\n   |  +-------------------------+\n   |  |      Guard Page         | (Protected to catch overflows)\n   +  +-------------------------+\nLow Address<\/code><\/pre>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Key Detail:<\/strong> The Stack is located <strong>below<\/strong> the TLS. If you overflow the stack (write too much), you hit the Guard Page. You typically do <em>not<\/em> corrupt your own TLS (which is above you), though buffer overflows <em>on<\/em> the stack could theoretically read\/write upwards.<\/li>\n<\/ul>\n\n\n\n<h4 class=\"wp-block-heading\">B. The Main Thread<\/h4>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Stack:<\/strong> The main stack is allocated by the kernel at the very top of user memory.<\/li>\n\n\n\n<li><strong>TLS:<\/strong> The Main Thread&#8217;s TLS is allocated by the dynamic linker (ld.so) early during startup. It is usually located near the heap or loaded libraries, <strong>separate<\/strong> from the main stack.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Why this matters for your AFL code<\/h3>\n\n\n\n<p>In the assembly snippet you provided:<\/p>\n\n\n\n<ol class=\"wp-block-list\">\n<li><strong><code>adrp ... _exit<\/code><\/strong>: It loads a global pointer to find the TLS offset.<\/li>\n\n\n\n<li><strong><code>blr<\/code> (Helper)<\/strong>: The helper calculates the offset.<\/li>\n\n\n\n<li><strong><code>add<\/code> \/ <code>ldr<\/code><\/strong>: It adds that offset to <strong><code>tpidr_el0<\/code><\/strong>.<\/li>\n<\/ol>\n\n\n\n<p>Because AArch64 is <strong>Variant 1<\/strong>, the offset returned by the helper will be a <strong>positive number<\/strong> (e.g., <code>+24<\/code>). The code is effectively saying:<br><em>&#8220;Take the address in <code>tpidr_el0<\/code>, skip the 16-byte header, and go forward X bytes to find my variable.&#8221;<\/em><\/p>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<p>On AArch64 (ARM64) Linux (using <code>glibc<\/code>), the memory layout differs significantly between the <strong>Main Thread<\/strong> and <strong>Secondary Threads<\/strong> (created via <code>pthread_create<\/code>), even though they both use the same internal TCB structure.<\/p>\n\n\n\n<p>Here is the detailed breakdown.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">1. The Structure: <code>struct pthread<\/code> (The TCB)<\/h3>\n\n\n\n<p>Regardless of which thread it is, <code>tpidr_el0<\/code> always points to the <strong>Thread Control Block (TCB)<\/strong>. In <code>glibc<\/code>, this TCB is actually the header of a much larger structure called <code>struct pthread<\/code>.<\/p>\n\n\n\n<p>For AArch64 (TLS Variant 1), the layout at the pointer address is:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>Memory Address: Low  ----------------------------------------&gt; High\nPointer:        &#91; tpidr_el0 ]\nContents:       &#91; TCB Header ] &#91; Static TLS Data (App) ] &#91; Padding ]\nOffsets:        +0           +16                       +...<\/code><\/pre>\n\n\n\n<p><strong>Key Fields inside the TCB (<code>struct pthread<\/code>):<\/strong><\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>offset 0x00 (<code>dtv<\/code>):<\/strong> Pointer to the <strong>Dynamic Thread Vector<\/strong>. This tracks TLS variables for libraries loaded dynamically (via <code>dlopen<\/code>).<\/li>\n\n\n\n<li><strong>offset 0x08 (<code>private<\/code>):<\/strong> Reserved (often used implementation-specific data).<\/li>\n\n\n\n<li><strong>offset 0x28 (<code>stack_guard<\/code>):<\/strong> (Approximate offset) The &#8220;Stack Canary&#8221; value. The compiler reads this value and puts it on the stack to detect buffer overflows.<\/li>\n\n\n\n<li><strong>offset 0x30 (<code>pointer_guard<\/code>):<\/strong> Used to XOR function pointers (like in <code>setjmp<\/code>\/<code>longjmp<\/code>) for security.<\/li>\n\n\n\n<li><strong>Other fields:<\/strong> <code>tid<\/code> (Thread ID), <code>pid<\/code>, <code>cleanup_jmp_buf<\/code>, <code>joinid<\/code> (for <code>pthread_join<\/code>), and scheduling priority.<\/li>\n<\/ul>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<h3 class=\"wp-block-heading\">2. Main Thread Layout<\/h3>\n\n\n\n<p>The Main Thread is special because the Kernel creates its stack, but the Dynamic Linker (<code>ld.so<\/code>) creates its TLS\/TCB. Therefore, they are usually in <strong>completely different memory regions<\/strong>.<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Stack:<\/strong> Located at the very top of the user address space (growing down).<\/li>\n\n\n\n<li><strong>TCB\/TLS:<\/strong> Located near the executable code or Heap (growing up).<\/li>\n<\/ul>\n\n\n\n<pre class=\"wp-block-code\"><code>      &#91; High Address (e.g., 0x0000ffff...) ]\n      +-------------------------+\n      |       Main Stack        |  &lt;-- Kernel allocates this\n      |      (Grows Down)       |\n      +-------------------------+\n      |          ...            |\n      |   (Gigabytes of Gap)    |\n      |          ...            |\n      +-------------------------+\n      |      Linked Libs        |\n      +-------------------------+\n      |     Main Thread TLS     |  &lt;-- ld.so allocates this\n      |   &#91;  Application TLS  ] |\n      |   &#91;    TCB Header     ] |  &lt;-- tpidr_el0 points here\n      +-------------------------+\n      &#91; Low Address ]<\/code><\/pre>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<h3 class=\"wp-block-heading\">3. Secondary Thread Layout (pthreads)<\/h3>\n\n\n\n<p>When you call <code>pthread_create<\/code>, <code>glibc<\/code> allocates <strong>one contiguous block<\/strong> of memory (via <code>mmap<\/code>) to hold <em>everything<\/em> for that thread: the stack, the TCB, and the TLS.<\/p>\n\n\n\n<p>This creates a &#8220;sandwich&#8221; layout where the TCB is effectively at the <strong>top<\/strong> of the stack space.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>      &#91; High Address ]\n      +-------------------------+ &lt;--- End of mmap'd block\n      |    Padding \/ Alignment  |\n      +-------------------------+\n      |    Static TLS Data      |  &lt;-- \"Global\" variables for this thread\n      +-------------------------+\n      |       TCB Header        |  &lt;-- tpidr_el0 points here\n      |    (struct pthread)     |\n      +-------------------------+\n      |                         |\n      |      Thread Stack       |  &lt;-- Stack starts here and grows DOWN\n      |                         |\n      |           |             |\n      |           v             |\n      |                         |\n      +-------------------------+\n      |       Guard Page        |  &lt;-- Protected page (SIGSEGV if stack overflows)\n      +-------------------------+ &lt;--- Start of mmap'd block\n      &#91; Low Address ]<\/code><\/pre>\n\n\n\n<h3 class=\"wp-block-heading\">Why does this matter for Fuzzing\/Exploitation?<\/h3>\n\n\n\n<ol class=\"wp-block-list\">\n<li><strong>Stack Overflow:<\/strong>\n<ul class=\"wp-block-list\">\n<li><strong>Main Thread:<\/strong> If you overflow the main stack, you hit unmapped memory (Crash) or legacy environment variables. You generally <strong>cannot<\/strong> overwrite the TCB\/TLS because it is gigabytes away.<\/li>\n\n\n\n<li><strong>Secondary Threads:<\/strong> If you overflow a secondary thread&#8217;s stack (going down), you hit the Guard Page. However, if you have a <strong>Buffer Over-read<\/strong> or an <strong>Underflow<\/strong> (writing upwards from a buffer on the stack), you are perilously close to the TCB.<\/li>\n<\/ul>\n<\/li>\n\n\n\n<li><strong>Targeting the TCB:<\/strong> If an attacker can write slightly <em>above<\/em> the stack pointer in a secondary thread, they can overwrite:\n<ul class=\"wp-block-list\">\n<li><strong><code>stack_guard<\/code>:<\/strong> Bypassing stack canaries.<\/li>\n\n\n\n<li><strong><code>dtv<\/code>:<\/strong> Hijacking TLS variable lookups.<\/li>\n\n\n\n<li><strong><code>pointer_guard<\/code>:<\/strong> Bypassing pointer encryption.<\/li>\n<\/ul>\n<\/li>\n<\/ol>\n","protected":false},"excerpt":{"rendered":"<p>On AArch64 (ARM64), the memory layout for Thread Local Storage [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[42],"tags":[30],"class_list":["post-290","post","type-post","status-publish","format-standard","hentry","category-knowledge-base","tag-binary"],"_links":{"self":[{"href":"https:\/\/haco.club\/index.php?rest_route=\/wp\/v2\/posts\/290","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/haco.club\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/haco.club\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/haco.club\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/haco.club\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=290"}],"version-history":[{"count":1,"href":"https:\/\/haco.club\/index.php?rest_route=\/wp\/v2\/posts\/290\/revisions"}],"predecessor-version":[{"id":292,"href":"https:\/\/haco.club\/index.php?rest_route=\/wp\/v2\/posts\/290\/revisions\/292"}],"wp:attachment":[{"href":"https:\/\/haco.club\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=290"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/haco.club\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=290"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/haco.club\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=290"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}