{"id":269,"date":"2025-11-25T07:32:00","date_gmt":"2025-11-25T07:32:00","guid":{"rendered":"https:\/\/haco.club\/?p=269"},"modified":"2025-11-25T07:53:09","modified_gmt":"2025-11-25T07:53:09","slug":"aarch64-pre-post-indexing","status":"publish","type":"post","link":"https:\/\/haco.club\/?p=269","title":{"rendered":"AArch64 Pre\/Post Indexing"},"content":{"rendered":"\n<p>In AArch64 (ARMv8-A 64-bit architecture), <strong>Pre-indexing<\/strong> and <strong>Post-indexing<\/strong> are memory addressing modes used with Load (<code>LDR<\/code>) and Store (<code>STR<\/code>) instructions.<\/p>\n\n\n\n<p>Their primary purpose is to perform <strong>Writeback<\/strong>: they automatically update the base register (the pointer) with a new address as part of the instruction execution. This is extremely efficient for iterating through arrays or managing stacks because it eliminates the need for a separate <code>ADD<\/code> or <code>SUB<\/code> instruction to move the pointer.<\/p>\n\n\n\n<p>Here is the breakdown of how they work.<\/p>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<h3 class=\"wp-block-heading\">1. Pre-Indexed Addressing<\/h3>\n\n\n\n<p><strong>Syntax:<\/strong> <code>[base, #offset]!<\/code><br><strong>Key Symbol:<\/strong> The exclamation mark <code>!<\/code> at the end.<\/p>\n\n\n\n<p>In pre-indexing, the offset is added to the base register <strong>before<\/strong> the memory access occurs. The base register is then <strong>permanently updated<\/strong> with this new address.<\/p>\n\n\n\n<p><strong>Sequence of Operations:<\/strong><\/p>\n\n\n\n<ol class=\"wp-block-list\">\n<li><strong>Calculate:<\/strong> Add the <code>#offset<\/code> to the value in the <code>base<\/code> register.<\/li>\n\n\n\n<li><strong>Update:<\/strong> Write this new value back into the <code>base<\/code> register.<\/li>\n\n\n\n<li><strong>Access:<\/strong> Access memory using the <strong>new<\/strong> value.<\/li>\n<\/ol>\n\n\n\n<p><strong>Example:<\/strong><\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>MOV X1, #0x1000     ; Set Base pointer (X1) to address 0x1000\nLDR X0, &#91;X1, #8]!   ; Pre-indexed Load<\/code><\/pre>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Before:<\/strong> <code>X1<\/code> = <code>0x1000<\/code><\/li>\n\n\n\n<li><strong>Action:<\/strong> <code>X1<\/code> becomes <code>0x1008<\/code> (0x1000 + 8). Memory is read from address <code>0x1008<\/code>.<\/li>\n\n\n\n<li><strong>After:<\/strong> <code>X0<\/code> holds the data from <code>0x1008<\/code>. <code>X1<\/code> is now <code>0x1008<\/code>.<\/li>\n<\/ul>\n\n\n\n<p><strong>C Language Analogy:<\/strong><br>Similar to: <code>data = *(ptr += 1);<\/code> (Increment pointer, then access).<\/p>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<h3 class=\"wp-block-heading\">2. Post-Indexed Addressing<\/h3>\n\n\n\n<p><strong>Syntax:<\/strong> <code>[base], #offset<\/code><br><strong>Key Visual:<\/strong> The offset is <strong>outside<\/strong> the square brackets.<\/p>\n\n\n\n<p>In post-indexing, the memory access uses the <strong>current<\/strong> value of the base register. <strong>After<\/strong> the access is complete, the offset is added to the base register and the register is updated.<\/p>\n\n\n\n<p><strong>Sequence of Operations:<\/strong><\/p>\n\n\n\n<ol class=\"wp-block-list\">\n<li><strong>Access:<\/strong> Access memory using the <strong>current<\/strong> value in the <code>base<\/code> register.<\/li>\n\n\n\n<li><strong>Calculate:<\/strong> Add the <code>#offset<\/code> to the <code>base<\/code> register.<\/li>\n\n\n\n<li><strong>Update:<\/strong> Write this new value back into the <code>base<\/code> register.<\/li>\n<\/ol>\n\n\n\n<p><strong>Example:<\/strong><\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>MOV X1, #0x1000     ; Set Base pointer (X1) to address 0x1000\nLDR X0, &#91;X1], #8    ; Post-indexed Load<\/code><\/pre>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Before:<\/strong> <code>X1<\/code> = <code>0x1000<\/code><\/li>\n\n\n\n<li><strong>Action:<\/strong> Memory is read from address <code>0x1000<\/code>. Then <code>X1<\/code> becomes <code>0x1008<\/code>.<\/li>\n\n\n\n<li><strong>After:<\/strong> <code>X0<\/code> holds the data from <code>0x1000<\/code>. <code>X1<\/code> is now <code>0x1008<\/code>.<\/li>\n<\/ul>\n\n\n\n<p><strong>C Language Analogy:<\/strong><br>Similar to: <code>data = *ptr++;<\/code> (Access current pointer, then increment).<\/p>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<h3 class=\"wp-block-heading\">Summary Comparison<\/h3>\n\n\n\n<p>To visualize the difference, assume <code>X1<\/code> starts at <code>0x1000<\/code>:<\/p>\n\n\n\n<figure class=\"wp-block-table\"><table class=\"has-fixed-layout\"><thead><tr><th>Mode<\/th><th>Syntax<\/th><th>Effective Address Used<\/th><th>Final Value of X1<\/th><\/tr><\/thead><tbody><tr><td><strong>Offset (Normal)<\/strong><\/td><td><code>[X1, #8]<\/code><\/td><td><code>0x1008<\/code><\/td><td><code>0x1000<\/code> (No change)<\/td><\/tr><tr><td><strong>Pre-Index<\/strong><\/td><td><code>[X1, #8]!<\/code><\/td><td><code>0x1008<\/code><\/td><td><code>0x1008<\/code><\/td><\/tr><tr><td><strong>Post-Index<\/strong><\/td><td><code>[X1], #8<\/code><\/td><td><code>0x1000<\/code><\/td><td><code>0x1008<\/code><\/td><\/tr><\/tbody><\/table><\/figure>\n\n\n\n<h3 class=\"wp-block-heading\">When to use which?<\/h3>\n\n\n\n<ol class=\"wp-block-list\">\n<li><strong>Post-Indexing (<code>[Xn], #offset<\/code>):<\/strong>\n<ul class=\"wp-block-list\">\n<li>Most common for iterating over arrays or buffers. You process the item at the current pointer and immediately get ready for the next loop iteration by moving the pointer forward.<\/li>\n<\/ul>\n<\/li>\n\n\n\n<li><strong>Pre-Indexing (<code>[Xn, #offset]!<\/code>):<\/strong>\n<ul class=\"wp-block-list\">\n<li>Commonly used for <strong>stack pushes<\/strong>.<\/li>\n\n\n\n<li>Example (Pushing to a Full Descending stack): You must decrement the stack pointer <em>before<\/em> you store data, so you don&#8217;t overwrite the current top of the stack.<\/li>\n\n\n\n<li><code>STR X0, [SP, #-16]!<\/code> (Move SP down 16 bytes, then store X0 there).<\/li>\n<\/ul>\n<\/li>\n<\/ol>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<p>If you are looking at compiler-generated code (GCC, Clang, MSVC) or standard handwritten assembly examples, you are likely seeing <strong>Normal Offset<\/strong> addressing (e.g., <code>str x0, [sp, #8]<\/code>) instead of <strong>Pre-indexed<\/strong> addressing (e.g., <code>str x0, [sp, #-16]!<\/code>) for one major reason: <strong>Optimization via Fixed Stack Frames.<\/strong><\/p>\n\n\n\n<p>Here is the detailed breakdown of why modern AArch64 code prefers normal offsets over pre-indexing (push-style behavior).<\/p>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<h2 class=\"wp-block-heading\">AArch64 Stack Operations Explained<\/h2>\n\n\n\n<h3 class=\"wp-block-heading\">1. Breaking Dependency Chains (Performance)<\/h3>\n\n\n\n<p>Modern CPUs are <strong>superscalar<\/strong> and <strong>out-of-order<\/strong>. They want to execute multiple instructions at the same time.<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Pre-indexing (The &#8220;Push&#8221; approach):<\/strong><br>If you use pre-indexing for every variable, you are modifying the Stack Pointer (<code>sp<\/code>) in every instruction. <code>str x0, [sp, #-16]! ; Modifies SP str x1, [sp, #-16]! ; Depends on new SP value str x2, [sp, #-16]! ; Depends on new SP value<\/code> The CPU cannot execute the second instruction until the first one finishes updating <code>sp<\/code>. This creates a <strong>dependency chain<\/strong>, forcing the CPU to execute these instructions serially (one by one).<\/li>\n\n\n\n<li><strong>Normal Offset (The &#8220;Frame&#8221; approach):<\/strong><br>The compiler allocates the entire stack frame at once using a single subtraction.<br><code>asm sub sp, sp, #48 ; Allocate space for 3 variables (aligned) str x0, [sp, #32] ; Base address is SP, no update str x1, [sp, #16] ; Base address is SP, no update str x2, [sp, #0] ; Base address is SP, no update<\/code><br>Since the <code>str<\/code> instructions do not modify <code>sp<\/code>, they are independent of each other. The CPU can execute all three stores in parallel.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">2. Random Access for Local Variables<\/h3>\n\n\n\n<p>In high-level languages (C, C++, Rust), the stack is not just a LIFO (Last In, First Out) queue; it is a storage area for local variables.<\/p>\n\n\n\n<p>You often need to read and write variables located in the middle of the stack frame multiple times throughout a function.<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>If you used <code>push<\/code>\/<code>pop<\/code> logic (pre\/post-indexing), the relative offset to your variables would change constantly as <code>sp<\/code> moves.<\/li>\n\n\n\n<li>By allocating the frame once (<code>sub sp, sp, #size<\/code>), <code>sp<\/code> remains constant throughout the function body. This makes accessing variables via static offsets (<code>[sp, #offset]<\/code>) simple and efficient.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">3. Addressing Range Limits<\/h3>\n\n\n\n<p>AArch64 instructions have limits on how large the immediate value (the offset) can be.<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Pre-indexed (<code>!<\/code>) limit:<\/strong> The offset usually must fit within a narrower range (simpler encoding).<\/li>\n\n\n\n<li><strong>Normal Offset limit:<\/strong> The offset generally has a larger allowable range, especially when scaled.<\/li>\n<\/ul>\n\n\n\n<p>If a function has a large stack frame (e.g., a char array of 2KB), you cannot allocate it using a single pre-indexed store because the offset <code>#-2048<\/code> might not fit in the pre-indexed encoding for that specific instruction. It is safer and more standard to <code>sub<\/code> the <code>sp<\/code> separately.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">4. The 16-Byte Alignment Requirement<\/h3>\n\n\n\n<p>The AArch64 hardware and ABI (Application Binary Interface) require the stack pointer (<code>sp<\/code>) to be <strong>16-byte aligned<\/strong> whenever it is used to access memory.<\/p>\n\n\n\n<p>If you were to push single 64-bit registers (8 bytes) one by one using <code>str x0, [sp, #-8]!<\/code>, you would misalign the stack.<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>To fix this, you would have to use <code>stp<\/code> (Store Pair) to push two registers at once (<code>stp x0, x1, [sp, #-16]!<\/code>).<\/li>\n\n\n\n<li>However, if you have an odd number of registers to save, managing alignment via individual pushes becomes messy.<\/li>\n\n\n\n<li>Allocating the whole frame at once (<code>sub sp, sp, #aligned_size<\/code>) guarantees alignment is handled in one instruction.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">When WILL you see Pre-indexing?<\/h3>\n\n\n\n<p>You will typically see pre-indexing (and post-indexing) primarily in the <strong>Function Prologue<\/strong> and <strong>Epilogue<\/strong> for very small functions, specifically for saving the Frame Pointer (<code>x29<\/code>) and Link Register (<code>x30<\/code>).<\/p>\n\n\n\n<p><strong>Common Prologue:<\/strong><\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>stp x29, x30, &#91;sp, #-16]!  ; Pre-index: Allocate 16 bytes and save FP\/LR\nmov x29, sp                ; Set up frame pointer<\/code><\/pre>\n\n\n\n<p><strong>Common Epilogue:<\/strong><\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>ldp x29, x30, &#91;sp], #16    ; Post-index: Restore FP\/LR and deallocate 16 bytes\nret<\/code><\/pre>\n\n\n\n<p><strong>But for the rest of the function code?<\/strong> It will almost exclusively be Normal Offsets.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>In AArch64 (ARMv8-A 64-bit architecture), Pre-indexing and Post-indexing are memory [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[42],"tags":[13,30],"class_list":["post-269","post","type-post","status-publish","format-standard","hentry","category-knowledge-base","tag-aarch64","tag-binary"],"_links":{"self":[{"href":"https:\/\/haco.club\/index.php?rest_route=\/wp\/v2\/posts\/269","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/haco.club\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/haco.club\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/haco.club\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/haco.club\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=269"}],"version-history":[{"count":3,"href":"https:\/\/haco.club\/index.php?rest_route=\/wp\/v2\/posts\/269\/revisions"}],"predecessor-version":[{"id":272,"href":"https:\/\/haco.club\/index.php?rest_route=\/wp\/v2\/posts\/269\/revisions\/272"}],"wp:attachment":[{"href":"https:\/\/haco.club\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=269"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/haco.club\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=269"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/haco.club\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=269"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}