In AArch64 (ARMv8-A 64-bit architecture), Pre-indexing and Post-indexing are memory addressing modes used with Load (LDR) and Store (STR) instructions.
Their primary purpose is to perform Writeback: they automatically update the base register (the pointer) with a new address as part of the instruction execution. This is extremely efficient for iterating through arrays or managing stacks because it eliminates the need for a separate ADD or SUB instruction to move the pointer.
Here is the breakdown of how they work.
1. Pre-Indexed Addressing
Syntax: [base, #offset]!
Key Symbol: The exclamation mark ! at the end.
In pre-indexing, the offset is added to the base register before the memory access occurs. The base register is then permanently updated with this new address.
Sequence of Operations:
- Calculate: Add the
#offsetto the value in thebaseregister. - Update: Write this new value back into the
baseregister. - Access: Access memory using the new value.
Example:
MOV X1, #0x1000 ; Set Base pointer (X1) to address 0x1000
LDR X0, [X1, #8]! ; Pre-indexed Load
- Before:
X1=0x1000 - Action:
X1becomes0x1008(0x1000 + 8). Memory is read from address0x1008. - After:
X0holds the data from0x1008.X1is now0x1008.
C Language Analogy:
Similar to: data = *(ptr += 1); (Increment pointer, then access).
2. Post-Indexed Addressing
Syntax: [base], #offset
Key Visual: The offset is outside the square brackets.
In post-indexing, the memory access uses the current value of the base register. After the access is complete, the offset is added to the base register and the register is updated.
Sequence of Operations:
- Access: Access memory using the current value in the
baseregister. - Calculate: Add the
#offsetto thebaseregister. - Update: Write this new value back into the
baseregister.
Example:
MOV X1, #0x1000 ; Set Base pointer (X1) to address 0x1000
LDR X0, [X1], #8 ; Post-indexed Load
- Before:
X1=0x1000 - Action: Memory is read from address
0x1000. ThenX1becomes0x1008. - After:
X0holds the data from0x1000.X1is now0x1008.
C Language Analogy:
Similar to: data = *ptr++; (Access current pointer, then increment).
Summary Comparison
To visualize the difference, assume X1 starts at 0x1000:
| Mode | Syntax | Effective Address Used | Final Value of X1 |
|---|---|---|---|
| Offset (Normal) | [X1, #8] | 0x1008 | 0x1000 (No change) |
| Pre-Index | [X1, #8]! | 0x1008 | 0x1008 |
| Post-Index | [X1], #8 | 0x1000 | 0x1008 |
When to use which?
- Post-Indexing (
[Xn], #offset):- Most common for iterating over arrays or buffers. You process the item at the current pointer and immediately get ready for the next loop iteration by moving the pointer forward.
- Pre-Indexing (
[Xn, #offset]!):- Commonly used for stack pushes.
- Example (Pushing to a Full Descending stack): You must decrement the stack pointer before you store data, so you don’t overwrite the current top of the stack.
STR X0, [SP, #-16]!(Move SP down 16 bytes, then store X0 there).
If you are looking at compiler-generated code (GCC, Clang, MSVC) or standard handwritten assembly examples, you are likely seeing Normal Offset addressing (e.g., str x0, [sp, #8]) instead of Pre-indexed addressing (e.g., str x0, [sp, #-16]!) for one major reason: Optimization via Fixed Stack Frames.
Here is the detailed breakdown of why modern AArch64 code prefers normal offsets over pre-indexing (push-style behavior).
AArch64 Stack Operations Explained
1. Breaking Dependency Chains (Performance)
Modern CPUs are superscalar and out-of-order. They want to execute multiple instructions at the same time.
- Pre-indexing (The “Push” approach):
If you use pre-indexing for every variable, you are modifying the Stack Pointer (sp) in every instruction.str x0, [sp, #-16]! ; Modifies SP str x1, [sp, #-16]! ; Depends on new SP value str x2, [sp, #-16]! ; Depends on new SP valueThe CPU cannot execute the second instruction until the first one finishes updatingsp. This creates a dependency chain, forcing the CPU to execute these instructions serially (one by one). - Normal Offset (The “Frame” approach):
The compiler allocates the entire stack frame at once using a single subtraction.asm sub sp, sp, #48 ; Allocate space for 3 variables (aligned) str x0, [sp, #32] ; Base address is SP, no update str x1, [sp, #16] ; Base address is SP, no update str x2, [sp, #0] ; Base address is SP, no update
Since thestrinstructions do not modifysp, they are independent of each other. The CPU can execute all three stores in parallel.
2. Random Access for Local Variables
In high-level languages (C, C++, Rust), the stack is not just a LIFO (Last In, First Out) queue; it is a storage area for local variables.
You often need to read and write variables located in the middle of the stack frame multiple times throughout a function.
- If you used
push/poplogic (pre/post-indexing), the relative offset to your variables would change constantly asspmoves. - By allocating the frame once (
sub sp, sp, #size),spremains constant throughout the function body. This makes accessing variables via static offsets ([sp, #offset]) simple and efficient.
3. Addressing Range Limits
AArch64 instructions have limits on how large the immediate value (the offset) can be.
- Pre-indexed (
!) limit: The offset usually must fit within a narrower range (simpler encoding). - Normal Offset limit: The offset generally has a larger allowable range, especially when scaled.
If a function has a large stack frame (e.g., a char array of 2KB), you cannot allocate it using a single pre-indexed store because the offset #-2048 might not fit in the pre-indexed encoding for that specific instruction. It is safer and more standard to sub the sp separately.
4. The 16-Byte Alignment Requirement
The AArch64 hardware and ABI (Application Binary Interface) require the stack pointer (sp) to be 16-byte aligned whenever it is used to access memory.
If you were to push single 64-bit registers (8 bytes) one by one using str x0, [sp, #-8]!, you would misalign the stack.
- To fix this, you would have to use
stp(Store Pair) to push two registers at once (stp x0, x1, [sp, #-16]!). - However, if you have an odd number of registers to save, managing alignment via individual pushes becomes messy.
- Allocating the whole frame at once (
sub sp, sp, #aligned_size) guarantees alignment is handled in one instruction.
When WILL you see Pre-indexing?
You will typically see pre-indexing (and post-indexing) primarily in the Function Prologue and Epilogue for very small functions, specifically for saving the Frame Pointer (x29) and Link Register (x30).
Common Prologue:
stp x29, x30, [sp, #-16]! ; Pre-index: Allocate 16 bytes and save FP/LR
mov x29, sp ; Set up frame pointer
Common Epilogue:
ldp x29, x30, [sp], #16 ; Post-index: Restore FP/LR and deallocate 16 bytes
ret
But for the rest of the function code? It will almost exclusively be Normal Offsets.