AArch64 Pre/Post Indexing – My Personal Garden

In AArch64 (ARMv8-A 64-bit architecture), Pre-indexing and Post-indexing are memory addressing modes used with Load (LDR) and Store (STR) instructions.

Their primary purpose is to perform Writeback: they automatically update the base register (the pointer) with a new address as part of the instruction execution. This is extremely efficient for iterating through arrays or managing stacks because it eliminates the need for a separate ADD or SUB instruction to move the pointer.

Here is the breakdown of how they work.

1. Pre-Indexed Addressing

Syntax: [base, #offset]!
Key Symbol: The exclamation mark ! at the end.

In pre-indexing, the offset is added to the base register before the memory access occurs. The base register is then permanently updated with this new address.

Sequence of Operations:

Calculate: Add the #offset to the value in the base register.
Update: Write this new value back into the base register.
Access: Access memory using the new value.

Example:

MOV X1, #0x1000     ; Set Base pointer (X1) to address 0x1000
LDR X0, [X1, #8]!   ; Pre-indexed Load

Before: X1 = 0x1000
Action: X1 becomes 0x1008 (0x1000 + 8). Memory is read from address 0x1008.
After: X0 holds the data from 0x1008. X1 is now 0x1008.

C Language Analogy:
Similar to: data = *(ptr += 1); (Increment pointer, then access).

2. Post-Indexed Addressing

Syntax: [base], #offset
Key Visual: The offset is outside the square brackets.

In post-indexing, the memory access uses the current value of the base register. After the access is complete, the offset is added to the base register and the register is updated.

Sequence of Operations:

Access: Access memory using the current value in the base register.
Calculate: Add the #offset to the base register.
Update: Write this new value back into the base register.

Example:

MOV X1, #0x1000     ; Set Base pointer (X1) to address 0x1000
LDR X0, [X1], #8    ; Post-indexed Load

Before: X1 = 0x1000
Action: Memory is read from address 0x1000. Then X1 becomes 0x1008.
After: X0 holds the data from 0x1000. X1 is now 0x1008.

C Language Analogy:
Similar to: data = *ptr++; (Access current pointer, then increment).

Summary Comparison

To visualize the difference, assume X1 starts at 0x1000:

Mode	Syntax	Effective Address Used	Final Value of X1
Offset (Normal)	`[X1, #8]`	`0x1008`	`0x1000` (No change)
Pre-Index	`[X1, #8]!`	`0x1008`	`0x1008`
Post-Index	`[X1], #8`	`0x1000`	`0x1008`

When to use which?

Post-Indexing ([Xn], #offset):
- Most common for iterating over arrays or buffers. You process the item at the current pointer and immediately get ready for the next loop iteration by moving the pointer forward.
Pre-Indexing ([Xn, #offset]!):
- Commonly used for stack pushes.
- Example (Pushing to a Full Descending stack): You must decrement the stack pointer before you store data, so you don’t overwrite the current top of the stack.
- STR X0, [SP, #-16]! (Move SP down 16 bytes, then store X0 there).

If you are looking at compiler-generated code (GCC, Clang, MSVC) or standard handwritten assembly examples, you are likely seeing Normal Offset addressing (e.g., str x0, [sp, #8]) instead of Pre-indexed addressing (e.g., str x0, [sp, #-16]!) for one major reason: Optimization via Fixed Stack Frames.

Here is the detailed breakdown of why modern AArch64 code prefers normal offsets over pre-indexing (push-style behavior).

AArch64 Stack Operations Explained

1. Breaking Dependency Chains (Performance)

Modern CPUs are superscalar and out-of-order. They want to execute multiple instructions at the same time.

Pre-indexing (The “Push” approach):
If you use pre-indexing for every variable, you are modifying the Stack Pointer (sp) in every instruction. str x0, [sp, #-16]! ; Modifies SP str x1, [sp, #-16]! ; Depends on new SP value str x2, [sp, #-16]! ; Depends on new SP value The CPU cannot execute the second instruction until the first one finishes updating sp. This creates a dependency chain, forcing the CPU to execute these instructions serially (one by one).
Normal Offset (The “Frame” approach):
The compiler allocates the entire stack frame at once using a single subtraction.
asm sub sp, sp, #48 ; Allocate space for 3 variables (aligned) str x0, [sp, #32] ; Base address is SP, no update str x1, [sp, #16] ; Base address is SP, no update str x2, [sp, #0] ; Base address is SP, no update
Since the str instructions do not modify sp, they are independent of each other. The CPU can execute all three stores in parallel.

2. Random Access for Local Variables

In high-level languages (C, C++, Rust), the stack is not just a LIFO (Last In, First Out) queue; it is a storage area for local variables.

You often need to read and write variables located in the middle of the stack frame multiple times throughout a function.

If you used push/pop logic (pre/post-indexing), the relative offset to your variables would change constantly as sp moves.
By allocating the frame once (sub sp, sp, #size), sp remains constant throughout the function body. This makes accessing variables via static offsets ([sp, #offset]) simple and efficient.

3. Addressing Range Limits

AArch64 instructions have limits on how large the immediate value (the offset) can be.

Pre-indexed (!) limit: The offset usually must fit within a narrower range (simpler encoding).
Normal Offset limit: The offset generally has a larger allowable range, especially when scaled.

If a function has a large stack frame (e.g., a char array of 2KB), you cannot allocate it using a single pre-indexed store because the offset #-2048 might not fit in the pre-indexed encoding for that specific instruction. It is safer and more standard to sub the sp separately.

4. The 16-Byte Alignment Requirement

The AArch64 hardware and ABI (Application Binary Interface) require the stack pointer (sp) to be 16-byte aligned whenever it is used to access memory.

If you were to push single 64-bit registers (8 bytes) one by one using str x0, [sp, #-8]!, you would misalign the stack.

To fix this, you would have to use stp (Store Pair) to push two registers at once (stp x0, x1, [sp, #-16]!).
However, if you have an odd number of registers to save, managing alignment via individual pushes becomes messy.
Allocating the whole frame at once (sub sp, sp, #aligned_size) guarantees alignment is handled in one instruction.

When WILL you see Pre-indexing?

You will typically see pre-indexing (and post-indexing) primarily in the Function Prologue and Epilogue for very small functions, specifically for saving the Frame Pointer (x29) and Link Register (x30).

Common Prologue:

stp x29, x30, [sp, #-16]!  ; Pre-index: Allocate 16 bytes and save FP/LR
mov x29, sp                ; Set up frame pointer

Common Epilogue:

ldp x29, x30, [sp], #16    ; Post-index: Restore FP/LR and deallocate 16 bytes
ret

But for the rest of the function code? It will almost exclusively be Normal Offsets.

Post Views: 131

1. Pre-Indexed Addressing

2. Post-Indexed Addressing

Summary Comparison

When to use which?

AArch64 Stack Operations Explained

1. Breaking Dependency Chains (Performance)

2. Random Access for Local Variables

3. Addressing Range Limits

4. The 16-Byte Alignment Requirement

When WILL you see Pre-indexing?

Leave a Reply Cancel reply

Executable Startup And Initialization

AFL_SKIP_BIN_CHECK

PIE Relocation: Tagging Addresses

Understanding Binary File Components