Tailcall in AArch64

In AArch64 (ARM64), for a tail call to work, the current function must tear down its own stack frame before branching to the next function.

If it didn’t, the stack would grow infinitely with every tail call, causing a stack overflow.

Here is exactly how the “reuse” works at the assembly level, step-by-step.

1. The Standard Mechanism

In a normal return, a function ends with an epilogue that restores registers and the stack pointer, followed by a ret instruction. In a tail call, the compiler generates a special epilogue that does the cleanup but replaces ret with a branch instruction (b or br).

The Sequence:

  1. Restore Callee-Saved Registers: The function loads the saved Frame Pointer (x29) and Link Register (x30) from its stack frame back into the registers.
  2. Pop the Stack (Deallocation): The function adds to the Stack Pointer (sp) to reclaim the space it used. At this exact moment, the stack is arguably “reused” because the memory is now marked as free for the next function to claim.
  3. Jump (Branch): Instead of executing ret (which jumps to x30), the processor executes a direct jump (b target_func) to the start of the next function.

2. Concrete Assembly Example

Imagine Function A calls Function B, and Function B tail-calls Function C.

Function B (The Tail Caller):

_FunctionB:
    // --- Prologue ---
    stp     x29, x30, [sp, #-16]!   // Push FP and LR
    mov     x29, sp                 // Set up Frame Pointer

    // ... do some work ...

    // --- Tail Call Preparation ---
    // 1. Restore the caller's context (Function A's context)
    ldp     x29, x30, [sp], #16     // Pop FP and LR, and increment SP (deallocate frame)

    // 2. The stack is now exactly as it was when A called B.
    //    SP points to A's frame. LR contains the return address back to A.

    // 3. Jump to C
    b       _FunctionC

Function C (The Target):
When Function C starts, it sees the stack pointer (sp) pointing to Function A‘s frame. When Function C finishes and executes ret, it uses the value in x30 (LR). Since Function B restored x30 before jumping, Function C returns directly to Function A.

3. The “Stack Argument” Limitation

There is one major exception where this simple “pop before jump” strategy fails.

If Function C (the target) takes more arguments on the stack than Function B received, the tail call optimization is usually impossible (or very difficult).

  • Why? Function B would need to write new arguments into the stack space that belongs to Function A (its caller), effectively corrupting the caller’s frame.
  • Result: In this specific case, the compiler will disable Tail Call Optimization and use a standard call (creating a new frame) instead.

Leave a Reply

Your email address will not be published. Required fields are marked *