In AArch64 (ARM64), for a tail call to work, the current function must tear down its own stack frame before branching to the next function.
If it didn’t, the stack would grow infinitely with every tail call, causing a stack overflow.
Here is exactly how the “reuse” works at the assembly level, step-by-step.
1. The Standard Mechanism
In a normal return, a function ends with an epilogue that restores registers and the stack pointer, followed by a ret instruction. In a tail call, the compiler generates a special epilogue that does the cleanup but replaces ret with a branch instruction (b or br).
The Sequence:
- Restore Callee-Saved Registers: The function loads the saved Frame Pointer (
x29) and Link Register (x30) from its stack frame back into the registers. - Pop the Stack (Deallocation): The function adds to the Stack Pointer (
sp) to reclaim the space it used. At this exact moment, the stack is arguably “reused” because the memory is now marked as free for the next function to claim. - Jump (Branch): Instead of executing
ret(which jumps tox30), the processor executes a direct jump (b target_func) to the start of the next function.
2. Concrete Assembly Example
Imagine Function A calls Function B, and Function B tail-calls Function C.
Function B (The Tail Caller):
_FunctionB:
// --- Prologue ---
stp x29, x30, [sp, #-16]! // Push FP and LR
mov x29, sp // Set up Frame Pointer
// ... do some work ...
// --- Tail Call Preparation ---
// 1. Restore the caller's context (Function A's context)
ldp x29, x30, [sp], #16 // Pop FP and LR, and increment SP (deallocate frame)
// 2. The stack is now exactly as it was when A called B.
// SP points to A's frame. LR contains the return address back to A.
// 3. Jump to C
b _FunctionC
Function C (The Target):
When Function C starts, it sees the stack pointer (sp) pointing to Function A‘s frame. When Function C finishes and executes ret, it uses the value in x30 (LR). Since Function B restored x30 before jumping, Function C returns directly to Function A.
3. The “Stack Argument” Limitation
There is one major exception where this simple “pop before jump” strategy fails.
If Function C (the target) takes more arguments on the stack than Function B received, the tail call optimization is usually impossible (or very difficult).
- Why?
Function Bwould need to write new arguments into the stack space that belongs toFunction A(its caller), effectively corrupting the caller’s frame. - Result: In this specific case, the compiler will disable Tail Call Optimization and use a standard
call(creating a new frame) instead.