Enabling PAC and BTI on AArch64 for Linux

https://community.arm.com/arm-community-blogs/b/architectures-and-processors-blog/posts/enabling-pac-and-bti-on-aarch64

https://community.arm.com/arm-community-blogs/b/architectures-and-processors-blog/posts/p2-enabling-pac-and-bti-on-aarch64

https://community.arm.com/arm-community-blogs/b/architectures-and-processors-blog/posts/p3-enabling-pac-and-bti-on-aarch64

Source code for the examples can be found at https://gitlab.arm.com/pac-and-bti-blog/blog-example and the tag will be referenced with the "Tag" keyword before source examples.

Certain versions of Arm 64-bit processors have features that can help provide control flow integrity and reduce gadget space, making software more robust in the face of attack. Pointer Authentication Codes (PAC) work by signing and verifying indirect branch targets and branch target instructions (BTI) function by marking all valid branch locations. These technologies harden the control flow by ensuring that modification of control flow values are cryptographically verified and that control flow can only be transferred to valid locations. Details on how this works can be found in another Arm blog post on BTI and PAC.

This post is going to spare the underlying implementation details and is going to focus on the A processors and the Linux ecosystem of C/C++ code, ELF, exception handling, and toolchains. The goal being to provide a pragmatic guide for enablement throughout that ecosystem. This is also specifically for C and C++ projects that may optionally contain intermixed assembly, as assembly code modification is required to enable support. Other languages may or may not support these technologies at this time and will not be discussed. All these examples were executed on a Linux machine with support for PAC and BTI. To test if your machine has support for pac and bti you can run the following command:

cat /proc/cpuinfo | grep -E -o "bti|pac" | sort | uniq
bti
pac

Enabling C/C++ Code

Contemporary versions of both the gcc and clang compiler suites, runtimes and assorted binutils support PAC and BTI. Enabling a C or C++ project is as simple as passing the compiler option -mbranch-protection=standard. This will enable the standard set of PAC and BTI features. To facilitate in verifying the project is built with BTI one can optionally specify the linker option -zforce-bti,--fatal-warnings.

You can pass linker flags through gcc or clang by specifying -Wl, for example: -Wl,-zforce-bti,--fatal-warnings.

The linker flags will force the linker to generate an error and output what object files do not support BTI.

Most build systems will respect the environment variables CFLAGS, CXXFLAGS and LDFLAGS. For example: CFLAGS='-mbranch-protection=standard' make.

Additionally, you can check the produced ELF binary for support using readelf -n <binary> . We will create an empty C file and compile it to an object file and check the resulting object file for a set of special flags. For Example:

touch empty.c
gcc -mbranch-protection=standard -c -o empty.o empty.c
readelf -n empty.o

Displaying notes found in: .note.gnu.property
Owner Data size Description
GNU 0x00000010 NT_GNU_PROPERTY_TYPE_0
Properties: AArch64 feature: BTI, PAC

The “Properties” section will indicate PAC and/or BTI support. The main issues with supporting PAC and BTI, is projects utilize standalone assembly, and the assembly must be instrumented to provide this support.
Enabling Assembly

The simplest way to enable assembly is to rewrite it using a combination of C/C++, intrinsics and inline assembly if needed. Modern compilers are very capable of generating optimized assembly routines that are often better than hand coded assembly. However, certain use case may dictate otherwise, and thus existing or new assembly will need modification for 3 specific cases:

  • BTI: instrumenting call and jump points in assembly
  • PAC: instrumenting routines to sign and verify the link register
  • ELF: instrumenting the GNU Notes section to indicate PAC and BTI support

Example Program

We will use the following example program containing both C and Assembly sources. The C code calls an assembly routine called call_function which takes a function pointer as the first argument and jumps to it. Create the following files indicated below. The source code examples can also be found at https://gitlab.arm.com/pac-and-bti-blog/blog-example. The source repository is annotated with tags and the tag name will be associated with the example via the “Tag” keyword, along with a link, for those using the source code repository.

Tag: Example-1

main.c:

// Declaration of the assembly routines
extern void call_function(void (*func)());
extern void my_jump();

int main() {
    // Call the assembly routine **indirectly** using a function pointer
    // and pass the jump location as well.
    void (*fn)(void (*func)()) = call_function;
    fn(my_jump);
    return 0;
}

call_function.S:

.section .rodata
.align 3
.Lstring:
    .string "Hello From My Jump!"

.section .text
.global my_jump
.global call_function

my_jump:
    stp x29, x30, [sp, #-16]!
    // Print "Hello From My Jump!" using puts.
    // puts can modify registers, so push the return address in x1
    // to the stack
    adrp    x0, .Lstring        // Get the page the string is within
    add x0, x0, :lo12:.Lstring  // Get the page offset (handles relocations ADD_ABS_LO12_NC)
    bl      puts                // puts prints the string in x0

    ldp x29, x30, [sp], #16
    ret

// Function prototype
// void call_function(void (*func)())
call_function:
    // Save link register and frame pointer, allocating enough space for
    // saving the return location.
    stp x29, x30, [sp, #-16]!
    mov x29, sp
     
    // x0 is the caller's first argument, so jump
    // to the "function" pointed by x0 and save
    // the return address to the stack
    adr lr, return_loc
    br x0  //intentionally avoiding a branch and link, you'll see why later.
return_loc:
    // Restore link register and frame pointer
    ldp x29, x30, [sp], #16

    // Return from the function
    ret

Makefile:

ASFLAGS ?= $(CFLAGS)

OBJS := main.o \
	call_function.o

main: $(OBJS)
	$(CC) $(CFLAGS) $(LDFLAGS) -o $@ $^

.PHONY: clean
clean:
	@printf "Cleaning...\n" && rm -rf $(OBJS) main
Only the modified files will be shown throughout the example code from this point forward, if a file is not shown, it is expected to be unmodified from its previous state.

Compiling

To compile the example code execute:

make
./main
Hello From My Jump!

BTI

Step one when enabling BTI is to enable it through the compiler. In this case, we will use -mbranch-protection=bti so we only get the instructions for BTI and not PAC. We will also add the linker flags to force an error if BTI is not enabled within an ELF object file. We will use the Makefile to compile all the examples with differing sets of CFLAGS and LDFLAGS.

Ensure that you make clean between examples

Perform the following to compile the code with bti support:

CFLAGS="-mbranch-protection=bti" LDFLAGS='-Wl,-zforce-bti,--fatal-warnings' make
cc -mbranch-protection=bti   -c -o main.o main.c
cc -mbranch-protection=bti   -c -o call_function.o call_function.S
cc -mbranch-protection=bti -Wl,-zforce-bti,--fatal-warnings -o main main.o call_function.o
/usr/bin/ld: call_function.o: warning: BTI turned on by -z force-bti when all inputs do not have BTI in NOTE section.
collect2: error: ld returned 1 exit status
make: *** [Makefile:7: main] Error 1

As designed, the linker errored and reported that BTI is not enabled in the two assembly object files. The next step will be enabling BTI, and a convenient way of doing so, is with support from the C pre-processor which enables conditional compilation and they include features within the C/C++ languages. It can be leveraged so that BTI is support is included conditionally. BTI can be included unconditionally, and the linker will discard the GNU note section flags when combined with other object files that do not declare BTI in the GNU Notes section. Additionally, the BTI instructions will NOP, but you will still pay a cycle count penalty on the NOP operation. With that stated, let’s create a header file and include it within our assembly so that the BTI instructions are enabled only when compiled, assembled and linked explicitly with support. Documentation on the feature test macros to use can be found at Arm’s Developer documentation on Feature Test Macros.

Tag: Example-2

aarch64.h:

#ifndef _AARCH_64_H_
#define _AARCH_64_H_

/*
 * References:
 *  - https://developer.arm.com/documentation/101028/0012/5--Feature-test-macros
 *  - https://github.com/ARM-software/abi-aa/blob/main/aaelf64/aaelf64.rst
 */

#if defined(__ARM_FEATURE_BTI_DEFAULT) && __ARM_FEATURE_BTI_DEFAULT == 1
  #define BTI_J bti j /* for jumps, IE br instructions */
  #define BTI_C bti c  /* for calls, IE bl instructions */
  #define GNU_PROPERTY_AARCH64_BTI 1 /* bit 0 GNU Notes is for BTI support */
#else
  #define BTI_J
  #define BTI_C
  #define GNU_PROPERTY_AARCH64_BTI 0
#endif

/* Add the BTI support to GNU Notes section */
#if GNU_PROPERTY_AARCH64_BTI != 0
    .pushsection .note.gnu.property, "a"; /* Start a new allocatable section */
    .balign 8; /* align it on a byte boundry */
    .long 4; /* size of "GNU\0" */
    .long 0x10; /* size of descriptor */
    .long 0x5; /* NT_GNU_PROPERTY_TYPE_0 */
    .asciz "GNU";
    .long 0xc0000000; /* GNU_PROPERTY_AARCH64_FEATURE_1_AND */
    .long 4; /* Four bytes of data */
    .long GNU_PROPERTY_AARCH64_BTI; /* BTI is enabled */
    .long 0; /* padding for 8 byte alignment */
    .popsection; /* end the section */
#endif

#endif

Now that the header file is in place, let’s augment the assembly file.

call_function.S:

#include "aarch64.h"

.section .rodata
.align 3
.Lstring:
    .string "Hello From My Jump!"

.section .text
.global my_jump
.global call_function

my_jump:
    stp x29, x30, [sp, #-16]!
    // Print "Hello From My Jump!" using puts.
    // puts can modify registers, so push the return address in x1
    // to the stack
    adrp    x0, .Lstring        // Get the page the string is within
    add x0, x0, :lo12:.Lstring  // Get the page offset (handles relocations ADD_ABS_LO12_NC)
    bl      puts                // puts prints the string in x0

    ldp x29, x30, [sp], #16
    ret

// Function prototype
// void call_function(void (*func)())
call_function:
    BTI_C
    // Save link register and frame pointer, allocating enough space for
    // saving the return location.
    stp x29, x30, [sp, #-16]!
    mov x29, sp

    // x0 is the caller's first argument, so jump
    // to the "function" pointed by x0 and save
    // the return address to the stack
    adr lr, return_loc
    br x0  //intentionally avoiding a branch and link, you'll see why later.
return_loc:
    // Restore link register and frame pointer
    ldp x29, x30, [sp], #16

    // Return from the function
    ret

As a reminder, since main.c and Makefile require no modifications, it will not be displayed. However, we will need to clean, and rebuild:

make clean
LDFLAGS='-Wl,-zforce-bti,--fatal-warnings' CFLAGS="-mbranch-protection=bti" make
cc -mbranch-protection=bti -c -o main.o main.c
cc -mbranch-protection=bti -c -o call_function.o call_function.S
cc -mbranch-protection=bti -c -o my_jump.o my_jump.S
cc -mbranch-protection=bti -Wl,-zforce-bti,--fatal-warnings -o main main.o call_function.o my_jump.o

Notice now the linker no longer complains about “missing the BTI in Note section” and now we can check that the BTI bit is set in the ELF object file:

readelf -n main

Displaying notes found in: .note.gnu.property
  Owner                Data size 	Description
  GNU                  0x00000010	NT_GNU_PROPERTY_TYPE_0
      Properties: AArch64 feature: BTI

We can also execute the program:

./main
Illegal instruction (core dumped)

Wait, it did not work. Why? 

The example was intentionally omitting the bti j instruction for the landing pad for the jump. Since the ELF GNU Notes declared that it has support for BTI, the linker or loader mapped the executable pages with PROT_BTI and a runtime exception occurred, as designed. Now, let’s add the landing pad to my_jump .

Tag: Example-3

call_function.S:

#include "aarch64.h"

.section .rodata
.align 3
.Lstring:
    .string "Hello From My Jump!"

.section .text
.global my_jump
.global call_function

my_jump:
    BTI_J
    stp x29, x30, [sp, #-16]!
    // Print "Hello From My Jump!" using puts.
    // puts can modify registers, so push the return address in x1
    // to the stack
    adrp    x0, .Lstring        // Get the page the string is within
    add x0, x0, :lo12:.Lstring  // Get the page offset (handles relocations ADD_ABS_LO12_NC)
    bl      puts                // puts prints the string in x0

    ldp x29, x30, [sp], #16
    ret

// Function prototype
// void call_function(void (*func)())
call_function:
    BTI_C
    // Save link register and frame pointer, allocating enough space for
    // saving the return location.
    stp x29, x30, [sp, #-16]!
    mov x29, sp

    // x0 is the caller's first argument, so jump
    // to the "function" pointed by x0 and save
    // the return address to the stack
    adr lr, return_loc
    br x0  //Later has arrived, it's to highlight use of bti j.
return_loc:
    // Restore link register and frame pointer
    ldp x29, x30, [sp], #16

    // Return from the function
    ret

Then we can re-build the code and run the executable as follows:

make clean
CFLAGS="-mbranch-protection=bti" make
./main
Hello From My Jump!

One thing of note is when to use bti j vs bti c. Generally speaking, functions called from C/C++ will be through a bl instruction and would use bti c. Whereas assembly will need to be audited to understand the context. It is still useful to audit the C/C++ code with something like objdump -d or having gcc output the assembler. Let’s audit the generated assembly and verify how it is calling my_function .

gcc -mbranch-protection=bti -S -o main.S main.c

Then lets review the generated assembly:

    .arch armv8-a
    .file   "main.c"
    .text
    .align  2
    .global main
    .type   main, %function
main:
.LFB0:
    .cfi_startproc
    hint    34 // bti c
    stp x29, x30, [sp, -32]!
    .cfi_def_cfa_offset 32
    .cfi_offset 29, -32
    .cfi_offset 30, -24
    mov x29, sp
    adrp    x0, call_function
    add x0, x0, :lo12:call_function
    str x0, [sp, 24]
    ldr x1, [sp, 24]
    adrp    x0, my_jump
    add x0, x0, :lo12:my_jump
    blr x1
    mov w0, 0
    ldp x29, x30, [sp], 32
    .cfi_restore 30
    .cfi_restore 29
    .cfi_def_cfa_offset 0
    ret
    .cfi_endproc
.LFE0:
    .size   main, .-main
    .ident  "GCC: (GNU) 14.2.1 20240801 (Red Hat 14.2.1-1)"
    .section    .note.GNU-stack,"",@progbits
    .section    .note.gnu.property,"a"
    .align  3
    .word   4
    .word   16
    .word   5
    .string "GNU"
    .word   3221225472
    .word   4
    .word   1
    .align  3

Notice the control flow change to call_function is through blr, so the resulting routine needs a bti c. However, the control flow transfer to my_jump and return_loc is through br and thus needs a bti j. To summarize, if it is an indirect branch instruction with a link it is classified as a call. However, if it is a plain indirect branch instruction, then it is considered a jump. It’s also incredibly important to note that the invocation of ret does not need a bti landing pad, even though, conceptually it is an indirect branch using the link register. If an indirect branch was used to return to return_loc then it would need a bti j landing pad. However, using that approach would also increase the usage of bti landing pads, which would increase the amount of entry points in the code that could be called or jumped to, thus increasing the gadget space available to a potential attacker.

The BTI instruction bti jc, would be valid in both locations, but it is best to limit the scope of the target to how the program is using it, as it will limit the attackers possibilities. If an entry point serves as both a jump and call location, then it would be appropriate to mark it with a bti jc.

BTI, linkers, loaders and GNU notes

It is important to state that the GNU notes section is mandatory to get BTI support even if the instructions are present. As showcased above, when we linked in call_function.o  and my_jump.o  the linker reported that these ELF object files do not have the required BTI support indicated. This is because it is missing the GNU notes section. It does not matter if the toolchain is linking an executable or a shared object, every object file must have the support, or the support is stripped from the GNU notes in the linker produced binary. Consequently, it is very important to understand the implications of this behavior. When the loader loads the binary into memory, it checks the GNU Notes section for this support bit to indicate what memory protections to apply. This is indicated by PROT_BTI  which is a mprotect / mmap flag that can be applied to enable BTI support for that memory mapping in the MMU. If the GNU Notes section is missing the flag indicating BTI protections, then BTI protections will not be enabled for that memory region. Consider a binary that has multiple shared libraries, this allows BTI aware shared libraries to exist with non-BTI shared libraries where some protections are afforded. Namely, when a control flow change is directed into PROT_BTI marked memory, protections are enforced. If control is transferred into non-BTI memory, BTI instructions, if present, are “NOP’d” and thus not enforced. In the case of static linking, one missing object file will disable it for the whole linked binary.

PAC

Enabling PAC follows the same logical steps as BTI. However, the GNU notes field is optional, but is nice for auditing purposes and we recommended to add it. The reason this flag can be omitted, is unlike BTI,  PAC is currently a callee ABI in Linux with no changes to memory permissions. The Linux ABI is that the callee is modified to sign and verify the link register within their function context. So given the most recent assembly sources, let us modify it to support to PAC.

Tag: Example-4

aarch64.h:

#ifndef _AARCH_64_H_
#define _AARCH_64_H_
 
/*
 * References:
 *  - https://developer.arm.com/documentation/101028/0012/5--Feature-test-macros
 *  - https://github.com/ARM-software/abi-aa/blob/main/aaelf64/aaelf64.rst
 */
 
#if defined(__ARM_FEATURE_BTI_DEFAULT) && __ARM_FEATURE_BTI_DEFAULT == 1
  #define BTI_J bti j /* for jumps, IE br instructions */
  #define BTI_C bti c  /* for calls, IE bl instructions */
  #define GNU_PROPERTY_AARCH64_BTI 1 /* bit 0 GNU Notes is for BTI support */
#else
  #define BTI_J
  #define BTI_C
  #define GNU_PROPERTY_AARCH64_BTI 0
#endif
 
#if defined(__ARM_FEATURE_PAC_DEFAULT)
  #if __ARM_FEATURE_PAC_DEFAULT & 1
    #define SIGN_LR paciasp /* sign with the A key */
    #define VERIFY_LR autiasp /* verify with the A key */
  #elif __ARM_FEATURE_PAC_DEFAULT & 2
    #define SIGN_LR pacibsp /* sign with the b key */
    #define VERIFY_LR autibsp /* verify with the b key */
  #endif
  #define GNU_PROPERTY_AARCH64_POINTER_AUTH 2 /* bit 1 GNU Notes is for PAC support */
#else
  #define SIGN_LR
  #define VERIFY_LR
  #define GNU_PROPERTY_AARCH64_POINTER_AUTH 0
#endif
 
/* Add the BTI support to GNU Notes section */
#if GNU_PROPERTY_AARCH64_BTI != 0 || GNU_PROPERTY_AARCH64_POINTER_AUTH != 0
    .pushsection .note.gnu.property, "a"; /* Start a new allocatable section */
    .balign 8; /* align it on a byte boundry */
    .long 4; /* size of "GNU\0" */
    .long 0x10; /* size of descriptor */
    .long 0x5; /* NT_GNU_PROPERTY_TYPE_0 */
    .asciz "GNU";
    .long 0xc0000000; /* GNU_PROPERTY_AARCH64_FEATURE_1_AND */
    .long 4; /* Four bytes of data */
    .long (GNU_PROPERTY_AARCH64_BTI|GNU_PROPERTY_AARCH64_POINTER_AUTH); /* BTI or PAC is enabled */
    .long 0; /* padding for 8 byte alignment */
    .popsection; /* end the section */
#endif
 
#endif

call_function.S:

#include "aarch64.h"
 
.section .rodata
.align 3
.Lstring:
    .string "Hello From My Jump!"
 
.section .text
.global my_jump
.global call_function
 
my_jump:
    BTI_J
    stp x29, x30, [sp, #-16]!
    // Print "Hello From My Jump!" using puts.
    // puts can modify registers, so push the return address in x1
    // to the stack
    adrp    x0, .Lstring        // Get the page the string is within
    add x0, x0, :lo12:.Lstring  // Get the page offset (handles relocations ADD_ABS_LO12_NC)
    bl      puts                // puts prints the string in x0
 
    ldp x29, x30, [sp], #16
    ret
 
// Function prototype
// void call_function(void (*func)())
call_function:
    BTI_C
    SIGN_LR
    // Save link register and frame pointer, allocating enough space for
    // saving the return location.
    stp x29, x30, [sp, #-16]!
    mov x29, sp
 
    // x0 is the caller's first argument, so jump
    // to the "function" pointed by x0 and save
    // the return address to the stack
    adr lr, return_loc
    br x0  //Later has arrived, it's to highlight use of bti j.
return_loc:
    // Restore link register and frame pointer
    ldp x29, x30, [sp], #16
 
    // Return from the function
    VERIFY_LR
    ret

Then compile the code with -mbranch-protection=pac-ret which enables standard PAC support only.

make clean
CFLAGS="-mbranch-protection=pac-ret" make
./main
Hello From My Jump!

Since the PAC support bit was added to the GNU Notes section, readelf  should indicate PAC support.

readelf -n main

Displaying notes found in: .note.gnu.property
  Owner                Data size 	Description
  GNU                  0x00000010	NT_GNU_PROPERTY_TYPE_0
      Properties: AArch64 feature: PAC



PAC and BTI Together

PAC and BTI will function independently of each other, but like chocolate and peanut butter, they go better together. With -mbranch-protection=standard  we can enable them both. Currently, the standard argument to -mbranch-protection=  option is analogous to pac-ret+bti. 

make clean
CFLAGS="-mbranch-protection=standard" make

And readelf will indicate both PAC and BTI are supported:

readelf -n main
 
Displaying notes found in: .note.gnu.property
  Owner                Data size    Description
  GNU                  0x00000010   NT_GNU_PROPERTY_TYPE_0
      Properties: AArch64 feature: BTI, PAC

And the program will execute as expected:

./main
Hello from my_jump!

Optimizing

When both PAC and BTI are enabled, function prologs, which is the common boiler plate at the beginning of a function, will have 2 extra instructions, this is less than ideal. However, certain PAC instructions can also act as BTI landing pads, specifically in this example, the paciasp and B-Key variant pacibsp can be used to replace a bti c instruction. So, let’s modify the aarch64.h and call_function.S files to take advantage of this:

Tag: Example-5

aarch64.h:

#ifndef _AARCH_64_H_
#define _AARCH_64_H_

/*
 * References:
 *  - https://developer.arm.com/documentation/101028/0012/5--Feature-test-macros
 *  - https://github.com/ARM-software/abi-aa/blob/main/aaelf64/aaelf64.rst
 */

#if defined(__ARM_FEATURE_BTI_DEFAULT) && __ARM_FEATURE_BTI_DEFAULT == 1
  #define BTI_J bti j /* for jumps, IE br instructions */
  #define BTI_C bti c  /* for calls, IE bl instructions */
  #define GNU_PROPERTY_AARCH64_BTI 1 /* bit 0 GNU Notes is for BTI support */
#else
  #define BTI_J
  #define BTI_C
  #define GNU_PROPERTY_AARCH64_BTI 0
#endif

#if defined(__ARM_FEATURE_PAC_DEFAULT)
  #if __ARM_FEATURE_PAC_DEFAULT & 1
    #define SIGN_LR paciasp /* sign with the A key */
    #define VERIFY_LR autiasp /* verify with the A key */
  #elif __ARM_FEATURE_PAC_DEFAULT & 2
    #define SIGN_LR pacibsp /* sign with the b key */
    #define VERIFY_LR autibsp /* verify with the b key */
  #endif
  #define GNU_PROPERTY_AARCH64_POINTER_AUTH 2 /* bit 1 GNU Notes is for PAC support */
#else
  #define SIGN_LR BTI_C
  #define VERIFY_LR
  #define GNU_PROPERTY_AARCH64_POINTER_AUTH 0
#endif

/* Add the BTI support to GNU Notes section */
#if GNU_PROPERTY_AARCH64_BTI != 0 || GNU_PROPERTY_AARCH64_POINTER_AUTH != 0
    .pushsection .note.gnu.property, "a"; /* Start a new allocatable section */
    .balign 8; /* align it on a byte boundry */
    .long 4; /* size of "GNU\0" */
    .long 0x10; /* size of descriptor */
    .long 0x5; /* NT_GNU_PROPERTY_TYPE_0 */
    .asciz "GNU";
    .long 0xc0000000; /* GNU_PROPERTY_AARCH64_FEATURE_1_AND */
    .long 4; /* Four bytes of data */
    .long (GNU_PROPERTY_AARCH64_BTI|GNU_PROPERTY_AARCH64_POINTER_AUTH); /* BTI or PAC is enabled */
    .long 0; /* padding for 8 byte alignment */
    .popsection; /* end the section */
#endif

#endif

call_function.s:

#include "aarch64.h"

.section .rodata
.align 3
.Lstring:
    .string "Hello From My Jump!"

.section .text
.global my_jump
.global call_function

my_jump:
    BTI_J
    stp x29, x30, [sp, #-16]!
    // Print "Hello From My Jump!" using puts.
    // puts can modify registers, so push the return address in x1
    // to the stack
    adrp    x0, .Lstring        // Get the page the string is within
    add x0, x0, :lo12:.Lstring  // Get the page offset (handles relocations ADD_ABS_LO12_NC)
    bl      puts                // puts prints the string in x0

    ldp x29, x30, [sp], #16
    ret

// Function prototype
// void call_function(void (*func)())
call_function:
    SIGN_LR
    // Save link register and frame pointer, allocating enough space for
    // saving the return location.
    stp x29, x30, [sp, #-16]!
    mov x29, sp

    // x0 is the caller's first argument, so jump
    // to the "function" pointed by x0 and save
    // the return address to the stack
    adr lr, return_loc
    br x0  //Later has arrived, it's to highlight use of bti j.
return_loc:
    // Restore link register and frame pointer
    ldp x29, x30, [sp], #16

    // Return from the function
    VERIFY_LR
    ret

 Then build and run the example:

make clean
CFLAGS="-mbranch-protection=standard" make
./main
Hello From My Jump!

Examining the prolog to call_function shows a single paciasp instruction as the valid BTI landing pad:

objdump -d main
<snip/>
0000000000410240 <call_function>:
  410240:       d503233f        paciasp
  410244:       a9bf7bfd        stp     x29, x30, [sp, #-16]!
<snip/>

Backwards Compatibility

During this whole tutorial, we have been using the PAC and BTI instruction mnemonics directly. This poses a problem if using older toolchains that cannot support those instructions. Fortunately, the engineers foresaw this problem and utilized the hint space within the ARM architecture. The hint space, is a space for encoding instructions where they will NOP on architectures that do not support them, and work as intended on architectures that do. Also, existing toolchains are aware of hint instructions, so older toolchains will happily interact with new uses of hint instructions. Note that the encoding between the PAC or BTI instruction is the same as the hint space instruction, so this is merely for toolchains and the hardware sees no difference. So armed with this knowledge, let us modify the header file use hint instructions so older toolchains can compile our code.

Tag: Example-6

aarch64.h:

#ifndef _AARCH_64_H_
#define _AARCH_64_H_

/*
 * References:
 *  - https://developer.arm.com/documentation/101028/0012/5--Feature-test-macros
 *  - https://github.com/ARM-software/abi-aa/blob/main/aaelf64/aaelf64.rst
 */

#if defined(__ARM_FEATURE_BTI_DEFAULT) && __ARM_FEATURE_BTI_DEFAULT == 1
  #define BTI_J hint 36 /* bti j: for jumps, IE br instructions */
  #define BTI_C hint 34  /* bti c: for calls, IE bl instructions */
  #define GNU_PROPERTY_AARCH64_BTI 1 /* bit 0 GNU Notes is for BTI support */
#else
  #define BTI_J
  #define BTI_C
  #define GNU_PROPERTY_AARCH64_BTI 0
#endif

#if defined(__ARM_FEATURE_PAC_DEFAULT)
  #if __ARM_FEATURE_PAC_DEFAULT & 1
    #define SIGN_LR hint 25 /* paciasp: sign with the A key */
    #define VERIFY_LR hint 29 /* autiasp: verify with the A key */
  #elif __ARM_FEATURE_PAC_DEFAULT & 2
    #define SIGN_LR hint 27 /* pacibsp: sign with the b key */
    #define VERIFY_LR hint 32 /* autibsp: verify with the b key */
  #endif
  #define GNU_PROPERTY_AARCH64_POINTER_AUTH 2 /* bit 1 GNU Notes is for PAC support */
#else
  #define SIGN_LR BTI_C
  #define VERIFY_LR
  #define GNU_PROPERTY_AARCH64_POINTER_AUTH 0
#endif

/* Add the BTI support to GNU Notes section */
#if GNU_PROPERTY_AARCH64_BTI != 0 || GNU_PROPERTY_AARCH64_POINTER_AUTH != 0
    .pushsection .note.gnu.property, "a"; /* Start a new allocatable section */
    .balign 8; /* align it on a byte boundry */
    .long 4; /* size of "GNU\0" */
    .long 0x10; /* size of descriptor */
    .long 0x5; /* NT_GNU_PROPERTY_TYPE_0 */
    .asciz "GNU";
    .long 0xc0000000; /* GNU_PROPERTY_AARCH64_FEATURE_1_AND */
    .long 4; /* Four bytes of data */
    .long (GNU_PROPERTY_AARCH64_BTI|GNU_PROPERTY_AARCH64_POINTER_AUTH); /* BTI or PAC is enabled */
    .long 0; /* padding for 8 byte alignment */
    .popsection; /* end the section */
#endif

#endif

As always, clean and run the example:

make clean
CFLAGS="-mbranch-protection=standard" make
./main
Hello From My Jump!



Exception Handling: DWARF and CFI

If you wanted to support exception handling across assembly routines, you must implement the CFI directives to do so. The CFI, or Call Frame Information, are a set of assembler directives that handle generating the DWARF data needed to unwind the call frames and stack when a C++ exception occurs. DWARF itself is a Turing complete stack-based virtual machine, and the CFI directives can be thought of as programming that virtual machine. The DWARF code is executed to generate the required data for handling exceptions. Let’s modify our program to throw an exception and ensure it gets handled.

Tag: Example-7

Makefile:

ASFLAGS ?= $(CXXFLAGS)

OBJS := main.o \
	call_function.o

main: $(OBJS)
	$(CXX) $(CXXLAGS) $(LDFLAGS) -o $@ $^

.PHONY: clean
clean:
	@printf "Cleaning...\n" && rm -rf $(OBJS) main

call_function.S:

#include "aarch64.h"

.section .text
.global call_function

// Function prototype
// void call_function(void (*func)())
call_function:
    .cfi_startproc
    SIGN_LR
    CFI_WINDOW_SAVE
    // Save link register and frame pointer, allocating enough space for
    // saving the return location.
    stp x29, x30, [sp, #-16]!
    .cfi_def_cfa_offset 16
    .cfi_offset 29, -16
    .cfi_offset 30, -8
    mov x29, sp

    // x0 is the caller's first argument, so jump
    // to the "function" pointed by x0 and save
    // the return address to the stack
    blr x0
return_loc:
    // Restore link register and frame pointer
    ldp x29, x30, [sp], #16

    .cfi_restore 30
    .cfi_restore 29
    .cfi_def_cfa_offset 0

    // Return from the function
    VERIFY_LR
    ret
    .cfi_endproc

main.cpp:

#include <iostream>

// Declaration of the assembly routines
extern "C" {
void call_function(void (*func)());
};
static void my_exception() {
    std::cout << "Throwing exception..." << std::endl;
    throw 42;
}

int main() {
    try {
         // Call the assembly routine **indirectly** using a function pointer
        // and pass the jump location as well.
        void (*fn)(void (*func)()) = call_function;
        fn(my_exception);
    } catch (int e) {
        std::cout << "Caught exception: " << e << std::endl;
    }
    return 0;
}

Now we need to compile and run the C++ example:

make clean
CXXFLAGS="-mbranch-protection=standard" make
./main
Throwing exception...
Caught exception: 42

The major differences between this and our previous examples is that instead of main.c we now have main.cpp so we can use C++ exceptions and thus main.c is no longer needed and can be removed. We also modified call_function to call the C++ routine that throws an exception by using blr and not just br and thus my_jump is no longer needed. Additionally, the code was augmented with the required CFI directives. Note that clang and gcc  will output the CFI directives in their assembly code when generating assembly from C/C++ code using the option -S. We can now examine how to propagate an exception through an assembly layer so various parts of the runtime can make use of it.

An important part of using CFI directives is to understand the meaning of “CFA”. The CFA, or Canonical Frame Address, is what the DWARF system uses, and ultimately the unwinder, to unwind the call stack. Debuggers will also make use of this additional DWARF data. The way that DWARF works in practice, is that each function gets its own FDE, or Function Description Entry. Additionally, each FDE is related to a CIE, or Common Information Entry, which, as implied, has common information used by a set of FDEs. By default, the CIE states that the sp is the CFA, so anytime the sp  is modified we need to let DWARF know through those CFI directives. That is what .cfi_def_cfa_offset does, it lets DWARF know that the CFA is the current sp plus an offset of 16 bytes. The next thing DWARF needs to know is where to find the lr and the fp relative to the CFA. This is what .cfi_offset  does, it informs DWARF that the value for the fp  or x29, it is the same register, can be found at the current CFA at offset -16 bytes. Similarly, the same is done for x30 , or the lr  with the appropriate offset. The next CFI directive, .cfi_restore, just restores the rule for the register to the same state when .cfi_start_proc was issued. After that, .cfi_def_cfa_offset indicates that the CFA is equal to sp and finally .cfi_endproc ends the FDE entry. All of this instruments the DWARF system, which in-turn is used by debuggers, runtimes and the unwinder. All  of these systems need to know that the address in the pushed lr  is signed and they need to potentially verify the pointer and demangle the address before using it. The unwinder uses the autia1716 or autib1716 instructions to demangle the return address. Both of these are within the hint space as hint 12 and hint 14 respectively. The pointer must be demangled, as the pointer is modified to include the PAC signature, so removing the signature restores the pointer to a valid pointer.

Depending on the implementation, the auti(a|b)1716 instructions may return an invalid pointer or throw illegal instruction on signature failures.

Our header files and discussions thus far have indicated that PAC supports two keys: the A and B keys. These keys can be changed at build time through compiler options. This can be done be specifying -mbranch-protection=pac-ret+b-key. Let’s modify our latest C++ example, namely my_function.S and aarch64.h to support the B key within the required DWARF code:

Tag: Example-8

aarch64.h:

#ifndef _AARCH_64_H_
#define _AARCH_64_H_

/*
 * References:
 *  - https://developer.arm.com/documentation/101028/0012/5--Feature-test-macros
 *  - https://github.com/ARM-software/abi-aa/blob/main/aaelf64/aaelf64.rst
 */

#if defined(__ARM_FEATURE_BTI_DEFAULT) && __ARM_FEATURE_BTI_DEFAULT == 1
  #define BTI_J hint 36 /* bti j: for jumps, IE br instructions */
  #define BTI_C hint 34  /* bti c: for calls, IE bl instructions */
  #define GNU_PROPERTY_AARCH64_BTI 1 /* bit 0 GNU Notes is for BTI support */
#else
  #define BTI_J
  #define BTI_C
  #define GNU_PROPERTY_AARCH64_BTI 0
#endif

#if defined(__ARM_FEATURE_PAC_DEFAULT)
  #if __ARM_FEATURE_PAC_DEFAULT & 1
    #define SIGN_LR hint 25 /* paciasp: sign with the A key */
    #define VERIFY_LR hint 29 /* autiasp: verify with the A key */
    #define CFI_B_KEY_FRAME /* empty is no B key */
   #elif __ARM_FEATURE_PAC_DEFAULT & 2
    #define SIGN_LR hint 27 /* pacibsp: sign with the b key */
    #define VERIFY_LR hint 32 /* autibsp: verify with the b key */
    #define CFI_B_KEY_FRAME .cfi_b_key_frame
#endif
  #define CFI_WINDOW_SAVE .cfi_window_save
  #define GNU_PROPERTY_AARCH64_POINTER_AUTH 2 /* bit 1 GNU Notes is for PAC support */
#else
  #define SIGN_LR BTI_C
  #define VERIFY_LR
  #define CFI_WINDOW_SAVE
  #define CFI_B_KEY_FRAME
  #define GNU_PROPERTY_AARCH64_POINTER_AUTH 0
#endif

/* Add the BTI support to GNU Notes section */
#if GNU_PROPERTY_AARCH64_BTI != 0 || GNU_PROPERTY_AARCH64_POINTER_AUTH != 0
    .pushsection .note.gnu.property, "a"; /* Start a new allocatable section */
    .balign 8; /* align it on a byte boundry */
    .long 4; /* size of "GNU\0" */
    .long 0x10; /* size of descriptor */
    .long 0x5; /* NT_GNU_PROPERTY_TYPE_0 */
    .asciz "GNU";
    .long 0xc0000000; /* GNU_PROPERTY_AARCH64_FEATURE_1_AND */
    .long 4; /* Four bytes of data */
    .long (GNU_PROPERTY_AARCH64_BTI|GNU_PROPERTY_AARCH64_POINTER_AUTH); /* BTI or PAC is enabled */
    .long 0; /* padding for 8 byte alignment */
    .popsection; /* end the section */
#endif

#endif

call_function.S:

#include "aarch64.h"

.section .text
.global call_function

// Function prototype
// void call_function(void (*func)())
call_function:
    .cfi_startproc
    SIGN_LR
    CFI_WINDOW_SAVE
    CFI_B_KEY_FRAME
    // Save link register and frame pointer, allocating enough space for
    // saving the return location.
    stp x29, x30, [sp, #-16]!
    .cfi_def_cfa_offset 16
    .cfi_offset 29, -16
    .cfi_offset 30, -8
    mov x29, sp

    // x0 is the caller's first argument, so jump
    // to the "function" pointed by x0 and save
    // the return address to the stack
    blr x0
return_loc:
    // Restore link register and frame pointer
    ldp x29, x30, [sp], #16

    .cfi_restore 30
    .cfi_restore 29
    .cfi_def_cfa_offset 0

    // Return from the function
    VERIFY_LR
    ret
    .cfi_endproc

Compile and run the program:

make clean
CXXFLAGS="-mbranch-protection=pac-ret+b-key+bti" make
./main
Throwing exception...
Caught exception: 42

Debugging DWARF

As previously mentioned, DWARF is byte code for a virtual machine. This DWARF information is then embedded within different sections in the generated ELF files for the various consumers like the unwinder and debuggers. It is possible to dump these DWARF instructions as a dissasembled version which is rather nice for debugging. Note, we will add -g to produce some debug info for the upcoming addr2line example.

make clean
CXXFLAGS="-mbranch-protection=pac-ret+b-key+bti -g" make
readelf --debug-dump=frames call_function.o
Contents of the .eh_frame section:


00000000 0000000000000010 00000000 CIE
  Version:               1
  Augmentation:          "zR"
  Code alignment factor: 4
  Data alignment factor: -8
  Return address column: 30
  Augmentation data:     1b
  DW_CFA_def_cfa: r31 (sp) ofs 0

00000014 0000000000000020 00000018 FDE cie=00000000 pc=0000000000000000..0000000000000014
  DW_CFA_advance_loc: 4 to 0000000000000004
  DW_CFA_def_cfa_offset: 16
  DW_CFA_offset: r29 (x29) at cfa-16
  DW_CFA_offset: r30 (x30) at cfa-8
  DW_CFA_advance_loc: 12 to 0000000000000010
  DW_CFA_restore: r30 (x30)
  DW_CFA_restore: r29 (x29)
  DW_CFA_def_cfa_offset: 0
  DW_CFA_nop
  DW_CFA_nop
  DW_CFA_nop
  DW_CFA_nop
  DW_CFA_nop
  DW_CFA_nop
  DW_CFA_nop

The noteworthy elements here, for starters, is the “B” in the Augmentation string. This is within the CIE, which will be inherited by all FDEs that use it. The “B” indicates that the PAC B signing key is used. If “B” is not present, then the “A” key is in use. An example usage is demonstrated by unwinders to choose the right instruction, either autib1716 or autia1716, when demangling PAC signed addresses. The other important item to note, is the DW_CFA_AARCH64_negate_ra_state which is the output from the CFI directive .cfi_window_save. This DWARF opcode indicates that the lr is signed and that that anything interpreting the lr needs to demangle it.

Each FDE has a corresponding CIE shown by the cie= and their can be multiple CIEs. Each FDE also has an associated pc range that its valid for.

It is possible to associate and FDE to a function using addr2line, note it needs -g  in the compilation flags or you will see ? in the addr2line output:

addr2line -f -e call_function.o 0
call_function
/home/bill/workspace/blog-example/call_function.S:10

Jumping to Functions

When an indirect transfer of control flow occurs, BTI enabled hardware and its corresponding software enabled stacks, will ensure that indirect control flow transfers land on landing pad. Another way to state this, is that direct control flow changes are not checked. This is because the target address is encoded in the instruction itself and not provided externally with a potentially attacker controlled value. Consequently, instructions like br and brl and their associated instructions are checked that they land on proper landing pads. Typically, the branch instructions with a link, like brl ,are used to call functions and thus the control flow change needs to land on a bti c or bti jc  instruction. For branches that do not modify the link register, like br, they  are used for a “jump” and thus must transfer control flow to a bti j or bti jc  landing pad. However, in certain scenarios where jump oriented programing models are used, a branch or jump may be used to transfer control flow to a function that is typically called. In some cases, that function that was “jumped to” using a branch instruction is compiled code from a C or C++ compiler and thus the landing pad for that function will be a bti c  instruction. Because of this, BTI enforcement will occur and an exception thrown because jumps or branches without the link expect the first instruction for the landing pad as a bti j  instruction. To work around this possible issue, the architecture supports that if the target address is in register x16 or x17, that the BTI enforcement will allow the jump to occur to a bti c  label or a bti j label as expected. This is further discussed in Jump Oriented Programing.




Conclusion

This multi-part tutorial shows how to enable PAC and BTI through assembly functions, how PAC instructions can also serve as BTI landing pads, and how to handle PAC A and B keys in source. It also highlights how exception handling needs to be augmented through the use of CFI directives, and how to dump the CFI generated DWARF data. 

Leave a Reply

Your email address will not be published. Required fields are marked *