{"id":117,"date":"2025-03-18T13:20:18","date_gmt":"2025-03-18T13:20:18","guid":{"rendered":"https:\/\/haco.club\/?p=117"},"modified":"2025-03-21T07:55:01","modified_gmt":"2025-03-21T07:55:01","slug":"enabling-pac-and-bti-on-aarch64-for-linux","status":"publish","type":"post","link":"https:\/\/haco.club\/?p=117","title":{"rendered":"Enabling PAC and BTI on AArch64 for Linux"},"content":{"rendered":"\n<blockquote class=\"wp-block-quote is-layout-flow wp-block-quote-is-layout-flow\">\n<p><a href=\"https:\/\/community.arm.com\/arm-community-blogs\/b\/architectures-and-processors-blog\/posts\/enabling-pac-and-bti-on-aarch64\">https:\/\/community.arm.com\/arm-community-blogs\/b\/architectures-and-processors-blog\/posts\/enabling-pac-and-bti-on-aarch64<\/a><\/p>\n\n\n\n<p><a href=\"https:\/\/community.arm.com\/arm-community-blogs\/b\/architectures-and-processors-blog\/posts\/p2-enabling-pac-and-bti-on-aarch64\">https:\/\/community.arm.com\/arm-community-blogs\/b\/architectures-and-processors-blog\/posts\/p2-enabling-pac-and-bti-on-aarch64<\/a><\/p>\n\n\n\n<p><a href=\"https:\/\/community.arm.com\/arm-community-blogs\/b\/architectures-and-processors-blog\/posts\/p3-enabling-pac-and-bti-on-aarch64\">https:\/\/community.arm.com\/arm-community-blogs\/b\/architectures-and-processors-blog\/posts\/p3-enabling-pac-and-bti-on-aarch64<\/a><\/p>\n<\/blockquote>\n\n\n\n<pre class=\"wp-block-code\"><code>Source code for the examples can be found at https:\/\/gitlab.arm.com\/pac-and-bti-blog\/blog-example and the tag will be referenced with the \"Tag\" keyword before source examples.<\/code><\/pre>\n\n\n\n<p>Certain versions of Arm 64-bit processors have features that can help provide control flow integrity and reduce gadget space, making software more robust in the face of attack. Pointer Authentication Codes (PAC) work by signing and verifying indirect branch targets and branch target instructions (BTI) function by marking all valid branch locations. These technologies harden the control flow by ensuring that modification of control flow values are cryptographically verified and that control flow can only be transferred to valid locations.&nbsp;Details on how this works can be found&nbsp;in another&nbsp;<a href=\"https:\/\/community.arm.com\/arm-community-blogs\/b\/architectures-and-processors-blog\/posts\/armv8-1-m-pointer-authentication-and-branch-target-identification-extension.\">Arm blog post on BTI and PAC<\/a>.<\/p>\n\n\n\n<p>This post is going to spare the underlying implementation details and is going to focus on the A processors and the Linux ecosystem of C\/C++ code,&nbsp;<code>ELF,<\/code>&nbsp;exception handling, and toolchains. The goal being to provide a pragmatic guide for enablement throughout that ecosystem. This is also specifically for C and C++ projects that may optionally contain intermixed assembly, as assembly code modification is required to enable support. Other languages may or may not support these technologies at this time and will not be discussed. All these examples were executed on a Linux machine with support for PAC and BTI. To test if your machine has support for&nbsp;<code>pac<\/code>&nbsp;and&nbsp;<code>bti<\/code>&nbsp;you can run the following command:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>cat \/proc\/cpuinfo | grep -E -o \"bti|pac\" | sort | uniq\nbti\npac<\/code><\/pre>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"mcetoc_1i9k8hdbk2\">Enabling C\/C++ Code<\/h2>\n\n\n\n<p>Contemporary versions of both the&nbsp;<code>gcc<\/code>&nbsp;and&nbsp;<code>clang<\/code>&nbsp;compiler suites, runtimes and assorted&nbsp;<code>binutils<\/code>&nbsp;support PAC and BTI. Enabling a C or C++ project is as simple as passing the compiler option&nbsp;<code>-mbranch-protection=standard<\/code>. This will enable the standard set of PAC and BTI features. To facilitate in verifying the project is built with BTI one can optionally specify the linker option&nbsp;<code>-zforce-bti,--fatal-warnings.<\/code><\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>You can pass linker flags through gcc or clang by specifying -Wl, for example: -Wl,-zforce-bti,--fatal-warnings.<\/code><\/pre>\n\n\n\n<p>The linker flags will force the linker to generate an error and output what object files do not support BTI.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>Most build systems will respect the environment variables CFLAGS, CXXFLAGS and LDFLAGS. For example: CFLAGS='-mbranch-protection=standard' make.<\/code><\/pre>\n\n\n\n<p>Additionally, you can check the produced ELF binary for support using&nbsp;<code>readelf -n &lt;binary&gt;<\/code>&nbsp;. We will create an empty C file and compile it to an object file and check the resulting object file for a set of special flags. For Example:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>touch empty.c\ngcc -mbranch-protection=standard -c -o empty.o empty.c\nreadelf -n empty.o\n\nDisplaying notes found in: .note.gnu.property\nOwner Data size Description\nGNU 0x00000010 NT_GNU_PROPERTY_TYPE_0\nProperties: AArch64 feature: BTI, PAC<\/code><\/pre>\n\n\n\n<p>The &#8220;Properties&#8221; section will indicate PAC and\/or BTI support.&nbsp;The main issues with supporting PAC and BTI, is projects utilize standalone assembly, and the assembly must be instrumented to provide this support.<br>Enabling Assembly<\/p>\n\n\n\n<p>The simplest way to enable assembly is to rewrite it using a combination of C\/C++, intrinsics and inline assembly if needed. Modern compilers are very capable of generating optimized assembly routines that are often better than hand coded assembly. However, certain use case may dictate otherwise, and thus existing or new assembly will need modification for&nbsp;3 specific cases:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>BTI: instrumenting call and jump points in assembly<\/li>\n\n\n\n<li>PAC: instrumenting routines to sign and verify the link register<\/li>\n\n\n\n<li>ELF: instrumenting the GNU Notes section to indicate PAC and BTI support<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"mcetoc_1i9k8ju4j4\">Example Program<\/h3>\n\n\n\n<p>We will use the following example program containing both C and Assembly sources. The C code calls an assembly routine called&nbsp;<code>call_function<\/code>&nbsp;which takes a function pointer as the first argument and&nbsp;<strong>jumps<\/strong>&nbsp;to it. Create the following files indicated below. The source code examples can also be found&nbsp;at&nbsp;<a id=\"\" href=\"https:\/\/gitlab.arm.com\/pac-and-bti-blog\/blog-example\">https:\/\/gitlab.arm.com\/pac-and-bti-blog\/blog-example<\/a>. The source repository is annotated with tags and the tag name will be associated with the example via the &#8220;Tag&#8221; keyword, along with a link, for those using the source code repository.<\/p>\n\n\n\n<p><strong>Tag:&nbsp;<a href=\"https:\/\/gitlab.arm.com\/pac-and-bti-blog\/blog-example\/-\/tree\/Example-1?ref_type=tags\">Example-1<\/a><\/strong><\/p>\n\n\n\n<p>main.c:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>\/\/ Declaration of the assembly routines\nextern void call_function(void (*func)());\nextern void my_jump();\n\nint main() {\n    \/\/ Call the assembly routine **indirectly** using a function pointer\n    \/\/ and pass the jump location as well.\n    void (*fn)(void (*func)()) = call_function;\n    fn(my_jump);\n    return 0;\n}<\/code><\/pre>\n\n\n\n<p>call_function.S:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>.section .rodata\n.align 3\n.Lstring:\n    .string \"Hello From My Jump!\"\n\n.section .text\n.global my_jump\n.global call_function\n\nmy_jump:\n    stp x29, x30, &#91;sp, #-16]!\n    \/\/ Print \"Hello From My Jump!\" using puts.\n    \/\/ puts can modify registers, so push the return address in x1\n    \/\/ to the stack\n    adrp    x0, .Lstring        \/\/ Get the page the string is within\n    add x0, x0, :lo12:.Lstring  \/\/ Get the page offset (handles relocations ADD_ABS_LO12_NC)\n    bl      puts                \/\/ puts prints the string in x0\n\n    ldp x29, x30, &#91;sp], #16\n    ret\n\n\/\/ Function prototype\n\/\/ void call_function(void (*func)())\ncall_function:\n    \/\/ Save link register and frame pointer, allocating enough space for\n    \/\/ saving the return location.\n    stp x29, x30, &#91;sp, #-16]!\n    mov x29, sp\n     \n    \/\/ x0 is the caller's first argument, so jump\n    \/\/ to the \"function\" pointed by x0 and save\n    \/\/ the return address to the stack\n    adr lr, return_loc\n    br x0  \/\/intentionally avoiding a branch and link, you'll see why later.\nreturn_loc:\n    \/\/ Restore link register and frame pointer\n    ldp x29, x30, &#91;sp], #16\n\n    \/\/ Return from the function\n    ret<\/code><\/pre>\n\n\n\n<p>Makefile:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>ASFLAGS ?= $(CFLAGS)\n\nOBJS := main.o \\\n\tcall_function.o\n\nmain: $(OBJS)\n\t$(CC) $(CFLAGS) $(LDFLAGS) -o $@ $^\n\n.PHONY: clean\nclean:\n\t@printf \"Cleaning...\\n\" &amp;&amp; rm -rf $(OBJS) main<\/code><\/pre>\n\n\n\n<pre class=\"wp-block-code\"><code>Only the modified files will be shown throughout the example code from this point forward, if a file is not shown, it is expected to be unmodified from its previous state.<\/code><\/pre>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"mcetoc_1i9k8nlql5\">Compiling<\/h3>\n\n\n\n<p>To compile the example code execute:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>make\n.\/main\nHello From My Jump!<\/code><\/pre>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"mcetoc_1i9k8oh0n6\">BTI<\/h2>\n\n\n\n<p>Step one when enabling BTI is to enable it through the compiler. In this case, we will use&nbsp;<code>-mbranch-protection=bti<\/code>&nbsp;so we only get the instructions for BTI and not PAC. We will also add the linker flags to force an error if BTI is not enabled within an&nbsp;<code>ELF<\/code>&nbsp;object file. We will use the&nbsp;<code>Makefile<\/code>&nbsp;to compile all the examples with differing sets of&nbsp;<code>CFLAGS and LDFLAGS<\/code>.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>Ensure that you make clean between examples<\/code><\/pre>\n\n\n\n<p>Perform the following to compile the code with bti support:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>CFLAGS=\"-mbranch-protection=bti\" LDFLAGS='-Wl,-zforce-bti,--fatal-warnings' make\ncc -mbranch-protection=bti   -c -o main.o main.c\ncc -mbranch-protection=bti   -c -o call_function.o call_function.S\ncc -mbranch-protection=bti -Wl,-zforce-bti,--fatal-warnings -o main main.o call_function.o\n\/usr\/bin\/ld: call_function.o: warning: BTI turned on by -z force-bti when all inputs do not have BTI in NOTE section.\ncollect2: error: ld returned 1 exit status\nmake: *** &#91;Makefile:7: main] Error 1<\/code><\/pre>\n\n\n\n<p>As designed, the linker errored and reported that BTI is not enabled in the two assembly object files. The next step will be enabling BTI, and a convenient way of doing so, is with support from the C pre-processor which enables conditional compilation and they include features within the C\/C++ languages. It can be leveraged so that BTI is support is included conditionally. BTI can be included unconditionally, and the linker will discard the GNU note section flags when combined with other object files that do not declare BTI in the GNU Notes section. Additionally, the BTI instructions will NOP, but you will still pay a cycle count penalty on the NOP operation. With that stated, let&#8217;s create a header file and include it within our assembly so that the BTI instructions are enabled only when compiled, assembled and linked explicitly with support.&nbsp;Documentation on the feature test macros to use can be found at&nbsp;<a href=\"https:\/\/developer.arm.com\/documentation\/101028\/0012\/5--Feature-test-macros.\" rel=\"noreferrer noopener\" target=\"_blank\">Arm&#8217;s Developer documentation on Feature Test Macros<\/a>.<\/p>\n\n\n\n<p><strong>Tag:&nbsp;<a href=\"https:\/\/gitlab.arm.com\/pac-and-bti-blog\/blog-example\/-\/tree\/Example-2?ref_type=tags\">Example-2<\/a><\/strong><\/p>\n\n\n\n<p>aarch64.h:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>#ifndef _AARCH_64_H_\n#define _AARCH_64_H_\n\n\/*\n * References:\n *  - https:\/\/developer.arm.com\/documentation\/101028\/0012\/5--Feature-test-macros\n *  - https:\/\/github.com\/ARM-software\/abi-aa\/blob\/main\/aaelf64\/aaelf64.rst\n *\/\n\n#if defined(__ARM_FEATURE_BTI_DEFAULT) &amp;&amp; __ARM_FEATURE_BTI_DEFAULT == 1\n  #define BTI_J bti j \/* for jumps, IE br instructions *\/\n  #define BTI_C bti c  \/* for calls, IE bl instructions *\/\n  #define GNU_PROPERTY_AARCH64_BTI 1 \/* bit 0 GNU Notes is for BTI support *\/\n#else\n  #define BTI_J\n  #define BTI_C\n  #define GNU_PROPERTY_AARCH64_BTI 0\n#endif\n\n\/* Add the BTI support to GNU Notes section *\/\n#if GNU_PROPERTY_AARCH64_BTI != 0\n    .pushsection .note.gnu.property, \"a\"; \/* Start a new allocatable section *\/\n    .balign 8; \/* align it on a byte boundry *\/\n    .long 4; \/* size of \"GNU\\0\" *\/\n    .long 0x10; \/* size of descriptor *\/\n    .long 0x5; \/* NT_GNU_PROPERTY_TYPE_0 *\/\n    .asciz \"GNU\";\n    .long 0xc0000000; \/* GNU_PROPERTY_AARCH64_FEATURE_1_AND *\/\n    .long 4; \/* Four bytes of data *\/\n    .long GNU_PROPERTY_AARCH64_BTI; \/* BTI is enabled *\/\n    .long 0; \/* padding for 8 byte alignment *\/\n    .popsection; \/* end the section *\/\n#endif\n\n#endif<\/code><\/pre>\n\n\n\n<p>Now that the header file is in place, let&#8217;s augment the assembly file.<\/p>\n\n\n\n<p>call_function.S:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>#include \"aarch64.h\"\n\n.section .rodata\n.align 3\n.Lstring:\n    .string \"Hello From My Jump!\"\n\n.section .text\n.global my_jump\n.global call_function\n\nmy_jump:\n    stp x29, x30, &#91;sp, #-16]!\n    \/\/ Print \"Hello From My Jump!\" using puts.\n    \/\/ puts can modify registers, so push the return address in x1\n    \/\/ to the stack\n    adrp    x0, .Lstring        \/\/ Get the page the string is within\n    add x0, x0, :lo12:.Lstring  \/\/ Get the page offset (handles relocations ADD_ABS_LO12_NC)\n    bl      puts                \/\/ puts prints the string in x0\n\n    ldp x29, x30, &#91;sp], #16\n    ret\n\n\/\/ Function prototype\n\/\/ void call_function(void (*func)())\ncall_function:\n    BTI_C\n    \/\/ Save link register and frame pointer, allocating enough space for\n    \/\/ saving the return location.\n    stp x29, x30, &#91;sp, #-16]!\n    mov x29, sp\n\n    \/\/ x0 is the caller's first argument, so jump\n    \/\/ to the \"function\" pointed by x0 and save\n    \/\/ the return address to the stack\n    adr lr, return_loc\n    br x0  \/\/intentionally avoiding a branch and link, you'll see why later.\nreturn_loc:\n    \/\/ Restore link register and frame pointer\n    ldp x29, x30, &#91;sp], #16\n\n    \/\/ Return from the function\n    ret<\/code><\/pre>\n\n\n\n<p>As a reminder, since&nbsp;<code>main.c<\/code>&nbsp;<code>and Makefile&nbsp;<\/code>require no modifications, it will not be displayed. However, we will need to clean, and rebuild:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>make clean\nLDFLAGS='-Wl,-zforce-bti,--fatal-warnings' CFLAGS=\"-mbranch-protection=bti\" make\ncc -mbranch-protection=bti -c -o main.o main.c\ncc -mbranch-protection=bti -c -o call_function.o call_function.S\ncc -mbranch-protection=bti -c -o my_jump.o my_jump.S\ncc -mbranch-protection=bti -Wl,-zforce-bti,--fatal-warnings -o main main.o call_function.o my_jump.o<\/code><\/pre>\n\n\n\n<p>Notice now the linker no longer complains about &#8220;missing the BTI in Note section&#8221; and now we can check that the BTI bit is set in the ELF object file:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>readelf -n main\n\nDisplaying notes found in: .note.gnu.property\n  Owner                Data size \tDescription\n  GNU                  0x00000010\tNT_GNU_PROPERTY_TYPE_0\n      Properties: AArch64 feature: BTI<\/code><\/pre>\n\n\n\n<p>We can also execute the program:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>.\/main\nIllegal instruction (core dumped)<\/code><\/pre>\n\n\n\n<p>Wait, it did not work. Why?&nbsp;<\/p>\n\n\n\n<p>The example was intentionally omitting the&nbsp;<code>bti j<\/code>&nbsp;instruction for the landing pad for the jump. Since the ELF GNU Notes declared that it has support for BTI, the linker or loader mapped the executable pages with&nbsp;<code>PROT_BTI<\/code>&nbsp;and a runtime exception occurred, as designed. Now, let&#8217;s add the landing pad to&nbsp;<code>my_jump<\/code>&nbsp;.<\/p>\n\n\n\n<p><strong>Tag:&nbsp;<a href=\"https:\/\/gitlab.arm.com\/pac-and-bti-blog\/blog-example\/-\/tree\/Example-3?ref_type=tags\">Example-3<\/a><\/strong><\/p>\n\n\n\n<p>call_function.S:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>#include \"aarch64.h\"\n\n.section .rodata\n.align 3\n.Lstring:\n    .string \"Hello From My Jump!\"\n\n.section .text\n.global my_jump\n.global call_function\n\nmy_jump:\n    BTI_J\n    stp x29, x30, &#91;sp, #-16]!\n    \/\/ Print \"Hello From My Jump!\" using puts.\n    \/\/ puts can modify registers, so push the return address in x1\n    \/\/ to the stack\n    adrp    x0, .Lstring        \/\/ Get the page the string is within\n    add x0, x0, :lo12:.Lstring  \/\/ Get the page offset (handles relocations ADD_ABS_LO12_NC)\n    bl      puts                \/\/ puts prints the string in x0\n\n    ldp x29, x30, &#91;sp], #16\n    ret\n\n\/\/ Function prototype\n\/\/ void call_function(void (*func)())\ncall_function:\n    BTI_C\n    \/\/ Save link register and frame pointer, allocating enough space for\n    \/\/ saving the return location.\n    stp x29, x30, &#91;sp, #-16]!\n    mov x29, sp\n\n    \/\/ x0 is the caller's first argument, so jump\n    \/\/ to the \"function\" pointed by x0 and save\n    \/\/ the return address to the stack\n    adr lr, return_loc\n    br x0  \/\/Later has arrived, it's to highlight use of bti j.\nreturn_loc:\n    \/\/ Restore link register and frame pointer\n    ldp x29, x30, &#91;sp], #16\n\n    \/\/ Return from the function\n    ret<\/code><\/pre>\n\n\n\n<p>Then we can re-build the code and run the executable as follows:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>make clean\nCFLAGS=\"-mbranch-protection=bti\" make\n.\/main\nHello From My Jump!<\/code><\/pre>\n\n\n\n<p>One thing of note is when to use&nbsp;<code>bti j<\/code>&nbsp;vs bti c. Generally speaking, functions called from C\/C++ will be through a&nbsp;<code>bl<\/code>&nbsp;instruction and would use&nbsp;<code>bti c<\/code>. Whereas assembly will need to be audited to understand the context. It is still useful to audit the C\/C++ code with something like&nbsp;<code>objdump -d<\/code>&nbsp;or having gcc output the assembler. Let&#8217;s audit the generated assembly and verify how it is calling&nbsp;<code>my_function<\/code>&nbsp;.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>gcc -mbranch-protection=bti -S -o main.S main.c<\/code><\/pre>\n\n\n\n<p>Then lets review the generated assembly:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>    .arch armv8-a\n    .file   \"main.c\"\n    .text\n    .align  2\n    .global main\n    .type   main, %function\nmain:\n.LFB0:\n    .cfi_startproc\n    hint    34 \/\/ bti c\n    stp x29, x30, &#91;sp, -32]!\n    .cfi_def_cfa_offset 32\n    .cfi_offset 29, -32\n    .cfi_offset 30, -24\n    mov x29, sp\n    adrp    x0, call_function\n    add x0, x0, :lo12:call_function\n    str x0, &#91;sp, 24]\n    ldr x1, &#91;sp, 24]\n    adrp    x0, my_jump\n    add x0, x0, :lo12:my_jump\n    blr x1\n    mov w0, 0\n    ldp x29, x30, &#91;sp], 32\n    .cfi_restore 30\n    .cfi_restore 29\n    .cfi_def_cfa_offset 0\n    ret\n    .cfi_endproc\n.LFE0:\n    .size   main, .-main\n    .ident  \"GCC: (GNU) 14.2.1 20240801 (Red Hat 14.2.1-1)\"\n    .section    .note.GNU-stack,\"\",@progbits\n    .section    .note.gnu.property,\"a\"\n    .align  3\n    .word   4\n    .word   16\n    .word   5\n    .string \"GNU\"\n    .word   3221225472\n    .word   4\n    .word   1\n    .align  3<\/code><\/pre>\n\n\n\n<p>Notice the control flow change to&nbsp;<code>call_function<\/code>&nbsp;is through blr, so the resulting routine needs a bti c. However, the control flow transfer to my_jump and return_loc is through&nbsp;<code>br<\/code>&nbsp;and thus needs a&nbsp;<code>bti j<\/code>. To summarize, if it is an indirect branch instruction with a link it is classified as a call. However, if it is a plain indirect branch instruction, then it is considered a jump. It&#8217;s also incredibly important to&nbsp;note that the invocation of&nbsp;<code>ret<\/code>&nbsp;does not need a bti landing pad,&nbsp;even though, conceptually it is an indirect branch using the link register. If an indirect branch was used to return to&nbsp;<code>return_loc<\/code>&nbsp;then it would need a&nbsp;<code>bti j<\/code>&nbsp;landing pad. However, using that approach would also increase the usage of bti landing pads, which would increase the amount of entry points in the code that could be called or jumped to, thus increasing the gadget space available to a potential attacker.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>The BTI instruction bti jc, would be valid in both locations, but it is best to limit the scope of the target to how the program is using it, as it will limit the attackers possibilities. If an entry point serves as both a jump and call location, then it would be appropriate to mark it with a bti jc.<\/code><\/pre>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"Bill'sdraftpost:EnablingPACandBTIonAArch64onLinux-BTI,linkers,loadersandGNUNotes\">BTI, linkers, loaders and GNU notes<\/h3>\n\n\n\n<p>It is important to state that the GNU notes section is mandatory to get BTI support even if the instructions are present. As showcased above, when we linked in&nbsp;<code>call_function.o<\/code>&nbsp;&nbsp;and&nbsp;<code>my_jump.o<\/code>&nbsp;&nbsp;the linker reported that these ELF object files do not have the required BTI support indicated. This is because it is missing the GNU notes section. It does not matter if the toolchain is linking an executable or a shared object, every object file must have the support, or the support is stripped from the GNU notes in the linker produced binary. Consequently, it is very important to understand the implications of this behavior. When the loader loads the binary into memory, it checks the GNU Notes section for this support bit to indicate what memory protections to apply. This is indicated by&nbsp;<code>PROT_BTI<\/code>&nbsp;&nbsp;which is a&nbsp;<code>mprotect<\/code>&nbsp;\/&nbsp;<code>mmap<\/code>&nbsp;flag that can be applied to enable BTI support for that memory mapping in the MMU. If the GNU Notes section is missing the flag indicating BTI protections, then BTI protections will not be enabled for that memory region. Consider a binary that has multiple shared libraries, this allows BTI aware shared libraries to exist with non-BTI shared libraries where some protections are afforded. Namely, when a control flow change is directed into PROT_BTI marked memory, protections are enforced. If control is transferred into non-BTI memory, BTI instructions, if present, are &#8220;NOP&#8217;d&#8221; and thus not enforced. In the case of static linking, one missing object file will disable it for the whole linked binary.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"Bill'sdraftpost:EnablingPACandBTIonAArch64onLinux-PAC\">PAC<\/h2>\n\n\n\n<p>Enabling PAC follows the same logical steps as BTI. However, the GNU notes field is optional, but is nice for auditing purposes and&nbsp;we recommended to add it. The reason this flag can be omitted, is unlike BTI,&nbsp; PAC is currently a callee ABI in Linux with no changes to memory permissions. The Linux ABI is that the callee is modified to sign and verify the link register within their function context. So given the most recent assembly sources, let us modify it to support to PAC.<\/p>\n\n\n\n<p><strong>Tag:&nbsp;<a href=\"https:\/\/gitlab.arm.com\/pac-and-bti-blog\/blog-example\/-\/tree\/Example-4?ref_type=tags\">Example-4<\/a><\/strong><\/p>\n\n\n\n<p>aarch64.h:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>#ifndef _AARCH_64_H_\n#define _AARCH_64_H_\n \n\/*\n * References:\n *  - https:\/\/developer.arm.com\/documentation\/101028\/0012\/5--Feature-test-macros\n *  - https:\/\/github.com\/ARM-software\/abi-aa\/blob\/main\/aaelf64\/aaelf64.rst\n *\/\n \n#if defined(__ARM_FEATURE_BTI_DEFAULT) &amp;&amp; __ARM_FEATURE_BTI_DEFAULT == 1\n  #define BTI_J bti j \/* for jumps, IE br instructions *\/\n  #define BTI_C bti c  \/* for calls, IE bl instructions *\/\n  #define GNU_PROPERTY_AARCH64_BTI 1 \/* bit 0 GNU Notes is for BTI support *\/\n#else\n  #define BTI_J\n  #define BTI_C\n  #define GNU_PROPERTY_AARCH64_BTI 0\n#endif\n \n#if defined(__ARM_FEATURE_PAC_DEFAULT)\n  #if __ARM_FEATURE_PAC_DEFAULT &amp; 1\n    #define SIGN_LR paciasp \/* sign with the A key *\/\n    #define VERIFY_LR autiasp \/* verify with the A key *\/\n  #elif __ARM_FEATURE_PAC_DEFAULT &amp; 2\n    #define SIGN_LR pacibsp \/* sign with the b key *\/\n    #define VERIFY_LR autibsp \/* verify with the b key *\/\n  #endif\n  #define GNU_PROPERTY_AARCH64_POINTER_AUTH 2 \/* bit 1 GNU Notes is for PAC support *\/\n#else\n  #define SIGN_LR\n  #define VERIFY_LR\n  #define GNU_PROPERTY_AARCH64_POINTER_AUTH 0\n#endif\n \n\/* Add the BTI support to GNU Notes section *\/\n#if GNU_PROPERTY_AARCH64_BTI != 0 || GNU_PROPERTY_AARCH64_POINTER_AUTH != 0\n    .pushsection .note.gnu.property, \"a\"; \/* Start a new allocatable section *\/\n    .balign 8; \/* align it on a byte boundry *\/\n    .long 4; \/* size of \"GNU\\0\" *\/\n    .long 0x10; \/* size of descriptor *\/\n    .long 0x5; \/* NT_GNU_PROPERTY_TYPE_0 *\/\n    .asciz \"GNU\";\n    .long 0xc0000000; \/* GNU_PROPERTY_AARCH64_FEATURE_1_AND *\/\n    .long 4; \/* Four bytes of data *\/\n    .long (GNU_PROPERTY_AARCH64_BTI|GNU_PROPERTY_AARCH64_POINTER_AUTH); \/* BTI or PAC is enabled *\/\n    .long 0; \/* padding for 8 byte alignment *\/\n    .popsection; \/* end the section *\/\n#endif\n \n#endif<\/code><\/pre>\n\n\n\n<p>call_function.S:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>#include \"aarch64.h\"\n \n.section .rodata\n.align 3\n.Lstring:\n    .string \"Hello From My Jump!\"\n \n.section .text\n.global my_jump\n.global call_function\n \nmy_jump:\n    BTI_J\n    stp x29, x30, &#91;sp, #-16]!\n    \/\/ Print \"Hello From My Jump!\" using puts.\n    \/\/ puts can modify registers, so push the return address in x1\n    \/\/ to the stack\n    adrp    x0, .Lstring        \/\/ Get the page the string is within\n    add x0, x0, :lo12:.Lstring  \/\/ Get the page offset (handles relocations ADD_ABS_LO12_NC)\n    bl      puts                \/\/ puts prints the string in x0\n \n    ldp x29, x30, &#91;sp], #16\n    ret\n \n\/\/ Function prototype\n\/\/ void call_function(void (*func)())\ncall_function:\n    BTI_C\n    SIGN_LR\n    \/\/ Save link register and frame pointer, allocating enough space for\n    \/\/ saving the return location.\n    stp x29, x30, &#91;sp, #-16]!\n    mov x29, sp\n \n    \/\/ x0 is the caller's first argument, so jump\n    \/\/ to the \"function\" pointed by x0 and save\n    \/\/ the return address to the stack\n    adr lr, return_loc\n    br x0  \/\/Later has arrived, it's to highlight use of bti j.\nreturn_loc:\n    \/\/ Restore link register and frame pointer\n    ldp x29, x30, &#91;sp], #16\n \n    \/\/ Return from the function\n    VERIFY_LR\n    ret<\/code><\/pre>\n\n\n\n<p>Then compile the code with&nbsp;<code>-mbranch-protection=pac-ret<\/code>&nbsp;which enables standard PAC support only.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>make clean\nCFLAGS=\"-mbranch-protection=pac-ret\" make\n.\/main\nHello From My Jump!<\/code><\/pre>\n\n\n\n<p>Since the PAC support bit was added to the GNU Notes section,&nbsp;<code>readelf<\/code>&nbsp; should indicate PAC support.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>readelf -n main\n\nDisplaying notes found in: .note.gnu.property\n  Owner                Data size \tDescription\n  GNU                  0x00000010\tNT_GNU_PROPERTY_TYPE_0\n      Properties: AArch64 feature: PAC<\/code><\/pre>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity is-style-wide\"\/>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity is-style-dots\"\/>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity is-style-wide\"\/>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"mcetoc_1ibs5kk581\">PAC and BTI Together<\/h2>\n\n\n\n<p>PAC and BTI will function independently of each other, but like chocolate and peanut butter, they go better together.&nbsp;With&nbsp;<code>-mbranch-protection=standard<\/code>&nbsp; we can enable them both. Currently, the&nbsp;<code>standard<\/code>&nbsp;argument to&nbsp;<code>-mbranch-protection=<\/code>&nbsp; option is analogous to&nbsp;<code>pac-ret+bti.<\/code>&nbsp;<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>make clean\nCFLAGS=\"-mbranch-protection=standard\" make<\/code><\/pre>\n\n\n\n<p>And\u00a0<code>readelf<\/code>\u00a0will indicate both PAC and BTI are supported:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>readelf -n main\n \nDisplaying notes found in: .note.gnu.property\n  Owner                Data size    Description\n  GNU                  0x00000010   NT_GNU_PROPERTY_TYPE_0\n      Properties: AArch64 feature: BTI, PAC<\/code><\/pre>\n\n\n\n<p>And the program will execute as expected:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>.\/main\nHello from my_jump!<\/code><\/pre>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"mcetoc_1i9k9j25o2\">Optimizing<\/h3>\n\n\n\n<p>When both PAC and BTI are enabled, function prologs, which is the common boiler plate at the beginning of a function, will have 2 extra instructions, this is less than ideal. However, certain PAC instructions can also act as BTI landing pads, specifically in this example, the&nbsp;<code>paciasp<\/code>&nbsp;and B-Key variant&nbsp;<code>pacibsp<\/code>&nbsp;can be used to replace a&nbsp;<code>bti c<\/code>&nbsp;instruction. So, let&#8217;s modify the&nbsp;<code>aarch64.h<\/code>&nbsp;and&nbsp;<code>call_function.S<\/code>&nbsp;files to take advantage of this:<\/p>\n\n\n\n<p><strong>Tag:&nbsp;<a href=\"https:\/\/gitlab.arm.com\/pac-and-bti-blog\/blog-example\/-\/tree\/Example-5?ref_type=tags\">Example-5<\/a><\/strong><\/p>\n\n\n\n<p>aarch64.h:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>#ifndef _AARCH_64_H_\n#define _AARCH_64_H_\n\n\/*\n * References:\n *  - https:\/\/developer.arm.com\/documentation\/101028\/0012\/5--Feature-test-macros\n *  - https:\/\/github.com\/ARM-software\/abi-aa\/blob\/main\/aaelf64\/aaelf64.rst\n *\/\n\n#if defined(__ARM_FEATURE_BTI_DEFAULT) &amp;&amp; __ARM_FEATURE_BTI_DEFAULT == 1\n  #define BTI_J bti j \/* for jumps, IE br instructions *\/\n  #define BTI_C bti c  \/* for calls, IE bl instructions *\/\n  #define GNU_PROPERTY_AARCH64_BTI 1 \/* bit 0 GNU Notes is for BTI support *\/\n#else\n  #define BTI_J\n  #define BTI_C\n  #define GNU_PROPERTY_AARCH64_BTI 0\n#endif\n\n#if defined(__ARM_FEATURE_PAC_DEFAULT)\n  #if __ARM_FEATURE_PAC_DEFAULT &amp; 1\n    #define SIGN_LR paciasp \/* sign with the A key *\/\n    #define VERIFY_LR autiasp \/* verify with the A key *\/\n  #elif __ARM_FEATURE_PAC_DEFAULT &amp; 2\n    #define SIGN_LR pacibsp \/* sign with the b key *\/\n    #define VERIFY_LR autibsp \/* verify with the b key *\/\n  #endif\n  #define GNU_PROPERTY_AARCH64_POINTER_AUTH 2 \/* bit 1 GNU Notes is for PAC support *\/\n#else\n  #define SIGN_LR BTI_C\n  #define VERIFY_LR\n  #define GNU_PROPERTY_AARCH64_POINTER_AUTH 0\n#endif\n\n\/* Add the BTI support to GNU Notes section *\/\n#if GNU_PROPERTY_AARCH64_BTI != 0 || GNU_PROPERTY_AARCH64_POINTER_AUTH != 0\n    .pushsection .note.gnu.property, \"a\"; \/* Start a new allocatable section *\/\n    .balign 8; \/* align it on a byte boundry *\/\n    .long 4; \/* size of \"GNU\\0\" *\/\n    .long 0x10; \/* size of descriptor *\/\n    .long 0x5; \/* NT_GNU_PROPERTY_TYPE_0 *\/\n    .asciz \"GNU\";\n    .long 0xc0000000; \/* GNU_PROPERTY_AARCH64_FEATURE_1_AND *\/\n    .long 4; \/* Four bytes of data *\/\n    .long (GNU_PROPERTY_AARCH64_BTI|GNU_PROPERTY_AARCH64_POINTER_AUTH); \/* BTI or PAC is enabled *\/\n    .long 0; \/* padding for 8 byte alignment *\/\n    .popsection; \/* end the section *\/\n#endif\n\n#endif<\/code><\/pre>\n\n\n\n<p>call_function.s:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>#include \"aarch64.h\"\n\n.section .rodata\n.align 3\n.Lstring:\n    .string \"Hello From My Jump!\"\n\n.section .text\n.global my_jump\n.global call_function\n\nmy_jump:\n    BTI_J\n    stp x29, x30, &#91;sp, #-16]!\n    \/\/ Print \"Hello From My Jump!\" using puts.\n    \/\/ puts can modify registers, so push the return address in x1\n    \/\/ to the stack\n    adrp    x0, .Lstring        \/\/ Get the page the string is within\n    add x0, x0, :lo12:.Lstring  \/\/ Get the page offset (handles relocations ADD_ABS_LO12_NC)\n    bl      puts                \/\/ puts prints the string in x0\n\n    ldp x29, x30, &#91;sp], #16\n    ret\n\n\/\/ Function prototype\n\/\/ void call_function(void (*func)())\ncall_function:\n    SIGN_LR\n    \/\/ Save link register and frame pointer, allocating enough space for\n    \/\/ saving the return location.\n    stp x29, x30, &#91;sp, #-16]!\n    mov x29, sp\n\n    \/\/ x0 is the caller's first argument, so jump\n    \/\/ to the \"function\" pointed by x0 and save\n    \/\/ the return address to the stack\n    adr lr, return_loc\n    br x0  \/\/Later has arrived, it's to highlight use of bti j.\nreturn_loc:\n    \/\/ Restore link register and frame pointer\n    ldp x29, x30, &#91;sp], #16\n\n    \/\/ Return from the function\n    VERIFY_LR\n    ret<\/code><\/pre>\n\n\n\n<p>\u00a0Then build and run the example:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>make clean\nCFLAGS=\"-mbranch-protection=standard\" make\n.\/main\nHello From My Jump!<\/code><\/pre>\n\n\n\n<p>Examining the prolog to&nbsp;<code>call_function<\/code>&nbsp;shows a single&nbsp;<code>paciasp<\/code>&nbsp;instruction as the valid BTI landing pad:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>objdump -d main\n&lt;snip\/>\n0000000000410240 &lt;call_function>:\n  410240:       d503233f        paciasp\n  410244:       a9bf7bfd        stp     x29, x30, &#91;sp, #-16]!\n&lt;snip\/><\/code><\/pre>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"mcetoc_1i9oa71840\">Backwards Compatibility<\/h2>\n\n\n\n<p>During this whole tutorial, we have been using the PAC and BTI instruction mnemonics directly. This poses a problem if using older toolchains that cannot support those instructions. Fortunately, the engineers foresaw this problem and utilized the&nbsp;<code>hint<\/code>&nbsp;space within the ARM architecture. The&nbsp;<code>hint<\/code>&nbsp;space, is a space for encoding instructions where they will NOP on architectures that do not support them, and work as intended on architectures that do. Also, existing toolchains are aware of&nbsp;<code>hint<\/code>&nbsp;instructions, so older toolchains will happily interact with new uses of&nbsp;<code>hint<\/code>&nbsp;instructions. Note that the encoding between the PAC or BTI instruction is the same as the&nbsp;<code>hint<\/code>&nbsp;space instruction, so this is merely for toolchains and the hardware sees no difference. So armed with this knowledge, let us modify the header file use&nbsp;<code>hint<\/code>&nbsp;instructions so older toolchains can compile our code.<\/p>\n\n\n\n<p><strong>Tag:&nbsp;<a href=\"https:\/\/gitlab.arm.com\/pac-and-bti-blog\/blog-example\/-\/tree\/Example-6?ref_type=tags\">Example-6<\/a><\/strong><\/p>\n\n\n\n<p>aarch64.h:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>#ifndef _AARCH_64_H_\n#define _AARCH_64_H_\n\n\/*\n * References:\n *  - https:\/\/developer.arm.com\/documentation\/101028\/0012\/5--Feature-test-macros\n *  - https:\/\/github.com\/ARM-software\/abi-aa\/blob\/main\/aaelf64\/aaelf64.rst\n *\/\n\n#if defined(__ARM_FEATURE_BTI_DEFAULT) &amp;&amp; __ARM_FEATURE_BTI_DEFAULT == 1\n  #define BTI_J hint 36 \/* bti j: for jumps, IE br instructions *\/\n  #define BTI_C hint 34  \/* bti c: for calls, IE bl instructions *\/\n  #define GNU_PROPERTY_AARCH64_BTI 1 \/* bit 0 GNU Notes is for BTI support *\/\n#else\n  #define BTI_J\n  #define BTI_C\n  #define GNU_PROPERTY_AARCH64_BTI 0\n#endif\n\n#if defined(__ARM_FEATURE_PAC_DEFAULT)\n  #if __ARM_FEATURE_PAC_DEFAULT &amp; 1\n    #define SIGN_LR hint 25 \/* paciasp: sign with the A key *\/\n    #define VERIFY_LR hint 29 \/* autiasp: verify with the A key *\/\n  #elif __ARM_FEATURE_PAC_DEFAULT &amp; 2\n    #define SIGN_LR hint 27 \/* pacibsp: sign with the b key *\/\n    #define VERIFY_LR hint 32 \/* autibsp: verify with the b key *\/\n  #endif\n  #define GNU_PROPERTY_AARCH64_POINTER_AUTH 2 \/* bit 1 GNU Notes is for PAC support *\/\n#else\n  #define SIGN_LR BTI_C\n  #define VERIFY_LR\n  #define GNU_PROPERTY_AARCH64_POINTER_AUTH 0\n#endif\n\n\/* Add the BTI support to GNU Notes section *\/\n#if GNU_PROPERTY_AARCH64_BTI != 0 || GNU_PROPERTY_AARCH64_POINTER_AUTH != 0\n    .pushsection .note.gnu.property, \"a\"; \/* Start a new allocatable section *\/\n    .balign 8; \/* align it on a byte boundry *\/\n    .long 4; \/* size of \"GNU\\0\" *\/\n    .long 0x10; \/* size of descriptor *\/\n    .long 0x5; \/* NT_GNU_PROPERTY_TYPE_0 *\/\n    .asciz \"GNU\";\n    .long 0xc0000000; \/* GNU_PROPERTY_AARCH64_FEATURE_1_AND *\/\n    .long 4; \/* Four bytes of data *\/\n    .long (GNU_PROPERTY_AARCH64_BTI|GNU_PROPERTY_AARCH64_POINTER_AUTH); \/* BTI or PAC is enabled *\/\n    .long 0; \/* padding for 8 byte alignment *\/\n    .popsection; \/* end the section *\/\n#endif\n\n#endif<\/code><\/pre>\n\n\n\n<p>As always, clean and run the example:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>make clean\nCFLAGS=\"-mbranch-protection=standard\" make\n.\/main\nHello From My Jump!<\/code><\/pre>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity is-style-wide\"\/>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity is-style-dots\"\/>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity is-style-wide\"\/>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"mcetoc_1i9ka8q1v1\">Exception Handling: DWARF and CFI<\/h2>\n\n\n\n<p>If you wanted to support exception handling across assembly routines, you must implement the CFI directives to do so. The CFI, or Call Frame Information, are a set of assembler directives that handle generating the DWARF data needed to unwind the call frames and stack when a C++ exception occurs. DWARF itself is a Turing complete stack-based virtual machine, and the CFI directives can be&nbsp;thought&nbsp;of as programming that virtual machine. The DWARF code is executed to generate the required data for handling exceptions. Let&#8217;s modify our program to throw an exception and ensure it gets handled.<\/p>\n\n\n\n<p><strong><\/strong><strong>Tag:&nbsp;<a href=\"https:\/\/gitlab.arm.com\/pac-and-bti-blog\/blog-example\/-\/tree\/Example-7?ref_type=tags\">Example-7<\/a><\/strong><\/p>\n\n\n\n<p>Makefile:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>ASFLAGS ?= $(CXXFLAGS)\n\nOBJS := main.o \\\n\tcall_function.o\n\nmain: $(OBJS)\n\t$(CXX) $(CXXLAGS) $(LDFLAGS) -o $@ $^\n\n.PHONY: clean\nclean:\n\t@printf \"Cleaning...\\n\" &amp;&amp; rm -rf $(OBJS) main<\/code><\/pre>\n\n\n\n<p>call_function.S:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>#include \"aarch64.h\"\n\n.section .text\n.global call_function\n\n\/\/ Function prototype\n\/\/ void call_function(void (*func)())\ncall_function:\n    .cfi_startproc\n    SIGN_LR\n    CFI_WINDOW_SAVE\n    \/\/ Save link register and frame pointer, allocating enough space for\n    \/\/ saving the return location.\n    stp x29, x30, &#91;sp, #-16]!\n    .cfi_def_cfa_offset 16\n    .cfi_offset 29, -16\n    .cfi_offset 30, -8\n    mov x29, sp\n\n    \/\/ x0 is the caller's first argument, so jump\n    \/\/ to the \"function\" pointed by x0 and save\n    \/\/ the return address to the stack\n    blr x0\nreturn_loc:\n    \/\/ Restore link register and frame pointer\n    ldp x29, x30, &#91;sp], #16\n\n    .cfi_restore 30\n    .cfi_restore 29\n    .cfi_def_cfa_offset 0\n\n    \/\/ Return from the function\n    VERIFY_LR\n    ret\n    .cfi_endproc<\/code><\/pre>\n\n\n\n<p>main.cpp:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>#include &lt;iostream>\n\n\/\/ Declaration of the assembly routines\nextern \"C\" {\nvoid call_function(void (*func)());\n};\nstatic void my_exception() {\n    std::cout &lt;&lt; \"Throwing exception...\" &lt;&lt; std::endl;\n    throw 42;\n}\n\nint main() {\n    try {\n         \/\/ Call the assembly routine **indirectly** using a function pointer\n        \/\/ and pass the jump location as well.\n        void (*fn)(void (*func)()) = call_function;\n        fn(my_exception);\n    } catch (int e) {\n        std::cout &lt;&lt; \"Caught exception: \" &lt;&lt; e &lt;&lt; std::endl;\n    }\n    return 0;\n}<\/code><\/pre>\n\n\n\n<p>Now we need to compile and run the C++ example:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>make clean\nCXXFLAGS=\"-mbranch-protection=standard\" make\n.\/main\nThrowing exception...\nCaught exception: 42<\/code><\/pre>\n\n\n\n<p>The major differences between this and our previous examples is that instead of&nbsp;<code>main.c<\/code>&nbsp;we now have&nbsp;<code>main.cpp<\/code>&nbsp;so we can use C++ exceptions and thus&nbsp;<code>main.c<\/code>&nbsp;is no longer needed and can be removed. We also modified&nbsp;<code>call_function<\/code>&nbsp;to call the C++ routine that throws an exception by using&nbsp;<code>blr<\/code>&nbsp;and not just&nbsp;<code>br<\/code>&nbsp;and thus&nbsp;<code>my_jump<\/code>&nbsp;is no longer needed. Additionally, the code was augmented with the required CFI directives. Note that&nbsp;<code>clang<\/code>&nbsp;and&nbsp;<code>gcc<\/code>&nbsp; will output the CFI directives in their assembly code when generating assembly from C\/C++ code using the option&nbsp;<code>-S<\/code>. We can now examine how to propagate an exception through an assembly layer so various parts of the runtime can make use of it.<\/p>\n\n\n\n<p>An important part of using CFI directives is to understand the meaning of &#8220;CFA&#8221;. The CFA, or Canonical Frame Address, is what the DWARF system uses, and ultimately the unwinder, to unwind the call stack. Debuggers will also make use of this additional DWARF data. The way that DWARF works in practice, is that each function gets its own FDE, or Function Description Entry. Additionally, each FDE is related to a CIE, or Common Information Entry, which, as implied, has common information used by a set of FDEs. By default, the CIE states that the&nbsp;<code>sp<\/code>&nbsp;is the CFA, so anytime the&nbsp;<code>sp<\/code>&nbsp; is modified we need to let DWARF know through those CFI directives. That is what&nbsp;<code>.cfi_def_cfa_offset<\/code>&nbsp;does, it lets DWARF know that the CFA is the current&nbsp;<code>sp<\/code>&nbsp;plus an offset of 16 bytes. The next thing DWARF needs to know is where to find the&nbsp;<code>lr<\/code>&nbsp;and the&nbsp;<code>fp<\/code>&nbsp;relative to the CFA. This is what&nbsp;<code>.cfi_offset<\/code>&nbsp;&nbsp;does, it informs DWARF that the value for the&nbsp;<code>fp<\/code>&nbsp;&nbsp;or&nbsp;<code>x29<\/code>, it is the same register, can be found at the current CFA at offset -16 bytes. Similarly, the same is done for&nbsp;<code>x30<\/code>&nbsp;, or the&nbsp;<code>lr<\/code>&nbsp;&nbsp;with the appropriate offset. The next CFI directive,&nbsp;<code>.cfi_restore,<\/code>&nbsp;just restores the rule for the register to the same state when&nbsp;<code>.cfi_start_proc<\/code>&nbsp;was issued. After that,&nbsp;<code>.cfi_def_cfa_offset<\/code>&nbsp;indicates that the CFA is equal to&nbsp;<code>sp and finally .cfi_endproc ends the FDE entry.&nbsp;<\/code>All of this instruments the DWARF system, which in-turn is used by debuggers, runtimes and the unwinder. All&nbsp; of these systems need to know that the address in the pushed&nbsp;<code>lr<\/code>&nbsp; is signed and they need to potentially verify the pointer and demangle the address before using it. The unwinder uses the&nbsp;<code>autia1716<\/code>&nbsp;or&nbsp;<code>autib1716<\/code>&nbsp;instructions to demangle the return address. Both of these are within the hint space as&nbsp;<code>hint 12<\/code>&nbsp;and&nbsp;<code>hint 14<\/code>&nbsp;respectively. The pointer must be demangled, as the pointer is modified to include the PAC signature, so removing the signature restores the pointer to a valid pointer.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>Depending on the implementation, the auti(a|b)1716 instructions may return an invalid pointer or throw illegal instruction on signature failures.<\/code><\/pre>\n\n\n\n<p>Our header files and discussions thus far have indicated that PAC supports two keys: the A and B keys. These keys can be changed at build time through compiler options. This can be done be specifying&nbsp;<code>-mbranch-protection=pac-ret+b-key<\/code>. Let&#8217;s modify our latest C++ example, namely&nbsp;<code>my_function.S and aarch64.h&nbsp;<\/code>to support the B key within the required DWARF code:<\/p>\n\n\n\n<p><strong>Tag:&nbsp;<a href=\"https:\/\/gitlab.arm.com\/pac-and-bti-blog\/blog-example\/-\/tree\/Example-8?ref_type=tags\">Example-8<\/a><\/strong><\/p>\n\n\n\n<p>aarch64.h:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>#ifndef _AARCH_64_H_\n#define _AARCH_64_H_\n\n\/*\n * References:\n *  - https:\/\/developer.arm.com\/documentation\/101028\/0012\/5--Feature-test-macros\n *  - https:\/\/github.com\/ARM-software\/abi-aa\/blob\/main\/aaelf64\/aaelf64.rst\n *\/\n\n#if defined(__ARM_FEATURE_BTI_DEFAULT) &amp;&amp; __ARM_FEATURE_BTI_DEFAULT == 1\n  #define BTI_J hint 36 \/* bti j: for jumps, IE br instructions *\/\n  #define BTI_C hint 34  \/* bti c: for calls, IE bl instructions *\/\n  #define GNU_PROPERTY_AARCH64_BTI 1 \/* bit 0 GNU Notes is for BTI support *\/\n#else\n  #define BTI_J\n  #define BTI_C\n  #define GNU_PROPERTY_AARCH64_BTI 0\n#endif\n\n#if defined(__ARM_FEATURE_PAC_DEFAULT)\n  #if __ARM_FEATURE_PAC_DEFAULT &amp; 1\n    #define SIGN_LR hint 25 \/* paciasp: sign with the A key *\/\n    #define VERIFY_LR hint 29 \/* autiasp: verify with the A key *\/\n    #define CFI_B_KEY_FRAME \/* empty is no B key *\/\n   #elif __ARM_FEATURE_PAC_DEFAULT &amp; 2\n    #define SIGN_LR hint 27 \/* pacibsp: sign with the b key *\/\n    #define VERIFY_LR hint 32 \/* autibsp: verify with the b key *\/\n    #define CFI_B_KEY_FRAME .cfi_b_key_frame\n#endif\n  #define CFI_WINDOW_SAVE .cfi_window_save\n  #define GNU_PROPERTY_AARCH64_POINTER_AUTH 2 \/* bit 1 GNU Notes is for PAC support *\/\n#else\n  #define SIGN_LR BTI_C\n  #define VERIFY_LR\n  #define CFI_WINDOW_SAVE\n  #define CFI_B_KEY_FRAME\n  #define GNU_PROPERTY_AARCH64_POINTER_AUTH 0\n#endif\n\n\/* Add the BTI support to GNU Notes section *\/\n#if GNU_PROPERTY_AARCH64_BTI != 0 || GNU_PROPERTY_AARCH64_POINTER_AUTH != 0\n    .pushsection .note.gnu.property, \"a\"; \/* Start a new allocatable section *\/\n    .balign 8; \/* align it on a byte boundry *\/\n    .long 4; \/* size of \"GNU\\0\" *\/\n    .long 0x10; \/* size of descriptor *\/\n    .long 0x5; \/* NT_GNU_PROPERTY_TYPE_0 *\/\n    .asciz \"GNU\";\n    .long 0xc0000000; \/* GNU_PROPERTY_AARCH64_FEATURE_1_AND *\/\n    .long 4; \/* Four bytes of data *\/\n    .long (GNU_PROPERTY_AARCH64_BTI|GNU_PROPERTY_AARCH64_POINTER_AUTH); \/* BTI or PAC is enabled *\/\n    .long 0; \/* padding for 8 byte alignment *\/\n    .popsection; \/* end the section *\/\n#endif\n\n#endif<\/code><\/pre>\n\n\n\n<p>call_function.S:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>#include \"aarch64.h\"\n\n.section .text\n.global call_function\n\n\/\/ Function prototype\n\/\/ void call_function(void (*func)())\ncall_function:\n    .cfi_startproc\n    SIGN_LR\n    CFI_WINDOW_SAVE\n    CFI_B_KEY_FRAME\n    \/\/ Save link register and frame pointer, allocating enough space for\n    \/\/ saving the return location.\n    stp x29, x30, &#91;sp, #-16]!\n    .cfi_def_cfa_offset 16\n    .cfi_offset 29, -16\n    .cfi_offset 30, -8\n    mov x29, sp\n\n    \/\/ x0 is the caller's first argument, so jump\n    \/\/ to the \"function\" pointed by x0 and save\n    \/\/ the return address to the stack\n    blr x0\nreturn_loc:\n    \/\/ Restore link register and frame pointer\n    ldp x29, x30, &#91;sp], #16\n\n    .cfi_restore 30\n    .cfi_restore 29\n    .cfi_def_cfa_offset 0\n\n    \/\/ Return from the function\n    VERIFY_LR\n    ret\n    .cfi_endproc<\/code><\/pre>\n\n\n\n<p>Compile and run the program:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>make clean\nCXXFLAGS=\"-mbranch-protection=pac-ret+b-key+bti\" make\n.\/main\nThrowing exception...\nCaught exception: 42<\/code><\/pre>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"mcetoc_1i9kah5kp2\">Debugging DWARF<\/h2>\n\n\n\n<p>As previously mentioned, DWARF is byte code for a virtual machine. This DWARF information is then embedded within different sections in the generated ELF files for the various consumers like the unwinder and debuggers. It is possible to dump these DWARF instructions as a dissasembled version which is rather nice for debugging. Note, we will add&nbsp;<code>-g<\/code>&nbsp;to produce some debug info for the upcoming&nbsp;<code>addr2line<\/code>&nbsp;example.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>make clean\nCXXFLAGS=\"-mbranch-protection=pac-ret+b-key+bti -g\" make\nreadelf --debug-dump=frames call_function.o\nContents of the .eh_frame section:\n\n\n00000000 0000000000000010 00000000 CIE\n  Version:               1\n  Augmentation:          \"zR\"\n  Code alignment factor: 4\n  Data alignment factor: -8\n  Return address column: 30\n  Augmentation data:     1b\n  DW_CFA_def_cfa: r31 (sp) ofs 0\n\n00000014 0000000000000020 00000018 FDE cie=00000000 pc=0000000000000000..0000000000000014\n  DW_CFA_advance_loc: 4 to 0000000000000004\n  DW_CFA_def_cfa_offset: 16\n  DW_CFA_offset: r29 (x29) at cfa-16\n  DW_CFA_offset: r30 (x30) at cfa-8\n  DW_CFA_advance_loc: 12 to 0000000000000010\n  DW_CFA_restore: r30 (x30)\n  DW_CFA_restore: r29 (x29)\n  DW_CFA_def_cfa_offset: 0\n  DW_CFA_nop\n  DW_CFA_nop\n  DW_CFA_nop\n  DW_CFA_nop\n  DW_CFA_nop\n  DW_CFA_nop\n  DW_CFA_nop<\/code><\/pre>\n\n\n\n<p>The noteworthy elements here, for starters, is the &#8220;B&#8221; in the\u00a0<code>Augmentation<\/code>\u00a0string. This is within the CIE, which will be inherited by all FDEs that use it. The &#8220;B&#8221; indicates that the PAC B signing key is used. If &#8220;B&#8221; is not present, then the &#8220;A&#8221; key is in use. An example usage is demonstrated by unwinders to choose the right instruction, either\u00a0<code>autib1716<\/code>\u00a0or\u00a0<code>autia1716<\/code>, when demangling PAC signed addresses.\u00a0The other important item to note, is the\u00a0<code>DW_CFA_AARCH64_negate_ra_state<\/code>\u00a0which is the output from the CFI directive\u00a0<code>.cfi_window_save<\/code>. This DWARF opcode indicates that the\u00a0<code>lr<\/code>\u00a0is signed and that that anything interpreting the\u00a0<code>lr<\/code>\u00a0needs to demangle it.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>Each FDE has a corresponding CIE shown by the cie= and their can be multiple CIEs. Each FDE also has an associated pc range that its valid for.<\/code><\/pre>\n\n\n\n<p>It is possible to associate and FDE to a function using\u00a0<code>addr2line<\/code>, note it needs\u00a0<code>-g<\/code>\u00a0 in the compilation flags or you will see\u00a0<code>?<\/code>\u00a0in the\u00a0<code>addr2line<\/code>\u00a0output:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>addr2line -f -e call_function.o 0\ncall_function\n\/home\/bill\/workspace\/blog-example\/call_function.S:10<\/code><\/pre>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"mcetoc_1i9ochagh0\">Jumping to Functions<\/h2>\n\n\n\n<p>When an indirect transfer of control flow occurs, BTI enabled hardware and its corresponding software enabled stacks, will ensure that&nbsp;indirect control flow transfers land on landing pad. Another way to state this, is that direct control flow changes are not checked. This is because the target address is encoded in the instruction itself and not provided externally with a potentially attacker controlled value. Consequently, instructions like&nbsp;<code>br<\/code>&nbsp;and&nbsp;<code>brl<\/code>&nbsp;and their associated instructions are checked that they land on proper landing pads. Typically, the branch instructions with a link, like&nbsp;<code>brl<\/code>&nbsp;,are used to call functions and thus the control flow change needs to land on a&nbsp;<code>bti c<\/code>&nbsp;or&nbsp;<code>bti jc<\/code>&nbsp; instruction. For branches that do not modify the link register, like&nbsp;<code>br<\/code>, they&nbsp; are used for a &#8220;jump&#8221; and thus must transfer control flow to a&nbsp;<code>bti j<\/code>&nbsp;or&nbsp;<code>bti jc<\/code>&nbsp; landing pad. However, in certain scenarios where jump oriented programing models are used, a branch or jump may be used to transfer control flow to a function that is typically called. In some cases, that function that was &#8220;jumped to&#8221; using a branch instruction is compiled code from a C or C++ compiler and thus the landing pad for that function will be a&nbsp;<code>bti c<\/code>&nbsp; instruction. Because of this, BTI enforcement will occur and an exception thrown because jumps or branches without the link expect the first instruction for the landing pad as a&nbsp;<code>bti j<\/code>&nbsp; instruction. To work around this possible issue, the architecture supports that if the target address is in register x16 or x17, that the BTI enforcement will allow the jump to occur to a&nbsp;<code>bti c<\/code>&nbsp; label&nbsp;<em>or<\/em>&nbsp;a&nbsp;<code>bti j<\/code>&nbsp;label as expected. This is further discussed in&nbsp;<a href=\"https:\/\/developer.arm.com\/documentation\/102433\/0200\/Jump-oriented-programming\" rel=\"noreferrer noopener\" target=\"_blank\">Jump Oriented Programing<\/a>.<\/p>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity is-style-wide\"\/>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity is-style-dots\"\/>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity is-style-wide\"\/>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"mcetoc_1i9ochagh1\">Conclusion<\/h2>\n\n\n\n<p>This multi-part tutorial shows how to enable PAC and BTI through assembly functions, how PAC instructions can also serve as BTI landing pads, and how to handle PAC A and B keys in source. It also highlights how exception handling needs to be augmented through the use of CFI directives, and how to dump the CFI generated DWARF data.&nbsp;<\/p>\n","protected":false},"excerpt":{"rendered":"<p>https:\/\/community.arm.com\/arm-community-blogs\/b\/architectures-and-processors-blog\/posts\/enabling-pac-and-bti-on-aarch64 https:\/\/community.arm.com\/arm-community-blogs\/b\/architectures-and-processors-blog\/posts\/p2-enabling-pac-and-bti-on-aarch64 https:\/\/community.arm.com\/arm-community-blogs\/b\/architectures-and-processors-blog\/posts\/p3-enabling-pac-and-bti-on-aarch64 Certain versions of Arm 64-bit processors have [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[7],"tags":[13,16,17],"class_list":["post-117","post","type-post","status-publish","format-standard","hentry","category-tutotial","tag-aarch64","tag-bti","tag-pac"],"_links":{"self":[{"href":"https:\/\/haco.club\/index.php?rest_route=\/wp\/v2\/posts\/117","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/haco.club\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/haco.club\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/haco.club\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/haco.club\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=117"}],"version-history":[{"count":5,"href":"https:\/\/haco.club\/index.php?rest_route=\/wp\/v2\/posts\/117\/revisions"}],"predecessor-version":[{"id":122,"href":"https:\/\/haco.club\/index.php?rest_route=\/wp\/v2\/posts\/117\/revisions\/122"}],"wp:attachment":[{"href":"https:\/\/haco.club\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=117"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/haco.club\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=117"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/haco.club\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=117"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}