Why the load of main by _start uses got entry, not adrp+add pair?

The _start function uses a Global Offset Table (GOT) entry to load the address of main primarily because _start is defined in a pre-compiled object file (typically Scrt1.o) that was built with Position-Independent Code (PIC) enabled. Here is the detailed explanation of why this happens and why adrp + add isn't used by default: 1. _start is Pre-Compiled Generic Code The _start function is not compiled at the same time as your application's main.c. It is part of the C Runtime (CRT) startup files (specifically Scrt1.o for Position…

AArch64 Pre/Post Indexing

In AArch64 (ARMv8-A 64-bit architecture), Pre-indexing and Post-indexing are memory addressing modes used with Load (LDR) and Store (STR) instructions. Their primary purpose is to perform Writeback: they automatically update the base register (the pointer) with a new address as part of the instruction execution. This is extremely efficient for iterating through arrays or managing stacks because it eliminates the need for a separate ADD or SUB instruction to move the pointer. Here is the breakdown of how they work. 1. Pre-Indexed Addressing Syntax: [base, #offset]!Key Symbol: The…

Check whether an executable is pure C or CPP

Distinguishing between a pure C and a C++ executable can be achieved by examining the symbols and library dependencies of the binary file. C++ compilers employ a technique called "name mangling" to support function overloading and namespaces, which is absent in C. Furthermore, C++ programs have a distinct set of standard library dependencies. Inspecting Symbol Tables for Name Mangling A primary indicator of C++ code is the presence of "mangled" names in the executable's symbol table. C++ compilers alter function and variable names to encode information about their…

PIE Relocation: Tagging Addresses

In a Position-Independent Executable (PIE), absolute addresses aren't "tagged" directly within the machine code. Instead, the linker creates a separate list of instructions and data locations that need fixing, and this list is stored in a special section of the binary called the relocation table. The dynamic loader uses this table at runtime to patch the code with the correct memory addresses once the binary's actual location in memory is known. The Core Mechanism: Linker and Loader Teamwork 🤝 Think of it like moving into a new apartment building. You…

Understanding Binary File Components

Symbol Table Think of the symbol table as a directory for the "named things" within your code, like functions and global variables. When you compile a source file, the compiler creates an object file. This object file contains the machine code for your functions and space for your global variables, but it doesn't yet know the final memory addresses of everything. The symbol table maps these symbolic names (e.g., my_function, global_variable) to their locations within the object file. This is vital for the linker, the tool that combines multiple object files and libraries…

Uncovering Supply Chain Attack with Code Genome Framework

This talk from IBM Research focuses on using AI and machine learning to combat supply chain attacks. The presenters highlight the increasing lack of trust in software due to major security breaches like the XZ backdoor. Here are the key takeaways: The Problem: There's a "semantic gap" between what code is expected to do and what it actually does. This gap is exploited in supply chain attacks where malicious code is hidden in software updates or open-source projects. The Solution: The researchers introduce the "Code Genome Framework," an…