Understanding Binary File Components

Symbol Table

Think of the symbol table as a directory for the “named things” within your code, like functions and global variables. When you compile a source file, the compiler creates an object file. This object file contains the machine code for your functions and space for your global variables, but it doesn’t yet know the final memory addresses of everything.

The symbol table maps these symbolic names (e.g., my_function, global_variable) to their locations within the object file. This is vital for the linker, the tool that combines multiple object files and libraries into a single executable. The linker uses the symbol tables from all the input files to resolve references. For instance, if main.c calls a function defined in helper.c, the linker will use the symbol tables to find the address of that function in helper.c and patch the call in main.c with the correct address.

There are typically two main types of symbol tables in the context of the Executable and Linkable Format (ELF), which is common on Linux and other Unix-like systems:

.symtab: This is the full symbol table, containing all symbols, including local ones that are only visible within their own file. This table is often removed by a process called “stripping” to reduce the size of the final executable, as it’s not strictly necessary for execution.
.dynsym: This is a smaller, dynamic symbol table that only contains symbols that are needed for dynamic linking (i.e., symbols that are referenced by or exported to shared libraries). This table is essential for the runtime linker to resolve symbols at load time or runtime.

String Table

The string table is a simple yet crucial section. It contains the actual character strings for the names of symbols, section headers, and other textual data within the binary. The symbol table and other sections don’t store the names directly. Instead, they contain an index (an offset) into the string table.

This approach is efficient because it avoids duplicating strings. If multiple symbols or sections have the same name (which is rare but possible), they can all point to the same string in the string table.

For ELF files, you’ll typically find these string tables:

.strtab: This table holds the strings for the symbols found in the .symtab section.
.dynstr: This table contains the strings for the symbols in the .dynsym section.
.shstrtab: This table stores the names of the sections themselves (e.g., .text, .data).

Relocation Table

A relocation table contains information that tells the linker or the dynamic linker how to modify the machine code to use the correct memory addresses. When a compiler generates code, it often doesn’t know the final memory addresses of functions and variables, especially those defined in other files or shared libraries. It leaves placeholders in the machine code and creates entries in the relocation table.

Each entry in the relocation table specifies:

The location to be modified: This is an offset within a section (like the .text section containing the code) that needs to be “patched.”
The symbol to be referenced: This is an index into the symbol table, identifying the function or variable whose address is needed.
The type of relocation: This specifies how the address should be calculated and inserted into the code. For example, it might be a direct 32-bit address or a relative offset from the current instruction.

During linking, the static linker processes these relocation entries to create a fully resolved executable. For dynamically linked executables, some relocation entries are left for the dynamic linker to handle when the program is loaded into memory.

Dynamic Section

The dynamic section, typically named .dynamic, is essential for executables and shared libraries that are dynamically linked. It contains a series of entries that provide information to the dynamic linker (like ld-linux.so on Linux).

Think of the dynamic section as a set of instructions for the dynamic linker. It tells the dynamic linker things like:

Needed Libraries: A list of the shared libraries that the program depends on.
Location of Dynamic Symbol and String Tables: Pointers to the .dynsym and .dynstr sections.
Location of Relocation Tables: Pointers to the relocation tables that need to be processed at runtime (e.g., .rel.plt, .rel.dyn).
Initialization and Finalization Functions: The addresses of functions that should be run when the library is loaded (_init) and unloaded (_fini).
Hash Table: A pointer to a hash table that speeds up symbol lookups.

In essence, the dynamic section is the central hub of information that enables a program to be linked with its required shared libraries at runtime, allowing for code sharing and easier updates.

Post Views: 111

Symbol Table

String Table

Relocation Table

Dynamic Section

Leave a Reply Cancel reply

Why the load of main by _start uses got entry, not adrp+add pair?

AFL Coverage Instrumentation Callback

Global variable declaration inside function (Python

PIE Relocation: Tagging Addresses