From 82e9ac4163b46b59e121194f84ac370818482923 Mon Sep 17 00:00:00 2001 From: johannst Date: Thu, 15 Jul 2021 21:20:14 +0200 Subject: use proper date fmt in content file names that zola can automatically can derive the date --- content/20191118-dynamic-linking-linux-x86_64.md | 339 ----------------------- 1 file changed, 339 deletions(-) delete mode 100644 content/20191118-dynamic-linking-linux-x86_64.md (limited to 'content/20191118-dynamic-linking-linux-x86_64.md') diff --git a/content/20191118-dynamic-linking-linux-x86_64.md b/content/20191118-dynamic-linking-linux-x86_64.md deleted file mode 100644 index 9265671..0000000 --- a/content/20191118-dynamic-linking-linux-x86_64.md +++ /dev/null @@ -1,339 +0,0 @@ -+++ -title = "Dynamic linking on Linux (x86_64)" -date = 2019-11-18 - -[taxonomies] -tags = ["elf", "linux", "x86"] -+++ - -As I was interested in how the bits behind dynamic linking work, this article -is about exploring this topic. -However, since dynamic linking strongly depends on the OS, the architecture and -the binary format, I only focus on one combination here. -Spending most of my time with Linux on `x86` or `ARM` I chose the following -for this article: -- OS: Linux -- arch: x86_64 -- binfmt: [`Executable and Linking Format (ELF)`][elf-1.2] - -## Introduction to dynamic linking - -Dynamic linking is used in the case we have non-statically linked applications. -This means an application uses code which is not included in the application -itself, but in a shared library. The shared libraries in turn can be used by -multiple applications. -The applications contain `relocation` entries which need to be resolved during -runtime, because shared libraries are compiled as `position independant code -(PIC)` so that they can be loaded at any any address in the applications -virtual address space. -This process of resolving the relocation entries at runtime is what I am -referring as dynamic linking in this article. - -The following figure shows a simple example, where we have an application -**foo** using a function **bar** from the shared library **libbar.so**. The -boxes show the virtual memory mapping for **foo** over time where time -increases to the right. -``` - foo foo - +-----------+ +-----------+ - | | | | - +-----------+ +-----------+ - | .text.foo | | .text.foo | - | | | | - | ... | trigger resolve reloc | ... | -pc->| call bar | X----+ | call bar |--+ - | ... | | | ... | | - +-----------+ | +-----------+ | - | | | | | | - | | | | | | - +-----------+ | +-----------+ | - | .text.bar | | | .text.bar | | - | ... | | | ... | | - | bar: | +---->[ld.so]----> | bar: |<-+pc - | ... | | ... | - +-----------+ +-----------+ - | | | | - +-----------+ +-----------+ - -``` - -## Conceptual overview && important parts of "the" ELF - -> In the following I assume a basic understanding of the ELF binary format. - -Before jumping into the details of dynamic linking it is important to get an -conceptual overview, as well as to understand which sections of the ELF file -actually matter. - -
- -On x86 calling a function in a shared library works via one indirect jump. -When the application wants to call a function in a shared library it jumps to a -well know location contained in the code of the application, called a -`trampoline`. From there the application then jumps to a function pointer -stored in a global table (`GOT = global offset table`). The application -contains **one** trampoline per function used from a shared library. - -When the application jumps to a trampoline for the first time the trampoline -will dispatch to the dynamic linker with the request to resolve the symbol. -Once the dynamic linker found the address of the symbol it patches the function -pointer in the `GOT` so that consecutive calls directly dispatch to the library -function. -``` - foo: GOT - ... +------------+ -+---- call bar_trampoline +- | 0xcafeface | [0] resolve (dynamic linker) -| call bar_trampoline | +------------+ -| ... | | 0xcafeface | [1] resolve (dynamic linker) -| | +------------+ -+-> bar_trampoline: | - jump GOT[0] <-----------+ - bar2_trampoline: - jump GOT[1] -``` -Once this is done, further calls to this symbol will be directly forwarded to -the correct address from the corresponding trampoline. -``` - foo: GOT - ... +------------+ - call bar_trampoline +- | 0x01234567 | [0] bar (libbar.so) -+---- call bar_trampoline | +------------+ -| .... | | 0xcafeface | [1] resolve (dynamic linker) -| | +------------+ -+-> bar_trampoline: | - jump GOT[0] <-----------+ - bar2_trampoline: - jump GOT[1] -``` - ---- - -With that in mind we can take a look and check which sections of the ELF file -are important for the dynamic linking process. -- `.plt` -> This section contains all the trampolines for the external functions used by -> the ELF file -- `.got.plt` -> This section contains the global offset table `GOT` for this ELF files trampolines. -- `.rel.plt` / `.rela.plt` -> This section holds the `relocation` entries, which are used by the dynamic -> linker to find which symbol needs to be resolved and which location in the -> `GOT` to be patched. (Whether it is `rel` or `rela` depends on the -> **DT_PLTREL** entry in the [`.dynamic` section](#dynamic-section)) - - -## The bits behind dynamic linking - -Now that we have the basic concept and know which sections of the ELF file -matter we can take a look at an actual example. For the analysis I am going to -use the following C program and build it explicitly as non `position -independant executable (PIE)`. - -> Using `-no-pie` has no functional impact, it is only used to get absolute -> virtual addresses in the ELF file, which makes the analysis easier to follow. - -```cpp -// main.c -#include -int main(int argc, const char* argv[]) { - printf("%s argc=%d\n", argv[0], argc); - puts("done"); - return 0; -} -``` - -```console -> gcc -o main main.c -no-pie -``` - -We use [radare2][r2] to open the compiled file and print the disassembly of -the `.got.plt` and `.plt` sections. - -```nasm -> r2 -A ./main ---snip-- -[0x00401050]> pd5 @ section..got.plt - ;-- section..got.plt: - ;-- _GLOBAL_OFFSET_TABLE_: - [0] 0x00404000 .qword 0x0000000000403e10 ; section..dynamic ; sym..dynamic - [1] 0x00404008 .qword 0x0000000000000000 - [2] 0x00404010 .qword 0x0000000000000000 - ;-- reloc.puts: - [3] 0x00404018 .qword 0x0000000000401036 - ;-- reloc.printf: - [4] 0x00404020 .qword 0x0000000000401046 - -[0x00401050]> pd9 @ section..plt - ;-- section..plt: - ┌┌─> 0x00401020 ff35e22f0000 push qword [0x00404008] - ╎╎ 0x00401026 ff25e42f0000 jmp qword [0x00404010] - ╎╎ 0x0040102c 0f1f4000 nop dword [rax] - int sym.imp.puts (const char *s); - ╎╎ 0x00401030 ff25e22f0000 jmp qword [reloc.puts] ; 0x00404018 - ╎╎ 0x00401036 6800000000 push 0 - └──< 0x0040103b e9e0ffffff jmp sym..plt - int sym.imp.printf (const char *format); - ╎ 0x00401040 ff25da2f0000 jmp qword [reloc.printf] ; 0x00404020 - ╎ 0x00401046 6801000000 push 1 - └─< 0x0040104b e9d0ffffff jmp sym..plt -[0x00401050]> -``` - -Taking a quick look at the `.got.plt` section we see the *global offset table GOT*. -The entries *GOT[0..2]* have special meanings, *GOT[0]* holds the address of the -[`.dynamic` section](#dynamic-section) for this ELF file, *GOT[1..2]* will be -filled by the dynamic linker at program startup. -Entries *GOT[3]* and *GOT[4]* contain the function pointers for **puts** and -**printf** accordingly. - -
- -In the `.plt` section we can find three trampolines -1. `0x00401020` dispatch to runtime linker (special role) -1. `0x00401030` **puts** -1. `0x00401040` **printf** - -Looking at the **puts** trampoline we can see that the first instruction jumps -to a location stored at `0x00404018` (reloc.puts) which is the GOT[3]. In the -beginning this entry contains the address of the `push 0` instruction coming -right after the `jmp`. This push instruction sets up some meta data for the -dynamic linker. The next instruction then jumps into the first trampoline, -which pushes more meta data (GOT[1]) onto the stack and then jumps to the -address stored in GOT[2]. -> GOT[1] & GOT[2] are zero here because they get filled by the dynamic linker -> at program startup. - - -
- -To understand the `push 0` instruction in the **puts** trampoline we have to -take a look at the third section of interest in the ELF file, the `.rela.plt` -section. - -```console -# -r print relocations -# -D use .dynamic info when displaying info -> readelf -W -r ./main ---snip-- -Relocation section '.rela.plt' at offset 0x4004d8 contains 2 entries: - Offset Info Type Symbol's Value Symbol's Name + Addend -0000000000404018 0000000200000007 R_X86_64_JUMP_SLOT 0000000000000000 puts@GLIBC_2.2.5 + 0 -0000000000404020 0000000300000007 R_X86_64_JUMP_SLOT 0000000000000000 printf@GLIBC_2.2.5 + 0 -``` - -The `0` passed as meta data to the dynamic linker means to use the relocation -at index [0] in the `.rela.plt` section. From the ELF specification we can -find how a relocation of type `rela` is defined: - -```c -// man 5 elf -typedef struct { - Elf64_Addr r_offset; - uint64_t r_info; - int64_t r_addend; -} Elf64_Rela; - -#define ELF64_R_SYM(i) ((i) >> 32) -#define ELF64_R_TYPE(i) ((i) & 0xffffffff) -``` - -`r_offset` holds the address to the GOT entry which the dynamic linker should -patch once it found the address of the requested symbol. -The offset here is `0x00404018` which is exactly the address of GOT[3], the -function pointer used in the **puts** trampoline. -From `r_info` the dynamic linker can find out which symbol it should look for. - -```c -ELF64_R_SYM(0x0000000200000007) -> 0x2 -``` - -The resulting index [2] is the offset into the dynamic symbol table -(`.dynsym`). Dumping the dynamic symbol table with readelf we can see that the -symbol at index [2] is **puts**. - -```console -# -s print symbols -> readelf -W -s ./main -Symbol table '.dynsym' contains 7 entries: - Num: Value Size Type Bind Vis Ndx Name - 0: 0000000000000000 0 NOTYPE LOCAL DEFAULT UND - 1: 0000000000000000 0 NOTYPE WEAK DEFAULT UND _ITM_deregisterTMCloneTable - 2: 0000000000000000 0 FUNC GLOBAL DEFAULT UND puts@GLIBC_2.2.5 (2) - 3: 0000000000000000 0 FUNC GLOBAL DEFAULT UND printf@GLIBC_2.2.5 (2) ---snip-- -``` - - -## Appendix: .dynamic section - -The `.dynamic` section of an ELF file contains important information for the -dynamic linking process and is created when linking the ELF file. - -The information can be accessed at runtime using following symbol -```c -extern Elf64_Dyn _DYNAMIC[]; -``` -which is an array of `Elf64_Dyn` entries -```c -typedef struct { - Elf64_Sxword d_tag; - union { - Elf64_Xword d_val; - Elf64_Addr d_ptr; - } d_un; -} Elf64_Dyn; -``` -> Since this meta-information is specific to an ELF file, every ELF file has -> its own `.dynamic` section and `_DYNAMIC` symbol. - -Following entries are most interesting for dynamic linking: - - d_tag | d_un | description --------------|-------|------------------------------------------------- - DT_PLTGOT | d_ptr | address of .got.plt - DT_JMPREL | d_ptr | address of .rela.plt - DT_PLTREL | d_val | DT_REL or DT_RELA - DT_PLTRELSZ | d_val | size of .rela.plt table - DT_RELENT | d_val | size of a single REL entry (PLTREL == DT_REL) - DT_RELAENT | d_val | size of a single RELA entry (PLTREL == DT_RELA) - -
- -We can use readelf to dump the `.dynamic` section. In the following snippet I -only kept the relevant entries: -```console -# -d dump .dynamic section -> readelf -d ./main - -Dynamic section at offset 0x2e10 contains 24 entries: - Tag Type Name/Value - 0x0000000000000003 (PLTGOT) 0x404000 - 0x0000000000000002 (PLTRELSZ) 48 (bytes) - 0x0000000000000014 (PLTREL) RELA - 0x0000000000000017 (JMPREL) 0x4004d8 - 0x0000000000000009 (RELAENT) 24 (bytes) -``` - -We can see that **PLTGOT** points to address **0x404000** which is the address -of the GOT as we saw in the [radare2 dump](#code-gotplt-dump). -Also we can see that **JMPREL** points to the [relocation table](#code-relaplt-dump). -**PLTRELSZ / RELAENT** tells us that we have 2 relocation entries which are -exactly the ones for **puts** and **printf**. - - -## References -- [`man 5 elf`][man-elf] -- [Executable and Linking Format (ELF)][elf-1.2] -- [SystemV ABI 4.1][systemv-abi-4.1] -- [SystemV ABI 1.0 (x86_64)][systemv-abi-1.0-x86_64] -- [`man 1 readelf`][man-readelf] - - -[r2]: https://rada.re/n/radare2.html -[man-elf]: http://man7.org/linux/man-pages/man5/elf.5.html -[man-readelf]: http://man7.org/linux/man-pages/man1/readelf.1.html -[elf-1.2]: http://refspecs.linuxbase.org/elf/elf.pdf -[systemv-abi-4.1]: https://refspecs.linuxfoundation.org/elf/gabi41.pdf -[systemv-abi-1.0-x86_64]: https://github.com/hjl-tools/x86-psABI/wiki/x86-64-psABI-1.0.pdf - - -- cgit v1.2.3