aboutsummaryrefslogtreecommitdiffhomepage
path: root/content/20191118-dynamic-linking-linux-x86_64.md
diff options
context:
space:
mode:
authorjohannst <johannes.stoelp@gmail.com>2021-07-15 21:20:14 +0200
committerjohannst <johannes.stoelp@gmail.com>2021-07-15 21:20:14 +0200
commit82e9ac4163b46b59e121194f84ac370818482923 (patch)
treeb52adf8f5b2bafe7904f563c33ec23f46cb7c87c /content/20191118-dynamic-linking-linux-x86_64.md
parent617d73fc9eff5b08a80b873fd97f66caa7e80fc9 (diff)
downloadblog-82e9ac4163b46b59e121194f84ac370818482923.tar.gz
blog-82e9ac4163b46b59e121194f84ac370818482923.zip
use proper date fmt in content file names that zola can automatically can derive the date
Diffstat (limited to 'content/20191118-dynamic-linking-linux-x86_64.md')
-rw-r--r--content/20191118-dynamic-linking-linux-x86_64.md339
1 files changed, 0 insertions, 339 deletions
diff --git a/content/20191118-dynamic-linking-linux-x86_64.md b/content/20191118-dynamic-linking-linux-x86_64.md
deleted file mode 100644
index 9265671..0000000
--- a/content/20191118-dynamic-linking-linux-x86_64.md
+++ /dev/null
@@ -1,339 +0,0 @@
-+++
-title = "Dynamic linking on Linux (x86_64)"
-date = 2019-11-18
-
-[taxonomies]
-tags = ["elf", "linux", "x86"]
-+++
-
-As I was interested in how the bits behind dynamic linking work, this article
-is about exploring this topic.
-However, since dynamic linking strongly depends on the OS, the architecture and
-the binary format, I only focus on one combination here.
-Spending most of my time with Linux on `x86` or `ARM` I chose the following
-for this article:
-- OS: Linux
-- arch: x86_64
-- binfmt: [`Executable and Linking Format (ELF)`][elf-1.2]
-
-## Introduction to dynamic linking
-
-Dynamic linking is used in the case we have non-statically linked applications.
-This means an application uses code which is not included in the application
-itself, but in a shared library. The shared libraries in turn can be used by
-multiple applications.
-The applications contain `relocation` entries which need to be resolved during
-runtime, because shared libraries are compiled as `position independant code
-(PIC)` so that they can be loaded at any any address in the applications
-virtual address space.
-This process of resolving the relocation entries at runtime is what I am
-referring as dynamic linking in this article.
-
-The following figure shows a simple example, where we have an application
-**foo** using a function **bar** from the shared library **libbar.so**. The
-boxes show the virtual memory mapping for **foo** over time where time
-increases to the right.
-```
- foo foo
- +-----------+ +-----------+
- | | | |
- +-----------+ +-----------+
- | .text.foo | | .text.foo |
- | | | |
- | ... | trigger resolve reloc | ... |
-pc->| call bar | X----+ | call bar |--+
- | ... | | | ... | |
- +-----------+ | +-----------+ |
- | | | | | |
- | | | | | |
- +-----------+ | +-----------+ |
- | .text.bar | | | .text.bar | |
- | ... | | | ... | |
- | bar: | +---->[ld.so]----> | bar: |<-+pc
- | ... | | ... |
- +-----------+ +-----------+
- | | | |
- +-----------+ +-----------+
-
-```
-
-## Conceptual overview && important parts of "the" ELF
-
-> In the following I assume a basic understanding of the ELF binary format.
-
-Before jumping into the details of dynamic linking it is important to get an
-conceptual overview, as well as to understand which sections of the ELF file
-actually matter.
-
-<br>
-
-On x86 calling a function in a shared library works via one indirect jump.
-When the application wants to call a function in a shared library it jumps to a
-well know location contained in the code of the application, called a
-`trampoline`. From there the application then jumps to a function pointer
-stored in a global table (`GOT = global offset table`). The application
-contains **one** trampoline per function used from a shared library.
-
-When the application jumps to a trampoline for the first time the trampoline
-will dispatch to the dynamic linker with the request to resolve the symbol.
-Once the dynamic linker found the address of the symbol it patches the function
-pointer in the `GOT` so that consecutive calls directly dispatch to the library
-function.
-```
- foo: GOT
- ... +------------+
-+---- call bar_trampoline +- | 0xcafeface | [0] resolve (dynamic linker)
-| call bar_trampoline | +------------+
-| ... | | 0xcafeface | [1] resolve (dynamic linker)
-| | +------------+
-+-> bar_trampoline: |
- jump GOT[0] <-----------+
- bar2_trampoline:
- jump GOT[1]
-```
-Once this is done, further calls to this symbol will be directly forwarded to
-the correct address from the corresponding trampoline.
-```
- foo: GOT
- ... +------------+
- call bar_trampoline +- | 0x01234567 | [0] bar (libbar.so)
-+---- call bar_trampoline | +------------+
-| .... | | 0xcafeface | [1] resolve (dynamic linker)
-| | +------------+
-+-> bar_trampoline: |
- jump GOT[0] <-----------+
- bar2_trampoline:
- jump GOT[1]
-```
-
----
-
-With that in mind we can take a look and check which sections of the ELF file
-are important for the dynamic linking process.
-- `.plt`
-> This section contains all the trampolines for the external functions used by
-> the ELF file
-- `.got.plt`
-> This section contains the global offset table `GOT` for this ELF files trampolines.
-- `.rel.plt` / `.rela.plt`
-> This section holds the `relocation` entries, which are used by the dynamic
-> linker to find which symbol needs to be resolved and which location in the
-> `GOT` to be patched. (Whether it is `rel` or `rela` depends on the
-> **DT_PLTREL** entry in the [`.dynamic` section](#dynamic-section))
-
-
-## The bits behind dynamic linking
-
-Now that we have the basic concept and know which sections of the ELF file
-matter we can take a look at an actual example. For the analysis I am going to
-use the following C program and build it explicitly as non `position
-independant executable (PIE)`.
-
-> Using `-no-pie` has no functional impact, it is only used to get absolute
-> virtual addresses in the ELF file, which makes the analysis easier to follow.
-
-```cpp
-// main.c
-#include <stdio.h>
-int main(int argc, const char* argv[]) {
- printf("%s argc=%d\n", argv[0], argc);
- puts("done");
- return 0;
-}
-```
-
-```console
-> gcc -o main main.c -no-pie
-```
-
-We use [radare2][r2] to open the compiled file and print the disassembly of
-the `.got.plt` and `.plt` sections.
-
-```nasm
-> r2 -A ./main
---snip--
-[0x00401050]> pd5 @ section..got.plt
- ;-- section..got.plt:
- ;-- _GLOBAL_OFFSET_TABLE_:
- [0] 0x00404000 .qword 0x0000000000403e10 ; section..dynamic ; sym..dynamic
- [1] 0x00404008 .qword 0x0000000000000000
- [2] 0x00404010 .qword 0x0000000000000000
- ;-- reloc.puts:
- [3] 0x00404018 .qword 0x0000000000401036
- ;-- reloc.printf:
- [4] 0x00404020 .qword 0x0000000000401046
-
-[0x00401050]> pd9 @ section..plt
- ;-- section..plt:
- ┌┌─> 0x00401020 ff35e22f0000 push qword [0x00404008]
- ╎╎ 0x00401026 ff25e42f0000 jmp qword [0x00404010]
- ╎╎ 0x0040102c 0f1f4000 nop dword [rax]
- int sym.imp.puts (const char *s);
- ╎╎ 0x00401030 ff25e22f0000 jmp qword [reloc.puts] ; 0x00404018
- ╎╎ 0x00401036 6800000000 push 0
- └──< 0x0040103b e9e0ffffff jmp sym..plt
- int sym.imp.printf (const char *format);
- ╎ 0x00401040 ff25da2f0000 jmp qword [reloc.printf] ; 0x00404020
- ╎ 0x00401046 6801000000 push 1
- └─< 0x0040104b e9d0ffffff jmp sym..plt
-[0x00401050]>
-```
-
-Taking a quick look at the `.got.plt` section we see the *global offset table GOT*.
-The entries *GOT[0..2]* have special meanings, *GOT[0]* holds the address of the
-[`.dynamic` section](#dynamic-section) for this ELF file, *GOT[1..2]* will be
-filled by the dynamic linker at program startup.
-Entries *GOT[3]* and *GOT[4]* contain the function pointers for **puts** and
-**printf** accordingly.
-
-<br>
-
-In the `.plt` section we can find three trampolines
-1. `0x00401020` dispatch to runtime linker (special role)
-1. `0x00401030` **puts**
-1. `0x00401040` **printf**
-
-Looking at the **puts** trampoline we can see that the first instruction jumps
-to a location stored at `0x00404018` (reloc.puts) which is the GOT[3]. In the
-beginning this entry contains the address of the `push 0` instruction coming
-right after the `jmp`. This push instruction sets up some meta data for the
-dynamic linker. The next instruction then jumps into the first trampoline,
-which pushes more meta data (GOT[1]) onto the stack and then jumps to the
-address stored in GOT[2].
-> GOT[1] & GOT[2] are zero here because they get filled by the dynamic linker
-> at program startup.
-
-
-<br>
-
-To understand the `push 0` instruction in the **puts** trampoline we have to
-take a look at the third section of interest in the ELF file, the `.rela.plt`
-section.
-
-```console
-# -r print relocations
-# -D use .dynamic info when displaying info
-> readelf -W -r ./main
---snip--
-Relocation section '.rela.plt' at offset 0x4004d8 contains 2 entries:
- Offset Info Type Symbol's Value Symbol's Name + Addend
-0000000000404018 0000000200000007 R_X86_64_JUMP_SLOT 0000000000000000 puts@GLIBC_2.2.5 + 0
-0000000000404020 0000000300000007 R_X86_64_JUMP_SLOT 0000000000000000 printf@GLIBC_2.2.5 + 0
-```
-
-The `0` passed as meta data to the dynamic linker means to use the relocation
-at index [0] in the `.rela.plt` section. From the ELF specification we can
-find how a relocation of type `rela` is defined:
-
-```c
-// man 5 elf
-typedef struct {
- Elf64_Addr r_offset;
- uint64_t r_info;
- int64_t r_addend;
-} Elf64_Rela;
-
-#define ELF64_R_SYM(i) ((i) >> 32)
-#define ELF64_R_TYPE(i) ((i) & 0xffffffff)
-```
-
-`r_offset` holds the address to the GOT entry which the dynamic linker should
-patch once it found the address of the requested symbol.
-The offset here is `0x00404018` which is exactly the address of GOT[3], the
-function pointer used in the **puts** trampoline.
-From `r_info` the dynamic linker can find out which symbol it should look for.
-
-```c
-ELF64_R_SYM(0x0000000200000007) -> 0x2
-```
-
-The resulting index [2] is the offset into the dynamic symbol table
-(`.dynsym`). Dumping the dynamic symbol table with readelf we can see that the
-symbol at index [2] is **puts**.
-
-```console
-# -s print symbols
-> readelf -W -s ./main
-Symbol table '.dynsym' contains 7 entries:
- Num: Value Size Type Bind Vis Ndx Name
- 0: 0000000000000000 0 NOTYPE LOCAL DEFAULT UND
- 1: 0000000000000000 0 NOTYPE WEAK DEFAULT UND _ITM_deregisterTMCloneTable
- 2: 0000000000000000 0 FUNC GLOBAL DEFAULT UND puts@GLIBC_2.2.5 (2)
- 3: 0000000000000000 0 FUNC GLOBAL DEFAULT UND printf@GLIBC_2.2.5 (2)
---snip--
-```
-
-
-## Appendix: .dynamic section
-
-The `.dynamic` section of an ELF file contains important information for the
-dynamic linking process and is created when linking the ELF file.
-
-The information can be accessed at runtime using following symbol
-```c
-extern Elf64_Dyn _DYNAMIC[];
-```
-which is an array of `Elf64_Dyn` entries
-```c
-typedef struct {
- Elf64_Sxword d_tag;
- union {
- Elf64_Xword d_val;
- Elf64_Addr d_ptr;
- } d_un;
-} Elf64_Dyn;
-```
-> Since this meta-information is specific to an ELF file, every ELF file has
-> its own `.dynamic` section and `_DYNAMIC` symbol.
-
-Following entries are most interesting for dynamic linking:
-
- d_tag | d_un | description
--------------|-------|-------------------------------------------------
- DT_PLTGOT | d_ptr | address of .got.plt
- DT_JMPREL | d_ptr | address of .rela.plt
- DT_PLTREL | d_val | DT_REL or DT_RELA
- DT_PLTRELSZ | d_val | size of .rela.plt table
- DT_RELENT | d_val | size of a single REL entry (PLTREL == DT_REL)
- DT_RELAENT | d_val | size of a single RELA entry (PLTREL == DT_RELA)
-
-<br>
-
-We can use readelf to dump the `.dynamic` section. In the following snippet I
-only kept the relevant entries:
-```console
-# -d dump .dynamic section
-> readelf -d ./main
-
-Dynamic section at offset 0x2e10 contains 24 entries:
- Tag Type Name/Value
- 0x0000000000000003 (PLTGOT) 0x404000
- 0x0000000000000002 (PLTRELSZ) 48 (bytes)
- 0x0000000000000014 (PLTREL) RELA
- 0x0000000000000017 (JMPREL) 0x4004d8
- 0x0000000000000009 (RELAENT) 24 (bytes)
-```
-
-We can see that **PLTGOT** points to address **0x404000** which is the address
-of the GOT as we saw in the [radare2 dump](#code-gotplt-dump).
-Also we can see that **JMPREL** points to the [relocation table](#code-relaplt-dump).
-**PLTRELSZ / RELAENT** tells us that we have 2 relocation entries which are
-exactly the ones for **puts** and **printf**.
-
-
-## References
-- [`man 5 elf`][man-elf]
-- [Executable and Linking Format (ELF)][elf-1.2]
-- [SystemV ABI 4.1][systemv-abi-4.1]
-- [SystemV ABI 1.0 (x86_64)][systemv-abi-1.0-x86_64]
-- [`man 1 readelf`][man-readelf]
-
-
-[r2]: https://rada.re/n/radare2.html
-[man-elf]: http://man7.org/linux/man-pages/man5/elf.5.html
-[man-readelf]: http://man7.org/linux/man-pages/man1/readelf.1.html
-[elf-1.2]: http://refspecs.linuxbase.org/elf/elf.pdf
-[systemv-abi-4.1]: https://refspecs.linuxfoundation.org/elf/gabi41.pdf
-[systemv-abi-1.0-x86_64]: https://github.com/hjl-tools/x86-psABI/wiki/x86-64-psABI-1.0.pdf
-
-