diff options
-rw-r--r-- | content/20191027-kernel-debugging-qemu.md | 196 | ||||
-rw-r--r-- | content/20191027-kernel-debugging-qemu/build_initrd.sh | 50 | ||||
-rw-r--r-- | content/20191027-kernel-debugging-qemu/build_kernel.sh | 38 | ||||
-rw-r--r-- | content/20191118-dynamic-linking-linux-x86_64.md | 339 | ||||
-rw-r--r-- | content/_index.md | 3 |
5 files changed, 626 insertions, 0 deletions
diff --git a/content/20191027-kernel-debugging-qemu.md b/content/20191027-kernel-debugging-qemu.md new file mode 100644 index 0000000..5189f86 --- /dev/null +++ b/content/20191027-kernel-debugging-qemu.md @@ -0,0 +1,196 @@ ++++ +title = "Linux Kernel debugging with QEMU" +date = 2019-10-27 + +[taxonomies] +tags = ["linux", "qemu"] ++++ + +The other evening while starring at some Linux kernel code I thought, let me +setup a minimal environment so I can easily step through the code and examine +the state. + +I ended up creating: +- a [Linux kernel][linux-kernel] with minimal configuration +- a minimal [ramdisk][initrd] to boot into which is based on [busybox][busybox] + +In the remaing part of this article we will go through each step by first +building the kernel, then building the initrd and then running the kernel using +[QEMU][qemu] and debugging it with [GDB][gdb]. + +## $> make kernel + +Before building the kernel we first need to generate a configuration. As a +starting point we generate a minimal config with the `make tinyconfig` make +target. Running this command will generate a `.config` file. After generating +the initial config file we customize the kernel using the merge fragment flow. +This allows us to merge a fragment file into the current configuration by +running the `scripts/kconfig/merge_config.sh` script. + +Let's quickly go over some customizations we do. +The following two lines enable support for gzipped initramdisks: +```config +CONFIG_BLK_DEV_INITRD=y +CONFIG_RD_GZIP=y +``` +The next two configurations are important as they enable the binary loaders for +[ELF][binfmt-elf] and [script #!][binfmt-script] files. +```config +CONFIG_BINFMT_ELF=y +CONFIG_BINFMT_SCRIPT=y +``` + +> Note: In the cursed based configuration `make menuconfig` we can search for +> configurations using the `/` key and then select a match using the number keys. +> After selecting a match we can check the `Help` to get a description for the +> configuration parameter. + +Building the kernel with the default make target will give us the following two +files: +- `vmlinux` statically linked kernel (ELF file) containing symbol information for debugging +- `arch/x86_64/boot/bzImage` compressed kernel image for booting + +Full configure & build script: +```sh +{{ include(path="content/20191027-kernel-debugging-qemu/build_kernel.sh") }} +``` + +## $> make initrd + +Next step is to build the initrd which we base on [busybox][busybox]. Therefore +we first build the busybox project in its default configuration with one +change, we enable following configuration to build a static binary so it can be +used stand-alone: +```sh +sed -i 's/# CONFIG_STATIC .*/CONFIG_STATIC=y/' .config +``` + +One important step before creating the final initrd is to create an init +process. This will be the first process executed in userspace after the kernel +finished its initialization. We just create a script that drops us into a +shell: +```sh +cat <<EOF > init +#!/bin/sh + +mount -t proc none /proc +mount -t sysfs none /sys + +exec setsid cttyhack sh +EOF +``` +> By default the kernel looks for `/sbin/init` in the root file system, but the +> location can optionally be specified with the [`init=`][kernel-param] kernel +> parameter. + +Full busybox & initrd build script: +```sh +{{ include(path="content/20191027-kernel-debugging-qemu/build_initrd.sh") }} +``` + +## Running QEMU && GDB + +After finishing the previous steps we have all we need to run and debug the +kernel. We have `arch/x86/boot/bzImage` and `initramfs.cpio.gz` to boot the +kernel into a shell and we have `vmlinux` to feed the debugger with debug +symbols. + +We start QEMU as follows, thanks to the `-S` flag the CPU will freeze until we +connected the debugger: +```sh +# -S freeze CPU until debugger connected +> qemu-system-x86_64 \ + -kernel ./linux-5.3.7/arch/x86/boot/bzImage \ + -nographic \ + -append "earlyprintk=ttyS0 console=ttyS0 nokaslr init=/init debug" \ + -initrd ./initramfs.cpio.gz \ + -gdb tcp::1234 \ + -S +``` + +Then we can start GDB and connect to the GDB server running in QEMU (configured +via `-gdb tcp::1234`). From now on we can start to debug through the +kernel. +```sh +> gdb linux-5.3.7/vmlinux -ex 'target remote :1234' +(gdb) b do_execve +Breakpoint 1 at 0xffffffff810a1a60: file fs/exec.c, line 1885. +(gdb) c +Breakpoint 1, do_execve (filename=0xffff888000060000, __argv=0xffffffff8181e160 <argv_init>, __envp=0xffffffff8181e040 <envp_init>) at fs/exec.c:1885 +1885 return do_execveat_common(AT_FDCWD, filename, argv, envp, 0); +(gdb) bt +#0 do_execve (filename=0xffff888000060000, __argv=0xffffffff8181e160 <argv_init>, __envp=0xffffffff8181e040 <envp_init>) at fs/exec.c:1885 +#1 0xffffffff81000498 in run_init_process (init_filename=<optimized out>) at init/main.c:1048 +#2 0xffffffff81116b75 in kernel_init (unused=<optimized out>) at init/main.c:1129 +#3 0xffffffff8120014f in ret_from_fork () at arch/x86/entry/entry_64.S:352 +#4 0x0000000000000000 in ?? () +(gdb) +``` + +--- + +## Appendix: Try to get around `<optimized out>` + +When debugging the kernel we often face following situation in gdb: +```text +(gdb) frame +#0 do_execveat_common (fd=fd@entry=-100, filename=0xffff888000120000, argv=argv@entry=..., envp=envp@entry=..., flags=flags@entry=0) at fs/exec.c + +(gdb) info args +fd = <optimized out> +filename = 0xffff888000060000 +argv = <optimized out> +envp = <optimized out> +flags = <optimized out> +file = 0x0 +``` +The problem is that the Linux kernel requires certain code to be compiled with +optimizations enabled. + +In this situation we can "try" to reduce the optimization for single compilation +units or a subtree (try because, reducing the optimization could break the +build). To do so we adapt the Makefile in the corresponding directory. +```make +# fs/Makefile + +# configure for single compilation unit +CFLAGS_exec.o := -Og + +# configure for the whole subtree of where the Makefile resides +ccflags-y := -Og +``` + +After enabling optimize for debug experience `-Og` we can see the following now +in gdb: +```txt +(gdb) frame +#0 do_execveat_common (fd=fd@entry=-100, filename=0xffff888000120000, argv=argv@entry=..., envp=envp@entry=..., flags=flags@entry=0) at fs/exec.c + +(gdb) info args +fd = -100 +filename = 0xffff888000120000 +argv = {ptr = {native = 0x10c5980}} +envp = {ptr = {native = 0x10c5990}} +flags = 0 + +(gdb) p *filename +$3 = {name = 0xffff888000120020 "/bin/ls", uptr = 0x10c59b8 "/bin/ls", refcnt = 1, aname = 0x0, iname = 0xffff888000120020 "/bin/ls"} + +(gdb) ptype filename +type = struct filename { + const char *name; + const char *uptr; + int refcnt; + struct audit_names *aname; + const char iname[]; +} +``` + +[linux-kernel]: https://www.kernel.org +[initrd]: https://www.kernel.org/doc/html/latest/admin-guide/initrd.html +[busybox]: https://busybox.net +[qemu]: https://www.qemu.org +[gdb]: https://www.gnu.org/software/gdb +[binfmt-elf]: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/fs/binfmt_elf.c +[binfmt-script]: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/fs/binfmt_script.c +[kernel-param]: https://www.kernel.org/doc/html/latest/admin-guide/kernel-parameters.html diff --git a/content/20191027-kernel-debugging-qemu/build_initrd.sh b/content/20191027-kernel-debugging-qemu/build_initrd.sh new file mode 100644 index 0000000..74f9896 --- /dev/null +++ b/content/20191027-kernel-debugging-qemu/build_initrd.sh @@ -0,0 +1,50 @@ +#!/bin/bash + +set -e + +BUSYBOX=busybox-1.31.0 +INITRD=$PWD/initramfs.cpio.gz + +## Build busybox + +echo "[+] configure & build $BUSYBOX ..." +[[ ! -d $BUSYBOX ]] && { + wget https://busybox.net/downloads/$BUSYBOX.tar.bz2 + bunzip2 $BUSYBOX.tar.bz2 && tar xf $BUSYBOX.tar +} + +cd $BUSYBOX +make defconfig +sed -i 's/# CONFIG_STATIC .*/CONFIG_STATIC=y/' .config +make -j4 busybox +make install + +## Create initrd + +echo "[+] create initrd $INITRD ..." + +cd _install + +# 1. create initrd folder structure +mkdir -p bin sbin etc proc sys usr/bin usr/sbin dev + +# 2. create init process +cat <<EOF > init +#!/bin/sh + +mount -t proc none /proc +mount -t sysfs none /sys + +exec setsid cttyhack sh +EOF +chmod +x init + +# 3. create device nodes +sudo mknod dev/tty c 5 0 +sudo mknod dev/tty0 c 4 0 +sudo mknod dev/ttyS0 c 4 64 + +# 4. created compressed initrd +find . -print0 \ + | cpio --null -ov --format=newc \ + | gzip -9 > $INITRD diff --git a/content/20191027-kernel-debugging-qemu/build_kernel.sh b/content/20191027-kernel-debugging-qemu/build_kernel.sh new file mode 100644 index 0000000..f1e15bb --- /dev/null +++ b/content/20191027-kernel-debugging-qemu/build_kernel.sh @@ -0,0 +1,38 @@ +#!/bin/bash + +set -e + +LINUX=linux-5.3.7 +wget https://cdn.kernel.org/pub/linux/kernel/v5.x/$LINUX.tar.xz +unxz $LINUX.tar.xz && tar xf $LINUX.tar + +cd $LINUX + +cat <<EOF > kernel_fragment.config +# 64bit kernel +CONFIG_64BIT=y +# enable support for compressed initrd (gzip) +CONFIG_BLK_DEV_INITRD=y +CONFIG_RD_GZIP=y +# support for ELF and #! binary format +CONFIG_BINFMT_ELF=y +CONFIG_BINFMT_SCRIPT=y +# /dev +CONFIG_DEVTMPFS=y +CONFIG_DEVTMPFS_MOUNT=y +# tty & console +CONFIG_TTY=y +CONFIG_SERIAL_8250=y +CONFIG_SERIAL_8250_CONSOLE=y +# pseudo fs +CONFIG_PROC_FS=y +CONFIG_SYSFS=y +# debugging +CONFIG_DEBUG_INFO=y +CONFIG_PRINTK=y +CONFIG_EARLY_PRINTK=y +EOF + +make tinyconfig +./scripts/kconfig/merge_config.sh -n ./kernel_fragment.config +make -j4 diff --git a/content/20191118-dynamic-linking-linux-x86_64.md b/content/20191118-dynamic-linking-linux-x86_64.md new file mode 100644 index 0000000..9265671 --- /dev/null +++ b/content/20191118-dynamic-linking-linux-x86_64.md @@ -0,0 +1,339 @@ ++++ +title = "Dynamic linking on Linux (x86_64)" +date = 2019-11-18 + +[taxonomies] +tags = ["elf", "linux", "x86"] ++++ + +As I was interested in how the bits behind dynamic linking work, this article +is about exploring this topic. +However, since dynamic linking strongly depends on the OS, the architecture and +the binary format, I only focus on one combination here. +Spending most of my time with Linux on `x86` or `ARM` I chose the following +for this article: +- OS: Linux +- arch: x86_64 +- binfmt: [`Executable and Linking Format (ELF)`][elf-1.2] + +## Introduction to dynamic linking + +Dynamic linking is used in the case we have non-statically linked applications. +This means an application uses code which is not included in the application +itself, but in a shared library. The shared libraries in turn can be used by +multiple applications. +The applications contain `relocation` entries which need to be resolved during +runtime, because shared libraries are compiled as `position independant code +(PIC)` so that they can be loaded at any any address in the applications +virtual address space. +This process of resolving the relocation entries at runtime is what I am +referring as dynamic linking in this article. + +The following figure shows a simple example, where we have an application +**foo** using a function **bar** from the shared library **libbar.so**. The +boxes show the virtual memory mapping for **foo** over time where time +increases to the right. +``` + foo foo + +-----------+ +-----------+ + | | | | + +-----------+ +-----------+ + | .text.foo | | .text.foo | + | | | | + | ... | trigger resolve reloc | ... | +pc->| call bar | X----+ | call bar |--+ + | ... | | | ... | | + +-----------+ | +-----------+ | + | | | | | | + | | | | | | + +-----------+ | +-----------+ | + | .text.bar | | | .text.bar | | + | ... | | | ... | | + | bar: | +---->[ld.so]----> | bar: |<-+pc + | ... | | ... | + +-----------+ +-----------+ + | | | | + +-----------+ +-----------+ + +``` + +## Conceptual overview && important parts of "the" ELF + +> In the following I assume a basic understanding of the ELF binary format. + +Before jumping into the details of dynamic linking it is important to get an +conceptual overview, as well as to understand which sections of the ELF file +actually matter. + +<br> + +On x86 calling a function in a shared library works via one indirect jump. +When the application wants to call a function in a shared library it jumps to a +well know location contained in the code of the application, called a +`trampoline`. From there the application then jumps to a function pointer +stored in a global table (`GOT = global offset table`). The application +contains **one** trampoline per function used from a shared library. + +When the application jumps to a trampoline for the first time the trampoline +will dispatch to the dynamic linker with the request to resolve the symbol. +Once the dynamic linker found the address of the symbol it patches the function +pointer in the `GOT` so that consecutive calls directly dispatch to the library +function. +``` + foo: GOT + ... +------------+ ++---- call bar_trampoline +- | 0xcafeface | [0] resolve (dynamic linker) +| call bar_trampoline | +------------+ +| ... | | 0xcafeface | [1] resolve (dynamic linker) +| | +------------+ ++-> bar_trampoline: | + jump GOT[0] <-----------+ + bar2_trampoline: + jump GOT[1] +``` +Once this is done, further calls to this symbol will be directly forwarded to +the correct address from the corresponding trampoline. +``` + foo: GOT + ... +------------+ + call bar_trampoline +- | 0x01234567 | [0] bar (libbar.so) ++---- call bar_trampoline | +------------+ +| .... | | 0xcafeface | [1] resolve (dynamic linker) +| | +------------+ ++-> bar_trampoline: | + jump GOT[0] <-----------+ + bar2_trampoline: + jump GOT[1] +``` + +--- + +With that in mind we can take a look and check which sections of the ELF file +are important for the dynamic linking process. +- `.plt` +> This section contains all the trampolines for the external functions used by +> the ELF file +- `.got.plt` +> This section contains the global offset table `GOT` for this ELF files trampolines. +- `.rel.plt` / `.rela.plt` +> This section holds the `relocation` entries, which are used by the dynamic +> linker to find which symbol needs to be resolved and which location in the +> `GOT` to be patched. (Whether it is `rel` or `rela` depends on the +> **DT_PLTREL** entry in the [`.dynamic` section](#dynamic-section)) + + +## The bits behind dynamic linking + +Now that we have the basic concept and know which sections of the ELF file +matter we can take a look at an actual example. For the analysis I am going to +use the following C program and build it explicitly as non `position +independant executable (PIE)`. + +> Using `-no-pie` has no functional impact, it is only used to get absolute +> virtual addresses in the ELF file, which makes the analysis easier to follow. + +```cpp +// main.c +#include <stdio.h> +int main(int argc, const char* argv[]) { + printf("%s argc=%d\n", argv[0], argc); + puts("done"); + return 0; +} +``` + +```console +> gcc -o main main.c -no-pie +``` + +We use [radare2][r2] to open the compiled file and print the disassembly of +the `.got.plt` and `.plt` sections. + +```nasm +> r2 -A ./main +--snip-- +[0x00401050]> pd5 @ section..got.plt + ;-- section..got.plt: + ;-- _GLOBAL_OFFSET_TABLE_: + [0] 0x00404000 .qword 0x0000000000403e10 ; section..dynamic ; sym..dynamic + [1] 0x00404008 .qword 0x0000000000000000 + [2] 0x00404010 .qword 0x0000000000000000 + ;-- reloc.puts: + [3] 0x00404018 .qword 0x0000000000401036 + ;-- reloc.printf: + [4] 0x00404020 .qword 0x0000000000401046 + +[0x00401050]> pd9 @ section..plt + ;-- section..plt: + ┌┌─> 0x00401020 ff35e22f0000 push qword [0x00404008] + ╎╎ 0x00401026 ff25e42f0000 jmp qword [0x00404010] + ╎╎ 0x0040102c 0f1f4000 nop dword [rax] + int sym.imp.puts (const char *s); + ╎╎ 0x00401030 ff25e22f0000 jmp qword [reloc.puts] ; 0x00404018 + ╎╎ 0x00401036 6800000000 push 0 + └──< 0x0040103b e9e0ffffff jmp sym..plt + int sym.imp.printf (const char *format); + ╎ 0x00401040 ff25da2f0000 jmp qword [reloc.printf] ; 0x00404020 + ╎ 0x00401046 6801000000 push 1 + └─< 0x0040104b e9d0ffffff jmp sym..plt +[0x00401050]> +``` + +Taking a quick look at the `.got.plt` section we see the *global offset table GOT*. +The entries *GOT[0..2]* have special meanings, *GOT[0]* holds the address of the +[`.dynamic` section](#dynamic-section) for this ELF file, *GOT[1..2]* will be +filled by the dynamic linker at program startup. +Entries *GOT[3]* and *GOT[4]* contain the function pointers for **puts** and +**printf** accordingly. + +<br> + +In the `.plt` section we can find three trampolines +1. `0x00401020` dispatch to runtime linker (special role) +1. `0x00401030` **puts** +1. `0x00401040` **printf** + +Looking at the **puts** trampoline we can see that the first instruction jumps +to a location stored at `0x00404018` (reloc.puts) which is the GOT[3]. In the +beginning this entry contains the address of the `push 0` instruction coming +right after the `jmp`. This push instruction sets up some meta data for the +dynamic linker. The next instruction then jumps into the first trampoline, +which pushes more meta data (GOT[1]) onto the stack and then jumps to the +address stored in GOT[2]. +> GOT[1] & GOT[2] are zero here because they get filled by the dynamic linker +> at program startup. + + +<br> + +To understand the `push 0` instruction in the **puts** trampoline we have to +take a look at the third section of interest in the ELF file, the `.rela.plt` +section. + +```console +# -r print relocations +# -D use .dynamic info when displaying info +> readelf -W -r ./main +--snip-- +Relocation section '.rela.plt' at offset 0x4004d8 contains 2 entries: + Offset Info Type Symbol's Value Symbol's Name + Addend +0000000000404018 0000000200000007 R_X86_64_JUMP_SLOT 0000000000000000 puts@GLIBC_2.2.5 + 0 +0000000000404020 0000000300000007 R_X86_64_JUMP_SLOT 0000000000000000 printf@GLIBC_2.2.5 + 0 +``` + +The `0` passed as meta data to the dynamic linker means to use the relocation +at index [0] in the `.rela.plt` section. From the ELF specification we can +find how a relocation of type `rela` is defined: + +```c +// man 5 elf +typedef struct { + Elf64_Addr r_offset; + uint64_t r_info; + int64_t r_addend; +} Elf64_Rela; + +#define ELF64_R_SYM(i) ((i) >> 32) +#define ELF64_R_TYPE(i) ((i) & 0xffffffff) +``` + +`r_offset` holds the address to the GOT entry which the dynamic linker should +patch once it found the address of the requested symbol. +The offset here is `0x00404018` which is exactly the address of GOT[3], the +function pointer used in the **puts** trampoline. +From `r_info` the dynamic linker can find out which symbol it should look for. + +```c +ELF64_R_SYM(0x0000000200000007) -> 0x2 +``` + +The resulting index [2] is the offset into the dynamic symbol table +(`.dynsym`). Dumping the dynamic symbol table with readelf we can see that the +symbol at index [2] is **puts**. + +```console +# -s print symbols +> readelf -W -s ./main +Symbol table '.dynsym' contains 7 entries: + Num: Value Size Type Bind Vis Ndx Name + 0: 0000000000000000 0 NOTYPE LOCAL DEFAULT UND + 1: 0000000000000000 0 NOTYPE WEAK DEFAULT UND _ITM_deregisterTMCloneTable + 2: 0000000000000000 0 FUNC GLOBAL DEFAULT UND puts@GLIBC_2.2.5 (2) + 3: 0000000000000000 0 FUNC GLOBAL DEFAULT UND printf@GLIBC_2.2.5 (2) +--snip-- +``` + + +## Appendix: .dynamic section + +The `.dynamic` section of an ELF file contains important information for the +dynamic linking process and is created when linking the ELF file. + +The information can be accessed at runtime using following symbol +```c +extern Elf64_Dyn _DYNAMIC[]; +``` +which is an array of `Elf64_Dyn` entries +```c +typedef struct { + Elf64_Sxword d_tag; + union { + Elf64_Xword d_val; + Elf64_Addr d_ptr; + } d_un; +} Elf64_Dyn; +``` +> Since this meta-information is specific to an ELF file, every ELF file has +> its own `.dynamic` section and `_DYNAMIC` symbol. + +Following entries are most interesting for dynamic linking: + + d_tag | d_un | description +-------------|-------|------------------------------------------------- + DT_PLTGOT | d_ptr | address of .got.plt + DT_JMPREL | d_ptr | address of .rela.plt + DT_PLTREL | d_val | DT_REL or DT_RELA + DT_PLTRELSZ | d_val | size of .rela.plt table + DT_RELENT | d_val | size of a single REL entry (PLTREL == DT_REL) + DT_RELAENT | d_val | size of a single RELA entry (PLTREL == DT_RELA) + +<br> + +We can use readelf to dump the `.dynamic` section. In the following snippet I +only kept the relevant entries: +```console +# -d dump .dynamic section +> readelf -d ./main + +Dynamic section at offset 0x2e10 contains 24 entries: + Tag Type Name/Value + 0x0000000000000003 (PLTGOT) 0x404000 + 0x0000000000000002 (PLTRELSZ) 48 (bytes) + 0x0000000000000014 (PLTREL) RELA + 0x0000000000000017 (JMPREL) 0x4004d8 + 0x0000000000000009 (RELAENT) 24 (bytes) +``` + +We can see that **PLTGOT** points to address **0x404000** which is the address +of the GOT as we saw in the [radare2 dump](#code-gotplt-dump). +Also we can see that **JMPREL** points to the [relocation table](#code-relaplt-dump). +**PLTRELSZ / RELAENT** tells us that we have 2 relocation entries which are +exactly the ones for **puts** and **printf**. + + +## References +- [`man 5 elf`][man-elf] +- [Executable and Linking Format (ELF)][elf-1.2] +- [SystemV ABI 4.1][systemv-abi-4.1] +- [SystemV ABI 1.0 (x86_64)][systemv-abi-1.0-x86_64] +- [`man 1 readelf`][man-readelf] + + +[r2]: https://rada.re/n/radare2.html +[man-elf]: http://man7.org/linux/man-pages/man5/elf.5.html +[man-readelf]: http://man7.org/linux/man-pages/man1/readelf.1.html +[elf-1.2]: http://refspecs.linuxbase.org/elf/elf.pdf +[systemv-abi-4.1]: https://refspecs.linuxfoundation.org/elf/gabi41.pdf +[systemv-abi-1.0-x86_64]: https://github.com/hjl-tools/x86-psABI/wiki/x86-64-psABI-1.0.pdf + + diff --git a/content/_index.md b/content/_index.md new file mode 100644 index 0000000..8bc0069 --- /dev/null +++ b/content/_index.md @@ -0,0 +1,3 @@ ++++ +sort_by = "date" ++++ |