From 82e9ac4163b46b59e121194f84ac370818482923 Mon Sep 17 00:00:00 2001 From: johannst Date: Thu, 15 Jul 2021 21:20:14 +0200 Subject: use proper date fmt in content file names that zola can automatically can derive the date --- content/2019-10-27-kernel-debugging-qemu.md | 225 ++++++++++++++ .../2019-10-27-kernel-debugging-qemu/Dockerfile | 32 ++ content/2019-10-27-kernel-debugging-qemu/Makefile | 12 + .../build_initrd.sh | 54 ++++ .../build_kernel.sh | 38 +++ content/2019-10-27-kernel-debugging-qemu/run.sh | 24 ++ content/2019-11-18-dynamic-linking-linux-x86_64.md | 338 ++++++++++++++++++++ content/20191027-kernel-debugging-qemu.md | 226 -------------- content/20191027-kernel-debugging-qemu/Dockerfile | 32 -- content/20191027-kernel-debugging-qemu/Makefile | 12 - .../20191027-kernel-debugging-qemu/build_initrd.sh | 54 ---- .../20191027-kernel-debugging-qemu/build_kernel.sh | 38 --- content/20191027-kernel-debugging-qemu/run.sh | 24 -- content/20191118-dynamic-linking-linux-x86_64.md | 339 --------------------- content/2021-05-15-pthread_cancel-noexcept.md | 110 +++++++ .../2021-05-15-pthread_cancel-noexcept/thread.cc | 40 +++ content/20210515-pthread_cancel-noexcept.md | 111 ------- content/20210515-pthread_cancel-noexcept/thread.cc | 40 --- 18 files changed, 873 insertions(+), 876 deletions(-) create mode 100644 content/2019-10-27-kernel-debugging-qemu.md create mode 100644 content/2019-10-27-kernel-debugging-qemu/Dockerfile create mode 100644 content/2019-10-27-kernel-debugging-qemu/Makefile create mode 100755 content/2019-10-27-kernel-debugging-qemu/build_initrd.sh create mode 100755 content/2019-10-27-kernel-debugging-qemu/build_kernel.sh create mode 100755 content/2019-10-27-kernel-debugging-qemu/run.sh create mode 100644 content/2019-11-18-dynamic-linking-linux-x86_64.md delete mode 100644 content/20191027-kernel-debugging-qemu.md delete mode 100644 content/20191027-kernel-debugging-qemu/Dockerfile delete mode 100644 content/20191027-kernel-debugging-qemu/Makefile delete mode 100755 content/20191027-kernel-debugging-qemu/build_initrd.sh delete mode 100755 content/20191027-kernel-debugging-qemu/build_kernel.sh delete mode 100755 content/20191027-kernel-debugging-qemu/run.sh delete mode 100644 content/20191118-dynamic-linking-linux-x86_64.md create mode 100644 content/2021-05-15-pthread_cancel-noexcept.md create mode 100644 content/2021-05-15-pthread_cancel-noexcept/thread.cc delete mode 100644 content/20210515-pthread_cancel-noexcept.md delete mode 100644 content/20210515-pthread_cancel-noexcept/thread.cc diff --git a/content/2019-10-27-kernel-debugging-qemu.md b/content/2019-10-27-kernel-debugging-qemu.md new file mode 100644 index 0000000..cdf68aa --- /dev/null +++ b/content/2019-10-27-kernel-debugging-qemu.md @@ -0,0 +1,225 @@ ++++ +title = "Linux Kernel debugging with QEMU" + +[taxonomies] +tags = ["linux", "qemu"] ++++ + +**EDIT**: +- 2021-07-15: Added `Appendix: Dockerfile for Kernel development` and updated + busybox + Kernel versions. + +The other evening while starring at some Linux kernel code I thought, let me +setup a minimal environment so I can easily step through the code and examine +the state. + +I ended up creating: +- a [Linux kernel][linux-kernel] with minimal configuration +- a minimal [ramdisk][initrd] to boot into which is based on [busybox][busybox] + +In the remaing part of this article we will go through each step by first +building the kernel, then building the initrd and then running the kernel using +[QEMU][qemu] and debugging it with [GDB][gdb]. + +## $> make kernel + +Before building the kernel we first need to generate a configuration. As a +starting point we generate a minimal config with the `make tinyconfig` make +target. Running this command will generate a `.config` file. After generating +the initial config file we customize the kernel using the merge fragment flow. +This allows us to merge a fragment file into the current configuration by +running the `scripts/kconfig/merge_config.sh` script. + +Let's quickly go over some customizations we do. +The following two lines enable support for gzipped initramdisks: +```config +CONFIG_BLK_DEV_INITRD=y +CONFIG_RD_GZIP=y +``` +The next two configurations are important as they enable the binary loaders for +[ELF][binfmt-elf] and [script #!][binfmt-script] files. +```config +CONFIG_BINFMT_ELF=y +CONFIG_BINFMT_SCRIPT=y +``` + +> Note: In the cursed based configuration `make menuconfig` we can search for +> configurations using the `/` key and then select a match using the number keys. +> After selecting a match we can check the `Help` to get a description for the +> configuration parameter. + +Building the kernel with the default make target will give us the following two +files: +- `vmlinux` statically linked kernel (ELF file) containing symbol information for debugging +- `arch/x86_64/boot/bzImage` compressed kernel image for booting + +Full configure & build script: +```sh +{{ include(path="content/20191027-kernel-debugging-qemu/build_kernel.sh") }} +``` + +## $> make initrd + +Next step is to build the initrd which we base on [busybox][busybox]. Therefore +we first build the busybox project in its default configuration with one +change, we enable following configuration to build a static binary so it can be +used stand-alone: +```sh +sed -i 's/# CONFIG_STATIC .*/CONFIG_STATIC=y/' .config +``` + +One important step before creating the final initrd is to create an init +process. This will be the first process executed in userspace after the kernel +finished its initialization. We just create a script that drops us into a +shell: +```sh +cat < init +#!/bin/sh + +mount -t proc none /proc +mount -t sysfs none /sys + +exec setsid cttyhack sh +EOF +``` +> By default the kernel looks for `/sbin/init` in the root file system, but the +> location can optionally be specified with the [`init=`][kernel-param] kernel +> parameter. + +Full busybox & initrd build script: +```sh +{{ include(path="content/20191027-kernel-debugging-qemu/build_initrd.sh") }} +``` + +## Running QEMU && GDB + +After finishing the previous steps we have all we need to run and debug the +kernel. We have `arch/x86/boot/bzImage` and `initramfs.cpio.gz` to boot the +kernel into a shell and we have `vmlinux` to feed the debugger with debug +symbols. + +We start QEMU as follows, thanks to the `-S` flag the CPU will freeze until we +connected the debugger: +```sh +# -S freeze CPU until debugger connected +> qemu-system-x86_64 \ + -kernel ./linux-5.3.7/arch/x86/boot/bzImage \ + -nographic \ + -append "earlyprintk=ttyS0 console=ttyS0 nokaslr init=/init debug" \ + -initrd ./initramfs.cpio.gz \ + -gdb tcp::1234 \ + -S +``` + +Then we can start GDB and connect to the GDB server running in QEMU (configured +via `-gdb tcp::1234`). From now on we can start to debug through the +kernel. +```sh +> gdb linux-5.3.7/vmlinux -ex 'target remote :1234' +(gdb) b do_execve +Breakpoint 1 at 0xffffffff810a1a60: file fs/exec.c, line 1885. +(gdb) c +Breakpoint 1, do_execve (filename=0xffff888000060000, __argv=0xffffffff8181e160 , __envp=0xffffffff8181e040 ) at fs/exec.c:1885 +1885 return do_execveat_common(AT_FDCWD, filename, argv, envp, 0); +(gdb) bt +#0 do_execve (filename=0xffff888000060000, __argv=0xffffffff8181e160 , __envp=0xffffffff8181e040 ) at fs/exec.c:1885 +#1 0xffffffff81000498 in run_init_process (init_filename=) at init/main.c:1048 +#2 0xffffffff81116b75 in kernel_init (unused=) at init/main.c:1129 +#3 0xffffffff8120014f in ret_from_fork () at arch/x86/entry/entry_64.S:352 +#4 0x0000000000000000 in ?? () +(gdb) +``` + +--- + +## Appendix: Try to get around `` + +When debugging the kernel we often face following situation in gdb: +```text +(gdb) frame +#0 do_execveat_common (fd=fd@entry=-100, filename=0xffff888000120000, argv=argv@entry=..., envp=envp@entry=..., flags=flags@entry=0) at fs/exec.c + +(gdb) info args +fd = +filename = 0xffff888000060000 +argv = +envp = +flags = +file = 0x0 +``` +The problem is that the Linux kernel requires certain code to be compiled with +optimizations enabled. + +In this situation we can "try" to reduce the optimization for single compilation +units or a subtree (try because, reducing the optimization could break the +build). To do so we adapt the Makefile in the corresponding directory. +```make +# fs/Makefile + +# configure for single compilation unit +CFLAGS_exec.o := -Og + +# configure for the whole subtree of where the Makefile resides +ccflags-y := -Og +``` + +After enabling optimize for debug experience `-Og` we can see the following now +in gdb: +```txt +(gdb) frame +#0 do_execveat_common (fd=fd@entry=-100, filename=0xffff888000120000, argv=argv@entry=..., envp=envp@entry=..., flags=flags@entry=0) at fs/exec.c + +(gdb) info args +fd = -100 +filename = 0xffff888000120000 +argv = {ptr = {native = 0x10c5980}} +envp = {ptr = {native = 0x10c5990}} +flags = 0 + +(gdb) p *filename +$3 = {name = 0xffff888000120020 "/bin/ls", uptr = 0x10c59b8 "/bin/ls", refcnt = 1, aname = 0x0, iname = 0xffff888000120020 "/bin/ls"} + +(gdb) ptype filename +type = struct filename { + const char *name; + const char *uptr; + int refcnt; + struct audit_names *aname; + const char iname[]; +} +``` + +## Appendix: `Dockerfile` for Kernel development + +The following `Dockerfile` provides a development environment with all the +required tools and dependencies, to re-produce all the steps of building and +debugging the Linux kernel. +```dockerfile +{{ include(path="content/20191027-kernel-debugging-qemu/Dockerfile") }} +``` + +Save the listing above in a file called `Dockerfile` and build the docker image +as follows. +```sh +docker build -t kernel-dev +``` +> Optionally set `DOCKER_BUILDKIT=1` to use the newer image builder. + +Once the image has been built, an interactive container can be launched as +follows. +```sh +# Some options for conveniene: +# -v : Mount host path to guest path. +# --rm Remove the container after exiting. + +docker run -it kernel-dev +``` + +[linux-kernel]: https://www.kernel.org +[initrd]: https://www.kernel.org/doc/html/latest/admin-guide/initrd.html +[busybox]: https://busybox.net +[qemu]: https://www.qemu.org +[gdb]: https://www.gnu.org/software/gdb +[binfmt-elf]: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/fs/binfmt_elf.c +[binfmt-script]: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/fs/binfmt_script.c +[kernel-param]: https://www.kernel.org/doc/html/latest/admin-guide/kernel-parameters.html diff --git a/content/2019-10-27-kernel-debugging-qemu/Dockerfile b/content/2019-10-27-kernel-debugging-qemu/Dockerfile new file mode 100644 index 0000000..42e1f05 --- /dev/null +++ b/content/2019-10-27-kernel-debugging-qemu/Dockerfile @@ -0,0 +1,32 @@ +FROM ubuntu:20.04 +MAINTAINER Johannes Stoelp + +RUN apt update \ + && DEBIAN_FRONTEND=noninteractive \ + apt install \ + --yes \ + --no-install-recommends \ + # Download & unpack. + wget \ + ca-certificates \ + xz-utils \ + # Build tools & deps (kernel). + make \ + bc \ + gcc g++ \ + flex bison \ + libelf-dev \ + # Build tools & deps (initrd). + cpio \ + # Run & debug. + qemu-system-x86 \ + gdb \ + telnet \ + # Convenience. + ripgrep \ + fd-find \ + neovim \ + && rm -rf /var/lib/apt/lists/* \ + && apt-get clean + +WORKDIR /develop diff --git a/content/2019-10-27-kernel-debugging-qemu/Makefile b/content/2019-10-27-kernel-debugging-qemu/Makefile new file mode 100644 index 0000000..11e7c7b --- /dev/null +++ b/content/2019-10-27-kernel-debugging-qemu/Makefile @@ -0,0 +1,12 @@ +build: + scripts/build_kernel.sh + scripts/build_initrd.sh + +clean: + $(RM) -r linux-* + $(RM) -r busybox-* + $(RM) initramfs.cpio.gz + +docker: + DOCKER_BUILDKIT=1 docker build -t kernel-dev . + docker run -it --rm -v $(PWD):/develop/scripts -v $(PWD)/Makefile:/develop/Makefile kernel-dev diff --git a/content/2019-10-27-kernel-debugging-qemu/build_initrd.sh b/content/2019-10-27-kernel-debugging-qemu/build_initrd.sh new file mode 100755 index 0000000..fd82990 --- /dev/null +++ b/content/2019-10-27-kernel-debugging-qemu/build_initrd.sh @@ -0,0 +1,54 @@ +#!/bin/bash + +if test $(id -u) -ne 0; then + SUDO=sudo +fi + +set -e + +BUSYBOX=busybox-1.33.1 +INITRD=$PWD/initramfs.cpio.gz + +## Build busybox + +echo "[+] configure & build $BUSYBOX ..." +[[ ! -d $BUSYBOX ]] && { + wget https://busybox.net/downloads/$BUSYBOX.tar.bz2 + bunzip2 $BUSYBOX.tar.bz2 && tar xf $BUSYBOX.tar +} + +cd $BUSYBOX +make defconfig +sed -i 's/# CONFIG_STATIC .*/CONFIG_STATIC=y/' .config +make -j4 busybox +make install + +## Create initrd + +echo "[+] create initrd $INITRD ..." + +cd _install + +# 1. create initrd folder structure +mkdir -p bin sbin etc proc sys usr/bin usr/sbin dev + +# 2. create init process +cat < init +#!/bin/sh + +mount -t proc none /proc +mount -t sysfs none /sys + +exec setsid cttyhack sh +EOF +chmod +x init + +# 3. create device nodes +$SUDO mknod dev/tty c 5 0 +$SUDO mknod dev/tty0 c 4 0 +$SUDO mknod dev/ttyS0 c 4 64 + +# 4. created compressed initrd +find . -print0 \ + | cpio --null -ov --format=newc \ + | gzip -9 > $INITRD diff --git a/content/2019-10-27-kernel-debugging-qemu/build_kernel.sh b/content/2019-10-27-kernel-debugging-qemu/build_kernel.sh new file mode 100755 index 0000000..7ae3014 --- /dev/null +++ b/content/2019-10-27-kernel-debugging-qemu/build_kernel.sh @@ -0,0 +1,38 @@ +#!/bin/bash + +set -e + +LINUX=linux-5.13.2 +wget https://cdn.kernel.org/pub/linux/kernel/v5.x/$LINUX.tar.xz +unxz $LINUX.tar.xz && tar xf $LINUX.tar + +cd $LINUX + +cat < kernel_fragment.config +# 64bit kernel +CONFIG_64BIT=y +# enable support for compressed initrd (gzip) +CONFIG_BLK_DEV_INITRD=y +CONFIG_RD_GZIP=y +# support for ELF and #! binary format +CONFIG_BINFMT_ELF=y +CONFIG_BINFMT_SCRIPT=y +# /dev +CONFIG_DEVTMPFS=y +CONFIG_DEVTMPFS_MOUNT=y +# tty & console +CONFIG_TTY=y +CONFIG_SERIAL_8250=y +CONFIG_SERIAL_8250_CONSOLE=y +# pseudo fs +CONFIG_PROC_FS=y +CONFIG_SYSFS=y +# debugging +CONFIG_DEBUG_INFO=y +CONFIG_PRINTK=y +CONFIG_EARLY_PRINTK=y +EOF + +make tinyconfig +./scripts/kconfig/merge_config.sh -n ./kernel_fragment.config +make -j4 diff --git a/content/2019-10-27-kernel-debugging-qemu/run.sh b/content/2019-10-27-kernel-debugging-qemu/run.sh new file mode 100755 index 0000000..b0a84ae --- /dev/null +++ b/content/2019-10-27-kernel-debugging-qemu/run.sh @@ -0,0 +1,24 @@ +#!/bin/bash + +VER=5.13.2 + +# Launch the emulator with our kernel. +qemu-system-x86_64 \ + -kernel ./linux-$VER/arch/x86/boot/bzImage \ + -nographic \ + -append "earlyprintk=ttyS0 console=ttyS0 nokaslr init=/init debug" \ + -initrd ./initramfs.cpio.gz \ + -serial telnet:localhost:12345,server,nowait \ + -monitor none \ + -gdb tcp::1234 \ + -S & + +# Kill qemu when we exit. +QEMU_PID=$! +trap "kill $QEMU_PID" EXIT + +# Give qemu some time to come up. +sleep 0.5 + +# Attach debugger to qemu and load the kernel symbols. +gdb -ex 'target remote :1234' ./linux-$VER/vmlinux diff --git a/content/2019-11-18-dynamic-linking-linux-x86_64.md b/content/2019-11-18-dynamic-linking-linux-x86_64.md new file mode 100644 index 0000000..7888f42 --- /dev/null +++ b/content/2019-11-18-dynamic-linking-linux-x86_64.md @@ -0,0 +1,338 @@ ++++ +title = "Dynamic linking on Linux (x86_64)" + +[taxonomies] +tags = ["elf", "linux", "x86"] ++++ + +As I was interested in how the bits behind dynamic linking work, this article +is about exploring this topic. +However, since dynamic linking strongly depends on the OS, the architecture and +the binary format, I only focus on one combination here. +Spending most of my time with Linux on `x86` or `ARM` I chose the following +for this article: +- OS: Linux +- arch: x86_64 +- binfmt: [`Executable and Linking Format (ELF)`][elf-1.2] + +## Introduction to dynamic linking + +Dynamic linking is used in the case we have non-statically linked applications. +This means an application uses code which is not included in the application +itself, but in a shared library. The shared libraries in turn can be used by +multiple applications. +The applications contain `relocation` entries which need to be resolved during +runtime, because shared libraries are compiled as `position independant code +(PIC)` so that they can be loaded at any any address in the applications +virtual address space. +This process of resolving the relocation entries at runtime is what I am +referring as dynamic linking in this article. + +The following figure shows a simple example, where we have an application +**foo** using a function **bar** from the shared library **libbar.so**. The +boxes show the virtual memory mapping for **foo** over time where time +increases to the right. +``` + foo foo + +-----------+ +-----------+ + | | | | + +-----------+ +-----------+ + | .text.foo | | .text.foo | + | | | | + | ... | trigger resolve reloc | ... | +pc->| call bar | X----+ | call bar |--+ + | ... | | | ... | | + +-----------+ | +-----------+ | + | | | | | | + | | | | | | + +-----------+ | +-----------+ | + | .text.bar | | | .text.bar | | + | ... | | | ... | | + | bar: | +---->[ld.so]----> | bar: |<-+pc + | ... | | ... | + +-----------+ +-----------+ + | | | | + +-----------+ +-----------+ + +``` + +## Conceptual overview && important parts of "the" ELF + +> In the following I assume a basic understanding of the ELF binary format. + +Before jumping into the details of dynamic linking it is important to get an +conceptual overview, as well as to understand which sections of the ELF file +actually matter. + +
+ +On x86 calling a function in a shared library works via one indirect jump. +When the application wants to call a function in a shared library it jumps to a +well know location contained in the code of the application, called a +`trampoline`. From there the application then jumps to a function pointer +stored in a global table (`GOT = global offset table`). The application +contains **one** trampoline per function used from a shared library. + +When the application jumps to a trampoline for the first time the trampoline +will dispatch to the dynamic linker with the request to resolve the symbol. +Once the dynamic linker found the address of the symbol it patches the function +pointer in the `GOT` so that consecutive calls directly dispatch to the library +function. +``` + foo: GOT + ... +------------+ ++---- call bar_trampoline +- | 0xcafeface | [0] resolve (dynamic linker) +| call bar_trampoline | +------------+ +| ... | | 0xcafeface | [1] resolve (dynamic linker) +| | +------------+ ++-> bar_trampoline: | + jump GOT[0] <-----------+ + bar2_trampoline: + jump GOT[1] +``` +Once this is done, further calls to this symbol will be directly forwarded to +the correct address from the corresponding trampoline. +``` + foo: GOT + ... +------------+ + call bar_trampoline +- | 0x01234567 | [0] bar (libbar.so) ++---- call bar_trampoline | +------------+ +| .... | | 0xcafeface | [1] resolve (dynamic linker) +| | +------------+ ++-> bar_trampoline: | + jump GOT[0] <-----------+ + bar2_trampoline: + jump GOT[1] +``` + +--- + +With that in mind we can take a look and check which sections of the ELF file +are important for the dynamic linking process. +- `.plt` +> This section contains all the trampolines for the external functions used by +> the ELF file +- `.got.plt` +> This section contains the global offset table `GOT` for this ELF files trampolines. +- `.rel.plt` / `.rela.plt` +> This section holds the `relocation` entries, which are used by the dynamic +> linker to find which symbol needs to be resolved and which location in the +> `GOT` to be patched. (Whether it is `rel` or `rela` depends on the +> **DT_PLTREL** entry in the [`.dynamic` section](#dynamic-section)) + + +## The bits behind dynamic linking + +Now that we have the basic concept and know which sections of the ELF file +matter we can take a look at an actual example. For the analysis I am going to +use the following C program and build it explicitly as non `position +independant executable (PIE)`. + +> Using `-no-pie` has no functional impact, it is only used to get absolute +> virtual addresses in the ELF file, which makes the analysis easier to follow. + +```cpp +// main.c +#include +int main(int argc, const char* argv[]) { + printf("%s argc=%d\n", argv[0], argc); + puts("done"); + return 0; +} +``` + +```console +> gcc -o main main.c -no-pie +``` + +We use [radare2][r2] to open the compiled file and print the disassembly of +the `.got.plt` and `.plt` sections. + +```nasm +> r2 -A ./main +--snip-- +[0x00401050]> pd5 @ section..got.plt + ;-- section..got.plt: + ;-- _GLOBAL_OFFSET_TABLE_: + [0] 0x00404000 .qword 0x0000000000403e10 ; section..dynamic ; sym..dynamic + [1] 0x00404008 .qword 0x0000000000000000 + [2] 0x00404010 .qword 0x0000000000000000 + ;-- reloc.puts: + [3] 0x00404018 .qword 0x0000000000401036 + ;-- reloc.printf: + [4] 0x00404020 .qword 0x0000000000401046 + +[0x00401050]> pd9 @ section..plt + ;-- section..plt: + ┌┌─> 0x00401020 ff35e22f0000 push qword [0x00404008] + ╎╎ 0x00401026 ff25e42f0000 jmp qword [0x00404010] + ╎╎ 0x0040102c 0f1f4000 nop dword [rax] + int sym.imp.puts (const char *s); + ╎╎ 0x00401030 ff25e22f0000 jmp qword [reloc.puts] ; 0x00404018 + ╎╎ 0x00401036 6800000000 push 0 + └──< 0x0040103b e9e0ffffff jmp sym..plt + int sym.imp.printf (const char *format); + ╎ 0x00401040 ff25da2f0000 jmp qword [reloc.printf] ; 0x00404020 + ╎ 0x00401046 6801000000 push 1 + └─< 0x0040104b e9d0ffffff jmp sym..plt +[0x00401050]> +``` + +Taking a quick look at the `.got.plt` section we see the *global offset table GOT*. +The entries *GOT[0..2]* have special meanings, *GOT[0]* holds the address of the +[`.dynamic` section](#dynamic-section) for this ELF file, *GOT[1..2]* will be +filled by the dynamic linker at program startup. +Entries *GOT[3]* and *GOT[4]* contain the function pointers for **puts** and +**printf** accordingly. + +
+ +In the `.plt` section we can find three trampolines +1. `0x00401020` dispatch to runtime linker (special role) +1. `0x00401030` **puts** +1. `0x00401040` **printf** + +Looking at the **puts** trampoline we can see that the first instruction jumps +to a location stored at `0x00404018` (reloc.puts) which is the GOT[3]. In the +beginning this entry contains the address of the `push 0` instruction coming +right after the `jmp`. This push instruction sets up some meta data for the +dynamic linker. The next instruction then jumps into the first trampoline, +which pushes more meta data (GOT[1]) onto the stack and then jumps to the +address stored in GOT[2]. +> GOT[1] & GOT[2] are zero here because they get filled by the dynamic linker +> at program startup. + + +
+ +To understand the `push 0` instruction in the **puts** trampoline we have to +take a look at the third section of interest in the ELF file, the `.rela.plt` +section. + +```console +# -r print relocations +# -D use .dynamic info when displaying info +> readelf -W -r ./main +--snip-- +Relocation section '.rela.plt' at offset 0x4004d8 contains 2 entries: + Offset Info Type Symbol's Value Symbol's Name + Addend +0000000000404018 0000000200000007 R_X86_64_JUMP_SLOT 0000000000000000 puts@GLIBC_2.2.5 + 0 +0000000000404020 0000000300000007 R_X86_64_JUMP_SLOT 0000000000000000 printf@GLIBC_2.2.5 + 0 +``` + +The `0` passed as meta data to the dynamic linker means to use the relocation +at index [0] in the `.rela.plt` section. From the ELF specification we can +find how a relocation of type `rela` is defined: + +```c +// man 5 elf +typedef struct { + Elf64_Addr r_offset; + uint64_t r_info; + int64_t r_addend; +} Elf64_Rela; + +#define ELF64_R_SYM(i) ((i) >> 32) +#define ELF64_R_TYPE(i) ((i) & 0xffffffff) +``` + +`r_offset` holds the address to the GOT entry which the dynamic linker should +patch once it found the address of the requested symbol. +The offset here is `0x00404018` which is exactly the address of GOT[3], the +function pointer used in the **puts** trampoline. +From `r_info` the dynamic linker can find out which symbol it should look for. + +```c +ELF64_R_SYM(0x0000000200000007) -> 0x2 +``` + +The resulting index [2] is the offset into the dynamic symbol table +(`.dynsym`). Dumping the dynamic symbol table with readelf we can see that the +symbol at index [2] is **puts**. + +```console +# -s print symbols +> readelf -W -s ./main +Symbol table '.dynsym' contains 7 entries: + Num: Value Size Type Bind Vis Ndx Name + 0: 0000000000000000 0 NOTYPE LOCAL DEFAULT UND + 1: 0000000000000000 0 NOTYPE WEAK DEFAULT UND _ITM_deregisterTMCloneTable + 2: 0000000000000000 0 FUNC GLOBAL DEFAULT UND puts@GLIBC_2.2.5 (2) + 3: 0000000000000000 0 FUNC GLOBAL DEFAULT UND printf@GLIBC_2.2.5 (2) +--snip-- +``` + + +## Appendix: .dynamic section + +The `.dynamic` section of an ELF file contains important information for the +dynamic linking process and is created when linking the ELF file. + +The information can be accessed at runtime using following symbol +```c +extern Elf64_Dyn _DYNAMIC[]; +``` +which is an array of `Elf64_Dyn` entries +```c +typedef struct { + Elf64_Sxword d_tag; + union { + Elf64_Xword d_val; + Elf64_Addr d_ptr; + } d_un; +} Elf64_Dyn; +``` +> Since this meta-information is specific to an ELF file, every ELF file has +> its own `.dynamic` section and `_DYNAMIC` symbol. + +Following entries are most interesting for dynamic linking: + + d_tag | d_un | description +-------------|-------|------------------------------------------------- + DT_PLTGOT | d_ptr | address of .got.plt + DT_JMPREL | d_ptr | address of .rela.plt + DT_PLTREL | d_val | DT_REL or DT_RELA + DT_PLTRELSZ | d_val | size of .rela.plt table + DT_RELENT | d_val | size of a single REL entry (PLTREL == DT_REL) + DT_RELAENT | d_val | size of a single RELA entry (PLTREL == DT_RELA) + +
+ +We can use readelf to dump the `.dynamic` section. In the following snippet I +only kept the relevant entries: +```console +# -d dump .dynamic section +> readelf -d ./main + +Dynamic section at offset 0x2e10 contains 24 entries: + Tag Type Name/Value + 0x0000000000000003 (PLTGOT) 0x404000 + 0x0000000000000002 (PLTRELSZ) 48 (bytes) + 0x0000000000000014 (PLTREL) RELA + 0x0000000000000017 (JMPREL) 0x4004d8 + 0x0000000000000009 (RELAENT) 24 (bytes) +``` + +We can see that **PLTGOT** points to address **0x404000** which is the address +of the GOT as we saw in the [radare2 dump](#code-gotplt-dump). +Also we can see that **JMPREL** points to the [relocation table](#code-relaplt-dump). +**PLTRELSZ / RELAENT** tells us that we have 2 relocation entries which are +exactly the ones for **puts** and **printf**. + + +## References +- [`man 5 elf`][man-elf] +- [Executable and Linking Format (ELF)][elf-1.2] +- [SystemV ABI 4.1][systemv-abi-4.1] +- [SystemV ABI 1.0 (x86_64)][systemv-abi-1.0-x86_64] +- [`man 1 readelf`][man-readelf] + + +[r2]: https://rada.re/n/radare2.html +[man-elf]: http://man7.org/linux/man-pages/man5/elf.5.html +[man-readelf]: http://man7.org/linux/man-pages/man1/readelf.1.html +[elf-1.2]: http://refspecs.linuxbase.org/elf/elf.pdf +[systemv-abi-4.1]: https://refspecs.linuxfoundation.org/elf/gabi41.pdf +[systemv-abi-1.0-x86_64]: https://github.com/hjl-tools/x86-psABI/wiki/x86-64-psABI-1.0.pdf + + diff --git a/content/20191027-kernel-debugging-qemu.md b/content/20191027-kernel-debugging-qemu.md deleted file mode 100644 index 7a97fbc..0000000 --- a/content/20191027-kernel-debugging-qemu.md +++ /dev/null @@ -1,226 +0,0 @@ -+++ -title = "Linux Kernel debugging with QEMU" -date = 2019-10-27 - -[taxonomies] -tags = ["linux", "qemu"] -+++ - -**EDIT**: -- 2021-07-15: Added `Appendix: Dockerfile for Kernel development` and updated - busybox + Kernel versions. - -The other evening while starring at some Linux kernel code I thought, let me -setup a minimal environment so I can easily step through the code and examine -the state. - -I ended up creating: -- a [Linux kernel][linux-kernel] with minimal configuration -- a minimal [ramdisk][initrd] to boot into which is based on [busybox][busybox] - -In the remaing part of this article we will go through each step by first -building the kernel, then building the initrd and then running the kernel using -[QEMU][qemu] and debugging it with [GDB][gdb]. - -## $> make kernel - -Before building the kernel we first need to generate a configuration. As a -starting point we generate a minimal config with the `make tinyconfig` make -target. Running this command will generate a `.config` file. After generating -the initial config file we customize the kernel using the merge fragment flow. -This allows us to merge a fragment file into the current configuration by -running the `scripts/kconfig/merge_config.sh` script. - -Let's quickly go over some customizations we do. -The following two lines enable support for gzipped initramdisks: -```config -CONFIG_BLK_DEV_INITRD=y -CONFIG_RD_GZIP=y -``` -The next two configurations are important as they enable the binary loaders for -[ELF][binfmt-elf] and [script #!][binfmt-script] files. -```config -CONFIG_BINFMT_ELF=y -CONFIG_BINFMT_SCRIPT=y -``` - -> Note: In the cursed based configuration `make menuconfig` we can search for -> configurations using the `/` key and then select a match using the number keys. -> After selecting a match we can check the `Help` to get a description for the -> configuration parameter. - -Building the kernel with the default make target will give us the following two -files: -- `vmlinux` statically linked kernel (ELF file) containing symbol information for debugging -- `arch/x86_64/boot/bzImage` compressed kernel image for booting - -Full configure & build script: -```sh -{{ include(path="content/20191027-kernel-debugging-qemu/build_kernel.sh") }} -``` - -## $> make initrd - -Next step is to build the initrd which we base on [busybox][busybox]. Therefore -we first build the busybox project in its default configuration with one -change, we enable following configuration to build a static binary so it can be -used stand-alone: -```sh -sed -i 's/# CONFIG_STATIC .*/CONFIG_STATIC=y/' .config -``` - -One important step before creating the final initrd is to create an init -process. This will be the first process executed in userspace after the kernel -finished its initialization. We just create a script that drops us into a -shell: -```sh -cat < init -#!/bin/sh - -mount -t proc none /proc -mount -t sysfs none /sys - -exec setsid cttyhack sh -EOF -``` -> By default the kernel looks for `/sbin/init` in the root file system, but the -> location can optionally be specified with the [`init=`][kernel-param] kernel -> parameter. - -Full busybox & initrd build script: -```sh -{{ include(path="content/20191027-kernel-debugging-qemu/build_initrd.sh") }} -``` - -## Running QEMU && GDB - -After finishing the previous steps we have all we need to run and debug the -kernel. We have `arch/x86/boot/bzImage` and `initramfs.cpio.gz` to boot the -kernel into a shell and we have `vmlinux` to feed the debugger with debug -symbols. - -We start QEMU as follows, thanks to the `-S` flag the CPU will freeze until we -connected the debugger: -```sh -# -S freeze CPU until debugger connected -> qemu-system-x86_64 \ - -kernel ./linux-5.3.7/arch/x86/boot/bzImage \ - -nographic \ - -append "earlyprintk=ttyS0 console=ttyS0 nokaslr init=/init debug" \ - -initrd ./initramfs.cpio.gz \ - -gdb tcp::1234 \ - -S -``` - -Then we can start GDB and connect to the GDB server running in QEMU (configured -via `-gdb tcp::1234`). From now on we can start to debug through the -kernel. -```sh -> gdb linux-5.3.7/vmlinux -ex 'target remote :1234' -(gdb) b do_execve -Breakpoint 1 at 0xffffffff810a1a60: file fs/exec.c, line 1885. -(gdb) c -Breakpoint 1, do_execve (filename=0xffff888000060000, __argv=0xffffffff8181e160 , __envp=0xffffffff8181e040 ) at fs/exec.c:1885 -1885 return do_execveat_common(AT_FDCWD, filename, argv, envp, 0); -(gdb) bt -#0 do_execve (filename=0xffff888000060000, __argv=0xffffffff8181e160 , __envp=0xffffffff8181e040 ) at fs/exec.c:1885 -#1 0xffffffff81000498 in run_init_process (init_filename=) at init/main.c:1048 -#2 0xffffffff81116b75 in kernel_init (unused=) at init/main.c:1129 -#3 0xffffffff8120014f in ret_from_fork () at arch/x86/entry/entry_64.S:352 -#4 0x0000000000000000 in ?? () -(gdb) -``` - ---- - -## Appendix: Try to get around `` - -When debugging the kernel we often face following situation in gdb: -```text -(gdb) frame -#0 do_execveat_common (fd=fd@entry=-100, filename=0xffff888000120000, argv=argv@entry=..., envp=envp@entry=..., flags=flags@entry=0) at fs/exec.c - -(gdb) info args -fd = -filename = 0xffff888000060000 -argv = -envp = -flags = -file = 0x0 -``` -The problem is that the Linux kernel requires certain code to be compiled with -optimizations enabled. - -In this situation we can "try" to reduce the optimization for single compilation -units or a subtree (try because, reducing the optimization could break the -build). To do so we adapt the Makefile in the corresponding directory. -```make -# fs/Makefile - -# configure for single compilation unit -CFLAGS_exec.o := -Og - -# configure for the whole subtree of where the Makefile resides -ccflags-y := -Og -``` - -After enabling optimize for debug experience `-Og` we can see the following now -in gdb: -```txt -(gdb) frame -#0 do_execveat_common (fd=fd@entry=-100, filename=0xffff888000120000, argv=argv@entry=..., envp=envp@entry=..., flags=flags@entry=0) at fs/exec.c - -(gdb) info args -fd = -100 -filename = 0xffff888000120000 -argv = {ptr = {native = 0x10c5980}} -envp = {ptr = {native = 0x10c5990}} -flags = 0 - -(gdb) p *filename -$3 = {name = 0xffff888000120020 "/bin/ls", uptr = 0x10c59b8 "/bin/ls", refcnt = 1, aname = 0x0, iname = 0xffff888000120020 "/bin/ls"} - -(gdb) ptype filename -type = struct filename { - const char *name; - const char *uptr; - int refcnt; - struct audit_names *aname; - const char iname[]; -} -``` - -## Appendix: `Dockerfile` for Kernel development - -The following `Dockerfile` provides a development environment with all the -required tools and dependencies, to re-produce all the steps of building and -debugging the Linux kernel. -```dockerfile -{{ include(path="content/20191027-kernel-debugging-qemu/Dockerfile") }} -``` - -Save the listing above in a file called `Dockerfile` and build the docker image -as follows. -```sh -docker build -t kernel-dev -``` -> Optionally set `DOCKER_BUILDKIT=1` to use the newer image builder. - -Once the image has been built, an interactive container can be launched as -follows. -```sh -# Some options for conveniene: -# -v : Mount host path to guest path. -# --rm Remove the container after exiting. - -docker run -it kernel-dev -``` - -[linux-kernel]: https://www.kernel.org -[initrd]: https://www.kernel.org/doc/html/latest/admin-guide/initrd.html -[busybox]: https://busybox.net -[qemu]: https://www.qemu.org -[gdb]: https://www.gnu.org/software/gdb -[binfmt-elf]: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/fs/binfmt_elf.c -[binfmt-script]: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/fs/binfmt_script.c -[kernel-param]: https://www.kernel.org/doc/html/latest/admin-guide/kernel-parameters.html diff --git a/content/20191027-kernel-debugging-qemu/Dockerfile b/content/20191027-kernel-debugging-qemu/Dockerfile deleted file mode 100644 index 42e1f05..0000000 --- a/content/20191027-kernel-debugging-qemu/Dockerfile +++ /dev/null @@ -1,32 +0,0 @@ -FROM ubuntu:20.04 -MAINTAINER Johannes Stoelp - -RUN apt update \ - && DEBIAN_FRONTEND=noninteractive \ - apt install \ - --yes \ - --no-install-recommends \ - # Download & unpack. - wget \ - ca-certificates \ - xz-utils \ - # Build tools & deps (kernel). - make \ - bc \ - gcc g++ \ - flex bison \ - libelf-dev \ - # Build tools & deps (initrd). - cpio \ - # Run & debug. - qemu-system-x86 \ - gdb \ - telnet \ - # Convenience. - ripgrep \ - fd-find \ - neovim \ - && rm -rf /var/lib/apt/lists/* \ - && apt-get clean - -WORKDIR /develop diff --git a/content/20191027-kernel-debugging-qemu/Makefile b/content/20191027-kernel-debugging-qemu/Makefile deleted file mode 100644 index 11e7c7b..0000000 --- a/content/20191027-kernel-debugging-qemu/Makefile +++ /dev/null @@ -1,12 +0,0 @@ -build: - scripts/build_kernel.sh - scripts/build_initrd.sh - -clean: - $(RM) -r linux-* - $(RM) -r busybox-* - $(RM) initramfs.cpio.gz - -docker: - DOCKER_BUILDKIT=1 docker build -t kernel-dev . - docker run -it --rm -v $(PWD):/develop/scripts -v $(PWD)/Makefile:/develop/Makefile kernel-dev diff --git a/content/20191027-kernel-debugging-qemu/build_initrd.sh b/content/20191027-kernel-debugging-qemu/build_initrd.sh deleted file mode 100755 index fd82990..0000000 --- a/content/20191027-kernel-debugging-qemu/build_initrd.sh +++ /dev/null @@ -1,54 +0,0 @@ -#!/bin/bash - -if test $(id -u) -ne 0; then - SUDO=sudo -fi - -set -e - -BUSYBOX=busybox-1.33.1 -INITRD=$PWD/initramfs.cpio.gz - -## Build busybox - -echo "[+] configure & build $BUSYBOX ..." -[[ ! -d $BUSYBOX ]] && { - wget https://busybox.net/downloads/$BUSYBOX.tar.bz2 - bunzip2 $BUSYBOX.tar.bz2 && tar xf $BUSYBOX.tar -} - -cd $BUSYBOX -make defconfig -sed -i 's/# CONFIG_STATIC .*/CONFIG_STATIC=y/' .config -make -j4 busybox -make install - -## Create initrd - -echo "[+] create initrd $INITRD ..." - -cd _install - -# 1. create initrd folder structure -mkdir -p bin sbin etc proc sys usr/bin usr/sbin dev - -# 2. create init process -cat < init -#!/bin/sh - -mount -t proc none /proc -mount -t sysfs none /sys - -exec setsid cttyhack sh -EOF -chmod +x init - -# 3. create device nodes -$SUDO mknod dev/tty c 5 0 -$SUDO mknod dev/tty0 c 4 0 -$SUDO mknod dev/ttyS0 c 4 64 - -# 4. created compressed initrd -find . -print0 \ - | cpio --null -ov --format=newc \ - | gzip -9 > $INITRD diff --git a/content/20191027-kernel-debugging-qemu/build_kernel.sh b/content/20191027-kernel-debugging-qemu/build_kernel.sh deleted file mode 100755 index 7ae3014..0000000 --- a/content/20191027-kernel-debugging-qemu/build_kernel.sh +++ /dev/null @@ -1,38 +0,0 @@ -#!/bin/bash - -set -e - -LINUX=linux-5.13.2 -wget https://cdn.kernel.org/pub/linux/kernel/v5.x/$LINUX.tar.xz -unxz $LINUX.tar.xz && tar xf $LINUX.tar - -cd $LINUX - -cat < kernel_fragment.config -# 64bit kernel -CONFIG_64BIT=y -# enable support for compressed initrd (gzip) -CONFIG_BLK_DEV_INITRD=y -CONFIG_RD_GZIP=y -# support for ELF and #! binary format -CONFIG_BINFMT_ELF=y -CONFIG_BINFMT_SCRIPT=y -# /dev -CONFIG_DEVTMPFS=y -CONFIG_DEVTMPFS_MOUNT=y -# tty & console -CONFIG_TTY=y -CONFIG_SERIAL_8250=y -CONFIG_SERIAL_8250_CONSOLE=y -# pseudo fs -CONFIG_PROC_FS=y -CONFIG_SYSFS=y -# debugging -CONFIG_DEBUG_INFO=y -CONFIG_PRINTK=y -CONFIG_EARLY_PRINTK=y -EOF - -make tinyconfig -./scripts/kconfig/merge_config.sh -n ./kernel_fragment.config -make -j4 diff --git a/content/20191027-kernel-debugging-qemu/run.sh b/content/20191027-kernel-debugging-qemu/run.sh deleted file mode 100755 index b0a84ae..0000000 --- a/content/20191027-kernel-debugging-qemu/run.sh +++ /dev/null @@ -1,24 +0,0 @@ -#!/bin/bash - -VER=5.13.2 - -# Launch the emulator with our kernel. -qemu-system-x86_64 \ - -kernel ./linux-$VER/arch/x86/boot/bzImage \ - -nographic \ - -append "earlyprintk=ttyS0 console=ttyS0 nokaslr init=/init debug" \ - -initrd ./initramfs.cpio.gz \ - -serial telnet:localhost:12345,server,nowait \ - -monitor none \ - -gdb tcp::1234 \ - -S & - -# Kill qemu when we exit. -QEMU_PID=$! -trap "kill $QEMU_PID" EXIT - -# Give qemu some time to come up. -sleep 0.5 - -# Attach debugger to qemu and load the kernel symbols. -gdb -ex 'target remote :1234' ./linux-$VER/vmlinux diff --git a/content/20191118-dynamic-linking-linux-x86_64.md b/content/20191118-dynamic-linking-linux-x86_64.md deleted file mode 100644 index 9265671..0000000 --- a/content/20191118-dynamic-linking-linux-x86_64.md +++ /dev/null @@ -1,339 +0,0 @@ -+++ -title = "Dynamic linking on Linux (x86_64)" -date = 2019-11-18 - -[taxonomies] -tags = ["elf", "linux", "x86"] -+++ - -As I was interested in how the bits behind dynamic linking work, this article -is about exploring this topic. -However, since dynamic linking strongly depends on the OS, the architecture and -the binary format, I only focus on one combination here. -Spending most of my time with Linux on `x86` or `ARM` I chose the following -for this article: -- OS: Linux -- arch: x86_64 -- binfmt: [`Executable and Linking Format (ELF)`][elf-1.2] - -## Introduction to dynamic linking - -Dynamic linking is used in the case we have non-statically linked applications. -This means an application uses code which is not included in the application -itself, but in a shared library. The shared libraries in turn can be used by -multiple applications. -The applications contain `relocation` entries which need to be resolved during -runtime, because shared libraries are compiled as `position independant code -(PIC)` so that they can be loaded at any any address in the applications -virtual address space. -This process of resolving the relocation entries at runtime is what I am -referring as dynamic linking in this article. - -The following figure shows a simple example, where we have an application -**foo** using a function **bar** from the shared library **libbar.so**. The -boxes show the virtual memory mapping for **foo** over time where time -increases to the right. -``` - foo foo - +-----------+ +-----------+ - | | | | - +-----------+ +-----------+ - | .text.foo | | .text.foo | - | | | | - | ... | trigger resolve reloc | ... | -pc->| call bar | X----+ | call bar |--+ - | ... | | | ... | | - +-----------+ | +-----------+ | - | | | | | | - | | | | | | - +-----------+ | +-----------+ | - | .text.bar | | | .text.bar | | - | ... | | | ... | | - | bar: | +---->[ld.so]----> | bar: |<-+pc - | ... | | ... | - +-----------+ +-----------+ - | | | | - +-----------+ +-----------+ - -``` - -## Conceptual overview && important parts of "the" ELF - -> In the following I assume a basic understanding of the ELF binary format. - -Before jumping into the details of dynamic linking it is important to get an -conceptual overview, as well as to understand which sections of the ELF file -actually matter. - -
- -On x86 calling a function in a shared library works via one indirect jump. -When the application wants to call a function in a shared library it jumps to a -well know location contained in the code of the application, called a -`trampoline`. From there the application then jumps to a function pointer -stored in a global table (`GOT = global offset table`). The application -contains **one** trampoline per function used from a shared library. - -When the application jumps to a trampoline for the first time the trampoline -will dispatch to the dynamic linker with the request to resolve the symbol. -Once the dynamic linker found the address of the symbol it patches the function -pointer in the `GOT` so that consecutive calls directly dispatch to the library -function. -``` - foo: GOT - ... +------------+ -+---- call bar_trampoline +- | 0xcafeface | [0] resolve (dynamic linker) -| call bar_trampoline | +------------+ -| ... | | 0xcafeface | [1] resolve (dynamic linker) -| | +------------+ -+-> bar_trampoline: | - jump GOT[0] <-----------+ - bar2_trampoline: - jump GOT[1] -``` -Once this is done, further calls to this symbol will be directly forwarded to -the correct address from the corresponding trampoline. -``` - foo: GOT - ... +------------+ - call bar_trampoline +- | 0x01234567 | [0] bar (libbar.so) -+---- call bar_trampoline | +------------+ -| .... | | 0xcafeface | [1] resolve (dynamic linker) -| | +------------+ -+-> bar_trampoline: | - jump GOT[0] <-----------+ - bar2_trampoline: - jump GOT[1] -``` - ---- - -With that in mind we can take a look and check which sections of the ELF file -are important for the dynamic linking process. -- `.plt` -> This section contains all the trampolines for the external functions used by -> the ELF file -- `.got.plt` -> This section contains the global offset table `GOT` for this ELF files trampolines. -- `.rel.plt` / `.rela.plt` -> This section holds the `relocation` entries, which are used by the dynamic -> linker to find which symbol needs to be resolved and which location in the -> `GOT` to be patched. (Whether it is `rel` or `rela` depends on the -> **DT_PLTREL** entry in the [`.dynamic` section](#dynamic-section)) - - -## The bits behind dynamic linking - -Now that we have the basic concept and know which sections of the ELF file -matter we can take a look at an actual example. For the analysis I am going to -use the following C program and build it explicitly as non `position -independant executable (PIE)`. - -> Using `-no-pie` has no functional impact, it is only used to get absolute -> virtual addresses in the ELF file, which makes the analysis easier to follow. - -```cpp -// main.c -#include -int main(int argc, const char* argv[]) { - printf("%s argc=%d\n", argv[0], argc); - puts("done"); - return 0; -} -``` - -```console -> gcc -o main main.c -no-pie -``` - -We use [radare2][r2] to open the compiled file and print the disassembly of -the `.got.plt` and `.plt` sections. - -```nasm -> r2 -A ./main ---snip-- -[0x00401050]> pd5 @ section..got.plt - ;-- section..got.plt: - ;-- _GLOBAL_OFFSET_TABLE_: - [0] 0x00404000 .qword 0x0000000000403e10 ; section..dynamic ; sym..dynamic - [1] 0x00404008 .qword 0x0000000000000000 - [2] 0x00404010 .qword 0x0000000000000000 - ;-- reloc.puts: - [3] 0x00404018 .qword 0x0000000000401036 - ;-- reloc.printf: - [4] 0x00404020 .qword 0x0000000000401046 - -[0x00401050]> pd9 @ section..plt - ;-- section..plt: - ┌┌─> 0x00401020 ff35e22f0000 push qword [0x00404008] - ╎╎ 0x00401026 ff25e42f0000 jmp qword [0x00404010] - ╎╎ 0x0040102c 0f1f4000 nop dword [rax] - int sym.imp.puts (const char *s); - ╎╎ 0x00401030 ff25e22f0000 jmp qword [reloc.puts] ; 0x00404018 - ╎╎ 0x00401036 6800000000 push 0 - └──< 0x0040103b e9e0ffffff jmp sym..plt - int sym.imp.printf (const char *format); - ╎ 0x00401040 ff25da2f0000 jmp qword [reloc.printf] ; 0x00404020 - ╎ 0x00401046 6801000000 push 1 - └─< 0x0040104b e9d0ffffff jmp sym..plt -[0x00401050]> -``` - -Taking a quick look at the `.got.plt` section we see the *global offset table GOT*. -The entries *GOT[0..2]* have special meanings, *GOT[0]* holds the address of the -[`.dynamic` section](#dynamic-section) for this ELF file, *GOT[1..2]* will be -filled by the dynamic linker at program startup. -Entries *GOT[3]* and *GOT[4]* contain the function pointers for **puts** and -**printf** accordingly. - -
- -In the `.plt` section we can find three trampolines -1. `0x00401020` dispatch to runtime linker (special role) -1. `0x00401030` **puts** -1. `0x00401040` **printf** - -Looking at the **puts** trampoline we can see that the first instruction jumps -to a location stored at `0x00404018` (reloc.puts) which is the GOT[3]. In the -beginning this entry contains the address of the `push 0` instruction coming -right after the `jmp`. This push instruction sets up some meta data for the -dynamic linker. The next instruction then jumps into the first trampoline, -which pushes more meta data (GOT[1]) onto the stack and then jumps to the -address stored in GOT[2]. -> GOT[1] & GOT[2] are zero here because they get filled by the dynamic linker -> at program startup. - - -
- -To understand the `push 0` instruction in the **puts** trampoline we have to -take a look at the third section of interest in the ELF file, the `.rela.plt` -section. - -```console -# -r print relocations -# -D use .dynamic info when displaying info -> readelf -W -r ./main ---snip-- -Relocation section '.rela.plt' at offset 0x4004d8 contains 2 entries: - Offset Info Type Symbol's Value Symbol's Name + Addend -0000000000404018 0000000200000007 R_X86_64_JUMP_SLOT 0000000000000000 puts@GLIBC_2.2.5 + 0 -0000000000404020 0000000300000007 R_X86_64_JUMP_SLOT 0000000000000000 printf@GLIBC_2.2.5 + 0 -``` - -The `0` passed as meta data to the dynamic linker means to use the relocation -at index [0] in the `.rela.plt` section. From the ELF specification we can -find how a relocation of type `rela` is defined: - -```c -// man 5 elf -typedef struct { - Elf64_Addr r_offset; - uint64_t r_info; - int64_t r_addend; -} Elf64_Rela; - -#define ELF64_R_SYM(i) ((i) >> 32) -#define ELF64_R_TYPE(i) ((i) & 0xffffffff) -``` - -`r_offset` holds the address to the GOT entry which the dynamic linker should -patch once it found the address of the requested symbol. -The offset here is `0x00404018` which is exactly the address of GOT[3], the -function pointer used in the **puts** trampoline. -From `r_info` the dynamic linker can find out which symbol it should look for. - -```c -ELF64_R_SYM(0x0000000200000007) -> 0x2 -``` - -The resulting index [2] is the offset into the dynamic symbol table -(`.dynsym`). Dumping the dynamic symbol table with readelf we can see that the -symbol at index [2] is **puts**. - -```console -# -s print symbols -> readelf -W -s ./main -Symbol table '.dynsym' contains 7 entries: - Num: Value Size Type Bind Vis Ndx Name - 0: 0000000000000000 0 NOTYPE LOCAL DEFAULT UND - 1: 0000000000000000 0 NOTYPE WEAK DEFAULT UND _ITM_deregisterTMCloneTable - 2: 0000000000000000 0 FUNC GLOBAL DEFAULT UND puts@GLIBC_2.2.5 (2) - 3: 0000000000000000 0 FUNC GLOBAL DEFAULT UND printf@GLIBC_2.2.5 (2) ---snip-- -``` - - -## Appendix: .dynamic section - -The `.dynamic` section of an ELF file contains important information for the -dynamic linking process and is created when linking the ELF file. - -The information can be accessed at runtime using following symbol -```c -extern Elf64_Dyn _DYNAMIC[]; -``` -which is an array of `Elf64_Dyn` entries -```c -typedef struct { - Elf64_Sxword d_tag; - union { - Elf64_Xword d_val; - Elf64_Addr d_ptr; - } d_un; -} Elf64_Dyn; -``` -> Since this meta-information is specific to an ELF file, every ELF file has -> its own `.dynamic` section and `_DYNAMIC` symbol. - -Following entries are most interesting for dynamic linking: - - d_tag | d_un | description --------------|-------|------------------------------------------------- - DT_PLTGOT | d_ptr | address of .got.plt - DT_JMPREL | d_ptr | address of .rela.plt - DT_PLTREL | d_val | DT_REL or DT_RELA - DT_PLTRELSZ | d_val | size of .rela.plt table - DT_RELENT | d_val | size of a single REL entry (PLTREL == DT_REL) - DT_RELAENT | d_val | size of a single RELA entry (PLTREL == DT_RELA) - -
- -We can use readelf to dump the `.dynamic` section. In the following snippet I -only kept the relevant entries: -```console -# -d dump .dynamic section -> readelf -d ./main - -Dynamic section at offset 0x2e10 contains 24 entries: - Tag Type Name/Value - 0x0000000000000003 (PLTGOT) 0x404000 - 0x0000000000000002 (PLTRELSZ) 48 (bytes) - 0x0000000000000014 (PLTREL) RELA - 0x0000000000000017 (JMPREL) 0x4004d8 - 0x0000000000000009 (RELAENT) 24 (bytes) -``` - -We can see that **PLTGOT** points to address **0x404000** which is the address -of the GOT as we saw in the [radare2 dump](#code-gotplt-dump). -Also we can see that **JMPREL** points to the [relocation table](#code-relaplt-dump). -**PLTRELSZ / RELAENT** tells us that we have 2 relocation entries which are -exactly the ones for **puts** and **printf**. - - -## References -- [`man 5 elf`][man-elf] -- [Executable and Linking Format (ELF)][elf-1.2] -- [SystemV ABI 4.1][systemv-abi-4.1] -- [SystemV ABI 1.0 (x86_64)][systemv-abi-1.0-x86_64] -- [`man 1 readelf`][man-readelf] - - -[r2]: https://rada.re/n/radare2.html -[man-elf]: http://man7.org/linux/man-pages/man5/elf.5.html -[man-readelf]: http://man7.org/linux/man-pages/man1/readelf.1.html -[elf-1.2]: http://refspecs.linuxbase.org/elf/elf.pdf -[systemv-abi-4.1]: https://refspecs.linuxfoundation.org/elf/gabi41.pdf -[systemv-abi-1.0-x86_64]: https://github.com/hjl-tools/x86-psABI/wiki/x86-64-psABI-1.0.pdf - - diff --git a/content/2021-05-15-pthread_cancel-noexcept.md b/content/2021-05-15-pthread_cancel-noexcept.md new file mode 100644 index 0000000..f012cd1 --- /dev/null +++ b/content/2021-05-15-pthread_cancel-noexcept.md @@ -0,0 +1,110 @@ ++++ +title = "pthread_cancel in c++ code" + +[taxonomies] +tags = ["linux", "threading", "c++"] ++++ + +Few weeks ago I was debugging a random crash in a legacy code base at work. In +case the crash occurred the following message was printed on `stdout` of the +process: +```text +terminate called without an active exception +``` + +Looking at the reasons when [`std::terminate()`][std_terminate] is being +called, and the message that `std::terminate()` was called without an active +exception, the initial assumption was one of the following: +- `10) a joinable std::thread is destroyed or assigned to`. +- Invoked explicitly by the user. + +After receiving a backtrace captured by a customer it wasn't directly obvious +to me why `std::terminate()` was called here. The backtrace received looked +something like the following: +```text +#0 0x00007fb21df22ef5 in raise () from /usr/lib/libc.so.6 +#1 0x00007fb21df0c862 in abort () from /usr/lib/libc.so.6 +#2 0x00007fb21e2a886a in __gnu_cxx::__verbose_terminate_handler () at /build/gcc/src/gcc/libstdc++-v3/libsupc++/vterminate.cc:95 +#3 0x00007fb21e2b4d3a in __cxxabiv1::__terminate (handler=) at /build/gcc/src/gcc/libstdc++-v3/libsupc++/eh_terminate.cc:48 +#4 0x00007fb21e2b4da7 in std::terminate () at /build/gcc/src/gcc/libstdc++-v3/libsupc++/eh_terminate.cc:58 +#5 0x00007fb21e2b470d in __cxxabiv1::__gxx_personality_v0 (version=, actions=10, exception_class=0, ue_header=0x7fb21dee0cb0, context=) at /build/gcc/src/gcc/libstdc++-v3/libsupc++/eh_personality.cc:673 +#6 0x00007fb21e0c3814 in _Unwind_ForcedUnwind_Phase2 (exc=0x7fb21dee0cb0, context=0x7fb21dedfc50, frames_p=0x7fb21dedfb58) at /build/gcc/src/gcc/libgcc/unwind.inc:182 +#7 0x00007fb21e0c3f12 in _Unwind_ForcedUnwind (exc=0x7fb21dee0cb0, stop=, stop_argument=0x7fb21dedfe70) at /build/gcc/src/gcc/libgcc/unwind.inc:217 +#8 0x00007fb21e401434 in __pthread_unwind () from /usr/lib/libpthread.so.0 +#9 0x00007fb21e401582 in __pthread_enable_asynccancel () from /usr/lib/libpthread.so.0 +#10 0x00007fb21e4017c7 in write () from /usr/lib/libpthread.so.0 +#11 0x000055f6b8149320 in S::~S (this=0x7fb21dedfe37, __in_chrg=) at 20210515-pthread_cancel-noexcept/thread.cc:9 +#12 0x000055f6b81491bb in threadFn () at 20210515-pthread_cancel-noexcept/thread.cc:18 +#13 0x00007fb21e3f8299 in start_thread () from /usr/lib/libpthread.so.0 +#14 0x00007fb21dfe5053 in clone () from /usr/lib/libc.so.6 +``` +Looking at frames `#6 - #9` we can see that the crashing thread is just +executing `forced unwinding` which is performing the stack unwinding as part of +the thread being cancelled by [`pthread_cancel(3)`][pthread_cancel]. +Thread cancellation starts here from the call to `write()` at frame `#10`, as +pthreads in their default configuration only perform thread cancellation +requests when passing a `cancellation point` as described in +[pthreads(7)][pthreads]. +> The pthread cancel type can either be `PTHREAD_CANCEL_DEFERRED (default)` or +> `PTHREAD_CANCEL_ASYNCHRONOUS` and can be set with +> [`pthread_setcanceltype(3)`][pthread_canceltype]. + +With this findings we can take another look at the reasons when +[`std::terminate()`][std_terminate] is being called. The interesting item on +the list this time is the following: +- `7) a noexcept specification is violated` + +This item is of particular interest because: +- In c++ `destructors` are implicitly marked [`noexcept`][noexcept]. +- For NPTL, thread cancellation is implemented by throwing an exception of type + `abi::__forced_unwind`. + +With all these findings, the random crash in the application can be explained +as that the `pthread_cancel` call was happening asynchronous to the cancelled +thread and there was a chance that a `cancellation point` was hit in a +`destructor`. + +## Conclusion +In general `pthread_cancel` should not be used in c++ code at all, but the +thread should have a way to request a clean shutdown (for example similar to +[`std::jthread`][jthread]). + +However if thread cancellation is **required** then the code should be audited +very carefully and the cancellation points controlled explicitly. This can be +achieved by inserting cancellation points at **safe** sections as: +```c +pthread_setcancelstate(PTHREAD_CANCEL_ENABLE); +pthread_testcancel(); +pthread_setcancelstate(PTHREAD_CANCEL_DISABLE); +``` +> On thread entry, the cancel state should be set to `PTHREAD_CANCEL_DISABLE` +> to disable thread cancellation. + +## Appendix: `abi::__forced_unwind` exception +As mentioned above, thread cancellation for NPTL is implemented by throwing an +exception of type `abi::__forced_unwind`. This exception can actually be caught +in case some extra clean-up steps need to be performed on thread cancellation. +However it is **required** to `rethrow` the exception. +```cpp +#include + +try { + // ... +} catch (abi::__forced_unwind&) { + // Do some extra cleanup. + throw; +} +``` + +## Appendix: Minimal reproducer +```cpp +{{ include(path="content/20210515-pthread_cancel-noexcept/thread.cc") }} +``` + +[std_terminate]: https://en.cppreference.com/w/cpp/error/terminate +[pthread_cancel]: https://man7.org/linux/man-pages/man3/pthread_cancel.3.html +[pthread_canceltype]: https://man7.org/linux/man-pages/man3/pthread_setcanceltype.3.html +[pthread_testcancel]: https://man7.org/linux/man-pages/man3/pthread_testcancel.3.html +[pthreads]: https://man7.org/linux/man-pages/man7/pthreads.7.html +[noexcept]: https://en.cppreference.com/w/cpp/language/noexcept_spec +[jthread]: https://en.cppreference.com/w/cpp/thread/jthread/request_stop diff --git a/content/2021-05-15-pthread_cancel-noexcept/thread.cc b/content/2021-05-15-pthread_cancel-noexcept/thread.cc new file mode 100644 index 0000000..73370be --- /dev/null +++ b/content/2021-05-15-pthread_cancel-noexcept/thread.cc @@ -0,0 +1,40 @@ +// file : thread.cc +// compile: g++ thread.cc -o thread -lpthread + +#include + +#include +#include + +struct S { + ~S() { + const char msg[] = "cancellation-point\n"; + // write() -> pthread cancellation point. + write(STDOUT_FILENO, msg, sizeof(msg)); + } +}; + +std::atomic gReleaseThread{false}; + +void* threadFn(void*) { + while (!gReleaseThread) {} + + // Hit cancellation point in destructor which + // is implicitly `noexcept`. + S s; + + return nullptr; +} + +int main() { + pthread_t t; + pthread_create(&t, nullptr /* attr */, threadFn, nullptr /* arg */); + + // Cancel thread and release it to hit the cancellation point. + pthread_cancel(t); + gReleaseThread = true; + + pthread_join(t, nullptr /* retval */); + + return 0; +} diff --git a/content/20210515-pthread_cancel-noexcept.md b/content/20210515-pthread_cancel-noexcept.md deleted file mode 100644 index 8cb4b52..0000000 --- a/content/20210515-pthread_cancel-noexcept.md +++ /dev/null @@ -1,111 +0,0 @@ -+++ -title = "pthread_cancel in c++ code" -date = 2021-05-15 - -[taxonomies] -tags = ["linux", "threading", "c++"] -+++ - -Few weeks ago I was debugging a random crash in a legacy code base at work. In -case the crash occurred the following message was printed on `stdout` of the -process: -```text -terminate called without an active exception -``` - -Looking at the reasons when [`std::terminate()`][std_terminate] is being -called, and the message that `std::terminate()` was called without an active -exception, the initial assumption was one of the following: -- `10) a joinable std::thread is destroyed or assigned to`. -- Invoked explicitly by the user. - -After receiving a backtrace captured by a customer it wasn't directly obvious -to me why `std::terminate()` was called here. The backtrace received looked -something like the following: -```text -#0 0x00007fb21df22ef5 in raise () from /usr/lib/libc.so.6 -#1 0x00007fb21df0c862 in abort () from /usr/lib/libc.so.6 -#2 0x00007fb21e2a886a in __gnu_cxx::__verbose_terminate_handler () at /build/gcc/src/gcc/libstdc++-v3/libsupc++/vterminate.cc:95 -#3 0x00007fb21e2b4d3a in __cxxabiv1::__terminate (handler=) at /build/gcc/src/gcc/libstdc++-v3/libsupc++/eh_terminate.cc:48 -#4 0x00007fb21e2b4da7 in std::terminate () at /build/gcc/src/gcc/libstdc++-v3/libsupc++/eh_terminate.cc:58 -#5 0x00007fb21e2b470d in __cxxabiv1::__gxx_personality_v0 (version=, actions=10, exception_class=0, ue_header=0x7fb21dee0cb0, context=) at /build/gcc/src/gcc/libstdc++-v3/libsupc++/eh_personality.cc:673 -#6 0x00007fb21e0c3814 in _Unwind_ForcedUnwind_Phase2 (exc=0x7fb21dee0cb0, context=0x7fb21dedfc50, frames_p=0x7fb21dedfb58) at /build/gcc/src/gcc/libgcc/unwind.inc:182 -#7 0x00007fb21e0c3f12 in _Unwind_ForcedUnwind (exc=0x7fb21dee0cb0, stop=, stop_argument=0x7fb21dedfe70) at /build/gcc/src/gcc/libgcc/unwind.inc:217 -#8 0x00007fb21e401434 in __pthread_unwind () from /usr/lib/libpthread.so.0 -#9 0x00007fb21e401582 in __pthread_enable_asynccancel () from /usr/lib/libpthread.so.0 -#10 0x00007fb21e4017c7 in write () from /usr/lib/libpthread.so.0 -#11 0x000055f6b8149320 in S::~S (this=0x7fb21dedfe37, __in_chrg=) at 20210515-pthread_cancel-noexcept/thread.cc:9 -#12 0x000055f6b81491bb in threadFn () at 20210515-pthread_cancel-noexcept/thread.cc:18 -#13 0x00007fb21e3f8299 in start_thread () from /usr/lib/libpthread.so.0 -#14 0x00007fb21dfe5053 in clone () from /usr/lib/libc.so.6 -``` -Looking at frames `#6 - #9` we can see that the crashing thread is just -executing `forced unwinding` which is performing the stack unwinding as part of -the thread being cancelled by [`pthread_cancel(3)`][pthread_cancel]. -Thread cancellation starts here from the call to `write()` at frame `#10`, as -pthreads in their default configuration only perform thread cancellation -requests when passing a `cancellation point` as described in -[pthreads(7)][pthreads]. -> The pthread cancel type can either be `PTHREAD_CANCEL_DEFERRED (default)` or -> `PTHREAD_CANCEL_ASYNCHRONOUS` and can be set with -> [`pthread_setcanceltype(3)`][pthread_canceltype]. - -With this findings we can take another look at the reasons when -[`std::terminate()`][std_terminate] is being called. The interesting item on -the list this time is the following: -- `7) a noexcept specification is violated` - -This item is of particular interest because: -- In c++ `destructors` are implicitly marked [`noexcept`][noexcept]. -- For NPTL, thread cancellation is implemented by throwing an exception of type - `abi::__forced_unwind`. - -With all these findings, the random crash in the application can be explained -as that the `pthread_cancel` call was happening asynchronous to the cancelled -thread and there was a chance that a `cancellation point` was hit in a -`destructor`. - -## Conclusion -In general `pthread_cancel` should not be used in c++ code at all, but the -thread should have a way to request a clean shutdown (for example similar to -[`std::jthread`][jthread]). - -However if thread cancellation is **required** then the code should be audited -very carefully and the cancellation points controlled explicitly. This can be -achieved by inserting cancellation points at **safe** sections as: -```c -pthread_setcancelstate(PTHREAD_CANCEL_ENABLE); -pthread_testcancel(); -pthread_setcancelstate(PTHREAD_CANCEL_DISABLE); -``` -> On thread entry, the cancel state should be set to `PTHREAD_CANCEL_DISABLE` -> to disable thread cancellation. - -## Appendix: `abi::__forced_unwind` exception -As mentioned above, thread cancellation for NPTL is implemented by throwing an -exception of type `abi::__forced_unwind`. This exception can actually be caught -in case some extra clean-up steps need to be performed on thread cancellation. -However it is **required** to `rethrow` the exception. -```cpp -#include - -try { - // ... -} catch (abi::__forced_unwind&) { - // Do some extra cleanup. - throw; -} -``` - -## Appendix: Minimal reproducer -```cpp -{{ include(path="content/20210515-pthread_cancel-noexcept/thread.cc") }} -``` - -[std_terminate]: https://en.cppreference.com/w/cpp/error/terminate -[pthread_cancel]: https://man7.org/linux/man-pages/man3/pthread_cancel.3.html -[pthread_canceltype]: https://man7.org/linux/man-pages/man3/pthread_setcanceltype.3.html -[pthread_testcancel]: https://man7.org/linux/man-pages/man3/pthread_testcancel.3.html -[pthreads]: https://man7.org/linux/man-pages/man7/pthreads.7.html -[noexcept]: https://en.cppreference.com/w/cpp/language/noexcept_spec -[jthread]: https://en.cppreference.com/w/cpp/thread/jthread/request_stop diff --git a/content/20210515-pthread_cancel-noexcept/thread.cc b/content/20210515-pthread_cancel-noexcept/thread.cc deleted file mode 100644 index 73370be..0000000 --- a/content/20210515-pthread_cancel-noexcept/thread.cc +++ /dev/null @@ -1,40 +0,0 @@ -// file : thread.cc -// compile: g++ thread.cc -o thread -lpthread - -#include - -#include -#include - -struct S { - ~S() { - const char msg[] = "cancellation-point\n"; - // write() -> pthread cancellation point. - write(STDOUT_FILENO, msg, sizeof(msg)); - } -}; - -std::atomic gReleaseThread{false}; - -void* threadFn(void*) { - while (!gReleaseThread) {} - - // Hit cancellation point in destructor which - // is implicitly `noexcept`. - S s; - - return nullptr; -} - -int main() { - pthread_t t; - pthread_create(&t, nullptr /* attr */, threadFn, nullptr /* arg */); - - // Cancel thread and release it to hit the cancellation point. - pthread_cancel(t); - gReleaseThread = true; - - pthread_join(t, nullptr /* retval */); - - return 0; -} -- cgit v1.2.3