From 1c123a75e9b17858b783fbc6533417bdfa9794eb Mon Sep 17 00:00:00 2001 From: Johannes Stoelp Date: Mon, 19 Dec 2022 18:54:41 +0100 Subject: Migrate pages with assets according to zola best practice https://www.getzola.org/documentation/content/overview/#asset-colocation --- content/2019-10-27-kernel-debugging-qemu.md | 225 ------------- content/2019-10-27-kernel-debugging-qemu/index.md | 225 +++++++++++++ content/2021-05-15-pthread_cancel-noexcept.md | 110 ------ .../2021-05-15-pthread_cancel-noexcept/index.md | 110 ++++++ content/2021-12-02-toying-with-virtio.md | 373 --------------------- content/2021-12-02-toying-with-virtio/index.md | 373 +++++++++++++++++++++ content/2022-06-18-libclang-c-to-llvm-ir.md | 32 -- content/2022-06-18-libclang-c-to-llvm-ir/index.md | 32 ++ content/2022-07-07-llvm-orc-jit.md | 42 --- content/2022-07-07-llvm-orc-jit/index.md | 42 +++ 10 files changed, 782 insertions(+), 782 deletions(-) delete mode 100644 content/2019-10-27-kernel-debugging-qemu.md create mode 100644 content/2019-10-27-kernel-debugging-qemu/index.md delete mode 100644 content/2021-05-15-pthread_cancel-noexcept.md create mode 100644 content/2021-05-15-pthread_cancel-noexcept/index.md delete mode 100644 content/2021-12-02-toying-with-virtio.md create mode 100644 content/2021-12-02-toying-with-virtio/index.md delete mode 100644 content/2022-06-18-libclang-c-to-llvm-ir.md create mode 100644 content/2022-06-18-libclang-c-to-llvm-ir/index.md delete mode 100644 content/2022-07-07-llvm-orc-jit.md create mode 100644 content/2022-07-07-llvm-orc-jit/index.md (limited to 'content') diff --git a/content/2019-10-27-kernel-debugging-qemu.md b/content/2019-10-27-kernel-debugging-qemu.md deleted file mode 100644 index 518b3d5..0000000 --- a/content/2019-10-27-kernel-debugging-qemu.md +++ /dev/null @@ -1,225 +0,0 @@ -+++ -title = "Linux Kernel debugging with QEMU" - -[taxonomies] -tags = ["linux", "qemu"] -+++ - -**EDIT**: -- 2021-07-15: Added `Appendix: Dockerfile for Kernel development` and updated - busybox + Kernel versions. - -The other evening while starring at some Linux kernel code I thought, let me -setup a minimal environment so I can easily step through the code and examine -the state. - -I ended up creating: -- a [Linux kernel][linux-kernel] with minimal configuration -- a minimal [ramdisk][initrd] to boot into which is based on [busybox][busybox] - -In the remaing part of this article we will go through each step by first -building the kernel, then building the initrd and then running the kernel using -[QEMU][qemu] and debugging it with [GDB][gdb]. - -## $> make kernel - -Before building the kernel we first need to generate a configuration. As a -starting point we generate a minimal config with the `make tinyconfig` make -target. Running this command will generate a `.config` file. After generating -the initial config file we customize the kernel using the merge fragment flow. -This allows us to merge a fragment file into the current configuration by -running the `scripts/kconfig/merge_config.sh` script. - -Let's quickly go over some customizations we do. -The following two lines enable support for gzipped initramdisks: -```config -CONFIG_BLK_DEV_INITRD=y -CONFIG_RD_GZIP=y -``` -The next two configurations are important as they enable the binary loaders for -[ELF][binfmt-elf] and [script #!][binfmt-script] files. -```config -CONFIG_BINFMT_ELF=y -CONFIG_BINFMT_SCRIPT=y -``` - -> Note: In the cursed based configuration `make menuconfig` we can search for -> configurations using the `/` key and then select a match using the number keys. -> After selecting a match we can check the `Help` to get a description for the -> configuration parameter. - -Building the kernel with the default make target will give us the following two -files: -- `vmlinux` statically linked kernel (ELF file) containing symbol information for debugging -- `arch/x86_64/boot/bzImage` compressed kernel image for booting - -Full configure & build script: -```sh -{{ include(path="content/2019-10-27-kernel-debugging-qemu/build_kernel.sh") }} -``` - -## $> make initrd - -Next step is to build the initrd which we base on [busybox][busybox]. Therefore -we first build the busybox project in its default configuration with one -change, we enable following configuration to build a static binary so it can be -used stand-alone: -```sh -sed -i 's/# CONFIG_STATIC .*/CONFIG_STATIC=y/' .config -``` - -One important step before creating the final initrd is to create an init -process. This will be the first process executed in userspace after the kernel -finished its initialization. We just create a script that drops us into a -shell: -```sh -cat < init -#!/bin/sh - -mount -t proc none /proc -mount -t sysfs none /sys - -exec setsid cttyhack sh -EOF -``` -> By default the kernel looks for `/sbin/init` in the root file system, but the -> location can optionally be specified with the [`init=`][kernel-param] kernel -> parameter. - -Full busybox & initrd build script: -```sh -{{ include(path="content/2019-10-27-kernel-debugging-qemu/build_initrd.sh") }} -``` - -## Running QEMU && GDB - -After finishing the previous steps we have all we need to run and debug the -kernel. We have `arch/x86/boot/bzImage` and `initramfs.cpio.gz` to boot the -kernel into a shell and we have `vmlinux` to feed the debugger with debug -symbols. - -We start QEMU as follows, thanks to the `-S` flag the CPU will freeze until we -connected the debugger: -```sh -# -S freeze CPU until debugger connected -> qemu-system-x86_64 \ - -kernel ./linux-5.3.7/arch/x86/boot/bzImage \ - -nographic \ - -append "earlyprintk=ttyS0 console=ttyS0 nokaslr init=/init debug" \ - -initrd ./initramfs.cpio.gz \ - -gdb tcp::1234 \ - -S -``` - -Then we can start GDB and connect to the GDB server running in QEMU (configured -via `-gdb tcp::1234`). From now on we can start to debug through the -kernel. -```sh -> gdb linux-5.3.7/vmlinux -ex 'target remote :1234' -(gdb) b do_execve -Breakpoint 1 at 0xffffffff810a1a60: file fs/exec.c, line 1885. -(gdb) c -Breakpoint 1, do_execve (filename=0xffff888000060000, __argv=0xffffffff8181e160 , __envp=0xffffffff8181e040 ) at fs/exec.c:1885 -1885 return do_execveat_common(AT_FDCWD, filename, argv, envp, 0); -(gdb) bt -#0 do_execve (filename=0xffff888000060000, __argv=0xffffffff8181e160 , __envp=0xffffffff8181e040 ) at fs/exec.c:1885 -#1 0xffffffff81000498 in run_init_process (init_filename=) at init/main.c:1048 -#2 0xffffffff81116b75 in kernel_init (unused=) at init/main.c:1129 -#3 0xffffffff8120014f in ret_from_fork () at arch/x86/entry/entry_64.S:352 -#4 0x0000000000000000 in ?? () -(gdb) -``` - ---- - -## Appendix: Try to get around `` - -When debugging the kernel we often face following situation in gdb: -``` -(gdb) frame -#0 do_execveat_common (fd=fd@entry=-100, filename=0xffff888000120000, argv=argv@entry=..., envp=envp@entry=..., flags=flags@entry=0) at fs/exec.c - -(gdb) info args -fd = -filename = 0xffff888000060000 -argv = -envp = -flags = -file = 0x0 -``` -The problem is that the Linux kernel requires certain code to be compiled with -optimizations enabled. - -In this situation we can "try" to reduce the optimization for single compilation -units or a subtree (try because, reducing the optimization could break the -build). To do so we adapt the Makefile in the corresponding directory. -```make -# fs/Makefile - -# configure for single compilation unit -CFLAGS_exec.o := -Og - -# configure for the whole subtree of where the Makefile resides -ccflags-y := -Og -``` - -After enabling optimize for debug experience `-Og` we can see the following now -in gdb: -``` -(gdb) frame -#0 do_execveat_common (fd=fd@entry=-100, filename=0xffff888000120000, argv=argv@entry=..., envp=envp@entry=..., flags=flags@entry=0) at fs/exec.c - -(gdb) info args -fd = -100 -filename = 0xffff888000120000 -argv = {ptr = {native = 0x10c5980}} -envp = {ptr = {native = 0x10c5990}} -flags = 0 - -(gdb) p *filename -$3 = {name = 0xffff888000120020 "/bin/ls", uptr = 0x10c59b8 "/bin/ls", refcnt = 1, aname = 0x0, iname = 0xffff888000120020 "/bin/ls"} - -(gdb) ptype filename -type = struct filename { - const char *name; - const char *uptr; - int refcnt; - struct audit_names *aname; - const char iname[]; -} -``` - -## Appendix: `Dockerfile` for Kernel development - -The following `Dockerfile` provides a development environment with all the -required tools and dependencies, to re-produce all the steps of building and -debugging the Linux kernel. -```dockerfile -{{ include(path="content/2019-10-27-kernel-debugging-qemu/Dockerfile") }} -``` - -Save the listing above in a file called `Dockerfile` and build the docker image -as follows. -```sh -docker build -t kernel-dev -``` -> Optionally set `DOCKER_BUILDKIT=1` to use the newer image builder. - -Once the image has been built, an interactive container can be launched as -follows. -```sh -# Some options for conveniene: -# -v : Mount host path to guest path. -# --rm Remove the container after exiting. - -docker run -it kernel-dev -``` - -[linux-kernel]: https://www.kernel.org -[initrd]: https://www.kernel.org/doc/html/latest/admin-guide/initrd.html -[busybox]: https://busybox.net -[qemu]: https://www.qemu.org -[gdb]: https://www.gnu.org/software/gdb -[binfmt-elf]: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/fs/binfmt_elf.c -[binfmt-script]: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/fs/binfmt_script.c -[kernel-param]: https://www.kernel.org/doc/html/latest/admin-guide/kernel-parameters.html diff --git a/content/2019-10-27-kernel-debugging-qemu/index.md b/content/2019-10-27-kernel-debugging-qemu/index.md new file mode 100644 index 0000000..518b3d5 --- /dev/null +++ b/content/2019-10-27-kernel-debugging-qemu/index.md @@ -0,0 +1,225 @@ ++++ +title = "Linux Kernel debugging with QEMU" + +[taxonomies] +tags = ["linux", "qemu"] ++++ + +**EDIT**: +- 2021-07-15: Added `Appendix: Dockerfile for Kernel development` and updated + busybox + Kernel versions. + +The other evening while starring at some Linux kernel code I thought, let me +setup a minimal environment so I can easily step through the code and examine +the state. + +I ended up creating: +- a [Linux kernel][linux-kernel] with minimal configuration +- a minimal [ramdisk][initrd] to boot into which is based on [busybox][busybox] + +In the remaing part of this article we will go through each step by first +building the kernel, then building the initrd and then running the kernel using +[QEMU][qemu] and debugging it with [GDB][gdb]. + +## $> make kernel + +Before building the kernel we first need to generate a configuration. As a +starting point we generate a minimal config with the `make tinyconfig` make +target. Running this command will generate a `.config` file. After generating +the initial config file we customize the kernel using the merge fragment flow. +This allows us to merge a fragment file into the current configuration by +running the `scripts/kconfig/merge_config.sh` script. + +Let's quickly go over some customizations we do. +The following two lines enable support for gzipped initramdisks: +```config +CONFIG_BLK_DEV_INITRD=y +CONFIG_RD_GZIP=y +``` +The next two configurations are important as they enable the binary loaders for +[ELF][binfmt-elf] and [script #!][binfmt-script] files. +```config +CONFIG_BINFMT_ELF=y +CONFIG_BINFMT_SCRIPT=y +``` + +> Note: In the cursed based configuration `make menuconfig` we can search for +> configurations using the `/` key and then select a match using the number keys. +> After selecting a match we can check the `Help` to get a description for the +> configuration parameter. + +Building the kernel with the default make target will give us the following two +files: +- `vmlinux` statically linked kernel (ELF file) containing symbol information for debugging +- `arch/x86_64/boot/bzImage` compressed kernel image for booting + +Full configure & build script: +```sh +{{ include(path="content/2019-10-27-kernel-debugging-qemu/build_kernel.sh") }} +``` + +## $> make initrd + +Next step is to build the initrd which we base on [busybox][busybox]. Therefore +we first build the busybox project in its default configuration with one +change, we enable following configuration to build a static binary so it can be +used stand-alone: +```sh +sed -i 's/# CONFIG_STATIC .*/CONFIG_STATIC=y/' .config +``` + +One important step before creating the final initrd is to create an init +process. This will be the first process executed in userspace after the kernel +finished its initialization. We just create a script that drops us into a +shell: +```sh +cat < init +#!/bin/sh + +mount -t proc none /proc +mount -t sysfs none /sys + +exec setsid cttyhack sh +EOF +``` +> By default the kernel looks for `/sbin/init` in the root file system, but the +> location can optionally be specified with the [`init=`][kernel-param] kernel +> parameter. + +Full busybox & initrd build script: +```sh +{{ include(path="content/2019-10-27-kernel-debugging-qemu/build_initrd.sh") }} +``` + +## Running QEMU && GDB + +After finishing the previous steps we have all we need to run and debug the +kernel. We have `arch/x86/boot/bzImage` and `initramfs.cpio.gz` to boot the +kernel into a shell and we have `vmlinux` to feed the debugger with debug +symbols. + +We start QEMU as follows, thanks to the `-S` flag the CPU will freeze until we +connected the debugger: +```sh +# -S freeze CPU until debugger connected +> qemu-system-x86_64 \ + -kernel ./linux-5.3.7/arch/x86/boot/bzImage \ + -nographic \ + -append "earlyprintk=ttyS0 console=ttyS0 nokaslr init=/init debug" \ + -initrd ./initramfs.cpio.gz \ + -gdb tcp::1234 \ + -S +``` + +Then we can start GDB and connect to the GDB server running in QEMU (configured +via `-gdb tcp::1234`). From now on we can start to debug through the +kernel. +```sh +> gdb linux-5.3.7/vmlinux -ex 'target remote :1234' +(gdb) b do_execve +Breakpoint 1 at 0xffffffff810a1a60: file fs/exec.c, line 1885. +(gdb) c +Breakpoint 1, do_execve (filename=0xffff888000060000, __argv=0xffffffff8181e160 , __envp=0xffffffff8181e040 ) at fs/exec.c:1885 +1885 return do_execveat_common(AT_FDCWD, filename, argv, envp, 0); +(gdb) bt +#0 do_execve (filename=0xffff888000060000, __argv=0xffffffff8181e160 , __envp=0xffffffff8181e040 ) at fs/exec.c:1885 +#1 0xffffffff81000498 in run_init_process (init_filename=) at init/main.c:1048 +#2 0xffffffff81116b75 in kernel_init (unused=) at init/main.c:1129 +#3 0xffffffff8120014f in ret_from_fork () at arch/x86/entry/entry_64.S:352 +#4 0x0000000000000000 in ?? () +(gdb) +``` + +--- + +## Appendix: Try to get around `` + +When debugging the kernel we often face following situation in gdb: +``` +(gdb) frame +#0 do_execveat_common (fd=fd@entry=-100, filename=0xffff888000120000, argv=argv@entry=..., envp=envp@entry=..., flags=flags@entry=0) at fs/exec.c + +(gdb) info args +fd = +filename = 0xffff888000060000 +argv = +envp = +flags = +file = 0x0 +``` +The problem is that the Linux kernel requires certain code to be compiled with +optimizations enabled. + +In this situation we can "try" to reduce the optimization for single compilation +units or a subtree (try because, reducing the optimization could break the +build). To do so we adapt the Makefile in the corresponding directory. +```make +# fs/Makefile + +# configure for single compilation unit +CFLAGS_exec.o := -Og + +# configure for the whole subtree of where the Makefile resides +ccflags-y := -Og +``` + +After enabling optimize for debug experience `-Og` we can see the following now +in gdb: +``` +(gdb) frame +#0 do_execveat_common (fd=fd@entry=-100, filename=0xffff888000120000, argv=argv@entry=..., envp=envp@entry=..., flags=flags@entry=0) at fs/exec.c + +(gdb) info args +fd = -100 +filename = 0xffff888000120000 +argv = {ptr = {native = 0x10c5980}} +envp = {ptr = {native = 0x10c5990}} +flags = 0 + +(gdb) p *filename +$3 = {name = 0xffff888000120020 "/bin/ls", uptr = 0x10c59b8 "/bin/ls", refcnt = 1, aname = 0x0, iname = 0xffff888000120020 "/bin/ls"} + +(gdb) ptype filename +type = struct filename { + const char *name; + const char *uptr; + int refcnt; + struct audit_names *aname; + const char iname[]; +} +``` + +## Appendix: `Dockerfile` for Kernel development + +The following `Dockerfile` provides a development environment with all the +required tools and dependencies, to re-produce all the steps of building and +debugging the Linux kernel. +```dockerfile +{{ include(path="content/2019-10-27-kernel-debugging-qemu/Dockerfile") }} +``` + +Save the listing above in a file called `Dockerfile` and build the docker image +as follows. +```sh +docker build -t kernel-dev +``` +> Optionally set `DOCKER_BUILDKIT=1` to use the newer image builder. + +Once the image has been built, an interactive container can be launched as +follows. +```sh +# Some options for conveniene: +# -v : Mount host path to guest path. +# --rm Remove the container after exiting. + +docker run -it kernel-dev +``` + +[linux-kernel]: https://www.kernel.org +[initrd]: https://www.kernel.org/doc/html/latest/admin-guide/initrd.html +[busybox]: https://busybox.net +[qemu]: https://www.qemu.org +[gdb]: https://www.gnu.org/software/gdb +[binfmt-elf]: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/fs/binfmt_elf.c +[binfmt-script]: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/fs/binfmt_script.c +[kernel-param]: https://www.kernel.org/doc/html/latest/admin-guide/kernel-parameters.html diff --git a/content/2021-05-15-pthread_cancel-noexcept.md b/content/2021-05-15-pthread_cancel-noexcept.md deleted file mode 100644 index f09cadc..0000000 --- a/content/2021-05-15-pthread_cancel-noexcept.md +++ /dev/null @@ -1,110 +0,0 @@ -+++ -title = "pthread_cancel in c++ code" - -[taxonomies] -tags = ["linux", "threading", "c++"] -+++ - -Few weeks ago I was debugging a random crash in a legacy code base at work. In -case the crash occurred the following message was printed on `stdout` of the -process: -``` -terminate called without an active exception -``` - -Looking at the reasons when [`std::terminate()`][std_terminate] is being -called, and the message that `std::terminate()` was called without an active -exception, the initial assumption was one of the following: -- `10) a joinable std::thread is destroyed or assigned to`. -- Invoked explicitly by the user. - -After receiving a backtrace captured by a customer it wasn't directly obvious -to me why `std::terminate()` was called here. The backtrace received looked -something like the following: -``` -#0 0x00007fb21df22ef5 in raise () from /usr/lib/libc.so.6 -#1 0x00007fb21df0c862 in abort () from /usr/lib/libc.so.6 -#2 0x00007fb21e2a886a in __gnu_cxx::__verbose_terminate_handler () at /build/gcc/src/gcc/libstdc++-v3/libsupc++/vterminate.cc:95 -#3 0x00007fb21e2b4d3a in __cxxabiv1::__terminate (handler=) at /build/gcc/src/gcc/libstdc++-v3/libsupc++/eh_terminate.cc:48 -#4 0x00007fb21e2b4da7 in std::terminate () at /build/gcc/src/gcc/libstdc++-v3/libsupc++/eh_terminate.cc:58 -#5 0x00007fb21e2b470d in __cxxabiv1::__gxx_personality_v0 (version=, actions=10, exception_class=0, ue_header=0x7fb21dee0cb0, context=) at /build/gcc/src/gcc/libstdc++-v3/libsupc++/eh_personality.cc:673 -#6 0x00007fb21e0c3814 in _Unwind_ForcedUnwind_Phase2 (exc=0x7fb21dee0cb0, context=0x7fb21dedfc50, frames_p=0x7fb21dedfb58) at /build/gcc/src/gcc/libgcc/unwind.inc:182 -#7 0x00007fb21e0c3f12 in _Unwind_ForcedUnwind (exc=0x7fb21dee0cb0, stop=, stop_argument=0x7fb21dedfe70) at /build/gcc/src/gcc/libgcc/unwind.inc:217 -#8 0x00007fb21e401434 in __pthread_unwind () from /usr/lib/libpthread.so.0 -#9 0x00007fb21e401582 in __pthread_enable_asynccancel () from /usr/lib/libpthread.so.0 -#10 0x00007fb21e4017c7 in write () from /usr/lib/libpthread.so.0 -#11 0x000055f6b8149320 in S::~S (this=0x7fb21dedfe37, __in_chrg=) at 20210515-pthread_cancel-noexcept/thread.cc:9 -#12 0x000055f6b81491bb in threadFn () at 20210515-pthread_cancel-noexcept/thread.cc:18 -#13 0x00007fb21e3f8299 in start_thread () from /usr/lib/libpthread.so.0 -#14 0x00007fb21dfe5053 in clone () from /usr/lib/libc.so.6 -``` -Looking at frames `#6 - #9` we can see that the crashing thread is just -executing `forced unwinding` which is performing the stack unwinding as part of -the thread being cancelled by [`pthread_cancel(3)`][pthread_cancel]. -Thread cancellation starts here from the call to `write()` at frame `#10`, as -pthreads in their default configuration only perform thread cancellation -requests when passing a `cancellation point` as described in -[pthreads(7)][pthreads]. -> The pthread cancel type can either be `PTHREAD_CANCEL_DEFERRED (default)` or -> `PTHREAD_CANCEL_ASYNCHRONOUS` and can be set with -> [`pthread_setcanceltype(3)`][pthread_canceltype]. - -With this findings we can take another look at the reasons when -[`std::terminate()`][std_terminate] is being called. The interesting item on -the list this time is the following: -- `7) a noexcept specification is violated` - -This item is of particular interest because: -- In c++ `destructors` are implicitly marked [`noexcept`][noexcept]. -- For NPTL, thread cancellation is implemented by throwing an exception of type - `abi::__forced_unwind`. - -With all these findings, the random crash in the application can be explained -as that the `pthread_cancel` call was happening asynchronous to the cancelled -thread and there was a chance that a `cancellation point` was hit in a -`destructor`. - -## Conclusion -In general `pthread_cancel` should not be used in c++ code at all, but the -thread should have a way to request a clean shutdown (for example similar to -[`std::jthread`][jthread]). - -However if thread cancellation is **required** then the code should be audited -very carefully and the cancellation points controlled explicitly. This can be -achieved by inserting cancellation points at **safe** sections as: -```c -pthread_setcancelstate(PTHREAD_CANCEL_ENABLE); -pthread_testcancel(); -pthread_setcancelstate(PTHREAD_CANCEL_DISABLE); -``` -> On thread entry, the cancel state should be set to `PTHREAD_CANCEL_DISABLE` -> to disable thread cancellation. - -## Appendix: `abi::__forced_unwind` exception -As mentioned above, thread cancellation for NPTL is implemented by throwing an -exception of type `abi::__forced_unwind`. This exception can actually be caught -in case some extra clean-up steps need to be performed on thread cancellation. -However it is **required** to `rethrow` the exception. -```cpp -#include - -try { - // ... -} catch (abi::__forced_unwind&) { - // Do some extra cleanup. - throw; -} -``` - -## Appendix: Minimal reproducer -```cpp -{{ include(path="content/2021-05-15-pthread_cancel-noexcept/thread.cc") }} -``` - -[std_terminate]: https://en.cppreference.com/w/cpp/error/terminate -[pthread_cancel]: https://man7.org/linux/man-pages/man3/pthread_cancel.3.html -[pthread_canceltype]: https://man7.org/linux/man-pages/man3/pthread_setcanceltype.3.html -[pthread_testcancel]: https://man7.org/linux/man-pages/man3/pthread_testcancel.3.html -[pthreads]: https://man7.org/linux/man-pages/man7/pthreads.7.html -[noexcept]: https://en.cppreference.com/w/cpp/language/noexcept_spec -[jthread]: https://en.cppreference.com/w/cpp/thread/jthread/request_stop diff --git a/content/2021-05-15-pthread_cancel-noexcept/index.md b/content/2021-05-15-pthread_cancel-noexcept/index.md new file mode 100644 index 0000000..f09cadc --- /dev/null +++ b/content/2021-05-15-pthread_cancel-noexcept/index.md @@ -0,0 +1,110 @@ ++++ +title = "pthread_cancel in c++ code" + +[taxonomies] +tags = ["linux", "threading", "c++"] ++++ + +Few weeks ago I was debugging a random crash in a legacy code base at work. In +case the crash occurred the following message was printed on `stdout` of the +process: +``` +terminate called without an active exception +``` + +Looking at the reasons when [`std::terminate()`][std_terminate] is being +called, and the message that `std::terminate()` was called without an active +exception, the initial assumption was one of the following: +- `10) a joinable std::thread is destroyed or assigned to`. +- Invoked explicitly by the user. + +After receiving a backtrace captured by a customer it wasn't directly obvious +to me why `std::terminate()` was called here. The backtrace received looked +something like the following: +``` +#0 0x00007fb21df22ef5 in raise () from /usr/lib/libc.so.6 +#1 0x00007fb21df0c862 in abort () from /usr/lib/libc.so.6 +#2 0x00007fb21e2a886a in __gnu_cxx::__verbose_terminate_handler () at /build/gcc/src/gcc/libstdc++-v3/libsupc++/vterminate.cc:95 +#3 0x00007fb21e2b4d3a in __cxxabiv1::__terminate (handler=) at /build/gcc/src/gcc/libstdc++-v3/libsupc++/eh_terminate.cc:48 +#4 0x00007fb21e2b4da7 in std::terminate () at /build/gcc/src/gcc/libstdc++-v3/libsupc++/eh_terminate.cc:58 +#5 0x00007fb21e2b470d in __cxxabiv1::__gxx_personality_v0 (version=, actions=10, exception_class=0, ue_header=0x7fb21dee0cb0, context=) at /build/gcc/src/gcc/libstdc++-v3/libsupc++/eh_personality.cc:673 +#6 0x00007fb21e0c3814 in _Unwind_ForcedUnwind_Phase2 (exc=0x7fb21dee0cb0, context=0x7fb21dedfc50, frames_p=0x7fb21dedfb58) at /build/gcc/src/gcc/libgcc/unwind.inc:182 +#7 0x00007fb21e0c3f12 in _Unwind_ForcedUnwind (exc=0x7fb21dee0cb0, stop=, stop_argument=0x7fb21dedfe70) at /build/gcc/src/gcc/libgcc/unwind.inc:217 +#8 0x00007fb21e401434 in __pthread_unwind () from /usr/lib/libpthread.so.0 +#9 0x00007fb21e401582 in __pthread_enable_asynccancel () from /usr/lib/libpthread.so.0 +#10 0x00007fb21e4017c7 in write () from /usr/lib/libpthread.so.0 +#11 0x000055f6b8149320 in S::~S (this=0x7fb21dedfe37, __in_chrg=) at 20210515-pthread_cancel-noexcept/thread.cc:9 +#12 0x000055f6b81491bb in threadFn () at 20210515-pthread_cancel-noexcept/thread.cc:18 +#13 0x00007fb21e3f8299 in start_thread () from /usr/lib/libpthread.so.0 +#14 0x00007fb21dfe5053 in clone () from /usr/lib/libc.so.6 +``` +Looking at frames `#6 - #9` we can see that the crashing thread is just +executing `forced unwinding` which is performing the stack unwinding as part of +the thread being cancelled by [`pthread_cancel(3)`][pthread_cancel]. +Thread cancellation starts here from the call to `write()` at frame `#10`, as +pthreads in their default configuration only perform thread cancellation +requests when passing a `cancellation point` as described in +[pthreads(7)][pthreads]. +> The pthread cancel type can either be `PTHREAD_CANCEL_DEFERRED (default)` or +> `PTHREAD_CANCEL_ASYNCHRONOUS` and can be set with +> [`pthread_setcanceltype(3)`][pthread_canceltype]. + +With this findings we can take another look at the reasons when +[`std::terminate()`][std_terminate] is being called. The interesting item on +the list this time is the following: +- `7) a noexcept specification is violated` + +This item is of particular interest because: +- In c++ `destructors` are implicitly marked [`noexcept`][noexcept]. +- For NPTL, thread cancellation is implemented by throwing an exception of type + `abi::__forced_unwind`. + +With all these findings, the random crash in the application can be explained +as that the `pthread_cancel` call was happening asynchronous to the cancelled +thread and there was a chance that a `cancellation point` was hit in a +`destructor`. + +## Conclusion +In general `pthread_cancel` should not be used in c++ code at all, but the +thread should have a way to request a clean shutdown (for example similar to +[`std::jthread`][jthread]). + +However if thread cancellation is **required** then the code should be audited +very carefully and the cancellation points controlled explicitly. This can be +achieved by inserting cancellation points at **safe** sections as: +```c +pthread_setcancelstate(PTHREAD_CANCEL_ENABLE); +pthread_testcancel(); +pthread_setcancelstate(PTHREAD_CANCEL_DISABLE); +``` +> On thread entry, the cancel state should be set to `PTHREAD_CANCEL_DISABLE` +> to disable thread cancellation. + +## Appendix: `abi::__forced_unwind` exception +As mentioned above, thread cancellation for NPTL is implemented by throwing an +exception of type `abi::__forced_unwind`. This exception can actually be caught +in case some extra clean-up steps need to be performed on thread cancellation. +However it is **required** to `rethrow` the exception. +```cpp +#include + +try { + // ... +} catch (abi::__forced_unwind&) { + // Do some extra cleanup. + throw; +} +``` + +## Appendix: Minimal reproducer +```cpp +{{ include(path="content/2021-05-15-pthread_cancel-noexcept/thread.cc") }} +``` + +[std_terminate]: https://en.cppreference.com/w/cpp/error/terminate +[pthread_cancel]: https://man7.org/linux/man-pages/man3/pthread_cancel.3.html +[pthread_canceltype]: https://man7.org/linux/man-pages/man3/pthread_setcanceltype.3.html +[pthread_testcancel]: https://man7.org/linux/man-pages/man3/pthread_testcancel.3.html +[pthreads]: https://man7.org/linux/man-pages/man7/pthreads.7.html +[noexcept]: https://en.cppreference.com/w/cpp/language/noexcept_spec +[jthread]: https://en.cppreference.com/w/cpp/thread/jthread/request_stop diff --git a/content/2021-12-02-toying-with-virtio.md b/content/2021-12-02-toying-with-virtio.md deleted file mode 100644 index c2ff031..0000000 --- a/content/2021-12-02-toying-with-virtio.md +++ /dev/null @@ -1,373 +0,0 @@ -+++ -title = "QEMU virtio configurations" - -[taxonomies] -tags = ["linux", "qemu", "virtio"] -+++ - -For my own reference I wanted to document some minimal [`virtio`][virtio] -device configurations with qemu and the required Linux kernel configuration to -enable those devices. - -The devices we will use are `virtio console`, `virtio blk` and `virtio net`. - -To make use of the virtio devices in qemu we are going to build and boot into -busybox based [`initramfs`][initramfs]. - -## Build initramfs - -For the initramfs there is not much magic, we will grab a copy of busybox, -configure it with the default config (`defconfig`) and enable static linking as -we will use it as rootfs. - -For the `init` process we will use the one provided by busybox but we have to -symlink it to `/init` as during boot, the kernel will extract the cpio -compressed initramfs into `rootfs` and look for the `/init` file. If that's not -found the kernel will fallback to an older mechanism an try to mount a root -partition (which we don't have). -> Optionally the init binary could be specified with the `rdinit=` kernel boot -> parameter. - -We populate the `/etc/inittab` and `/etc/init.d/rcS` with a minimal -configuration to mount the `proc`, `sys` and `dev` filesystems and drop into a -shell after the boot is completed. \ -Additionally we setup `/etc/passwd` and `/etc/shadow` with an entry for the -`root` user with the password `1234`, so we can login via the virtio console -later. - -```sh -{{ include_range(path="content/2021-12-02-toying-with-virtio/build_initramfs.sh", start=31, end=67) }} -``` - -The full build script is available under [build_initramfs.sh][build-initramfs]. - -## Virtio console - -To enable support for the virtio console we enable the kernel configs shown -below. -The pci configurations are enabled because in qemu the virtio console front-end -device (the one presented to the guest) is attached to the pci bus. - -```sh -{{ include_range(path="content/2021-12-02-toying-with-virtio/build_kernel.sh", start=32, end=38) }} -``` - -The full build script is available under [build_kernel.sh][build-kernel]. - -To boot-up the guest we use the following qemu configuration. - -```sh -qemu-system-x86_64 \ - -nographic \ - -cpu host \ - -enable-kvm \ - -kernel ./linux-$(VER)/arch/x86/boot/bzImage \ - -append "earlyprintk=ttyS0 console=ttyS0 root=/dev/ram0 ro" \ - -initrd ./initramfs.cpio.gz \ - -device virtio-serial-pci \ - -device virtconsole,chardev=vcon,name=console.0 \ - -chardev socket,id=vcon,ipv4=on,host=localhost,port=2222,server,telnet=on,wait=off -``` - -The important parts in this configuration are the last three lines. - -The `virtio-serial-pci` device creates the serial bus where the virtio console -is attached to. - -The `virtconsole` creates the virtio console device exposed to the guest -(front-end). The `chardev=vcon` option specifies that the chardev with -`id=vcon` is attached as back-end to the virtio console. -The back-end device is the one we will have access to from the host running the -emulation. - -The chardev back-end we configure to be a `socket`, running a telnet server -listening on port 2222. The `wait=off` tells qemu that it can directly boot -without waiting for a client connection. - -After booting the guest we are dropped into a shell and can verify that our -device is being detected properly. -```sh -root@virtio-box ~ # ls /sys/bus/virtio/devices/ -virtio0 -root@virtio-box ~ # cat /sys/bus/virtio/devices/virtio0/virtio-ports/vport0p0/name -console.0 -``` - -In `/etc/inittab`, we already configured to spawn `getty` on the first -hypervisor console `/dev/hvc0`. This will effectively run `login(1)` over the -serial console. - -From the host we can run `telnet localhost 2222` and are presented with a login shell to the guest. - -As we already included to launch `getty` on the first hypervisor console -`/dev/hvc0` in `/etc/inittab`, we can directly connect to the back-end chardev -and login to the guest with `root:1234`. - -```sh -> telnet -4 localhost 2222 -Trying 127.0.0.1... -Connected to localhost. -Escape character is '^]'. - -virtio-box login: root -Password: -root@virtio-box ~ # -``` - -## Virtio blk - -To enable support for the virtio block device we enable the kernel configs -shown below. -First we enable general support for block devices and then for virtio block -devices. Additionally we enable support for the `ext2` filesystem because we -are creating an ext2 filesystem to back the virtio block device. - -```sh -{{ include_range(path="content/2021-12-02-toying-with-virtio/build_kernel.sh", start=40, end=47) }} -``` - -The full build script is available under [build_kernel.sh][build-kernel]. - -Next we are creating the ext2 filesystem image. This we'll do by creating an -`128M` blob and format it with ext2 afterwards. Then we can mount the image -via a `loop` device and populate the filesystem. -```sh -{{ include_range(path="content/2021-12-02-toying-with-virtio/build_ext2.sh", start=3, end=7) }} -``` - -Before booting the guest we will attach the virtio block device to the VM. -Therefore we add the `-drive` configuration to our previous qemu invocation. - -```sh -qemu-system-x86_64 \ - ... - -drive if=virtio,file=fs.ext2,format=raw -``` - -The `-drive` option is a shortcut for a `-device (front-end) / -blockdev -(back-end)` pair. - -The `if=virtio` flag specifies the interface of the front-end device to be -`virtio`. - -The `file` and `format` flags configure the back-end to be a disk image. - -After booting the guest we are dropped into a shell and can verify a few -things. First we check if the virtio block device is detected, then we check if -we have support for the ext2 filesystem and finally we mount the disk. - -```sh -root@virtio-box ~ # ls -l /sys/block/ -lrwxrwxrwx 1 root 0 0 Dec 3 22:46 vda -> ../devices/pci0000:00/0000:00:05.0/virtio1/block/vda - -root@virtio-box ~ # cat /proc/filesystems -... - ext2 - -root@virtio-box ~ # mount -t ext2 /dev/vda /mnt -EXT2-fs (vda): warning: mounting unchecked fs, running e2fsck is recommended -ext2 filesystem being mounted at /mnt supports timestamps until 2038 (0x7fffffff) - -root@virtio-box ~ # cat /mnt/hello -world -``` - -## Virtio net - -To enable support for the virtio network device we enable the kernel configs -shown below. -First we enable general support for networking and TCP/IP and then enable the -core networking driver and the virtio net driver. - -```sh -{{ include_range(path="content/2021-12-02-toying-with-virtio/build_kernel.sh", start=49, end=62) }} -``` - -The full build script is available under [build_kernel.sh][build-kernel]. - -For the qemu device emulation we already decided on the front-end device, which -will be our virtio net device. \ -On the back-end we will choose the [`user`][qemu-user-net] option. This enables -a network stack implemented in userspace based on [libslirp][libslirp], which -has the benefit that we do not need to setup additional network interfaces and -therefore require any privileges. Fundamentally, [libslirp][libslirp] works by -replaying [Layer 2][osi-2] packets received from the guest NIC via the socket -API on the host ([Layer 4][osi-4]) and vice versa. User networking comes with a -set of limitations, for example -- Can not use `ping` inside the guest as `ICMP` is not supported. -- The guest is not accessible from the host. - -With the guest, qemu and the host in the picture this looks something like the -following. -``` -+--------------------------------------------+ -| host | -| +-------------------------+ | -| | guest | | -| | | | -| | user | | -| +------+------+-----------+ | -| | | eth0 | kernel | | -| | +--+---+ | | -| | | | | -| | +-----v--------+ | | -| | | nic (virtio) | | | -| +--+---+-----+--------+------+--+ | -| | | Layer 2 qemu | | -| | | (eth frames) | | -| | +----v-----+ | | -| | | libslirp | | | -| | +----+-----+ | | -| | | Layer 4 | | -| | | (socket API) | user | -+--+---------+--v---+--------------+---------+ -| | eth0 | kernel | -| +------+ | -+--------------------------------------------+ -``` - -The user networking implements a virtually NAT'ed sub-network with the address -range `10.0.2.0/24` running an internal dhcp server. By default, the dhcp -server assigns the following IP addresses which are interesting to us: -- `10.0.2.2` host running the qemu emulation -- `10.0.2.3` virtual DNS server -> The netdev options `net=addr/mask`, `host=addr`, `dns=addr` can be used to -> re-configure the sub-network (see [network options][qemu-nic-opts]). - -With the details of the sub-network in mind we can add some additional setup to -the initramfs which performs the basic network setup. - -We add the virtual DNS server to `/etc/resolv.conf` which will be used by the -libc resolver functions. - -Additionally we assign a static ip to the `eth0` network interface, bring the -interface up and define the default route via the host `10.0.2.2`. - -```sh -{{ include_range(path="content/2021-12-02-toying-with-virtio/build_initramfs.sh", start=69, end=85) }} -``` - -The full build script is available under [build_initramfs.sh][build-initramfs]. - -Before booting the guest we will attach the virtio net device and configure to -use the user network stack. -Therefore we add the `-nic` configuration to our previous qemu invocation. - -```sh -qemu-system-x86_64 \ - ... - -nic user,model=virtio-net-pci -``` - -The `-nic` option is a shortcut for a `-device (front-end) / -netdev -(back-end)` pair. - -After booting the guest we are dropped into a shell and can verify a few -things. First we check if the virtio net device is detected. Then we check if -the interface got configured and brought up correctly. - -```sh -root@virtio-box ~ # ls -l /sys/class/net/ -lrwxrwxrwx 1 root 0 0 Dec 4 16:56 eth0 -> ../../devices/pci0000:00/0000:00:03.0/virtio0/net/eth0 -lrwxrwxrwx 1 root 0 0 Dec 4 16:56 lo -> ../../devices/virtual/net/lo - - -root@virtio-box ~ # ip -o a -2: eth0 inet 10.0.2.15/24 scope global eth0 ... - -root@virtio-box ~ # ip route -default via 10.0.2.2 dev eth0 -10.0.2.0/24 dev eth0 scope link src 10.0.2.15 -``` - -We can resolve out domain and see that the virtual DNS gets contacted. - -```sh -root@virtio-box ~ # nslookup memzero.de -Server: 10.0.2.3 -Address: 10.0.2.3:53 - -Non-authoritative answer: -Name: memzero.de -Address: 46.101.148.203 -``` - -Additionally we can try to access a service running on the host. Therefore we -run a simple http server on the host (where we launched qemu) with the -following command `python3 -m http.server --bind 0.0.0.0 1234`. This will -launch the server to listen for any incoming address at port `1234`. - -From within the guest we can manually craft a simple http `GET` request and -send it to the http server running on the host. For that we use the IP address -`10.0.2.2` which the dhcp assigned to our host. - -```sh -root@virtio-box ~ # echo "GET / HTTP/1.0" | nc 10.0.2.2 1234 -HTTP/1.0 200 OK -Server: SimpleHTTP/0.6 Python/3.9.7 -Date: Sat, 04 Dec 2021 16:58:56 GMT -Content-type: text/html; charset=utf-8 -Content-Length: 917 - - - - - -Directory listing for / - - -

Directory listing for /

-
- -
- - -``` - -## Appendix: Workspace - -To re-produce the setup and play around with it just grab a copy of the -following files: -- [Dockerfile][dockerfile] -- [Makefile][makefile] -- [build_initramfs.sh][build-initramfs] -- [build_kernel.sh][build-kernel] -- [build_ext2.sh][build-ext2] - -Then run the following steps to build everything. The prefix `[H]` and `[C]` -indicate whether this command is run on the host or inside the container -respectively. -```sh -# To see all the make targets. -[H] make help - -# Build docker image, start a container with the current working dir -# mounted. On the first invocation this takes some minutes to build -# the image. -[H]: make docker - -# Build kernel and initramfs. -[C]: make - -# Build ext2 fs as virtio blkdev backend. -[H]: make ext2 - -# Start qemu guest. -[H]: make run -``` - -[build-initramfs]: https://git.memzero.de/blog/tree/content/2021-12-02-toying-with-virtio/build_initramfs.sh?h=main -[build-kernel]: https://git.memzero.de/blog/tree/content/2021-12-02-toying-with-virtio/build_kernel.sh?h=main -[build-ext2]: https://git.memzero.de/blog/tree/content/2021-12-02-toying-with-virtio/build_ext2.sh?h=main -[makefile]: https://git.memzero.de/blog/tree/content/2021-12-02-toying-with-virtio/Makefile?h=main -[dockerfile]: https://git.memzero.de/blog/tree/content/2021-12-02-toying-with-virtio/Dockerfile?h=main -[initramfs]: https://www.kernel.org/doc/Documentation/filesystems/ramfs-rootfs-initramfs.txt -[virtio]: http://docs.oasis-open.org/virtio/virtio/v1.1/virtio-v1.1.pdf -[qemu-nic-opts]: https://www.qemu.org/docs/master/system/invocation.html#hxtool-5 -[qemu-user-net]: https://www.qemu.org/docs/master/system/devices/net.html#using-the-user-mode-network-stack -[libslirp]: https://gitlab.com/qemu-project/libslirp -[osi-2]: https://osi-model.com/data-link-layer -[osi-4]: https://osi-model.com/transport-layer diff --git a/content/2021-12-02-toying-with-virtio/index.md b/content/2021-12-02-toying-with-virtio/index.md new file mode 100644 index 0000000..c2ff031 --- /dev/null +++ b/content/2021-12-02-toying-with-virtio/index.md @@ -0,0 +1,373 @@ ++++ +title = "QEMU virtio configurations" + +[taxonomies] +tags = ["linux", "qemu", "virtio"] ++++ + +For my own reference I wanted to document some minimal [`virtio`][virtio] +device configurations with qemu and the required Linux kernel configuration to +enable those devices. + +The devices we will use are `virtio console`, `virtio blk` and `virtio net`. + +To make use of the virtio devices in qemu we are going to build and boot into +busybox based [`initramfs`][initramfs]. + +## Build initramfs + +For the initramfs there is not much magic, we will grab a copy of busybox, +configure it with the default config (`defconfig`) and enable static linking as +we will use it as rootfs. + +For the `init` process we will use the one provided by busybox but we have to +symlink it to `/init` as during boot, the kernel will extract the cpio +compressed initramfs into `rootfs` and look for the `/init` file. If that's not +found the kernel will fallback to an older mechanism an try to mount a root +partition (which we don't have). +> Optionally the init binary could be specified with the `rdinit=` kernel boot +> parameter. + +We populate the `/etc/inittab` and `/etc/init.d/rcS` with a minimal +configuration to mount the `proc`, `sys` and `dev` filesystems and drop into a +shell after the boot is completed. \ +Additionally we setup `/etc/passwd` and `/etc/shadow` with an entry for the +`root` user with the password `1234`, so we can login via the virtio console +later. + +```sh +{{ include_range(path="content/2021-12-02-toying-with-virtio/build_initramfs.sh", start=31, end=67) }} +``` + +The full build script is available under [build_initramfs.sh][build-initramfs]. + +## Virtio console + +To enable support for the virtio console we enable the kernel configs shown +below. +The pci configurations are enabled because in qemu the virtio console front-end +device (the one presented to the guest) is attached to the pci bus. + +```sh +{{ include_range(path="content/2021-12-02-toying-with-virtio/build_kernel.sh", start=32, end=38) }} +``` + +The full build script is available under [build_kernel.sh][build-kernel]. + +To boot-up the guest we use the following qemu configuration. + +```sh +qemu-system-x86_64 \ + -nographic \ + -cpu host \ + -enable-kvm \ + -kernel ./linux-$(VER)/arch/x86/boot/bzImage \ + -append "earlyprintk=ttyS0 console=ttyS0 root=/dev/ram0 ro" \ + -initrd ./initramfs.cpio.gz \ + -device virtio-serial-pci \ + -device virtconsole,chardev=vcon,name=console.0 \ + -chardev socket,id=vcon,ipv4=on,host=localhost,port=2222,server,telnet=on,wait=off +``` + +The important parts in this configuration are the last three lines. + +The `virtio-serial-pci` device creates the serial bus where the virtio console +is attached to. + +The `virtconsole` creates the virtio console device exposed to the guest +(front-end). The `chardev=vcon` option specifies that the chardev with +`id=vcon` is attached as back-end to the virtio console. +The back-end device is the one we will have access to from the host running the +emulation. + +The chardev back-end we configure to be a `socket`, running a telnet server +listening on port 2222. The `wait=off` tells qemu that it can directly boot +without waiting for a client connection. + +After booting the guest we are dropped into a shell and can verify that our +device is being detected properly. +```sh +root@virtio-box ~ # ls /sys/bus/virtio/devices/ +virtio0 +root@virtio-box ~ # cat /sys/bus/virtio/devices/virtio0/virtio-ports/vport0p0/name +console.0 +``` + +In `/etc/inittab`, we already configured to spawn `getty` on the first +hypervisor console `/dev/hvc0`. This will effectively run `login(1)` over the +serial console. + +From the host we can run `telnet localhost 2222` and are presented with a login shell to the guest. + +As we already included to launch `getty` on the first hypervisor console +`/dev/hvc0` in `/etc/inittab`, we can directly connect to the back-end chardev +and login to the guest with `root:1234`. + +```sh +> telnet -4 localhost 2222 +Trying 127.0.0.1... +Connected to localhost. +Escape character is '^]'. + +virtio-box login: root +Password: +root@virtio-box ~ # +``` + +## Virtio blk + +To enable support for the virtio block device we enable the kernel configs +shown below. +First we enable general support for block devices and then for virtio block +devices. Additionally we enable support for the `ext2` filesystem because we +are creating an ext2 filesystem to back the virtio block device. + +```sh +{{ include_range(path="content/2021-12-02-toying-with-virtio/build_kernel.sh", start=40, end=47) }} +``` + +The full build script is available under [build_kernel.sh][build-kernel]. + +Next we are creating the ext2 filesystem image. This we'll do by creating an +`128M` blob and format it with ext2 afterwards. Then we can mount the image +via a `loop` device and populate the filesystem. +```sh +{{ include_range(path="content/2021-12-02-toying-with-virtio/build_ext2.sh", start=3, end=7) }} +``` + +Before booting the guest we will attach the virtio block device to the VM. +Therefore we add the `-drive` configuration to our previous qemu invocation. + +```sh +qemu-system-x86_64 \ + ... + -drive if=virtio,file=fs.ext2,format=raw +``` + +The `-drive` option is a shortcut for a `-device (front-end) / -blockdev +(back-end)` pair. + +The `if=virtio` flag specifies the interface of the front-end device to be +`virtio`. + +The `file` and `format` flags configure the back-end to be a disk image. + +After booting the guest we are dropped into a shell and can verify a few +things. First we check if the virtio block device is detected, then we check if +we have support for the ext2 filesystem and finally we mount the disk. + +```sh +root@virtio-box ~ # ls -l /sys/block/ +lrwxrwxrwx 1 root 0 0 Dec 3 22:46 vda -> ../devices/pci0000:00/0000:00:05.0/virtio1/block/vda + +root@virtio-box ~ # cat /proc/filesystems +... + ext2 + +root@virtio-box ~ # mount -t ext2 /dev/vda /mnt +EXT2-fs (vda): warning: mounting unchecked fs, running e2fsck is recommended +ext2 filesystem being mounted at /mnt supports timestamps until 2038 (0x7fffffff) + +root@virtio-box ~ # cat /mnt/hello +world +``` + +## Virtio net + +To enable support for the virtio network device we enable the kernel configs +shown below. +First we enable general support for networking and TCP/IP and then enable the +core networking driver and the virtio net driver. + +```sh +{{ include_range(path="content/2021-12-02-toying-with-virtio/build_kernel.sh", start=49, end=62) }} +``` + +The full build script is available under [build_kernel.sh][build-kernel]. + +For the qemu device emulation we already decided on the front-end device, which +will be our virtio net device. \ +On the back-end we will choose the [`user`][qemu-user-net] option. This enables +a network stack implemented in userspace based on [libslirp][libslirp], which +has the benefit that we do not need to setup additional network interfaces and +therefore require any privileges. Fundamentally, [libslirp][libslirp] works by +replaying [Layer 2][osi-2] packets received from the guest NIC via the socket +API on the host ([Layer 4][osi-4]) and vice versa. User networking comes with a +set of limitations, for example +- Can not use `ping` inside the guest as `ICMP` is not supported. +- The guest is not accessible from the host. + +With the guest, qemu and the host in the picture this looks something like the +following. +``` ++--------------------------------------------+ +| host | +| +-------------------------+ | +| | guest | | +| | | | +| | user | | +| +------+------+-----------+ | +| | | eth0 | kernel | | +| | +--+---+ | | +| | | | | +| | +-----v--------+ | | +| | | nic (virtio) | | | +| +--+---+-----+--------+------+--+ | +| | | Layer 2 qemu | | +| | | (eth frames) | | +| | +----v-----+ | | +| | | libslirp | | | +| | +----+-----+ | | +| | | Layer 4 | | +| | | (socket API) | user | ++--+---------+--v---+--------------+---------+ +| | eth0 | kernel | +| +------+ | ++--------------------------------------------+ +``` + +The user networking implements a virtually NAT'ed sub-network with the address +range `10.0.2.0/24` running an internal dhcp server. By default, the dhcp +server assigns the following IP addresses which are interesting to us: +- `10.0.2.2` host running the qemu emulation +- `10.0.2.3` virtual DNS server +> The netdev options `net=addr/mask`, `host=addr`, `dns=addr` can be used to +> re-configure the sub-network (see [network options][qemu-nic-opts]). + +With the details of the sub-network in mind we can add some additional setup to +the initramfs which performs the basic network setup. + +We add the virtual DNS server to `/etc/resolv.conf` which will be used by the +libc resolver functions. + +Additionally we assign a static ip to the `eth0` network interface, bring the +interface up and define the default route via the host `10.0.2.2`. + +```sh +{{ include_range(path="content/2021-12-02-toying-with-virtio/build_initramfs.sh", start=69, end=85) }} +``` + +The full build script is available under [build_initramfs.sh][build-initramfs]. + +Before booting the guest we will attach the virtio net device and configure to +use the user network stack. +Therefore we add the `-nic` configuration to our previous qemu invocation. + +```sh +qemu-system-x86_64 \ + ... + -nic user,model=virtio-net-pci +``` + +The `-nic` option is a shortcut for a `-device (front-end) / -netdev +(back-end)` pair. + +After booting the guest we are dropped into a shell and can verify a few +things. First we check if the virtio net device is detected. Then we check if +the interface got configured and brought up correctly. + +```sh +root@virtio-box ~ # ls -l /sys/class/net/ +lrwxrwxrwx 1 root 0 0 Dec 4 16:56 eth0 -> ../../devices/pci0000:00/0000:00:03.0/virtio0/net/eth0 +lrwxrwxrwx 1 root 0 0 Dec 4 16:56 lo -> ../../devices/virtual/net/lo + + +root@virtio-box ~ # ip -o a +2: eth0 inet 10.0.2.15/24 scope global eth0 ... + +root@virtio-box ~ # ip route +default via 10.0.2.2 dev eth0 +10.0.2.0/24 dev eth0 scope link src 10.0.2.15 +``` + +We can resolve out domain and see that the virtual DNS gets contacted. + +```sh +root@virtio-box ~ # nslookup memzero.de +Server: 10.0.2.3 +Address: 10.0.2.3:53 + +Non-authoritative answer: +Name: memzero.de +Address: 46.101.148.203 +``` + +Additionally we can try to access a service running on the host. Therefore we +run a simple http server on the host (where we launched qemu) with the +following command `python3 -m http.server --bind 0.0.0.0 1234`. This will +launch the server to listen for any incoming address at port `1234`. + +From within the guest we can manually craft a simple http `GET` request and +send it to the http server running on the host. For that we use the IP address +`10.0.2.2` which the dhcp assigned to our host. + +```sh +root@virtio-box ~ # echo "GET / HTTP/1.0" | nc 10.0.2.2 1234 +HTTP/1.0 200 OK +Server: SimpleHTTP/0.6 Python/3.9.7 +Date: Sat, 04 Dec 2021 16:58:56 GMT +Content-type: text/html; charset=utf-8 +Content-Length: 917 + + + + + +Directory listing for / + + +

Directory listing for /

+
+ +
+ + +``` + +## Appendix: Workspace + +To re-produce the setup and play around with it just grab a copy of the +following files: +- [Dockerfile][dockerfile] +- [Makefile][makefile] +- [build_initramfs.sh][build-initramfs] +- [build_kernel.sh][build-kernel] +- [build_ext2.sh][build-ext2] + +Then run the following steps to build everything. The prefix `[H]` and `[C]` +indicate whether this command is run on the host or inside the container +respectively. +```sh +# To see all the make targets. +[H] make help + +# Build docker image, start a container with the current working dir +# mounted. On the first invocation this takes some minutes to build +# the image. +[H]: make docker + +# Build kernel and initramfs. +[C]: make + +# Build ext2 fs as virtio blkdev backend. +[H]: make ext2 + +# Start qemu guest. +[H]: make run +``` + +[build-initramfs]: https://git.memzero.de/blog/tree/content/2021-12-02-toying-with-virtio/build_initramfs.sh?h=main +[build-kernel]: https://git.memzero.de/blog/tree/content/2021-12-02-toying-with-virtio/build_kernel.sh?h=main +[build-ext2]: https://git.memzero.de/blog/tree/content/2021-12-02-toying-with-virtio/build_ext2.sh?h=main +[makefile]: https://git.memzero.de/blog/tree/content/2021-12-02-toying-with-virtio/Makefile?h=main +[dockerfile]: https://git.memzero.de/blog/tree/content/2021-12-02-toying-with-virtio/Dockerfile?h=main +[initramfs]: https://www.kernel.org/doc/Documentation/filesystems/ramfs-rootfs-initramfs.txt +[virtio]: http://docs.oasis-open.org/virtio/virtio/v1.1/virtio-v1.1.pdf +[qemu-nic-opts]: https://www.qemu.org/docs/master/system/invocation.html#hxtool-5 +[qemu-user-net]: https://www.qemu.org/docs/master/system/devices/net.html#using-the-user-mode-network-stack +[libslirp]: https://gitlab.com/qemu-project/libslirp +[osi-2]: https://osi-model.com/data-link-layer +[osi-4]: https://osi-model.com/transport-layer diff --git a/content/2022-06-18-libclang-c-to-llvm-ir.md b/content/2022-06-18-libclang-c-to-llvm-ir.md deleted file mode 100644 index 7d3ee63..0000000 --- a/content/2022-06-18-libclang-c-to-llvm-ir.md +++ /dev/null @@ -1,32 +0,0 @@ -+++ -title = "C to LLVM IR in memory using libclang" - -[taxonomies] -tags = ["llvm", "clang", "c++"] -+++ - -For some experiments with the LLVM just in time (JIT) APIs, I was looking for a -way to compile in memory from `C -> LLVM IR` and without invoking Clang as a -child process. - -I created a minimal example for my purpose based on the [Clang -source][src-clang] code and the example given in the blog post [Compiling C++ -Code In Memory With Clang][blog-clang-in-memory]. - -The code listing below shows the example with detailed comments inlined, hence -I am not further describing any details here. - -> The example was build & tested with LLVM & Clang 13. - -```cpp -{{ include(path="content/2022-06-18-libclang-c-to-llvm-ir/gen-ir.cc") }} -``` - -The following Makefile can be used to compile and run the example. - -```make -{{ include(path="content/2022-06-18-libclang-c-to-llvm-ir/Makefile") }} -``` - -[src-clang]: https://github.com/llvm/llvm-project/tree/main/clang -[blog-clang-in-memory]: https://blog.audio-tk.com/2018/09/18/compiling-c-code-in-memory-with-clang/ diff --git a/content/2022-06-18-libclang-c-to-llvm-ir/index.md b/content/2022-06-18-libclang-c-to-llvm-ir/index.md new file mode 100644 index 0000000..7d3ee63 --- /dev/null +++ b/content/2022-06-18-libclang-c-to-llvm-ir/index.md @@ -0,0 +1,32 @@ ++++ +title = "C to LLVM IR in memory using libclang" + +[taxonomies] +tags = ["llvm", "clang", "c++"] ++++ + +For some experiments with the LLVM just in time (JIT) APIs, I was looking for a +way to compile in memory from `C -> LLVM IR` and without invoking Clang as a +child process. + +I created a minimal example for my purpose based on the [Clang +source][src-clang] code and the example given in the blog post [Compiling C++ +Code In Memory With Clang][blog-clang-in-memory]. + +The code listing below shows the example with detailed comments inlined, hence +I am not further describing any details here. + +> The example was build & tested with LLVM & Clang 13. + +```cpp +{{ include(path="content/2022-06-18-libclang-c-to-llvm-ir/gen-ir.cc") }} +``` + +The following Makefile can be used to compile and run the example. + +```make +{{ include(path="content/2022-06-18-libclang-c-to-llvm-ir/Makefile") }} +``` + +[src-clang]: https://github.com/llvm/llvm-project/tree/main/clang +[blog-clang-in-memory]: https://blog.audio-tk.com/2018/09/18/compiling-c-code-in-memory-with-clang/ diff --git a/content/2022-07-07-llvm-orc-jit.md b/content/2022-07-07-llvm-orc-jit.md deleted file mode 100644 index 6868518..0000000 --- a/content/2022-07-07-llvm-orc-jit.md +++ /dev/null @@ -1,42 +0,0 @@ -+++ -title = "Jit C in memory using LLVM ORC api" - -[taxonomies] -tags = ["llvm", "clang", "c++"] -+++ - -Based on the in-memory compiler shared in the last post ([C to LLVM IR in -memory using libclang](@/2022-06-18-libclang-c-to-llvm-ir.md)), this post -demonstrates a small *just in time (JIT)* compiler which allows to compile C -code to host native code in-memory. - -The JIT compiler is based on the LLVM [ORCv2 API][llvm-orc2] (the newest LLVM -JIT API at the time of writing) and the crucial parts are taken from the [JIT -tutorial][llvm-jit-tut]. - -The sources are available under [llvm-orc-jit][post-src]. - -### main.cc -```cpp -{{ include(path="content/2022-07-07-llvm-orc-jit/main.cc") }} -``` - -### jit.h -```cpp -{{ include(path="content/2022-07-07-llvm-orc-jit/jit.h") }} -``` - -### compiler.h -```cpp -{{ include(path="content/2022-07-07-llvm-orc-jit/ccompiler.h") }} -``` - -### Makefile -```make -{{ include(path="content/2022-07-07-llvm-orc-jit/Makefile") }} -``` -[post-src]: https://git.memzero.de/blog/tree/content/2022-07-07-llvm-orc-jit?h=main -[src-clang]: https://github.com/llvm/llvm-project/tree/main/clang -[blog-clang-in-memory]: https://blog.audio-tk.com/2018/09/18/compiling-c-code-in-memory-with-clang/ -[llvm-jit-tut]: https://www.llvm.org/docs/tutorial/BuildingAJIT1.html -[llvm-orc2]: https://www.llvm.org/docs/ORCv2.html diff --git a/content/2022-07-07-llvm-orc-jit/index.md b/content/2022-07-07-llvm-orc-jit/index.md new file mode 100644 index 0000000..4b2add0 --- /dev/null +++ b/content/2022-07-07-llvm-orc-jit/index.md @@ -0,0 +1,42 @@ ++++ +title = "Jit C in memory using LLVM ORC api" + +[taxonomies] +tags = ["llvm", "clang", "c++"] ++++ + +Based on the in-memory compiler shared in the last post ([C to LLVM IR in +memory using libclang](@/2022-06-18-libclang-c-to-llvm-ir/index.md)), this post +demonstrates a small *just in time (JIT)* compiler which allows to compile C +code to host native code in-memory. + +The JIT compiler is based on the LLVM [ORCv2 API][llvm-orc2] (the newest LLVM +JIT API at the time of writing) and the crucial parts are taken from the [JIT +tutorial][llvm-jit-tut]. + +The sources are available under [llvm-orc-jit][post-src]. + +### main.cc +```cpp +{{ include(path="content/2022-07-07-llvm-orc-jit/main.cc") }} +``` + +### jit.h +```cpp +{{ include(path="content/2022-07-07-llvm-orc-jit/jit.h") }} +``` + +### compiler.h +```cpp +{{ include(path="content/2022-07-07-llvm-orc-jit/ccompiler.h") }} +``` + +### Makefile +```make +{{ include(path="content/2022-07-07-llvm-orc-jit/Makefile") }} +``` +[post-src]: https://git.memzero.de/blog/tree/content/2022-07-07-llvm-orc-jit?h=main +[src-clang]: https://github.com/llvm/llvm-project/tree/main/clang +[blog-clang-in-memory]: https://blog.audio-tk.com/2018/09/18/compiling-c-code-in-memory-with-clang/ +[llvm-jit-tut]: https://www.llvm.org/docs/tutorial/BuildingAJIT1.html +[llvm-orc2]: https://www.llvm.org/docs/ORCv2.html -- cgit v1.2.3