content/2019-10-27-kernel-debugging-qemu/index.md


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240

+++
title = "Linux Kernel debugging with QEMU"

[taxonomies]
tags = ["linux", "qemu"]
+++

**EDIT**:
- 2021-07-15: Added `Appendix: Dockerfile for Kernel development` and updated
  busybox + Kernel versions.
- 2023-11-23: Fix ramdisk vs ramfs ([ref][ramfs-vs-ramdisk]), and use
  `devtmpfs` and updated busybox + Kernel versions.

The other evening while starring at some Linux kernel code I thought, let me
setup a minimal environment so I can easily step through the code and examine
the state.

I ended up creating:
- a [Linux kernel][linux-kernel] with minimal configuration
- a minimal [initramfs][initramfs] to boot into which is based on [busybox][busybox]

In the remaing part of this article we will go through each step by first
building the kernel, then building the initrd and then running the kernel using
[QEMU][qemu] and debugging it with [GDB][gdb].

## $> make kernel

Before building the kernel we first need to generate a configuration. As a
starting point we generate a minimal config with the `make tinyconfig` make
target. Running this command will generate a `.config` file. After generating
the initial config file we customize the kernel using the merge fragment flow.
This allows us to merge a fragment file into the current configuration by
running the `scripts/kconfig/merge_config.sh` script.

Let's quickly go over some customizations we do.
The following two lines enable support for gzipped initramfs:
```config
CONFIG_BLK_DEV_INITRD=y
CONFIG_RD_GZIP=y
```
The next two configurations are important as they enable the binary loaders for
[ELF][binfmt-elf] and [script #!][binfmt-script] files.
```config
CONFIG_BINFMT_ELF=y
CONFIG_BINFMT_SCRIPT=y
```

> Note: In the cursed based configuration `make menuconfig` we can search for
> configurations using the `/` key and then select a match using the number keys.
> After selecting a match we can check the `Help` to get a description for the
> configuration parameter.

Building the kernel with the default make target will give us the following two
files:
- `vmlinux` statically linked kernel (ELF file) containing symbol information for debugging
- `arch/x86_64/boot/bzImage` compressed kernel image for booting

Full configure & build script:
```sh
{{ include(path="content/2019-10-27-kernel-debugging-qemu/build_kernel.sh") }}
```

## $> make initrd

Next step is to build the initrd which we base on [busybox][busybox]. Therefore
we first build the busybox project in its default configuration with one
change, we enable following configuration to build a static binary so it can be
used stand-alone:
```sh
sed -i 's/# CONFIG_STATIC .*/CONFIG_STATIC=y/' .config
```

One important step before creating the final initrd is to create an init
process. This will be the first process executed in userspace after the kernel
finished its initialization. We just create a script that drops us into a
shell:
```sh
cat <<EOF > init
#!/bin/sh

mount -t proc none /proc
mount -t sysfs none /sys
mount -t devtmpfs none /dev

exec setsid cttyhack sh
EOF
```
> By default the kernel looks for `/sbin/init` in the root file system, but the
> location can optionally be specified with the [`init=`][kernel-param] kernel
> parameter.

Full busybox & initrd build script:
```sh
{{ include(path="content/2019-10-27-kernel-debugging-qemu/build_initrd.sh") }}
```

## Running QEMU && GDB

After finishing the previous steps we have all we need to run and debug the
kernel. We have `arch/x86/boot/bzImage` and `initramfs.cpio.gz` to boot the
kernel into a shell and we have `vmlinux` to feed the debugger with debug
symbols.

We start QEMU as follows, thanks to the `-S` flag the CPU will freeze until we
connected the debugger:
```sh
# -S    freeze CPU until debugger connected
> qemu-system-x86_64                                                 \
  -kernel ./linux-5.3.7/arch/x86/boot/bzImage                        \
  -nographic                                                         \
  -append "earlyprintk=ttyS0 console=ttyS0 nokaslr init=/init debug" \
  -initrd ./initramfs.cpio.gz                                        \
  -gdb tcp::1234                                                     \
  -S
```

Then we can start GDB and connect to the GDB server running in QEMU (configured
via `-gdb tcp::1234`). From now on we can start to debug through the
kernel.
```sh
> gdb linux-5.3.7/vmlinux -ex 'target remote :1234'
(gdb) b do_execve
Breakpoint 1 at 0xffffffff810a1a60: file fs/exec.c, line 1885.
(gdb) c
Breakpoint 1, do_execve (filename=0xffff888000060000, __argv=0xffffffff8181e160 <argv_init>, __envp=0xffffffff8181e040 <envp_init>) at fs/exec.c:1885
1885          return do_execveat_common(AT_FDCWD, filename, argv, envp, 0);
(gdb) bt
#0  do_execve (filename=0xffff888000060000, __argv=0xffffffff8181e160 <argv_init>, __envp=0xffffffff8181e040 <envp_init>) at fs/exec.c:1885
#1  0xffffffff81000498 in run_init_process (init_filename=<optimized out>) at init/main.c:1048
#2  0xffffffff81116b75 in kernel_init (unused=<optimized out>) at init/main.c:1129
#3  0xffffffff8120014f in ret_from_fork () at arch/x86/entry/entry_64.S:352
#4  0x0000000000000000 in ?? ()
(gdb)
```

---

## Appendix: Try to get around `<optimized out>`

When debugging the kernel we often face following situation in gdb:
```
(gdb) frame
#0  do_execveat_common (fd=fd@entry=-100, filename=0xffff888000120000, argv=argv@entry=..., envp=envp@entry=..., flags=flags@entry=0) at fs/exec.c

(gdb) info args
fd = <optimized out>
filename = 0xffff888000060000
argv = <optimized out>
envp = <optimized out>
flags = <optimized out>
file = 0x0
```
The problem is that the Linux kernel requires certain code to be compiled with
optimizations enabled.

In this situation we can "try" to reduce the optimization for single compilation
units or a subtree (try because, reducing the optimization could break the
build). To do so we adapt the Makefile in the corresponding directory.
```make
# fs/Makefile

# configure for single compilation unit
CFLAGS_exec.o := -Og

# configure for the whole subtree of where the Makefile resides
ccflags-y := -Og
```

After enabling optimize for debug experience `-Og` we can see the following now
in gdb:
```
(gdb) frame
#0  do_execveat_common (fd=fd@entry=-100, filename=0xffff888000120000, argv=argv@entry=..., envp=envp@entry=..., flags=flags@entry=0) at fs/exec.c

(gdb) info args
fd = -100
filename = 0xffff888000120000
argv = {ptr = {native = 0x10c5980}}
envp = {ptr = {native = 0x10c5990}}
flags = 0

(gdb) p *filename
$3 = {name = 0xffff888000120020 "/bin/ls", uptr = 0x10c59b8 "/bin/ls", refcnt = 1, aname = 0x0, iname = 0xffff888000120020 "/bin/ls"}

(gdb) ptype filename
type = struct filename {
    const char *name;
    const char *uptr;
    int refcnt;
    struct audit_names *aname;
    const char iname[];
}
```

## Appendix: `Dockerfile` for Kernel development

The following `Dockerfile` provides a development environment with all the
required tools and dependencies, to re-produce all the steps of building and
debugging the Linux kernel.
```dockerfile
{{ include(path="content/2019-10-27-kernel-debugging-qemu/Dockerfile") }}
```

Save the listing above in a file called `Dockerfile` and build the docker image
as follows.
```sh
docker build -t kernel-dev
```
> Optionally set `DOCKER_BUILDKIT=1` to use the newer image builder.

Once the image has been built, an interactive container can be launched as
follows.
```sh
# Some options for conveniene:
#   -v <HOST>:<GUEST>     Mount host path to guest path.
#   --rm                  Remove the container after exiting.

docker run -it kernel-dev
```
> Alternatively use podman.

## Appendix: Screencast of an example debug session

The screencast gives an example, debugging the Linux kernel using the above
mentioned Dockerfile.

<video width="100%" height="auto" controls>
    <source src="demo.mp4" type="video/mp4">
</video>

[linux-kernel]: https://www.kernel.org
[initrd]: https://www.kernel.org/doc/html/latest/admin-guide/initrd.html
[initramfs]: https://www.kernel.org/doc/html/latest/filesystems/ramfs-rootfs-initramfs.html
[ramfs-vs-ramdisk]: https://www.kernel.org/doc/html/latest/filesystems/ramfs-rootfs-initramfs.html#ramfs-and-ramdisk
[busybox]: https://busybox.net
[qemu]: https://www.qemu.org
[gdb]: https://www.gnu.org/software/gdb
[binfmt-elf]: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/fs/binfmt_elf.c
[binfmt-script]: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/fs/binfmt_script.c
[kernel-param]: https://www.kernel.org/doc/html/latest/admin-guide/kernel-parameters.html