1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
|
+++
title = "QEMU virtio configurations"
[taxonomies]
tags = ["linux", "qemu", "virtio"]
+++
For my own reference I wanted to document some minimal [`virtio`][virtio]
device configurations with qemu and the required Linux kernel configuration to
enable those devices.
The devices we will use are `virtio console`, `virtio blk` and `virtio net`.
To make use of the virtio devices in qemu we are going to build and boot into
busybox based [`initramfs`][initramfs].
## Build initramfs
For the initramfs there is not much magic, we will grab a copy of busybox,
configure it with the default config (`defconfig`) and enable static linking as
we will use it as rootfs.
For the `init` process we will use the one provided by busybox but we have to
symlink it to `/init` as during boot, the kernel will extract the cpio
compressed initramfs into `rootfs` and look for the `/init` file. If that's not
found the kernel will fallback to an older mechanism an try to mount a root
partition (which we don't have).
> Optionally the init binary could be specified with the `rdinit=` kernel boot
> parameter.
We populate the `/etc/inittab` and `/etc/init.d/rcS` with a minimal
configuration to mount the `proc`, `sys` and `dev` filesystems and drop into a
shell after the boot is completed. \
Additionally we setup `/etc/passwd` and `/etc/shadow` with an entry for the
`root` user with the password `1234`, so we can login via the virtio console
later.
```sh,hide_lines=1-30 68-1000
{{ include(path="content/2021-12-02-toying-with-virtio/build_initramfs.sh") }}
```
The full build script is available under [build_initramfs.sh][build-initramfs].
## Virtio console
To enable support for the virtio console we enable the kernel configs shown
below.
The pci configurations are enabled because in qemu the virtio console front-end
device (the one presented to the guest) is attached to the pci bus.
```sh,hide_lines=1-31 39-1000
{{ include(path="content/2021-12-02-toying-with-virtio/build_kernel.sh") }}
```
The full build script is available under [build_kernel.sh][build-kernel].
To boot-up the guest we use the following qemu configuration.
```sh
qemu-system-x86_64 \
-nographic \
-cpu host \
-enable-kvm \
-kernel ./linux-$(VER)/arch/x86/boot/bzImage \
-append "earlyprintk=ttyS0 console=ttyS0 root=/dev/ram0 ro" \
-initrd ./initramfs.cpio.gz \
-device virtio-serial-pci \
-device virtconsole,chardev=vcon,name=console.0 \
-chardev socket,id=vcon,ipv4=on,host=localhost,port=2222,server,telnet=on,wait=off
```
The important parts in this configuration are the last three lines.
The `virtio-serial-pci` device creates the serial bus where the virtio console
is attached to.
The `virtconsole` creates the virtio console device exposed to the guest
(front-end). The `chardev=vcon` option specifies that the chardev with
`id=vcon` is attached as back-end to the virtio console.
The back-end device is the one we will have access to from the host running the
emulation.
The chardev back-end we configure to be a `socket`, running a telnet server
listening on port 2222. The `wait=off` tells qemu that it can directly boot
without waiting for a client connection.
After booting the guest we are dropped into a shell and can verify that our
device is being detected properly.
```sh
root@virtio-box ~ # ls /sys/bus/virtio/devices/
virtio0
root@virtio-box ~ # cat /sys/bus/virtio/devices/virtio0/virtio-ports/vport0p0/name
console.0
```
In `/etc/inittab`, we already configured to spawn `getty` on the first
hypervisor console `/dev/hvc0`. This will effectively run `login(1)` over the
serial console.
From the host we can run `telnet localhost 2222` and are presented with a login shell to the guest.
As we already included to launch `getty` on the first hypervisor console
`/dev/hvc0` in `/etc/inittab`, we can directly connect to the back-end chardev
and login to the guest with `root:1234`.
```sh
> telnet -4 localhost 2222
Trying 127.0.0.1...
Connected to localhost.
Escape character is '^]'.
virtio-box login: root
Password:
root@virtio-box ~ #
```
## Virtio blk
To enable support for the virtio block device we enable the kernel configs
shown below.
First we enable general support for block devices and then for virtio block
devices. Additionally we enable support for the `ext2` filesystem because we
are creating an ext2 filesystem to back the virtio block device.
```sh,hide_lines=1-39 48-1000
{{ include(path="content/2021-12-02-toying-with-virtio/build_kernel.sh") }}
```
The full build script is available under [build_kernel.sh][build-kernel].
Next we are creating the ext2 filesystem image. This we'll do by creating an
`128M` blob and format it with ext2 afterwards. Then we can mount the image
via a `loop` device and populate the filesystem.
```sh,hide_lines=1-2 8-1000
{{ include(path="content/2021-12-02-toying-with-virtio/build_ext2.sh") }}
```
Before booting the guest we will attach the virtio block device to the VM.
Therefore we add the `-drive` configuration to our previous qemu invocation.
```sh
qemu-system-x86_64 \
...
-drive if=virtio,file=fs.ext2,format=raw
```
The `-drive` option is a shortcut for a `-device (front-end) / -blockdev
(back-end)` pair.
The `if=virtio` flag specifies the interface of the front-end device to be
`virtio`.
The `file` and `format` flags configure the back-end to be a disk image.
After booting the guest we are dropped into a shell and can verify a few
things. First we check if the virtio block device is detected, then we check if
we have support for the ext2 filesystem and finally we mount the disk.
```sh
root@virtio-box ~ # ls -l /sys/block/
lrwxrwxrwx 1 root 0 0 Dec 3 22:46 vda -> ../devices/pci0000:00/0000:00:05.0/virtio1/block/vda
root@virtio-box ~ # cat /proc/filesystems
...
ext2
root@virtio-box ~ # mount -t ext2 /dev/vda /mnt
EXT2-fs (vda): warning: mounting unchecked fs, running e2fsck is recommended
ext2 filesystem being mounted at /mnt supports timestamps until 2038 (0x7fffffff)
root@virtio-box ~ # cat /mnt/hello
world
```
## Virtio net
To enable support for the virtio network device we enable the kernel configs
shown below.
First we enable general support for networking and TCP/IP and then enable the
core networking driver and the virtio net driver.
```sh,hide_lines=1-48 63-1000
{{ include(path="content/2021-12-02-toying-with-virtio/build_kernel.sh") }}
```
The full build script is available under [build_kernel.sh][build-kernel].
For the qemu device emulation we already decided on the front-end device, which
will be our virtio net device. \
On the back-end we will choose the [`user`][qemu-user-net] option. This enables
a network stack implemented in userspace based on [libslirp][libslirp], which
has the benefit that we do not need to setup additional network interfaces and
therefore require any privileges. Fundamentally, [libslirp][libslirp] works by
replaying [Layer 2][osi-2] packets received from the guest NIC via the socket
API on the host ([Layer 4][osi-4]) and vice versa. User networking comes with a
set of limitations, for example
- Can not use `ping` inside the guest as `ICMP` is not supported.
- The guest is not accessible from the host.
With the guest, qemu and the host in the picture this looks something like the
following.
```
+--------------------------------------------+
| host |
| +-------------------------+ |
| | guest | |
| | | |
| | user | |
| +------+------+-----------+ |
| | | eth0 | kernel | |
| | +--+---+ | |
| | | | |
| | +-----v--------+ | |
| | | nic (virtio) | | |
| +--+---+-----+--------+------+--+ |
| | | Layer 2 qemu | |
| | | (eth frames) | |
| | +----v-----+ | |
| | | libslirp | | |
| | +----+-----+ | |
| | | Layer 4 | |
| | | (socket API) | user |
+--+---------+--v---+--------------+---------+
| | eth0 | kernel |
| +------+ |
+--------------------------------------------+
```
The user networking implements a virtually NAT'ed sub-network with the address
range `10.0.2.0/24` running an internal dhcp server. By default, the dhcp
server assigns the following IP addresses which are interesting to us:
- `10.0.2.2` host running the qemu emulation
- `10.0.2.3` virtual DNS server
> The netdev options `net=addr/mask`, `host=addr`, `dns=addr` can be used to
> re-configure the sub-network (see [network options][qemu-nic-opts]).
With the details of the sub-network in mind we can add some additional setup to
the initramfs which performs the basic network setup.
We add the virtual DNS server to `/etc/resolv.conf` which will be used by the
libc resolver functions.
Additionally we assign a static ip to the `eth0` network interface, bring the
interface up and define the default route via the host `10.0.2.2`.
```sh,hide_lines=1-68 86-1000
{{ include(path="content/2021-12-02-toying-with-virtio/build_initramfs.sh") }}
```
The full build script is available under [build_initramfs.sh][build-initramfs].
Before booting the guest we will attach the virtio net device and configure to
use the user network stack.
Therefore we add the `-nic` configuration to our previous qemu invocation.
```sh
qemu-system-x86_64 \
...
-nic user,model=virtio-net-pci
```
The `-nic` option is a shortcut for a `-device (front-end) / -netdev
(back-end)` pair.
After booting the guest we are dropped into a shell and can verify a few
things. First we check if the virtio net device is detected. Then we check if
the interface got configured and brought up correctly.
```sh
root@virtio-box ~ # ls -l /sys/class/net/
lrwxrwxrwx 1 root 0 0 Dec 4 16:56 eth0 -> ../../devices/pci0000:00/0000:00:03.0/virtio0/net/eth0
lrwxrwxrwx 1 root 0 0 Dec 4 16:56 lo -> ../../devices/virtual/net/lo
root@virtio-box ~ # ip -o a
2: eth0 inet 10.0.2.15/24 scope global eth0 ...
root@virtio-box ~ # ip route
default via 10.0.2.2 dev eth0
10.0.2.0/24 dev eth0 scope link src 10.0.2.15
```
We can resolve out domain and see that the virtual DNS gets contacted.
```sh
root@virtio-box ~ # nslookup memzero.de
Server: 10.0.2.3
Address: 10.0.2.3:53
Non-authoritative answer:
Name: memzero.de
Address: 46.101.148.203
```
Additionally we can try to access a service running on the host. Therefore we
run a simple http server on the host (where we launched qemu) with the
following command `python3 -m http.server --bind 0.0.0.0 1234`. This will
launch the server to listen for any incoming address at port `1234`.
From within the guest we can manually craft a simple http `GET` request and
send it to the http server running on the host. For that we use the IP address
`10.0.2.2` which the dhcp assigned to our host.
```sh
root@virtio-box ~ # echo "GET / HTTP/1.0" | nc 10.0.2.2 1234
HTTP/1.0 200 OK
Server: SimpleHTTP/0.6 Python/3.9.7
Date: Sat, 04 Dec 2021 16:58:56 GMT
Content-type: text/html; charset=utf-8
Content-Length: 917
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" "http://www.w3.org/TR/html4/strict.dtd">
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
<title>Directory listing for /</title>
</head>
<body>
<h1>Directory listing for /</h1>
<hr>
<ul>
<li><a href="build_initramfs.sh">build_initramfs.sh</a></li>
...
</ul>
<hr>
</body>
</html>
```
## Appendix: Workspace
To re-produce the setup and play around with it just grab a copy of the
following files:
- [Dockerfile][dockerfile]
- [Makefile][makefile]
- [build_initramfs.sh][build-initramfs]
- [build_kernel.sh][build-kernel]
- [build_ext2.sh][build-ext2]
Then run the following steps to build everything. The prefix `[H]` and `[C]`
indicate whether this command is run on the host or inside the container
respectively.
```sh
# To see all the make targets.
[H] make help
# Build docker image, start a container with the current working dir
# mounted. On the first invocation this takes some minutes to build
# the image.
[H]: make docker
# Build kernel and initramfs.
[C]: make
# Build ext2 fs as virtio blkdev backend.
[H]: make ext2
# Start qemu guest.
[H]: make run
```
[build-initramfs]: https://git.memzero.de/blog/tree/content/2021-12-02-toying-with-virtio/build_initramfs.sh?h=main
[build-kernel]: https://git.memzero.de/blog/tree/content/2021-12-02-toying-with-virtio/build_kernel.sh?h=main
[build-ext2]: https://git.memzero.de/blog/tree/content/2021-12-02-toying-with-virtio/build_ext2.sh?h=main
[makefile]: https://git.memzero.de/blog/tree/content/2021-12-02-toying-with-virtio/Makefile?h=main
[dockerfile]: https://git.memzero.de/blog/tree/content/2021-12-02-toying-with-virtio/Dockerfile?h=main
[initramfs]: https://www.kernel.org/doc/Documentation/filesystems/ramfs-rootfs-initramfs.txt
[virtio]: http://docs.oasis-open.org/virtio/virtio/v1.1/virtio-v1.1.pdf
[qemu-nic-opts]: https://www.qemu.org/docs/master/system/invocation.html#hxtool-5
[qemu-user-net]: https://www.qemu.org/docs/master/system/devices/net.html#using-the-user-mode-network-stack
[libslirp]: https://gitlab.com/qemu-project/libslirp
[osi-2]: https://osi-model.com/data-link-layer
[osi-4]: https://osi-model.com/transport-layer
|