aboutsummaryrefslogblamecommitdiffhomepage
path: root/content/2021-12-02-toying-with-virtio.md
blob: a46fb2ab374d7722cf533df8494961d732e0de43 (plain) (tree)
































































































































































































                                                                                                      









































                                                                               





























                                                                                                      


                                                                    








































































































                                                                                                                                


                                                    
+++
title = "QEMU virtio configurations"

[taxonomies]
tags = ["linux", "qemu", "virtio"]
+++

For my own reference I wanted to document some minimal [`virtio`][virtio]
device configurations with qemu and the required Linux kernel configuration to
enable those devices.

The devices we will use are `virtio console`, `virtio blk` and `virtio net`.

To make use of the virtio devices in qemu we are going to build and boot into
busybox based [`initramfs`][initramfs].

## Build initramfs

For the initramfs there is not much magic, we will grab a copy of busybox,
configure it with the default config (`defconfig`) and enable static linking as
we will use it as rootfs.

For the `init` process we will use the one provided by busybox but we have to
symlink it to `/init` as during boot, the kernel will extract the cpio
compressed initramfs into `rootfs` and look for the `/init` file. If that's not
found the kernel will fallback to an older mechanism an try to mount a root
partition (which we don't have).
> Optionally the init binary could be specified with the `rdinit=` kernel boot
> parameter.

We populate the `/etc/inittab` and `/etc/init.d/rcS` with a minimal
configuration to mount the `proc`, `sys` and `dev` filesystems and drop into a
shell after the boot is completed. \
Additionally we setup `/etc/passwd` and `/etc/shadow` with an entry for the
`root` user with the password `1234`, so we can login via the virtio console
later.

```sh
{{ include_range(path="content/2021-12-02-toying-with-virtio/build_initramfs.sh", start=31, end=67) }}
```

The full build script is available under [build_initramfs.sh][build-initramfs].

## Virtio console

To enable support for the virtio console we enable the kernel configs shown
below.
The pci configurations are enabled because in qemu the virtio console front-end
device (the one presented to the guest) is attached to the pci bus.

```sh
{{ include_range(path="content/2021-12-02-toying-with-virtio/build_kernel.sh", start=32, end=38) }}
```

The full build script is available under [build_kernel.sh][build-kernel].

To boot-up the guest we use the following qemu configuration.

```sh
qemu-system-x86_64                                            \
  -nographic                                                  \
  -cpu host                                                   \
  -enable-kvm                                                 \
  -kernel ./linux-$(VER)/arch/x86/boot/bzImage                \
  -append "earlyprintk=ttyS0 console=ttyS0 root=/dev/ram0 ro" \
  -initrd ./initramfs.cpio.gz                                 \
  -device virtio-serial-pci                                   \
  -device virtconsole,chardev=vcon,name=console.0             \
  -chardev socket,id=vcon,ipv4=on,host=localhost,port=2222,server,telnet=on,wait=off
```

The important parts in this configuration are the last three lines.

The `virtio-serial-pci` device creates the serial bus where the virtio console
is attached to.

The `virtconsole` creates the virtio console device exposed to the guest
(front-end). The `chardev=vcon` option specifies that the chardev with
`id=vcon` is attached as back-end to the virtio console.
The back-end device is the one we will have access to from the host running the
emulation.

The chardev back-end we configure to be a `socket`, running a telnet server
listening on port 2222. The `wait=off` tells qemu that it can directly boot
without waiting for a client connection.

After booting the guest we are dropped into a shell and can verify that our
device is being detected properly.
```sh
root@virtio-box ~ # ls /sys/bus/virtio/devices/
virtio0
root@virtio-box ~ # cat /sys/bus/virtio/devices/virtio0/virtio-ports/vport0p0/name
console.0
```

In `/etc/inittab`, we already configured to spawn `getty` on the first
hypervisor console `/dev/hvc0`. This will effectively run `login(1)` over the
serial console.

From the host we can run `telnet localhost 2222` and are presented with a login shell to the guest.

As we already included to launch `getty` on the first hypervisor console
`/dev/hvc0` in `/etc/inittab`, we can directly connect to the back-end chardev
and login to the guest with `root:1234`.

```sh
> telnet -4 localhost 2222
Trying 127.0.0.1...
Connected to localhost.
Escape character is '^]'.

virtio-box login: root
Password:
root@virtio-box ~ #
```

## Virtio blk

To enable support for the virtio block device we enable the kernel configs
shown below.
First we enable general support for block devices and then for virtio block
devices. Additionally we enable support for the `ext2` filesystem because we
are creating an ext2 filesystem to back the virtio block device.

```sh
{{ include_range(path="content/2021-12-02-toying-with-virtio/build_kernel.sh", start=40, end=47) }}
```

The full build script is available under [build_kernel.sh][build-kernel].

Next we are creating the ext2 filesystem image. This we'll do by creating an
`128M` blob and format it with ext2 afterwards. Then we can mount the image
via a `loop` device and populate the filesystem.
```sh
dd if=/dev/zero of=rootfs.ext2 bs=1M count=128
mkfs.ext2 rootfs.ext2
mount -t ext2 -o loop rootfs.ext2 /mnt
echo world > /mnt/hello
umount /mnt
```

Before booting the guest we will attach the virtio block device to the VM.
Therefore we add the `-drive` configuration to our previous qemu invocation.

```sh
qemu-system-x86_64 \
  ...
  -drive if=virtio,file=rootfs.ext2,format=raw
```

The `-drive` option is a shortcut for a `-device (front-end) / -blockdev
(back-end)` pair.

The `if=virtio` flag specifies the interface of the front-end device to be
`virtio`.

The `file` and `format` flags configure the back-end to be a disk image.

After booting the guest we are dropped into a shell and can verify a few
things. First we check if the virtio block device is detected, then we check if
we have support for the ext2 filesystem and finally we mount the disk.

```sh
root@virtio-box ~ # ls -l /sys/block/
lrwxrwxrwx 1 root 0 0 Dec  3 22:46 vda -> ../devices/pci0000:00/0000:00:05.0/virtio1/block/vda

root@virtio-box ~ # cat /proc/filesystems
...
       ext2

root@virtio-box ~ # mount -t ext2 /dev/vda /mnt
EXT2-fs (vda): warning: mounting unchecked fs, running e2fsck is recommended
ext2 filesystem being mounted at /mnt supports timestamps until 2038 (0x7fffffff)

root@virtio-box ~ # cat /mnt/hello
world
```

## Virtio net

To enable support for the virtio network device we enable the kernel configs
shown below.
First we enable general support for networking and TCP/IP and then enable the
core networking driver and the virtio net driver.

```sh
{{ include_range(path="content/2021-12-02-toying-with-virtio/build_kernel.sh", start=49, end=62) }}
```

The full build script is available under [build_kernel.sh][build-kernel].

For the qemu device emulation we already decided on the front-end device, which
will be our virtio net device. \
On the back-end we will choose the [`user`][qemu-user-net] option. This enables
a network stack implemented in userspace based on [libslirp][libslirp], which
has the benefit that we do not need to setup additional network interfaces and
therefore require any privileges. Fundamentally, [libslirp][libslirp] works by
replaying [Layer 2][osi-2] packets received from the guest NIC via the socket
API on the host ([Layer 4][osi-4]) and vice versa. User networking comes with a
set of limitations, for example
- Can not use `ping` inside the guest as `ICMP` is not supported.
- The guest is not accessible from the host.

With the guest, qemu and the host in the picture this looks something like the
following.
```text
+--------------------------------------------+
|                                       host |
|     +-------------------------+            |
|     | guest                   |            |
|     |                         |            |
|     |                    user |            |
|     +------+------+-----------+            |
|     |      | eth0 |    kernel |            |
|     |      +--+---+           |            |
|     |         |               |            |
|     |   +-----v--------+      |            |
|     |   | nic (virtio) |      |            |
|  +--+---+-----+--------+------+--+         |
|  |            | Layer 2     qemu |         |
|  |            | (eth frames)     |         |
|  |       +----v-----+            |         |
|  |       | libslirp |            |         |
|  |       +----+-----+            |         |
|  |            | Layer 4          |         |
|  |            | (socket API)     |    user |
+--+---------+--v---+--------------+---------+
|            | eth0 |                 kernel |
|            +------+                        |
+--------------------------------------------+
```

The user networking implements a virtually NAT'ed sub-network with the address
range `10.0.2.0/24` running an internal dhcp server. By default, the dhcp
server assigns the following IP addresses which are interesting to us:
- `10.0.2.2` host running the qemu emulation
- `10.0.2.3` virtual DNS server
> The netdev options `net=addr/mask`, `host=addr`, `dns=addr` can be used to
> re-configure the sub-network (see [network options][qemu-nic-opts]).

With the details of the sub-network in mind we can add some additional setup to
the initramfs which performs the basic network setup.

We add the virtual DNS server to `/etc/resolv.conf` which will be used by the
libc resolver functions.

Additionally we assign a static ip to the `eth0` network interface, bring the
interface up and define the default route via the host `10.0.2.2`.

```sh
{{ include_range(path="content/2021-12-02-toying-with-virtio/build_initramfs.sh", start=69, end=85) }}
```

The full build script is available under [build_initramfs.sh][build-initramfs].

Before booting the guest we will attach the virtio net device and configure to
use the user network stack.
Therefore we add the `-nic` configuration to our previous qemu invocation.

```sh
qemu-system-x86_64 \
  ...
  -nic user,model=virtio-net-pci
```

The `-nic` option is a shortcut for a `-device (front-end) / -netdev
(back-end)` pair.

After booting the guest we are dropped into a shell and can verify a few
things. First we check if the virtio net device is detected. Then we check if
the interface got configured and brought up correctly.

```sh
root@virtio-box ~ # ls -l /sys/class/net/
lrwxrwxrwx 1 root 0 0 Dec  4 16:56 eth0 -> ../../devices/pci0000:00/0000:00:03.0/virtio0/net/eth0
lrwxrwxrwx 1 root 0 0 Dec  4 16:56 lo -> ../../devices/virtual/net/lo


root@virtio-box ~ # ip -o a
2: eth0    inet 10.0.2.15/24 scope global eth0  ...

root@virtio-box ~ # ip route
default via 10.0.2.2 dev eth0
10.0.2.0/24 dev eth0 scope link  src 10.0.2.15
```

We can resolve out domain and see that the virtual DNS gets contacted.

```sh
root@virtio-box ~ # nslookup memzero.de
Server:   10.0.2.3
Address:  10.0.2.3:53

Non-authoritative answer:
Name:    memzero.de
Address: 46.101.148.203
```

Additionally we can try to access a service running on the host. Therefore we
run a simple http server on the host (where we launched qemu) with the
following command `python3 -m http.server --bind 0.0.0.0 1234`. This will
launch the server to listen for any incoming address at port `1234`.

From within the guest we can manually craft a simple http `GET` request and
send it to the http server running on the host. For that we use the IP address
`10.0.2.2` which the dhcp assigned to our host.

```sh
root@virtio-box ~ # echo "GET / HTTP/1.0" | nc 10.0.2.2 1234
HTTP/1.0 200 OK
Server: SimpleHTTP/0.6 Python/3.9.7
Date: Sat, 04 Dec 2021 16:58:56 GMT
Content-type: text/html; charset=utf-8
Content-Length: 917

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" "http://www.w3.org/TR/html4/strict.dtd">
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
<title>Directory listing for /</title>
</head>
<body>
<h1>Directory listing for /</h1>
<hr>
<ul>
<li><a href="build_initramfs.sh">build_initramfs.sh</a></li>
...
</ul>
<hr>
</body>
</html>
```

## Appendix: Workspace

To re-produce the setup and play around with it just grab a copy of the
following files:
- [Dockerfile][dockerfile]
- [Makefile][makefile]
- [build_initramfs.sh][build-initramfs]
- [build_kernel.sh][build-kernel]

Then run the following steps to build everything. The prefix `[H]` and `[C]`
indicate whether this command is run on the host or inside the container
respectively.
```sh
# To see all the make targets.
[H] make help

# Build docker image, start a container with the current working dir
# mounted. On the first invocation this takes some minutes to build
# the image.
[H]: make docker

# Build kernel and initramfs.
[C]: make

# Create the rootfs.ext2 disk image as described in the virtio blk
# section above, or remove the drive from the qemu command line
# in the make `run` target.

# Start qemu guest.
[H]: make run
```

[build-initramfs]: https://git.memzero.de/johannst/blog/src/branch/main/content/2021-12-02-toying-with-virtio/build_initramfs.sh
[build-kernel]: https://git.memzero.de/johannst/blog/src/branch/main/content/2021-12-02-toying-with-virtio/build_kernel.sh
[makefile]: https://git.memzero.de/johannst/blog/src/branch/main/content/2021-12-02-toying-with-virtio/Makefile
[dockerfile]: https://git.memzero.de/johannst/blog/src/branch/main/content/2021-12-02-toying-with-virtio/Dockerfile
[initramfs]: https://www.kernel.org/doc/Documentation/filesystems/ramfs-rootfs-initramfs.txt
[virtio]: http://docs.oasis-open.org/virtio/virtio/v1.1/virtio-v1.1.pdf
[qemu-nic-opts]: https://www.qemu.org/docs/master/system/invocation.html#hxtool-5
[qemu-user-net]: https://www.qemu.org/docs/master/system/devices/net.html#using-the-user-mode-network-stack
[libslirp]: https://gitlab.com/qemu-project/libslirp
[osi-2]: https://osi-model.com/data-link-layer
[osi-4]: https://osi-model.com/transport-layer