aboutsummaryrefslogtreecommitdiff
path: root/README.md
blob: 34afd4b7eafc315c0e808bd44af0dd269985581a (plain) (blame)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
# `vdso` proxy proof-of-concept

## Background
Simply spoken, the `vdso` is an ELF file provided by the Linux Kernel and
mapped into a process to provide the implementation of certain `syscalls` in
userspace. Userspace can call those `virtual` syscalls without invoking a
_real_ syscall (eg on x86-64 `syscall` instruction).

The location where the Kernel mapped the `vdso` can be found in the `maps`
(procfs) labeled with the `[vdso]` tag.
```bash
> cat /proc/self/maps | grep vdso
7ffeae5fb000-7ffeae5fd000 r-xp 00000000 00:00 0     [vdso]
```

More details about the `vdso` can be found here:
- https://man7.org/linux/man-pages/man7/vdso.7.html
- https://www.kernel.org/doc/Documentation/ABI/stable/vdso

## Why do this?
This is some toying around and proof-of-concept for `process-checkpoint`
scenarios with `migration` in mind.
Typically a process checkpoint contains a dump of the virtual memory regions of
a process which are then re-mapped when restoring the process at a later point
in time. The vdso in this case needs some special treatment as the user code in
the checkpoint image might have some references into the vdso segment (usually
this is done behind the scenes by the `libc`) where it was when taking the
checkpoint .
When restoring a checkpoint, the Kernel will map the `vdso` to a random virtual
address in the restoring process, therfore there are two cases to distinguish:
1. Restoring the checkpoint with the same Kernel.
1. Restoring the checkpoint with a different Kernel (`migration`).

For case `(1)` the `vdso` can be [`mremap(2)`][man-mremap]-ed to the virtual
address where the vdso resided when creating the checkpoint. This is fine
because the _new_ and the _old_ `vdso` are compatible.

For case `(2)` however it is possible that the binary layout of the _new_
`vdso` has changed (eg different offsets for a given symbol) and is therefore
incompatible with the _old_ `vdso`. In that case a simple
[`mremap(2)`][man-mremap] won't do the trick.
This case is explored in this repository with a `proxy` mechanism which is
described by the figure below.

```text
# Before checkpoint create.

          VMA
          +---------------------+
          | libc:               |
          | gettimeofday(...)   |
          |   ..                |
          |   call              | --+
          |   ..                |   | User code binds to symbols in the vdso.
eg    +-- +---------------------+   |
+0x10 |   | vdso:               |   |
      +-> | __vdso_gettimeofday | <-+
          |   ..                |
          +---------------------+


# After checkpoint restore.

          VMA
          +---------------------+
          | libc:               |
          | gettimeofday(...)   |
          |   ..                |
          |   call              | --+
          |   ..                |   | After restoring the memory of the process checkpoint,
eg    +-- +---------------------+   | user code still binds to symbols in the _old_ vdso region.
+0x10 |   | [old] vdso:         |   |
      +-> | __vdso_gettimeofday | <-+
          |   jmp               | --+
          |   ..                |   | After restore, the functions in the _old_ vdso region
eg    +-- +---------------------+   | are patched with a trampoline forwarding to the
+0x40 |   | [new] vdso:         |   | corresponding function in the _new_ vdso region.
      +-> | __vdso_gettimeofday | <-+
          |   ..                |
          +---------------------+
```

This approach introduces the need for a higher-level synchronization as it must
be ensured that no thread is in the middle of executing a `vdso` function when
creating the process checkpoint. This PoC doesn't take this into account as it
merely focuses on the mechanics described above.

## License
This project is licensed under the [MIT](LICENSE) license.

[man-mremap]: https://man7.org/linux/man-pages/man2/mremap.2.html