aboutsummaryrefslogtreecommitdiff
path: root/README.md
diff options
context:
space:
mode:
Diffstat (limited to 'README.md')
-rw-r--r--README.md91
1 files changed, 91 insertions, 0 deletions
diff --git a/README.md b/README.md
new file mode 100644
index 0000000..34afd4b
--- /dev/null
+++ b/README.md
@@ -0,0 +1,91 @@
+# `vdso` proxy proof-of-concept
+
+## Background
+Simply spoken, the `vdso` is an ELF file provided by the Linux Kernel and
+mapped into a process to provide the implementation of certain `syscalls` in
+userspace. Userspace can call those `virtual` syscalls without invoking a
+_real_ syscall (eg on x86-64 `syscall` instruction).
+
+The location where the Kernel mapped the `vdso` can be found in the `maps`
+(procfs) labeled with the `[vdso]` tag.
+```bash
+> cat /proc/self/maps | grep vdso
+7ffeae5fb000-7ffeae5fd000 r-xp 00000000 00:00 0 [vdso]
+```
+
+More details about the `vdso` can be found here:
+- https://man7.org/linux/man-pages/man7/vdso.7.html
+- https://www.kernel.org/doc/Documentation/ABI/stable/vdso
+
+## Why do this?
+This is some toying around and proof-of-concept for `process-checkpoint`
+scenarios with `migration` in mind.
+Typically a process checkpoint contains a dump of the virtual memory regions of
+a process which are then re-mapped when restoring the process at a later point
+in time. The vdso in this case needs some special treatment as the user code in
+the checkpoint image might have some references into the vdso segment (usually
+this is done behind the scenes by the `libc`) where it was when taking the
+checkpoint .
+When restoring a checkpoint, the Kernel will map the `vdso` to a random virtual
+address in the restoring process, therfore there are two cases to distinguish:
+1. Restoring the checkpoint with the same Kernel.
+1. Restoring the checkpoint with a different Kernel (`migration`).
+
+For case `(1)` the `vdso` can be [`mremap(2)`][man-mremap]-ed to the virtual
+address where the vdso resided when creating the checkpoint. This is fine
+because the _new_ and the _old_ `vdso` are compatible.
+
+For case `(2)` however it is possible that the binary layout of the _new_
+`vdso` has changed (eg different offsets for a given symbol) and is therefore
+incompatible with the _old_ `vdso`. In that case a simple
+[`mremap(2)`][man-mremap] won't do the trick.
+This case is explored in this repository with a `proxy` mechanism which is
+described by the figure below.
+
+```text
+# Before checkpoint create.
+
+ VMA
+ +---------------------+
+ | libc: |
+ | gettimeofday(...) |
+ | .. |
+ | call | --+
+ | .. | | User code binds to symbols in the vdso.
+eg +-- +---------------------+ |
++0x10 | | vdso: | |
+ +-> | __vdso_gettimeofday | <-+
+ | .. |
+ +---------------------+
+
+
+# After checkpoint restore.
+
+ VMA
+ +---------------------+
+ | libc: |
+ | gettimeofday(...) |
+ | .. |
+ | call | --+
+ | .. | | After restoring the memory of the process checkpoint,
+eg +-- +---------------------+ | user code still binds to symbols in the _old_ vdso region.
++0x10 | | [old] vdso: | |
+ +-> | __vdso_gettimeofday | <-+
+ | jmp | --+
+ | .. | | After restore, the functions in the _old_ vdso region
+eg +-- +---------------------+ | are patched with a trampoline forwarding to the
++0x40 | | [new] vdso: | | corresponding function in the _new_ vdso region.
+ +-> | __vdso_gettimeofday | <-+
+ | .. |
+ +---------------------+
+```
+
+This approach introduces the need for a higher-level synchronization as it must
+be ensured that no thread is in the middle of executing a `vdso` function when
+creating the process checkpoint. This PoC doesn't take this into account as it
+merely focuses on the mechanics described above.
+
+## License
+This project is licensed under the [MIT](LICENSE) license.
+
+[man-mremap]: https://man7.org/linux/man-pages/man2/mremap.2.html