diff options
Diffstat (limited to 'README.md')
-rw-r--r-- | README.md | 91 |
1 files changed, 91 insertions, 0 deletions
diff --git a/README.md b/README.md new file mode 100644 index 0000000..34afd4b --- /dev/null +++ b/README.md @@ -0,0 +1,91 @@ +# `vdso` proxy proof-of-concept + +## Background +Simply spoken, the `vdso` is an ELF file provided by the Linux Kernel and +mapped into a process to provide the implementation of certain `syscalls` in +userspace. Userspace can call those `virtual` syscalls without invoking a +_real_ syscall (eg on x86-64 `syscall` instruction). + +The location where the Kernel mapped the `vdso` can be found in the `maps` +(procfs) labeled with the `[vdso]` tag. +```bash +> cat /proc/self/maps | grep vdso +7ffeae5fb000-7ffeae5fd000 r-xp 00000000 00:00 0 [vdso] +``` + +More details about the `vdso` can be found here: +- https://man7.org/linux/man-pages/man7/vdso.7.html +- https://www.kernel.org/doc/Documentation/ABI/stable/vdso + +## Why do this? +This is some toying around and proof-of-concept for `process-checkpoint` +scenarios with `migration` in mind. +Typically a process checkpoint contains a dump of the virtual memory regions of +a process which are then re-mapped when restoring the process at a later point +in time. The vdso in this case needs some special treatment as the user code in +the checkpoint image might have some references into the vdso segment (usually +this is done behind the scenes by the `libc`) where it was when taking the +checkpoint . +When restoring a checkpoint, the Kernel will map the `vdso` to a random virtual +address in the restoring process, therfore there are two cases to distinguish: +1. Restoring the checkpoint with the same Kernel. +1. Restoring the checkpoint with a different Kernel (`migration`). + +For case `(1)` the `vdso` can be [`mremap(2)`][man-mremap]-ed to the virtual +address where the vdso resided when creating the checkpoint. This is fine +because the _new_ and the _old_ `vdso` are compatible. + +For case `(2)` however it is possible that the binary layout of the _new_ +`vdso` has changed (eg different offsets for a given symbol) and is therefore +incompatible with the _old_ `vdso`. In that case a simple +[`mremap(2)`][man-mremap] won't do the trick. +This case is explored in this repository with a `proxy` mechanism which is +described by the figure below. + +```text +# Before checkpoint create. + + VMA + +---------------------+ + | libc: | + | gettimeofday(...) | + | .. | + | call | --+ + | .. | | User code binds to symbols in the vdso. +eg +-- +---------------------+ | ++0x10 | | vdso: | | + +-> | __vdso_gettimeofday | <-+ + | .. | + +---------------------+ + + +# After checkpoint restore. + + VMA + +---------------------+ + | libc: | + | gettimeofday(...) | + | .. | + | call | --+ + | .. | | After restoring the memory of the process checkpoint, +eg +-- +---------------------+ | user code still binds to symbols in the _old_ vdso region. ++0x10 | | [old] vdso: | | + +-> | __vdso_gettimeofday | <-+ + | jmp | --+ + | .. | | After restore, the functions in the _old_ vdso region +eg +-- +---------------------+ | are patched with a trampoline forwarding to the ++0x40 | | [new] vdso: | | corresponding function in the _new_ vdso region. + +-> | __vdso_gettimeofday | <-+ + | .. | + +---------------------+ +``` + +This approach introduces the need for a higher-level synchronization as it must +be ensured that no thread is in the middle of executing a `vdso` function when +creating the process checkpoint. This PoC doesn't take this into account as it +merely focuses on the mechanics described above. + +## License +This project is licensed under the [MIT](LICENSE) license. + +[man-mremap]: https://man7.org/linux/man-pages/man2/mremap.2.html |