aboutsummaryrefslogtreecommitdiff

vdso proxy proof-of-concept

Background

Simply spoken, the vdso is an ELF file provided by the Linux Kernel and mapped into a process to provide the implementation of certain syscalls in userspace. Userspace can call those virtual syscalls without invoking a real syscall (eg on x86-64 syscall instruction).

The location where the Kernel mapped the vdso can be found in the maps (procfs) labeled with the [vdso] tag.

> cat /proc/self/maps | grep vdso
7ffeae5fb000-7ffeae5fd000 r-xp 00000000 00:00 0     [vdso]

More details about the vdso can be found here: - https://man7.org/linux/man-pages/man7/vdso.7.html - https://www.kernel.org/doc/Documentation/ABI/stable/vdso

Why do this?

This is some toying around and proof-of-concept for process-checkpoint scenarios with migration in mind. Typically a process checkpoint contains a dump of the virtual memory regions of a process which are then re-mapped when restoring the process at a later point in time. The vdso in this case needs some special treatment as the user code in the checkpoint image might have some references into the vdso segment (usually this is done behind the scenes by the libc) where it was when taking the checkpoint . When restoring a checkpoint, the Kernel will map the vdso to a random virtual address in the restoring process, therfore there are two cases to distinguish: 1. Restoring the checkpoint with the same Kernel. 1. Restoring the checkpoint with a different Kernel (migration).

For case (1) the vdso can be mremap(2)-ed to the virtual address where the vdso resided when creating the checkpoint. This is fine because the new and the old vdso are compatible.

For case (2) however it is possible that the binary layout of the new vdso has changed (eg different offsets for a given symbol) and is therefore incompatible with the old vdso. In that case a simple mremap(2) won't do the trick. This case is explored in this repository with a proxy mechanism which is described by the figure below.

# Before checkpoint create.

          VMA
          +---------------------+
          | libc:               |
          | gettimeofday(...)   |
          |   ..                |
          |   call              | --+
          |   ..                |   | User code binds to symbols in the vdso.
eg    +-- +---------------------+   |
+0x10 |   | vdso:               |   |
      +-> | __vdso_gettimeofday | <-+
          |   ..                |
          +---------------------+


# After checkpoint restore.

          VMA
          +---------------------+
          | libc:               |
          | gettimeofday(...)   |
          |   ..                |
          |   call              | --+
          |   ..                |   | After restoring the memory of the process checkpoint,
eg    +-- +---------------------+   | user code still binds to symbols in the _old_ vdso region.
+0x10 |   | [old] vdso:         |   |
      +-> | __vdso_gettimeofday | <-+
          |   jmp               | --+
          |   ..                |   | After restore, the functions in the _old_ vdso region
eg    +-- +---------------------+   | are patched with a trampoline forwarding to the
+0x40 |   | [new] vdso:         |   | corresponding function in the _new_ vdso region.
      +-> | __vdso_gettimeofday | <-+
          |   ..                |
          +---------------------+

This approach introduces the need for a higher-level synchronization as it must be ensured that no thread is in the middle of executing a vdso function when creating the process checkpoint. This PoC doesn't take this into account as it merely focuses on the mechanics described above.

License

This project is licensed under the MIT license.