diff options
-rw-r--r-- | 01_dynamic_linking/README.md | 6 | ||||
-rw-r--r-- | 02_process_init/Makefile | 12 | ||||
-rw-r--r-- | 02_process_init/README.md | 90 | ||||
-rw-r--r-- | 02_process_init/entry.c | 18 |
4 files changed, 70 insertions, 56 deletions
diff --git a/01_dynamic_linking/README.md b/01_dynamic_linking/README.md index b80d055..67e29d6 100644 --- a/01_dynamic_linking/README.md +++ b/01_dynamic_linking/README.md @@ -26,9 +26,9 @@ control to the interpreter rather than the user program. The interpreter will prepare the execution environment for the user program and pass control to it afterwards. Typical tasks of the interpreter are: -- loading shared objects into memory -- performing re-location -- running initialization routines +- Loading shared objects into memory (dependencies). +- Performing re-location. +- Running initialization routines. ```text @vm @vm | | | | diff --git a/02_process_init/Makefile b/02_process_init/Makefile index 430b0a9..db04941 100644 --- a/02_process_init/Makefile +++ b/02_process_init/Makefile @@ -4,12 +4,12 @@ show: entry ./entry 1 2 3 4 entry: entry.S entry.c - gcc -o $@ \ - -g -O0 \ - -Wall -Wextra \ - -I ../include \ - -nostartfiles -nodefaultlibs \ - -static \ + gcc -o $@ \ + -g -O0 \ + -Wall -Wextra \ + -I ../include \ + -nostdlib \ + -static \ $^ clean: diff --git a/02_process_init/README.md b/02_process_init/README.md index fbe9c4a..26e1cb3 100644 --- a/02_process_init/README.md +++ b/02_process_init/README.md @@ -1,28 +1,38 @@ # Process Initialization ### Goals -- Understand process state on process entry as specified by the - [SystemV x86-64 ABI][sysv_x86_64] -- Build `no-std` program to visualize process state +- Understand initial process state on process entry as specified by the + [SystemV x86-64 ABI][sysv_x86_64]. +- Build a `no-std` program to analyze & visualize the initial process state. --- Before starting to implement a minimal dynamic linker the first step is to -understand the `process initialization` in further depth. -Which is important because when starting a new process -- the dynamic linker must setup the execution environment for the user program - (eg load dependencies, pass command line arguments) -- the control is first passed to the dynamic linker (interpreter) by - the Linux Kernel as mentioned in - [01_dynamic_linking](../01_dynamic_linking/README.md) -- the dynamic linker must be a stand-alone executable with no dependencies - -Before transferring control to a new user process the Linux Kernel provides some -data on the `stack` with the format following the specification in the -[SystemV x86-64 ABI][sysv_x86_64] chapter _Initial Stack and Register State_. +understand the `process initialization` procedure. +This is important because when starting a `dynamically-linked` +executable the control is first passed to the dynamic linker +(interpreter) by the Linux Kernel as mentioned in +[01_dynamic_linking](../01_dynamic_linking/README.md). + +Once the dynamic linker is executing it needs to prepare the execution +environment for the dynamically-linked executable. The dynamic linker's main tasks are: +- To load dependencies. +- Perform re-locations. +- Run initialization routines. + +After the execution environment is prepared the dynamic linker hands +control to the user executable. + +Due to all this requirements the dynamic must be a free-standing +executable with no dependencies. ## Stack state on process entry +When launching an ELF executable the Linux Kernel will map in the +memory segments from the ELF file and setup some data on the `stack` +according to the specification in the [SystemV x86-64 ABI][sysv_x86_64] +chapter _Initial Stack and Register State_. + On process startup after `execve(2)` the stack looks as follows ```text +------------+ High Address @@ -59,23 +69,25 @@ ARGV | $rsp + 8 | const char* [] | Argument ptrs ARGC | $rsp | uint64_t | Argument count ``` -Where `ARGV` is an array of pointers to strings holding the command line -arguments passed to the user program and `ARGC` the number of arguments passed -+1 as `ARGV[0]` holds the path of the program started. Similar `ENVP` is an -array of pointers to strings holding the environment variables as seen by this -process. -The `AUXV` is the auxiliary vector and holds additional information as for -example the `entry point` or the `program header` of the program. Entries in -`AUXV` are encoded as given -in `AuxvEntry`. +- `ARGV : const char* []` is an array of pointers to string literals + holding the command line arguments. + - `ARGV[0]` is special as it holds the path of the launched program. +- `ARGC : uint64_t` is the number of command line arguments + 1 +- `ENVP : const char* []` is an array of pointers to string literals + holding the environment variables as seen by this process +- `AUXV : uint64_t[2]` is the `auxiliary vector` providing additional + information like the `entry point` or the `program header` of the + program. + +The `AUXV` segment consists of consecutive `AuxvEntry` elements terminated by the `DT_NULL` element. ```c struct AuxvEntry { uint64_t tag; uint64_t val; }; ``` -The [`x86-64 System V ABI`][sysv_x86_64] chapter _Auxiliary Vector_ specifies -the following tags +The _Auxiliary Vector_ chapter in the [`x86-64 System V ABI`][sysv_x86_64] specifies +the following tags: ```text AT_NULL = 0 AT_IGNORE = 1 @@ -107,7 +119,7 @@ below are in an unspecified state: - `$rdx`: function pointer that the application should register with `atexit(BA_OS)`. > Not sure here if clearing `$rbp` is strictly required as frame-pointer -> chaining is optional and can be omitted (eg `gcc -fomit-frame-pointer`). +> chaining is optional and can be omitted (`gcc -fomit-frame-pointer`). ## Hands-on the first instruction @@ -133,7 +145,7 @@ This is because by default the `static linker` adds some extra code & libraries to the program like for example the `libc` and the `C-runtime (crt)` which contains the `_start` symbol and hence the first instruction executed. -Passing `--trace` down to the `static linker` it sheds some light onto which +Passing `--trace` down to the `static linker` sheds some light onto which input files the static linker actually processes. ```bash echo 'void main() {}' | gcc -x c -o /dev/null - -Wl,--trace @@ -187,8 +199,8 @@ The full source code of the `_start` function is available in [entry.S](./entry. The pointer passed to the `entry` function can be used to compute `ARGC`, `ARGV` and `ENVP` accordingly. ```c -void entry(long* prctx) { - long argc = *prctx; +void entry(uint64_t* prctx) { + uint64_t argc = *prctx; const char** argv = (const char**)(prctx + 1); const char** envv = (const char**)(argv + argc + 1); ... @@ -197,7 +209,7 @@ void entry(long* prctx) { To collect the `AUXV` entries we first need to count the number of environment variables as follows. ```c -// entry +// entry.c ... int envc = 0; for (const char** env = envv; *env; ++env) { @@ -209,10 +221,10 @@ variables as follows. auxv[i] = 0; } - const uint64_t* auxvp = (const uint64_t*)(envv + envc + 1); - for (unsigned i = 0; auxvp[i] != AT_NULL; i += 2) { - if (auxvp[i] < AT_MAX_CNT) { - auxv[auxvp[i]] = auxvp[i + 1]; + const Auxv64Entry* auxvp = (const Auxv64Entry*)(envv + envc + 1); + for (; auxvp->tag != AT_NULL; ++auxvp) { + if (auxvp->tag < AT_MAX_CNT) { + auxv[auxvp->tag] = auxvp->val; } } ... @@ -220,7 +232,7 @@ variables as follows. Finally the data can be printed as ```c -// entry +// entry.c ... pfmt("Got %d arg(s)\n", argc); for (const char** arg = argv; *arg; ++arg) { @@ -268,7 +280,7 @@ Print first 9 env var(s) env = GDM_LANG=en_US.utf8 env = PWD=/home/johannst/dev/dynld/02_process_init env = MAIL=/var/spool/mail/johannst - env = XDG_SESSION_PATH=/org/freedesktop/DisplayManager/Session env = LANG=en_US.utf8 + env = XDG_SESSION_PATH=/org/freedesktop/DisplayManager/Session Print auxiliary vector AT_EXECFD: 0 AT_PHDR : 0x400040 @@ -287,9 +299,9 @@ Print auxiliary vector ## Things to remember - On process entry the Linux Kernel provides data on the stack as specified in - the [SystemV ABI][sysv_x86_64] + the [SystemV ABI][sysv_x86_64]. - By default the `static linker` adds additional code which contains the - `_start` symbol being the default process `entry point` + `_start` symbol being the default process `entry point`. ## References & Source Code - [x86-64 SystemV ABI][sysv_x86_64] diff --git a/02_process_init/entry.c b/02_process_init/entry.c index 37f57c8..37d5fc8 100644 --- a/02_process_init/entry.c +++ b/02_process_init/entry.c @@ -1,5 +1,6 @@ // Copyright (c) 2020 Johannes Stoelp +#include <auxv.h> #include <elf.h> #define MAX_PRINTF_LEN 128 #include <io.h> @@ -8,14 +9,14 @@ # error "Only supported in linux(x86_64)!" #endif -void entry(const long* prctx) { +void entry(const uint64_t* prctx) { // Interpret data on the stack passed by the OS kernel as specified in the // x86_64 SysV ABI. - - long argc = *prctx; + uint64_t argc = *prctx; const char** argv = (const char**)(prctx + 1); const char** envv = (const char**)(argv + argc + 1); + // Count the number of environment variables in the `ENVP` segment. int envc = 0; for (const char** env = envv; *env; ++env) { ++envc; @@ -26,14 +27,15 @@ void entry(const long* prctx) { auxv[i] = 0; } - const uint64_t* auxvp = (const uint64_t*)(envv + envc + 1); - for (unsigned i = 0; auxvp[i] != AT_NULL; i += 2) { - if (auxvp[i] < AT_MAX_CNT) { - auxv[auxvp[i]] = auxvp[i + 1]; + // Read the `AUXV` auxiliary vector segment. + const Auxv64Entry* auxvp = (const Auxv64Entry*)(envv + envc + 1); + for (; auxvp->tag != AT_NULL; ++auxvp) { + if (auxvp->tag < AT_MAX_CNT) { + auxv[auxvp->tag] = auxvp->val; } } - // Print for demonstration + // Print the data provided by the Linux Kernel on the stack. pfmt("Got %d arg(s)\n", argc); for (const char** arg = argv; *arg; ++arg) { |