diff options
author | Johannes Stoelp <johannes.stoelp@gmail.com> | 2024-12-20 23:28:04 +0100 |
---|---|---|
committer | Johannes Stoelp <johannes.stoelp@gmail.com> | 2024-12-20 23:28:04 +0100 |
commit | cb0562f9fba187a0db111a7f4532706a5140945e (patch) | |
tree | 922512c4ab4fe9450a538e5090db830e410437f1 /content/2024-12-20-xpost-juicebox-asm/index.md | |
parent | 9610877ef54536386b6ea7de29270993194161a5 (diff) | |
download | blog-cb0562f9fba187a0db111a7f4532706a5140945e.tar.gz blog-cb0562f9fba187a0db111a7f4532706a5140945e.zip |
Diffstat (limited to 'content/2024-12-20-xpost-juicebox-asm/index.md')
-rw-r--r-- | content/2024-12-20-xpost-juicebox-asm/index.md | 335 |
1 files changed, 335 insertions, 0 deletions
diff --git a/content/2024-12-20-xpost-juicebox-asm/index.md b/content/2024-12-20-xpost-juicebox-asm/index.md new file mode 100644 index 0000000..d8796bd --- /dev/null +++ b/content/2024-12-20-xpost-juicebox-asm/index.md @@ -0,0 +1,335 @@ ++++ +title = "xpost: x64 jit assembler (juicebox-asm)" + +[taxonomies] +tags = ["x86", "rust", "linux"] ++++ + +This is a cross post to a repository implementing an `x64` _jit assembler_, +which I used to study the x86 instruction encoding and how to write such a tool +in rust. The code is available on github [>> juicebox-asm <<][jb]. + +The _assembler_ only implements a small set of the x86 instruction set, but +enough to create some interesting examples. Additionally, the repository +provides a simple _runtime_ based on [mmap(2)][mmap] (only on linux), which is +used to execute the dynamically assembled code and provided examples. + +## Assembler +The _assembler_ implements all the supported instructions and provides a similar +interface as when writing assembly code by hand. + +```rust +let mut asm = Asm::new(); +asm.xor(rax, rax); +asm.add(rax, rdi); +asm.add(rax, Mem64::indirect_base_index(rdi, rsi)); +asm.ret(); + +asm.disasm(); +// Outputs: +// 00000000 4831C0 xor rax,rax +// 00000003 4801F8 add rax,rdi +// 00000006 48030437 add rax,[rdi+rsi] +// 00000006 C3 ret +``` + +An interesting design-choice is given by the implementation of the assembler API +for the various instructions. Since x86 is a _CISC_ instruction set, it allows +same the mnemonic to be used with different types and permutations of +operands. This can be seen in the code listing above where the _add_ instruction +is used with both `register-register` and `register-memory` operands. It becomes +even clearer when looking into the instruction reference manual, in the example +of the _add_ instruction. + +<img src="x86-add.png"> + +> Table taken from [Intel® 64 and IA-32 Architectures Software Developer’s +> Manual - Volume 2 Instruction Set Reference][intel-im]. + +Since rust does not have function overloading +<sup id="sup-1-src"><a href="#sup-1-dst">1</a></sup> +as in cpp, one does need to define functions with different function names to +implement the _add_ instruction for different combinations of operands. +```rust +// Pseudo code. +fn add_r64_r64(Reg64, Reg64) +fn add_r64_m64(Reg64, Mem64) +// .. +``` + +However, in this experiment, I did not go with that approach. Instead I used a +different solution and defined generic traits for the various instructions which +have different operand combinations. The _Add_ trait below gives an example of +that. + +```rust +pub trait Add<T, U> { + fn add(&mut self, op1: T, op2: U); +} +``` + +This in turn allows to implement the traits for all the different combinations +of operands that should be supported by each instruction. It gives fine control +of what to support and provides an ergonomic API as seen in the first example +code listing. + +```rust +impl Add<Reg64, Reg64> for Asm { + fn add(&mut self, op1: Reg64, op2: Reg64) { + self.encode_rr(&[0x01], op1, op2); + } +} + +impl Add<Reg64, Mem64> for Asm { + fn add(&mut self, op1: Reg64, op2: Mem64) { + self.encode_rm(0x03, op1, op2); + } +} +``` + +All instructions currently implemented with their variations can be found in the +documentation of the [Asm][jb-asm] _assembler_. + +## Runtime +With the help of the provided runtime, code assembled with the _assembler_ can +be put into executable memory and turned into a function pointer, allowing to +call into the generated code. + +```rust +extern "C" fn add(a: u32, b: u32) -> u32 { + a + b +} + +fn main() { + // SystemV abi: + // rdi -> first argument + // rsi -> second argument + // rax -> return value + + let mut asm = Asm::new(); + asm.mov(rsi, Imm64::from(42)); + asm.mov(rax, Imm64::from(add as usize)); + asm.call(rax); + asm.ret(); + + let mut rt = Runtime::new(); + let add42 = unsafe { rt.add_code::<extern "C" fn(u32) -> u32>(asm.into_code()) }; + + let res = add42(5); + assert_eq!(res, 47); +} +``` +> This example must be run on a system with the [SystemV][sysv] ABI. + +## Label +A label is used to associate a location in the generated code with a symbolic +name (the label), which can then be used as branch target for control-flow +instructions as for example `jz`, `jnz` and so on. + +The act of associating a label with a location is called _binding_. Each label +can only be bound **once**, else the branch target would become _ambiguous_. + +Binding and jumping to a label can come in two orders. +1. A label is first bound and then jumped to (backwards-reference). +1. A label is first jumped to and then bound (forwards-reference). + +In the following both cases will be discussed in a bit more detail, which +hopefully helps when browsing the implementation. One thing to keep in mind, +since the generated code must be relocatable, all jump instructions will be +encoded relative to the current program counter (pc). + +The first case where the label is first bound and later used, is somewhat more +natural to think about, as first defining then using something is pretty +clear. The code listing below shows an example of a loop with a back-reference +summing up all integer between 0 and 42. + +```rust,linenos +let mut asm = Asm::new(); +let mut lp = Label::new(); + +asm.mov(eax, Imm32::from(0)); // int ret = 0; +asm.mov(ecx, Imm32::from(42)); // int n = 42; + +asm.bind(&mut lp); // lp: +asm.add(eax, ecx); // ret += n; +asm.dec(ecx); // --n; +asm.test(ecx, ecx); // if (n != 0) +asm.jnz(&mut lp); // goto lp + +asm.ret(); +``` + +From the point of view of the code buffer this looks as follows. When the label +is bound (line 7), the current offset into the code buffer is recorded as the +labels _location_. Then when emitting the jump instruction (line 11) later, this +location is used with the current code buffers offset to compute the +_displacement_ (disp32) forming the relative jump. + +<div style="margin:auto; width:85%;"> +<img src="label-lj.svg"> +</div> + +The second case where the label is first used and then bound later, is not +really more complex, but one has to reverse the steps. The code listing below +gives an example, demonstrating forwards jumps by validating two input +arguments to be not equal to 0 and returning of one check fails. + +```rust,linenos +let mut asm = Asm::new(); +let mut ret = Label::new(); + +// Validate inputs (sysv abi). +asm.test(rdi, rdi); // if (rdi == 0) +asm.jz(&mut ret); // return; +asm.test(rsi, rsi); // if (rsi == 0) +asm.jz(&mut ret); // return; + +// Here, rdi != 0 && rsi != 0. +// nop as placeholder for some real work. +asm.nop(); + +asm.bind(&mut ret); +asm.ret(); +``` + +Again from the point of view of the code buffer this looks as follows. When the +jump instructions are emitted (line 6, 8), the _displacement_ can not be +computed, as the label has no defined location yet. A placeholder displacement +of value 0 is emitted into the code buffer and the offset to the displacement +value is recorded by the label as pending _relocation_. Once the label is bound +to a location, all pending relocations are resolved by consuming the recorded +offsets, computing the corresponding displacements and patching the values in +the code buffer. + +<div style="margin:auto; width:85%;"> +<img src="label-jl.svg"> +</div> + +## Example: bf + +The [bf.rs][jb-bf] example implements a _jit compiler_ for the [brainfuck][bf] +language. In this example the _bf program_ is jit compiled to one code blob as a +whole and evaluated then after. The example only returns from the jit code if +the bf program terminates or a runtime error is detected. For reference an +_interpreter_ is also provided. + +## Example: tiny-vm + +The [tiny_vm.rs][jb-tiny] example defines a _tiny virtual machine (TinyVm)_ with +a set of _registers_ and custom _instructions_. It then goes on and implements a +_jit compiler_ for the TinyVm. In comparison to the jit compiler in the previous +bf example, the TinyVm compiler jit compiles the program per basic-block (bb), +each time a new bb is encountered. \ +After executing a jit compiled basic block, the code returns into the +TinyVm. The vm then checks if the branch target has an entry in the _jit cache_, +which maps guest addresses to compiled basic-blocks. If the cache returns a hit +the TinyVm jumps to the generated code, if the cache returns a miss the new bb is +compiled and inserted into the jit cache. + +Similar to the bf example, this example also provides an _interpreter_ as +reference. + +## Appendix: x86 instruction encoding +x86 instructions are encoded with _variable length_ and follow the format below. +A single instruction can be as small as 1 byte up to a double-digit number of +bytes. + +<img src="x86-insn-fmt.png"> + +> Figure taken from [Intel® 64 and IA-32 Architectures Software Developer’s +> Manual - Volume 2 Instruction Set Reference][intel-im]. + +_Prefixes_ are typically modifiers for the instruction, where some examples are +`LOCK` for bus locking, `REP` for instruction repetition and `REX` for +specifying extended operand sizes and to access extended registers (like +r8-r15). + +The `ModR/M` and `SIB` bytes are used to encode the register and memory operands +of the instruction in the general case. Whether the `SIB` byte is used depends +on the _addressing mode_. + +In case the instruction also requires an _address displacement_ or an _immediate +value_, the bytes of the value directly follow the instruction bytes. + +Historically, the x86 ISA only had 8 general-purpose registers (ax, cx, dx, bx, +sp, bp, si, di). This is still visible today in various fields which encode +register operands, where one such example is the **ModRM.Reg** field. Later +on, 8 additional general-purpose registers (r8-r15) were added and one +additional bit was needed to encode all 16 registers, while keeping backwards +compatibility. \ +As mentioned above, the **REX** prefix is used to access the _extended_ +registers. This prefix contains the **R, X, B** bits, which are combined with +the various register operand fields, as shown by the figures below for the case +of the ModRM byte and the ModRM + SIB bytes. + +<div style="margin:auto; width:85%;"> +<img src="x86-insn-modrm.png"> +<img src="x86-insn-modrm-sib.png"> +</div> + +> Figures taken from [Intel® 64 and IA-32 Architectures Software Developer’s +> Manual - Volume 2 Instruction Set Reference][intel-im]. + +The following code listing provides an example for two load instructions with +different _addressing modes_ and gives a detailed breakdown of the instruction +bytes for each instruction. + +```rust +let mut asm = Asm::new(); +asm.mov(r8, Mem64::indirect_disp(r9, 0xaabb)); +asm.mov(r8, Mem64::indirect_base_index(r9, r10)); + +asm.disasm(); +// Outputs: +// 4D8B81BBAA0000 mov r8,[r9+0xaabb] +// 4F8B0411 mov r8,[r9+r10] + + +// Breakdown of the indirect + displacement load. +// +// 4D 8B 81 BBAA0000 mov r8,[r9+0xaabb] +// ^ ^ ^ `Imm32 +// | | `ModRM: mod=10,reg=000,rm=001 +// | `Opc: mov +// `REX: W=1,R=1,X=0,B=1 +// +// REX.R + ModRM.reg = 1000 (r8) - dest reg +// REX.B + ModRm.rm = 1001 (r9) - base reg + + +// Breakdown of the base + index load. +// +// 4F 8B 04 11 mov r8,[r9+r10] +// ^ ^ ^ `SIB: scale=00,index=010,base=001 +// | | `ModRM: mod=00,reg=000,rm=100 +// | `Opc: mov +// `REX: W=1,R=1,X=1,B=1 +// +// ModRM.rm = 100 -> use SIB byte +// +// REX.R + ModRM.reg = 1000 ( r8) - dest reg +// REX.X + SIB.index = 1010 (r10) - index reg +// REX.B + SIB.base = 1001 ( r9) - base reg +``` + +## References +- [juicebox-asm][jb] +- [osdev: x86 instruction encoding](https://wiki.osdev.org/X86-64_Instruction_Encoding) +- [intel instruction set reference][intel-im] + +## Footnotes +<span id="sup-1-dst">**1)**</span> +In cpp functions with the _same_ name but _different_ arguments can be +defined. During compilation [overload resolution][cpp-or] takes place and picks +the "right" function to call on each call site. +<a href="#sup-1-src">↩</a> + +[jb]: https://github.com/johannst/juicebox-asm +[jb-asm]: https://johannst.github.io/juicebox-asm/juicebox_asm/struct.Asm.html +[jb-bf]: https://github.com/johannst/juicebox-asm/blob/main/examples/bf.rs +[jb-tiny]: https://github.com/johannst/juicebox-asm/blob/main/examples/tiny_vm.rs +[mmap]: https://www.man7.org/linux/man-pages/man2/mmap.2.html +[intel-im]: https://cdrdv2.intel.com/v1/dl/getContent/671110 +[cpp-or]: https://en.cppreference.com/w/cpp/language/overload_resolution +[sysv]: https://gitlab.com/x86-psABIs/x86-64-ABI/-/jobs/artifacts/master/raw/x86-64-ABI/abi.pdf?job=build +[bf]: https://brainfuck.org/brainfuck.html |