+++ title = "xpost: x64 jit assembler (juicebox-asm)" [taxonomies] tags = ["x86", "rust", "linux"] +++ This is a cross post to a repository implementing an `x64` _jit assembler_, which I used to study the x86 instruction encoding and how to write such a tool in rust. The code is available on github [>> juicebox-asm <<][jb]. The _assembler_ only implements a small set of the x86 instruction set, but enough to create some interesting examples. Additionally, the repository provides a simple _runtime_ based on [mmap(2)][mmap] (linux only), which is used to execute the dynamically assembled code and the provided examples. ## Assembler The _assembler_ implements all the supported instructions and provides a similar interface as when writing assembly code by hand. ```rust let mut asm = Asm::new(); asm.xor(rax, rax); asm.add(rax, rdi); asm.add(rax, Mem64::indirect_base_index(rdi, rsi)); asm.ret(); asm.disasm(); // Outputs: // 00000000 4831C0 xor rax,rax // 00000003 4801F8 add rax,rdi // 00000006 48030437 add rax,[rdi+rsi] // 00000006 C3 ret ``` An interesting design-choice is given by the implementation of the assembler API for the various instructions, since x86 overloads instruction mnemonics with many operand types. This can be seen in the code listing above where the _add_ instruction is used with both `register-register` and `register-memory` operands. It becomes even clearer when looking into the instruction reference manual, in the example of the _add_ instruction. > Table taken from [Intel® 64 and IA-32 Architectures Software Developer’s > Manual - Volume 2 Instruction Set Reference][intel-im]. One approach is to implement the functions with _explicit_ types for the operands. This has the benefit that the API clearly expresses which operands are allowed for a given instruction and the compiler can check if wrong operands are provided. However, since rust does not have function overloading 1 as in cpp, one does need to define functions with different names to implement the instructions. The following shows an example for the _add_ instruction with different operands. ```rust // Pseudo code. fn add_r64_r64(Reg64, Reg64) fn add_r64_m64(Reg64, Mem64) // .. ``` Another approach is to provided a _flexible_ `Operand` type and implement the instructions with that. The benefit here is that there is only **one** _add_ function. The downside however, is that the API does not clearly express which operands are allowed and the operand check must be done at _runtime_ rather than _compile-time_. ```rust // Pseudo code. fn add(Operand, Operand) ``` During this experiment I have learned that there is a way in rust to emulate sort of function overloading, which allows to get all the benefits of the above discussed approaches while getting rid of all the downsides at the same time at the expense of a little boiler-plate code. The idea is to define generic traits for the various instructions with overloaded mnemonics. The listing below shows the `Add` trait as example for the _add_ instruction. ```rust pub trait Add { fn add(&mut self, op1: T, op2: U); } ``` This in turn allows to implement the traits for all the different combinations of (explicit) operands that should be supported by each instruction. It gives fine control of what to support and provides an ergonomic API as seen in the first example code listing. ```rust impl Add for Asm { fn add(&mut self, op1: Reg64, op2: Reg64) { self.encode_rr(&[0x01], op1, op2); } } impl Add for Asm { fn add(&mut self, op1: Reg64, op2: Mem64) { self.encode_rm(0x03, op1, op2); } } ``` All instructions currently implemented with their variations can be found in the documentation of the [Asm][jb-asm] _assembler_. ## Runtime With the help of the provided runtime, code assembled with the _assembler_ can be put into executable memory and turned into a function pointer, allowing to call into the generated code. ```rust extern "C" fn add(a: u32, b: u32) -> u32 { a + b } fn main() { // SystemV abi: // rdi -> first argument // rsi -> second argument // rax -> return value let mut asm = Asm::new(); asm.mov(rsi, Imm64::from(42)); asm.mov(rax, Imm64::from(add as usize)); asm.call(rax); asm.ret(); let mut rt = Runtime::new(); let add42 = unsafe { rt.add_code:: u32>(asm.into_code()) }; let res = add42(5); assert_eq!(res, 47); } ``` > This example must be run on a system with the [SystemV][sysv] ABI. ## Label A label is used to associate a location in the generated code with a symbolic name (the label), which can then be used as branch target for control-flow instructions as for example `jz`, `jnz`. The act of associating a label with a location is called _binding_. Each label can only be bound **once**, else the branch target would become _ambiguous_. Binding and jumping to a label can come in two orders. 1. A label is first bound and then jumped to (backwards-reference). 1. A label is first jumped to and then bound (forwards-reference). In the following both cases will be discussed in a bit more detail, which hopefully helps when browsing the implementation. One thing to keep in mind, since the generated code must be relocatable, all jump instructions will be encoded relative to the current program counter (pc). The first case where the label is first bound and later used, is somewhat more natural to think about, as first defining then using something is pretty clear. The code listing below shows an example of a loop with a back-reference summing up all integer between 0 and 42. ```rust,linenos let mut asm = Asm::new(); let mut lp = Label::new(); asm.mov(eax, Imm32::from(0)); // int ret = 0; asm.mov(ecx, Imm32::from(42)); // int n = 42; asm.bind(&mut lp); // lp: asm.add(eax, ecx); // ret += n; asm.dec(ecx); // --n; asm.test(ecx, ecx); // if (n != 0) asm.jnz(&mut lp); // goto lp asm.ret(); ``` From the point of view of the code buffer this looks as follows. When the label is bound (line 7), the current offset into the code buffer is recorded as the labels _location_. Then when emitting the jump instruction (line 11) later, this location is used with the current code buffers offset to compute the _displacement_ (disp32) forming the relative jump.
The second case where the label is first used and then bound later, is not really more complex, but one has to reverse the steps. The code listing below gives an example, demonstrating forwards jumps by validating two input arguments to be not equal to 0 and returning of one check fails. ```rust,linenos let mut asm = Asm::new(); let mut ret = Label::new(); // Validate inputs (sysv abi). asm.test(rdi, rdi); // if (rdi == 0) asm.jz(&mut ret); // return; asm.test(rsi, rsi); // if (rsi == 0) asm.jz(&mut ret); // return; // Here, rdi != 0 && rsi != 0. // nop as placeholder for some real work. asm.nop(); asm.bind(&mut ret); asm.ret(); ``` Again from the point of view of the code buffer this looks as follows. When the jump instructions are emitted (line 6, 8), the _displacement_ can not be computed, as the label has no defined location yet. A placeholder displacement of value 0 is emitted into the code buffer and the offset to the displacement value is recorded by the label as pending _relocation_. Once the label is bound to a location, all pending relocations are resolved by consuming the recorded offsets, computing the corresponding displacements and patching the values in the code buffer.
## Example: bf The [bf.rs][jb-bf] example implements a _jit compiler_ for the [brainfuck][bf] language. In this example the _bf program_ is jit compiled to one code blob as a whole and evaluated then after. The example only returns from the jit code if the bf program terminates or a runtime error is detected. For reference an _interpreter_ is also provided. ## Example: tiny-vm The [tiny_vm.rs][jb-tiny] example defines a _tiny virtual machine (TinyVm)_ with a set of _registers_ and custom _instructions_. It then goes on and implements a _jit compiler_ for the TinyVm. In comparison to the jit compiler in the previous bf example, the TinyVm compiler jit compiles the program per basic-block (bb), each time a new bb is encountered. \ After executing a jit compiled basic block, the code returns into the TinyVm. The vm then checks if the branch target has an entry in the _jit cache_, which maps guest addresses to compiled basic-blocks. If the cache returns a hit the TinyVm jumps to the generated code, if the cache returns a miss the new bb is compiled and inserted into the jit cache. Similar to the bf example, this example also provides an _interpreter_ as reference. ## Appendix: x86 instruction encoding x86 instructions are encoded with _variable length_ and follow the format below. A single instruction can be as small as 1 byte up to a double-digit number of bytes. > Figure taken from [Intel® 64 and IA-32 Architectures Software Developer’s > Manual - Volume 2 Instruction Set Reference][intel-im]. _Prefixes_ are typically modifiers for the instruction, where some examples are `LOCK` for bus locking, `REP` for instruction repetition and `REX` for specifying extended operand sizes and to access extended registers (like r8-r15). The `ModR/M` and `SIB` bytes are used to encode the register and memory operands of the instruction in the general case. Whether the `SIB` byte is used depends on the _addressing mode_. In case the instruction also requires an _address displacement_ or an _immediate value_, the bytes of the value directly follow the instruction bytes. Historically, the x86 ISA only had 8 general-purpose registers (ax, cx, dx, bx, sp, bp, si, di). This is still visible today in various fields which encode register operands, where one such example is the **ModRM.Reg** field. Later on, 8 additional general-purpose registers (r8-r15) were added and one additional bit was needed to encode all 16 registers, while keeping backwards compatibility. \ As mentioned above, the **REX** prefix is used to access the _extended_ registers. This prefix contains the **R, X, B** bits, which are combined with the various register operand fields, as shown by the figures below for the case of the ModRM byte and the ModRM + SIB bytes.
> Figures taken from [Intel® 64 and IA-32 Architectures Software Developer’s > Manual - Volume 2 Instruction Set Reference][intel-im]. The following code listing provides an example for two load instructions with different _addressing modes_ and gives a detailed breakdown of the instruction bytes for each instruction. ```rust let mut asm = Asm::new(); asm.mov(r8, Mem64::indirect_disp(r9, 0xaabb)); asm.mov(r8, Mem64::indirect_base_index(r9, r10)); asm.disasm(); // Outputs: // 4D8B81BBAA0000 mov r8,[r9+0xaabb] // 4F8B0411 mov r8,[r9+r10] // Breakdown of the indirect + displacement load. // // 4D 8B 81 BBAA0000 mov r8,[r9+0xaabb] // ^ ^ ^ `Imm32 // | | `ModRM: mod=10,reg=000,rm=001 // | `Opc: mov // `REX: W=1,R=1,X=0,B=1 // // REX.R + ModRM.reg = 1000 (r8) - dest reg // REX.B + ModRm.rm = 1001 (r9) - base reg // Breakdown of the base + index load. // // 4F 8B 04 11 mov r8,[r9+r10] // ^ ^ ^ `SIB: scale=00,index=010,base=001 // | | `ModRM: mod=00,reg=000,rm=100 // | `Opc: mov // `REX: W=1,R=1,X=1,B=1 // // ModRM.rm = 100 -> use SIB byte // // REX.R + ModRM.reg = 1000 ( r8) - dest reg // REX.X + SIB.index = 1010 (r10) - index reg // REX.B + SIB.base = 1001 ( r9) - base reg ``` ## References - [juicebox-asm][jb] - [osdev: x86 instruction encoding](https://wiki.osdev.org/X86-64_Instruction_Encoding) - [intel instruction set reference][intel-im] ## Footnotes **1)** In cpp functions with the _same_ name but _different_ arguments can be defined. During compilation [overload resolution][cpp-or] takes place and picks the "right" function to call on each call site. [jb]: https://github.com/johannst/juicebox-asm [jb-asm]: https://johannst.github.io/juicebox-asm/juicebox_asm/struct.Asm.html [jb-bf]: https://github.com/johannst/juicebox-asm/blob/main/examples/bf.rs [jb-tiny]: https://github.com/johannst/juicebox-asm/blob/main/examples/tiny_vm.rs [mmap]: https://www.man7.org/linux/man-pages/man2/mmap.2.html [intel-im]: https://cdrdv2.intel.com/v1/dl/getContent/671110 [cpp-or]: https://en.cppreference.com/w/cpp/language/overload_resolution [sysv]: https://gitlab.com/x86-psABIs/x86-64-ABI/-/jobs/artifacts/master/raw/x86-64-ABI/abi.pdf?job=build [bf]: https://brainfuck.org/brainfuck.html