1 files changed, 335 insertions, 0 deletions
diff --git a/content/2024-12-20-xpost-juicebox-asm/index.md b/content/2024-12-20-xpost-juicebox-asm/index.md
new file mode 100644
index 0000000..d8796bd
--- /dev/null
+++ b/content/2024-12-20-xpost-juicebox-asm/index.md
@@ -0,0 +1,335 @@
++++
+title = "xpost: x64 jit assembler (juicebox-asm)"
+
+[taxonomies]
+tags = ["x86", "rust", "linux"]
++++
+
+This is a cross post to a repository implementing an `x64` _jit assembler_,
+which I used to study the x86 instruction encoding and how to write such a tool
+in rust. The code is available on github [>> juicebox-asm <<][jb].
+
+The _assembler_ only implements a small set of the x86 instruction set, but
+enough to create some interesting examples. Additionally, the repository
+provides a simple _runtime_ based on [mmap(2)][mmap] (only on linux), which is
+used to execute the dynamically assembled code and provided examples.
+
+## Assembler
+The _assembler_ implements all the supported instructions and provides a similar
+interface as when writing assembly code by hand.
+
+```rust
+let mut asm = Asm::new();
+asm.xor(rax, rax);
+asm.add(rax, rdi);
+asm.add(rax, Mem64::indirect_base_index(rdi, rsi));
+asm.ret();
+
+asm.disasm();
+// Outputs:
+//   00000000  4831C0    xor rax,rax
+//   00000003  4801F8    add rax,rdi
+//   00000006  48030437  add rax,[rdi+rsi]
+//   00000006  C3        ret
+```
+
+An interesting design-choice is given by the implementation of the assembler API
+for the various instructions. Since x86 is a _CISC_ instruction set, it allows
+same the mnemonic to be used with different types and permutations of
+operands. This can be seen in the code listing above where the _add_ instruction
+is used with both `register-register` and `register-memory` operands. It becomes
+even clearer when looking into the instruction reference manual, in the example
+of the _add_ instruction.
+
+<img src="x86-add.png">
+
+> Table taken from [Intel® 64 and IA-32 Architectures Software Developer’s
+> Manual - Volume 2 Instruction Set Reference][intel-im].
+
+Since rust does not have function overloading
+<sup id="sup-1-src"><a href="#sup-1-dst">1</a></sup>
+as in cpp, one does need to define functions with different function names to
+implement the _add_ instruction for different combinations of operands.
+```rust
+// Pseudo code.
+fn add_r64_r64(Reg64, Reg64)
+fn add_r64_m64(Reg64, Mem64)
+// ..
+```
+
+However, in this experiment, I did not go with that approach. Instead I used a
+different solution and defined generic traits for the various instructions which
+have different operand combinations. The _Add_ trait below gives an example of
+that.
+
+```rust
+pub trait Add<T, U> {
+    fn add(&mut self, op1: T, op2: U);
+}
+```
+
+This in turn allows to implement the traits for all the different combinations
+of operands that should be supported by each instruction. It gives fine control
+of what to support and provides an ergonomic API as seen in the first example
+code listing.
+
+```rust
+impl Add<Reg64, Reg64> for Asm {
+    fn add(&mut self, op1: Reg64, op2: Reg64) {
+        self.encode_rr(&[0x01], op1, op2);
+    }
+}
+
+impl Add<Reg64, Mem64> for Asm {
+    fn add(&mut self, op1: Reg64, op2: Mem64) {
+        self.encode_rm(0x03, op1, op2);
+    }
+}
+```
+
+All instructions currently implemented with their variations can be found in the
+documentation of the [Asm][jb-asm] _assembler_.
+
+## Runtime
+With the help of the provided runtime, code assembled with the _assembler_ can
+be put into executable memory and turned into a function pointer, allowing to
+call into the generated code.
+
+```rust
+extern "C" fn add(a: u32, b: u32) -> u32 {
+    a + b
+}
+
+fn main() {
+    // SystemV abi:
+    //   rdi -> first argument
+    //   rsi -> second argument
+    //   rax -> return value
+
+    let mut asm = Asm::new();
+    asm.mov(rsi, Imm64::from(42));
+    asm.mov(rax, Imm64::from(add as usize));
+    asm.call(rax);
+    asm.ret();
+
+    let mut rt = Runtime::new();
+    let add42 = unsafe { rt.add_code::<extern "C" fn(u32) -> u32>(asm.into_code()) };
+
+    let res = add42(5);
+    assert_eq!(res, 47);
+}
+```
+> This example must be run on a system with the [SystemV][sysv] ABI.
+
+## Label
+A label is used to associate a location in the generated code with a symbolic
+name (the label), which can then be used as branch target for control-flow
+instructions as for example `jz`, `jnz` and so on.
+
+The act of associating a label with a location is called _binding_. Each label
+can only be bound **once**, else the branch target would become _ambiguous_.
+
+Binding and jumping to a label can come in two orders.
+1. A label is first bound and then jumped to (backwards-reference).
+1. A label is first jumped to and then bound (forwards-reference).
+
+In the following both cases will be discussed in a bit more detail, which
+hopefully helps when browsing the implementation. One thing to keep in mind,
+since the generated code must be relocatable, all jump instructions will be
+encoded relative to the current program counter (pc).
+
+The first case where the label is first bound and later used, is somewhat more
+natural to think about, as first defining then using something is pretty
+clear. The code listing below shows an example of a loop with a back-reference
+summing up all integer between 0 and 42.
+
+```rust,linenos
+let mut asm = Asm::new();
+let mut lp  = Label::new();
+
+asm.mov(eax, Imm32::from(0));   // int ret = 0;
+asm.mov(ecx, Imm32::from(42));  // int n   = 42;
+
+asm.bind(&mut lp);              // lp:
+asm.add(eax, ecx);              //   ret += n;
+asm.dec(ecx);                   //   --n;
+asm.test(ecx, ecx);             //   if (n != 0)
+asm.jnz(&mut lp);               //     goto lp
+
+asm.ret();
+```
+
+From the point of view of the code buffer this looks as follows. When the label
+is bound (line 7), the current offset into the code buffer is recorded as the
+labels _location_. Then when emitting the jump instruction (line 11) later, this
+location is used with the current code buffers offset to compute the
+_displacement_ (disp32) forming the relative jump.
+
+<div style="margin:auto; width:85%;">
+<img src="label-lj.svg">
+</div>
+
+The second case where the label is first used and then bound later, is not
+really more complex, but one has to reverse the steps. The code listing below
+gives an example, demonstrating forwards jumps by validating two input
+arguments to be not equal to 0 and returning of one check fails.
+
+```rust,linenos
+let mut asm = Asm::new();
+let mut ret = Label::new();
+
+// Validate inputs (sysv abi).
+asm.test(rdi, rdi);     // if (rdi == 0)
+asm.jz(&mut ret);       //   return;
+asm.test(rsi, rsi);     // if (rsi == 0)
+asm.jz(&mut ret);       //   return;
+
+// Here, rdi != 0 && rsi != 0.
+// nop as placeholder for some real work.
+asm.nop();
+
+asm.bind(&mut ret);
+asm.ret();
+```
+
+Again from the point of view of the code buffer this looks as follows. When the
+jump instructions are emitted (line 6, 8), the _displacement_ can not be
+computed, as the label has no defined location yet. A placeholder displacement
+of value 0 is emitted into the code buffer and the offset to the displacement
+value is recorded by the label as pending _relocation_. Once the label is bound
+to a location, all pending relocations are resolved by consuming the recorded
+offsets, computing the corresponding displacements and patching the values in
+the code buffer.
+
+<div style="margin:auto; width:85%;">
+<img src="label-jl.svg">
+</div>
+
+## Example: bf
+
+The [bf.rs][jb-bf] example implements a _jit compiler_ for the [brainfuck][bf]
+language. In this example the _bf program_ is jit compiled to one code blob as a
+whole and evaluated then after. The example only returns from the jit code if
+the bf program terminates or a runtime error is detected. For reference an
+_interpreter_ is also provided.
+
+## Example: tiny-vm
+
+The [tiny_vm.rs][jb-tiny] example defines a _tiny virtual machine (TinyVm)_ with
+a set of _registers_ and custom _instructions_. It then goes on and implements a
+_jit compiler_ for the TinyVm. In comparison to the jit compiler in the previous
+bf example, the TinyVm compiler jit compiles the program per basic-block (bb),
+each time a new bb is encountered. \
+After executing a jit compiled basic block, the code returns into the
+TinyVm. The vm then checks if the branch target has an entry in the _jit cache_,
+which maps guest addresses to compiled basic-blocks. If the cache returns a hit
+the TinyVm jumps to the generated code, if the cache returns a miss the new bb is
+compiled and inserted into the jit cache.
+
+Similar to the bf example, this example also provides an _interpreter_ as
+reference.
+
+## Appendix: x86 instruction encoding
+x86 instructions are encoded with _variable length_ and follow the format below.
+A single instruction can be as small as 1 byte up to a double-digit number of
+bytes.
+
+<img src="x86-insn-fmt.png">
+
+> Figure taken from [Intel® 64 and IA-32 Architectures Software Developer’s
+> Manual - Volume 2 Instruction Set Reference][intel-im].
+
+_Prefixes_ are typically modifiers for the instruction, where some examples are
+`LOCK` for bus locking, `REP` for instruction repetition and `REX` for
+specifying extended operand sizes and to access extended registers (like
+r8-r15).
+
+The `ModR/M` and `SIB` bytes are used to encode the register and memory operands
+of the instruction in the general case. Whether the `SIB` byte is used depends
+on the _addressing mode_.
+
+In case the instruction also requires an _address displacement_ or an _immediate
+value_, the bytes of the value directly follow the instruction bytes.
+
+Historically, the x86 ISA only had 8 general-purpose registers (ax, cx, dx, bx,
+sp, bp, si, di). This is still visible today in various fields which encode
+register operands, where one such example is the **ModRM.Reg** field. Later
+on, 8 additional general-purpose registers (r8-r15) were added and one
+additional bit was needed to encode all 16 registers, while keeping backwards
+compatibility. \
+As mentioned above, the **REX** prefix is used to access the _extended_
+registers. This prefix contains the **R, X, B** bits, which are combined with
+the various register operand fields, as shown by the figures below for the case
+of the ModRM byte and the ModRM + SIB bytes.
+
+<div style="margin:auto; width:85%;">
+<img src="x86-insn-modrm.png">
+<img src="x86-insn-modrm-sib.png">
+</div>
+
+> Figures taken from [Intel® 64 and IA-32 Architectures Software Developer’s
+> Manual - Volume 2 Instruction Set Reference][intel-im].
+
+The following code listing provides an example for two load instructions with
+different _addressing modes_ and gives a detailed breakdown of the instruction
+bytes for each instruction.
+
+```rust
+let mut asm = Asm::new();
+asm.mov(r8, Mem64::indirect_disp(r9, 0xaabb));
+asm.mov(r8, Mem64::indirect_base_index(r9, r10));
+
+asm.disasm();
+// Outputs:
+//   4D8B81BBAA0000    mov r8,[r9+0xaabb]
+//   4F8B0411          mov r8,[r9+r10]
+
+
+// Breakdown of the indirect + displacement load.
+//
+//   4D 8B 81 BBAA0000    mov r8,[r9+0xaabb]
+//   ^  ^  ^  `Imm32
+//   |  |  `ModRM: mod=10,reg=000,rm=001
+//   |  `Opc: mov
+//   `REX: W=1,R=1,X=0,B=1
+//
+//   REX.R + ModRM.reg = 1000 (r8) - dest reg
+//   REX.B + ModRm.rm  = 1001 (r9) - base reg
+
+
+// Breakdown of the base + index load.
+//
+//   4F 8B 04 11          mov r8,[r9+r10]
+//   ^  ^  ^  `SIB: scale=00,index=010,base=001
+//   |  |  `ModRM: mod=00,reg=000,rm=100
+//   |  `Opc: mov
+//   `REX: W=1,R=1,X=1,B=1
+//
+//  ModRM.rm = 100 -> use SIB byte
+//
+//   REX.R + ModRM.reg = 1000 ( r8) - dest reg
+//   REX.X + SIB.index = 1010 (r10) - index reg
+//   REX.B + SIB.base  = 1001 ( r9) - base reg
+```
+
+## References
+- [juicebox-asm][jb]
+- [osdev: x86 instruction encoding](https://wiki.osdev.org/X86-64_Instruction_Encoding)
+- [intel instruction set reference][intel-im]
+
+## Footnotes
+<span id="sup-1-dst">**1)**</span>
+In cpp functions with the _same_ name but _different_ arguments can be
+defined. During compilation [overload resolution][cpp-or] takes place and picks
+the "right" function to call on each call site.
+<a href="#sup-1-src">&#x21A9;</a>
+
+[jb]: https://github.com/johannst/juicebox-asm
+[jb-asm]: https://johannst.github.io/juicebox-asm/juicebox_asm/struct.Asm.html
+[jb-bf]: https://github.com/johannst/juicebox-asm/blob/main/examples/bf.rs
+[jb-tiny]: https://github.com/johannst/juicebox-asm/blob/main/examples/tiny_vm.rs
+[mmap]: https://www.man7.org/linux/man-pages/man2/mmap.2.html
+[intel-im]: https://cdrdv2.intel.com/v1/dl/getContent/671110
+[cpp-or]: https://en.cppreference.com/w/cpp/language/overload_resolution
+[sysv]: https://gitlab.com/x86-psABIs/x86-64-ABI/-/jobs/artifacts/master/raw/x86-64-ABI/abi.pdf?job=build
+[bf]: https://brainfuck.org/brainfuck.html