content/2023-01-14-xpost-matcha-threads/index.md


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91

+++
title = "xpost: Cooperative-multitasking studies (matcha-threads)"

[taxonomies]
tags = ["threading", "linux", "x86", "arm", "riscv"]
+++

This is a cross post to a **cooperative-multitasking implementation** that I
did in the past and hosted on github [>> matcha-threads <<][matcha].

Cooperative-multitasking allows to perform the thread scheduling in user space.
Executing threads need to give the control back to the `scheduler` such that
other threads can run. Since control is returned explicitly to the scheduler,
threads need to **"cooperate"**.

The following code snippet shows an example of two such threads:
```cpp
#include "lib/executor.h"
#include "lib/thread.h"
#include <cstdio>

void like_tea(nMatcha::Yielder& y) {
    std::puts("like");
    y.yield();
    std::puts("tea");
}

int main() {
    nMatcha::Executor e;
    e.spawn(nMatcha::FnThread::make(like_tea));
    e.spawn(nMatcha::FnThread::make([](nMatcha::Yielder& y) {
        std::puts("I");
        y.yield();
        std::puts("green");
    }));
    e.run();
    return 0;
}
```

Which gives the following output when being run:
```txt
I
like
green
tea
```

The main focus of that project was to understand the fundamental mechanism
underlying cooperative-multitasking and implement such a `yield()` function as
shown in the example above.

Looking at the final implementation, the yield function does the following:
```txt
yield:
  1. function prologue
  2. push callee-saved regs to current stack
  3. swap stack pointers (current - new)
  4. pop callee-saved regs from new stack
  5. function epilogue & return
```

Implementations for different ISAs are available here:
- [x86_64][yield-x86]
- [arm64][yield-arm64]
- [armv7a][yield-arm]
- [riscv64][yield-rv64]

## Appendix: os-level vs user-level threading

The figure below depicts *os-level* threading (left) vs *user-level* threading
(right).

The main difference is that in the case of user-level threading, the operating
system (os) does not now anything about the user threads. In the concrete
example, only a **single** user thread can run at any given time, whereas in
the case of os-level threading, all threads can run truly parallel.

When a user-level thread is scheduled, the *stack-pointer* (sp) of the os
thread is adjusted to the user threads' stack. For the example below, if the
user thread **A** is scheduled (yielded to), the stack-pointer for the os
thread **S** is switched to the **stack A**. Once the user thread yields back
to the scheduler, the stack-pointer is switched back to **stack S**.

<img src="os-vs-user-threads.svg">

[matcha]: https://github.com/johannst/matcha-threads
[yield-x86]: https://github.com/johannst/matcha-threads/blob/master/lib/arch/x86_64/yield.s
[yield-arm]: https://github.com/johannst/matcha-threads/blob/master/lib/arch/arm/yield.s
[yield-arm64]: https://github.com/johannst/matcha-threads/blob/master/lib/arch/arm64/yield.s
[yield-rv64]: https://github.com/johannst/matcha-threads/blob/master/lib/arch/riscv64/yield.s