1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
|
+++
title = "xpost: Cooperative-multitasking studies (matcha-threads)"
[taxonomies]
tags = ["threading", "linux", "x86", "arm", "riscv"]
+++
This is a cross post to a **cooperative-multitasking implementation** that I
did in the past and hosted on github [>> matcha-threads <<][matcha].
Cooperative-multitasking allows to perform the thread scheduling in user space.
Executing threads need to give the control back to the `scheduler` such that
other threads can run. Since control is returned explicitly to the scheduler,
threads need to **"cooperate"**.
The following code snippet shows an example of two such threads:
```cpp
#include "lib/executor.h"
#include "lib/thread.h"
#include <cstdio>
void like_tea(nMatcha::Yielder& y) {
std::puts("like");
y.yield();
std::puts("tea");
}
int main() {
nMatcha::Executor e;
e.spawn(nMatcha::FnThread::make(like_tea));
e.spawn(nMatcha::FnThread::make([](nMatcha::Yielder& y) {
std::puts("I");
y.yield();
std::puts("green");
}));
e.run();
return 0;
}
```
Which gives the following output when being run:
```txt
I
like
green
tea
```
The main focus of that project was to understand the fundamental mechanism
underlying cooperative-multitasking and implement such a `yield()` function as
shown in the example above.
Looking at the final implementation, the yield function does the following:
```txt
yield:
1. function prologue
2. push callee-saved regs to current stack
3. swap stack pointers (current - new)
4. pop callee-saved regs from new stack
5. function epilogue & return
```
Implementations for different ISAs are available here:
- [x86_64][yield-x86]
- [arm64][yield-arm64]
- [armv7a][yield-arm]
- [riscv64][yield-rv64]
## Appendix: os-level vs user-level threading
The figure below depicts *os-level* threading (left) vs *user-level* threading
(right).
The main difference is that in the case of user-level threading, the operating
system (os) does not now anything about the user threads. In the concrete
example, only a **single** user thread can run at any given time, whereas in
the case of os-level threading, all threads can run truly parallel.
When a user-level thread is scheduled, the *stack-pointer* (sp) of the os
thread is adjusted to the user threads' stack. For the example below, if the
user thread **A** is scheduled (yielded to), the stack-pointer for the os
thread **S** is switched to the **stack A**. Once the user thread yields back
to the scheduler, the stack-pointer is switched back to **stack S**.
<img src="os-vs-user-threads.svg">
[matcha]: https://github.com/johannst/matcha-threads
[yield-x86]: https://github.com/johannst/matcha-threads/blob/master/lib/arch/x86_64/yield.s
[yield-arm]: https://github.com/johannst/matcha-threads/blob/master/lib/arch/arm/yield.s
[yield-arm64]: https://github.com/johannst/matcha-threads/blob/master/lib/arch/arm64/yield.s
[yield-rv64]: https://github.com/johannst/matcha-threads/blob/master/lib/arch/riscv64/yield.s
|