# 2021q1 Homework6 (quiz6)
contributed by < [`WayneLin1992`](https://github.com/WayneLin1992) >
###### tags: `linux2021`
## 測驗`1`
### 延伸問題
- [ ] 解釋上述程式碼運作原理,指出設計和實作的缺失並改進
- [ ] 研讀 [sysprog21/bignum](https://github.com/sysprog21/bignum) 提供一套更高效率的 [arbitrary-precision arithmetic](https://en.wikipedia.org/wiki/Arbitrary-precision_arithmetic) 實作
- [ ] 並改寫上述程式碼,以允許階乘運算的十進位輸出
- [ ] [Karatsuba algorithm](https://en.wikipedia.org/wiki/Karatsuba_algorithm) 和 [Schönhage–Strassen algorithm](https://en.wikipedia.org/wiki/Sch%C3%B6nhage%E2%80%93Strassen_algorithm) 這類的快速乘法演算法相繼提出,請引入到上述程式碼
- [ ] 在 Linux 核心原始程式碼中,找出大數運算的案例
### 解釋運作原理
大數運算,這邊將會以16進位表示,結果為 100! 的結果
#### macro and sturct
```cpp
/* how large the underlying array size should be */
#define UNIT_SIZE 4
/* These are dedicated to UNIT_SIZE == 4 */
#define UTYPE uint32_t
#define UTYPE_TMP uint64_t
#define FORMAT_STRING "%.08x"
#define MAX_VAL ((UTYPE_TMP) 0xFFFFFFFF)
#define BN_ARRAY_SIZE (128 / UNIT_SIZE) /* size of big-numbers in bytes */
struct bn { UTYPE array[BN_ARRAY_SIZE]; };
```
```graphviz
digraph fbstring {
node [shape=record];
foo [label="<f6>0|0|0|0|0|0|0|0|0|0|...|0|<f5>0|<f4>0|<f3>0|<f1>0|<f0>0"];
bn -> foo:f6
fo1 [label="32 bits"]
foo:f6 -> fo1
}
```
UTYPE 為 uint_32 ,其中 BN_ARRAY_SIZE 為 128/4=32
#### `bn_init`
```cpp
static inline void bn_init(struct bn *n) {
for (int i = 0; i < BN_ARRAY_SIZE; ++i)
n->array[i] = 0;
}
```
初始化 bn array。
#### `bn_from_int`
```cpp
static inline void bn_from_int(struct bn *n, UTYPE_TMP i) {
bn_init(n);
/* FIXME: what if machine is not little-endian? */
n->array[0] = i;
/* bit-shift with U64 operands to force 64-bit results */
UTYPE_TMP tmp = i >> 32;
n->array[1] = tmp;
}
```
UTYPE_TMP 為 64 bits ,本來的 i 為 32 bit ,當 i 超過 uint_32 能表達的時就會存入 `n->array[1]` 表達,避免 overflow 的狀況,假如:他是 8 進位,就只能解讀其中的24位因為其他會 overflow 的狀況,所以必須存入下一格在進行解讀。
:::info
big-endian 的修正
```cpp
static inline void bn_from_int(struct bn *n, UTYPE_TMP i) {
bn_init(n);
if(__BYTE_ORDER__ == __ORDER_LITTLE_ENDIAN__){
/* FIXME: what if machine is not little-endian? */
n->array[0] = i;
/* bit-shift with U64 operands to force 64-bit results */
UTYPE_TMP tmp = i >> 32;
n->array[1] = tmp;
}else{
n->array[1] = i;
UTYPE_TMP tmp = i >> 32;
n->array[0] = tmp;
}
}
```
:::
#### `bn_to_str`
```cpp
static void bn_to_str(struct bn *n, char *str, int nbytes) {
/* index into array - reading "MSB" first -> big-endian */
int j = BN_ARRAY_SIZE - 1;
int i = 0; /* index into string representation */
/* reading last array-element "MSB" first -> big endian */
while ((j >= 0) && (nbytes > (i + 1))) {
sprintf(&str[i], FORMAT_STRING, n->array[j]);
i += (2 *
UNIT_SIZE); /* step UNIT_SIZE hex-byte(s) forward in the string */
j -= 1; /* step one element back in the array */
}
/* Count leading zeros: */
for (j = 0; str[j] == '0'; j++)
;
/* Move string j places ahead, effectively skipping leading zeros */
for (i = 0; i < (nbytes - j); ++i)
str[i] = str[i + j];
str[i] = 0;
}
```
big-endian:最大的數會存放在最低的記憶體位置,如 TCP/IP
little-endian:最小的數字存放在最低的記憶體位置,MSB 將會存進array 較大的位置,使用 little-endian 如 x86 risc-v。
:::info
可以藉由 `__BYTE_ORDER` 來判斷硬體為 big-endian 還是 little-endian。
`#define ORDER (__BYTE_ORDER__)&&(__BYTE_ORDER__ == __ORDER_LITTLE_ENDIAN__)`
參考資料:
[GCC macro](https://gcc.gnu.org/onlinedocs/cpp/Common-Predefined-Macros.html)
:::
```cpp
while ((j >= 0) && (nbytes > (i + 1))) {
sprintf(&str[i], FORMAT_STRING, n->array[j]);
i += (2 *
UNIT_SIZE); /* step UNIT_SIZE hex-byte(s) forward in the string */
j -= 1; /* step one element back in the array */
}
```
sprintf 將 n->array[j] 的內容轉至 char 的形式並存入 &str[i] 當中, j 會從 31開始,i 會以 8 為單位
`for (j = 0; str[j] == '0'; j++)` 計算 0 的數量
```cpp
for (i = 0; i < (nbytes - j); ++i)
str[i] = str[i + j];
str[i] = 0;
```
將其 shift 把0消除,並在最後放入 null terminator 。
#### `bn_dec`
```cpp
/* Decrement: subtract 1 from n */
static void bn_dec(struct bn *n) {
for (int i = 0; i < BN_ARRAY_SIZE; ++i) {
UTYPE tmp = n->array[i];
UTYPE res = tmp - 1;
n->array[i] = res;
if (!(res > tmp)) break; //COND
}
}
```
因為 unint 所以當負值時,就會變大數
#### `bn_add`
```cpp
static void bn_add(struct bn *a, struct bn *b, struct bn *c) {
int carry = 0;
for (int i = 0; i < BN_ARRAY_SIZE; ++i) {
UTYPE_TMP tmp = (UTYPE_TMP) a->array[i] + b->array[i] + carry;
carry = (tmp > MAX_VAL);
c->array[i] = (tmp & MAX_VAL);
}
}
```
`MAX_VAL = 0xFFFFFFFF` 以2進位表示為32個1 & 後都會使 tmp 寫入 c 當中。
#### `lshift_unit`
```cpp
static inline void lshift_unit(struct bn *a, int n_units) {
int i;
/* Shift whole units */
for (i = (BN_ARRAY_SIZE - 1); i >= n_units; --i)
a->array[i] = a->array[i - n_units];
/* Zero pad shifted units */
for (; i >= 0; --i)
a->array[i] = 0;
}
```
將64位元存成32位元,並使用 0 padding
#### `bn_mul`
```cpp
static void bn_mul(struct bn *a, struct bn *b, struct bn *c) {
struct bn row, tmp;
bn_init(c);
for (int i = 0; i < BN_ARRAY_SIZE; ++i) {
bn_init(&row);
for (int j = 0; j < BN_ARRAY_SIZE; ++j) {
if (i + j < BN_ARRAY_SIZE) {
bn_init(&tmp);
UTYPE_TMP intermediate = a->array[i] * (UTYPE_TMP) b->array[j]; //III
bn_from_int(&tmp, intermediate);
lshift_unit(&tmp, i + j);
bn_add(&tmp, &row, &row);
}
}
bn_add(c, &row, c);
}
}
```
`UTYPE_TMP intermediate = a->array[i] * (UTYPE_TMP) b->array[j];` intermediate 32位元 轉乘32位元最大會變成 64位元,但最後又要以32位元存取,最後會寫入 c 當中。
#### `factorial`
```cpp
static void factorial(struct bn *n, struct bn *res) {
struct bn tmp;
bn_assign(&tmp, n);
bn_dec(n);
while (!bn_is_zero(n)) {
bn_mul(&tmp, n, res); /* res = tmp * n */
bn_dec(n); /* n -= 1 */
bn_assign(&tmp, res); /* tmp = res */
}
bn_assign(res, &tmp);
}
```
一開始`tmp->array[0] = 100` `n->array[0] = 99` 透過 `bn_mul` 相乘,之後將 `n -= 1` ,再將 `bn_assign` 將 `tmp = res` 執行下一個迴圈。
### 需要改進的地方
sprintf 中 FORMAT_STRING 是存成16進位字串的關鍵,由定義知道 `#define FORMAT_STRING "%.08x"`
:::info
```cpp
#include <stdio.h>
#define FORMAT_STRING "%012llx"
int main() {
int data = 29;
printf("%x\n", data); // just print data
printf("%0x\n", data); // just print data ('0' on its own has no effect)
printf("%8x\n", data); // print in 8 width and pad with blank spaces
printf("%08x\n", data); // print in 8 width and pad with 0's
printf(FORMAT_STRING, data)
return 0;
}
```
>1d
1d
xxxxxx1d
0000001d
00000000001d
資料來源
[c the x format specifier](https://stackoverflow.com/questions/15108932/c-the-x-format-specifier)
:::
所以如果要改變為 10進位就要增加定義 `#define FORMAT_DEC "%.08d"` 要注意 int 為有號所以要使用 `#define FORMAT_DEC "%.08u"`
>696025217442369334856850671927523212172697863944854924868418374238231163332824851760301512674002087665783554000000000000000000000000
python
>93326215443944152681699238856266700490715968264381621468592963895217599993229915608941463976156518286253697920827223758251185210916864000000000000000000000000
`#define FORMAT_OCT "%.08o"` 來表示為8進位
>1546022623541334110035065245624403213146737153531033222416431337040341052560622410143263556273564260147153430000000000000000000000000
python
>6630226235416256702200645224325227366440321335131567632341654415533122416431034676100710564252703145224101433545473345664567213005650471534303200000000000000000000000000000000
發現到結果都不正確,猜測是在 overflow 方面出了錯誤導致無法顯示出正確的數值或是讀取的方式部正確導致
gdb 觀察記憶體所存取的內容

可以看出來 result 的結果與 `%u` 輸出的結果相同,檢視代入 FORMAT_STRING 的過程。
由 6960 以十進位輸入,在 hex 表示為 1b30 ,因此不只讀取方式要改變,連存取的資料也要對應的改變才行,由 `bn_mul` 才是真正存取 result 的結果,關於 overflow 就需要修改 0xFFFFFFFF 為16位元的上限,要修改成10位元上限,如: 99999999 等...... 但因為十進位不是2的冪,所以先以8進位進行嘗試。
設上限設為 8進位 `MAX_VAL` = 8進位 77777777 = 16進位 0xFFFFFF 對應到16位元就是他只能對應到6個 F ,所以為 >>24 bits 避免產生 overflow 產生 overflow 使用下一格來解讀,但還是有問題,所以簡化問題,先處理較低階成的問題,可以觀察到,在沒 overflow 時都正確,觀察期中 12! = 3443176000 但結果為 3243176000 可以發現到 3176000 是正確的,代表 overflow 部分,其中 2 和 4 之間的差距,來至 bit-shift 因為在2進位中只差一位而已,所以改成 >> 25 就得到解,但是在 13! = 56312146000 但程式的解 37312146000,經過課堂上的講解後,0xFFFFFFFF = 4,294,967,295 已 7 為表示,0x7777777777 = 1,073,741,823 共十個7,
## 測驗`2`
### 延伸問題
- [ ] 解釋程式碼運作原理
- [ ] 研讀 Linux 核心原始程式碼 [include/linux/hashtable.h](https://github.com/torvalds/linux/blob/master/include/linux/hashtable.h) 及對應的文件 [How does the kernel implements Hashtables?](https://kernelnewbies.org/FAQ/Hashtables)
- [ ] 解釋 hash table 的設計和實作手法,並留意到 [tools/include/linux/hash.h](https://github.com/torvalds/linux/blob/master/tools/include/linux/hash.h) 的 `GOLDEN_RATIO_PRIME`,探討其實作考量
### 解釋運作原理
##### `hlist_xx`
```cpp
struct hlist_node { struct hlist_node *next, **pprev; };
struct hlist_head { struct hlist_node *first; };
typedef struct { int bits; struct hlist_head *ht; } map_t;
```
與 `list_head` 結構相當類似,但差別在 `**pprev` 的部分,之後配合 `map_add` 一起觀察。
##### `map_init`
```cpp
#define MAP_HASH_SIZE(bits) (1 << bits)
map_t *map_init(int bits) {
map_t *map = malloc(sizeof(map_t));
if (!map)
return NULL;
map->bits = bits;
map->ht = malloc(sizeof(struct hlist_head) * MAP_HASH_SIZE(map->bits));
if (map->ht) {
for (int i = 0; i < MAP_HASH_SIZE(map->bits); i++)
(map->ht)[i].first = NULL;
} else {
free(map);
map = NULL;
}
return map;
}
```
初始化 map ,而 map 大小由 `MAP_HASH_SIZE` 決定
#### `hash_key`
```cpp
struct hash_key {
int key;
void *data;
struct hlist_node node;
};
```
```graphviz
digraph hash_list {
node [shape=record];
rankdir=LR;
subgraph cluster_queue {
label = "hash_key"
hash_key [label = "{{key|data|{pprev|next}}}"]
}
}
```
##### `container_of`
```cpp
#define container_of(ptr, type, member) \
({ \
void *__mptr = (void *) (ptr); \
((type *) (__mptr - offsetof(type, member))); \
})
```
container_of 可以知道 struct 起始位置
:::info
[quiz2](https://hackmd.io/@sysprog/linux2021-quiz2)
:::
#### `hash`
```cpp
#define GOLDEN_RATIO_32 0x61C88647
static inline unsigned int hash(unsigned int val, unsigned int bits) {
/* High bits are more random, so use them. */
return (val * GOLDEN_RATIO_32) >> (32 - bits);
}
```
hash table 需要注意到此 hash table 對於 collision 處理及 hash function 是否足夠均勻,可以看出這 hash table 處裡 collision 的方式為使用 linked list 使一個 hash 值能存取超過一個資料,再來看一下 hash function。
由 `twosum` 了解到 `map_init(10)` 所以 bits 設為 10
0x61C88647 = 1,640,531,527
#### `find_key`
```cpp
static struct hash_key *find_key(map_t *map, int key) {
struct hlist_head *head = &(map->ht)[hash(key, map->bits)];
for (struct hlist_node *p = head->first; p; p = p->next) {
struct hash_key *kn = container_of(p, struct hash_key, node);
if (kn->key == key)
return kn;
}
return NULL;
}
```
先算出他可能所在的 index 之後將使用 `container_of` 得到 hash_key 的位置並比較 key 的值,假如不一樣就尋找下一個,依序搜尋,沒找到 `return NULL`
#### `map_get`
```cpp
void *map_get(map_t *map, int key)
{
struct hash_key *kn = find_key(map, key);
return kn ? kn->data : NULL;
}
```
從 map 中尋找出 key ,由 `find_key` 來包裝,並 `return data`
#### `map_add`
```cpp
void map_add(map_t *map, int key, void *data)
{
struct hash_key *kn = find_key(map, key);
if (kn)
return;
kn = malloc(sizeof(struct hash_key));
kn->key = key, kn->data = data;
struct hlist_head *h = &map->ht[hash(key, map->bits)];
struct hlist_node *n = &kn->node, *first = h->first;
n->next = first;//NNN;
if (first)
first->pprev = &n->next;
h->first = n;
n->pprev = &h->first;//PPP
}
```
```graphviz
digraph hash_list {
node [shape=record];
rankdir=LR;
map [label = "map|<f0>0|<f1>1|..."]
hash_key1 [label = "n|{<f3>pprev|<f4>next}"]
hash_key2 [label = "first|{<f3>pprev|<f4>next}"]
hash_key1->hash_key2
hash_key2:f3->hash_key1:f4
map:f1->hash_key1
hash_key1:f3->map:f1
}
```
n 為新的節點,first 為本來 map 所連到的節點,先將 n next 指向 first, first pprev 指向 n ,然後將 n pprev 指向 map 當中
#### `map_deinit`
```cpp
void map_deinit(map_t *map)
{
if (!map)
return;
for (int i = 0; i < MAP_HASH_SIZE(map->bits); i++) {
struct hlist_head *head = &map->ht[i];
for (struct hlist_node *p = head->first; p;) {
struct hash_key *kn = container_of(p, struct hash_key, node);
struct hlist_node *n = p;
p = p->next;
if (!n->pprev) /* unhashed */
goto bail;
struct hlist_node *next = n->next, **pprev = n->pprev;
*pprev = next;
if (next)
next->pprev = pprev;
n->next = NULL, n->pprev = NULL;
bail:
free(kn->data);
free(kn);
}
}
free(map);
}
```
將 hash table 中使用 containor_of 依序將其中的節點 free 掉。
#### `twoSum`
```cpp
int *twoSum(int *nums, int numsSize, int target, int *returnSize)
{
map_t *map = map_init(10);
*returnSize = 0;
int *ret = malloc(sizeof(int) * 2);
if (!ret)
goto bail;
for (int i = 0; i < numsSize; i++) {
int *p = map_get(map, target - nums[i]);
if (p) { /* found */
ret[0] = i, ret[1] = *p;
*returnSize = 2;
break;
}
p = malloc(sizeof(int));
*p = i;
map_add(map, nums[i], p);
}
bail:
map_deinit(map);
return ret;
}
```
## 測驗`3`
### 延伸問題
- [x] 解釋運作原理,指出設計和實作的缺失,並予以改進
- [x] 反覆執行程式碼,可發現 fibonacci 和 squares 這兩個函式的輸出字串可能會遇到無法一整行表示,請指出原因並修正
- [ ] 研讀 [A (Bare-Bones) User-Level Thread Library](https://www.schaertl.me/posts/a-bare-bones-user-level-thread-library/),理解 fiber 的實作手法,並且設計實驗來量化 process, native thread (指符合 [NPTL](https://man7.org/linux/man-pages/man7/nptl.7.html) 的實作), fiber / coroutine 等執行單元的切換成本
### 解釋運作原理
[fiber](https://en.wikipedia.org/wiki/Fiber_(computer_science)) 定義:輕量級 thread 在 user-space 透過 coroutine 才將資源 yield 出,所以是 thread safetly 而且是同步的,所以 spin lock 及 atomic operator 都是非必要的操作。
#### `fiber_init`
```cpp
void fiber_init()
{
for (int i = 0; i < MAX_FIBERS; ++i)
fiber_list[i].pid = 0, fiber_list[i].stack = 0;
parent = getpid();
}
```
初始化 fiber
`fiber_list`
```graphviz
digraph fbstring {
node [shape=record];
rankdir=LR;
foo [label="<f6>0|1|2|3|4|...|<f5>9"];
fiber_list -> foo:f6
fo1 [label="pid|stack"]
foo:f6 -> fo1
}
```
:::info
pid_t getpid(void);
> getpid() returns the process ID (PID) of the calling process.(This is often used by routines that generate unique temporary filenames.)
其中 return type 為 pid_t 所以要對應 include library <sys/types.h>
pid_t
>The pid_t data type is a signed integer type which is capable of representing a process ID. In the GNU C Library, this is an int.
由此可以知道 pid_t 和 int 資料結構相等。
參考資料:
[getpid](https://man7.org/linux/man-pages/man2/getpid.2.html)
[<sys/types.h>](https://pubs.opengroup.org/onlinepubs/009696899/basedefs/sys/types.h.html)
[Process Identification](https://www.gnu.org/software/libc/manual/html_node/Process-Identification.html)
:::
#### `fiber_spawn`
```cpp
/* Creates a new fiber, running the func that is passed as an argument. */
int fiber_spawn(void (*func)(void))
{
if (num_fibers == MAX_FIBERS)
return FIBER_MAXFIBERS;
if ((fiber_list[num_fibers].stack = malloc(FIBER_STACK)) == 0)
return FIBER_MALLOC_ERROR;
struct fiber_args *args;
if ((args = malloc(sizeof(*args))) == 0) {
free(fiber_list[num_fibers].stack);
return FIBER_MALLOC_ERROR;
}
args->func = func;
fiber_list[num_fibers].pid = clone(
fiber_start, (char *) fiber_list[num_fibers].stack + FIBER_STACK,//KKK
SIGCHLD | CLONE_FS | CLONE_FILES | CLONE_SIGHAND | CLONE_VM, args);
if (fiber_list[num_fibers].pid == -1) {
free(fiber_list[num_fibers].stack);
free(args);
return FIBER_CLONE_ERROR;
}
num_fibers++;
return FIBER_NOERROR;
}
```
clone 的系統呼叫成功, pid 將存入 child_pid 失敗將會存入 -1 ,並在全域變數 num_fibers +1。
:::info
```cpp
int clone(int (*fn)(void *), void *child_stack,
int flags, void *arg, ...
/* pid_t *ptid, struct user_desc *tls, pid_t *ctid */ );
```
>These system calls create a new ("child") process, in a manner similar to fork(2).
On success, the thread ID of the child process is returned in the caller's thread of execution. On failure, -1 is returned in the caller's context, no child process will be created, and errno will be set appropriately.
成功將會回傳 child pid 失敗將會回傳 -1
chile_stack:
The child_stack argument specifies the location of the stack used by the child process.
由此可知道 chile_stack 為 1024 * 1024 大小
flag: in order to specify what is shared between the calling process and the child process
flag 將決定哪些東西需要分享。
SIGCHLD:the number of the termination signal sent to the parent when the child dies.
當 child 結束時將會發送信號給 parent
CLONE_FS: If CLONE_FS is set, the caller and the child process share the same file system information.
CLONE_FILES:If CLONE_FILES is set, the calling process and the child process share the same file descriptor table.
CLONE_SIGHAND:If CLONE_SIGHAND is set, the calling process and the child process share the same table of signal handlers.
CLONE_VM:If CLONE_VM is set, the calling process and the child process run in the same memory space
參考資料:
[clone](https://man7.org/linux/man-pages/man2/clone3.2.html)
[manpage clone](https://linux.die.net/man/2/clone)
:::
#### `fiber_wait_all`
```cpp
int fiber_wait_all()
{
/* Check to see if we are in a fiber, since we do not get signals in the
* child threads
*/
pid_t pid = getpid();
if (pid != parent)
return FIBER_INFIBER;
/* Wait for the fibers to quit, then free the stacks */
while (num_fibers > 0) {
if ((pid = wait(0)) == -1)
exit(1);
/* Find the fiber, free the stack, and swap it with the last one */
for (int i = 0; i < num_fibers; ++i) {
if (fiber_list[i].pid == pid) { //CCC
free(fiber_list[i].stack);
if (i != --num_fibers)
fiber_list[i] = fiber_list[num_fibers];
break;
}
}
}
return FIBER_NOERROR;
}
```
將所有 `fiber_list` 所使用的空間 free 掉。
:::info
`pid_t wait(int *stat_loc);`
>wait for a child process to stop or terminate.
>the value of the argument stat_loc is usually a null pointer.
>these functions shall return a value equal to the process ID of the child process for which status is reported.
>-1 shall be returned, and errno set to indicate the error.
[wait](https://man7.org/linux/man-pages/man2/wait.2.html)
:::
### 執行結果與修正
```shell
Fib(0) = 0
Fib(1) = 1
1 * 1 = 1
1 * 1 = 1
Fib(2) = 1
2 * 2 = 4
2 * 2 = 4
Fib(3) = 2
3 * 3 = 9
Fib(4) = 3
4 * 4 = 16
Fib(5) = 5
5 * 5 = 25
5 * 5 = 25
Fib(6) = 8
6 * 6 = 36
Fib(7) = 13
7 * 7 = 49
7 * 7 = 49
Fib(8) = 21
8 * 8 = 64
Fib(9) = 34
Fib(9) = 34
9 * 9 = 81
Fib(10) = 55
Fib(11) = 89
Fib(12) = 144
Fib(13) = 233
Fib(14) = 377
```
經過幾次之後
```shell
Fib(0) = 0
Fib(1) = 1
Fib(2) = 1
Fib(3) = 2
Fib(4) = 3
Fib(5) = 5
Fib(6) = 8
Fib(7) = 13
Fib(8) = 21
Fib(9) = 34
Fib(10) = 55
Fib(11) = 89
Fib(12) = 144
Fib(13) = 233
Fib(14) = 377
1 * 1 = 1
2 * 2 = 4
3 * 3 = 9
4 * 4 = 16
5 * 5 = 25
6 * 6 = 36
7 * 7 = 49
8 * 8 = 64
9 * 9 = 81
```
之後還是會亂,因為期望是會是先顯示 Fib 在顯示 square ,所以我觀察 yield 的部份的實作
:::info
`int sched_yield(void);`
>causes the calling thread to relinquish the CPU. The thread is moved to the end of the queue for its static priority and a new thread gets to run.
returns 0. On error, -1 is returned, and errno is set appropriately.
>If the calling thread is the only thread in the highest priority list at that time, it will continue to run after a call to sched_yield().
假如只有一個 thread 或是還是最高優先權,就會持續執行這 thread。
[sched_yield](https://man7.org/linux/man-pages/man2/sched_yield.2.html)
:::
為了確保 critical section 其中的操作順序正確,思考試試加入 futex 讓 thread 的順序正確
:::info
```cpp
#include <linux/futex.h>
#include <syscall.h>
int futex_addr;
int futex_wait(void* addr, int val1){
return syscall(SYS_futex,&futex_addr,val1, NULL, NULL, 0);
}
int futex_wake(void* addr, int n){
return syscall(SYS_futex, addr, FUTEX_WAKE, n, NULL, NULL, 0);
}
```
`fibonacci`
```diff
fib[1] = nextFib;
+futex_addr = 0;
+futex_wait(&futex_addr,0);
fiber_yield();
```
`main`
```diff
fiber_spawn(&fibonacci);
+futex_wake(&futex_addr,1);
fiber_spawn(&squares);
```
futex
>fast user-space locking
參考資料:
[How futex works in this case?](https://stackoverflow.com/questions/24488564/how-futex-works-in-this-case)
[futex](https://man7.org/linux/man-pages/man2/futex.2.html)
:::
```shell
Fib(0) = 0
Fib(1) = 1
Fib(2) = 1
Fib(3) = 2
1 * 1 = 1
2 * 2 = 4
3 * 3 = 9
4 * 4 = 16
5 * 5 = 25
6 * 6 = 36
7 * 7 = 49
8 * 8 = 64
9 * 9 = 81
```
結果還是錯的! 並把範例重新跑發現是錯的,所以思考 futex 指令
:::info
```cpp
long futex(uint32_t *uaddr, int futex_op, uint32_t val,
const struct timespec *timeout, /* or: uint32_t val2 */
uint32_t *uaddr2, uint32_t val3);
```
>uaddr: The uaddr argument points to the futex word. On all platforms, futexes are four-byte integers that must be aligned on a four-byte boundary.
>futex_op: The operation to perform on the futex is specified in the futex_op argument;
>val: val is a value whose meaning and purpose depends on futex_op.
>timeout, uaddr2, and val3: The remaining arguments (timeout, uaddr2, and val3) are required only for certain of the futex operations described below. Where one of these arguments is not required, it is ignored.
當有設定對應的 futex_op 才需要設定 timeout, uaddr2, and val3。
重新設定 futex,並研究一下 man page example
- [ ] [fiber.c](https://github.com/WayneLin1992/linux2021_week6/blob/main/fiber.c)
```cpp
futex(uint32_t *uaddr, int futex_op, uint32_t val, const struct timespec *timeout, uint32_t *uaddr2, uint32_t val3)
{
return syscall(SYS_futex, uaddr, futex_op, val, timeout, uaddr2, val3);
}
```
```diff
+uint32_t futex_addr = 1;
+uint32_t futex_addr_1 = 1;
```
`fibonacci`
```diff
+futex(&futex_addr, FUTEX_WAIT, 0, NULL, NULL, 0);
printf("Fib(%d) = %d\n", i, nextFib);
fib[0] = fib[1];
fib[1] = nextFib;
+futex(&futex_addr, FUTEX_WAKE, 1, NULL, NULL, 0);
fiber_yield();
```
`squares`
```diff
+futex(&futex_addr_1, FUTEX_WAIT, 0, NULL, NULL, 0);
printf("%d * %d = %d\n", i, i, i * i);
+futex(&futex_addr_1, FUTEX_WAKE, 1, NULL, NULL, 0);
fiber_yield();
```
`main`
```diff
fiber_spawn(&fibonacci);
+sleep(1);
fiber_spawn(&squares);
```
增加 sleep 是為了讓 child thread 有時間完成。
參考資料:
[futex](https://man7.org/linux/man-pages/man2/futex.2.html)
:::
```shell
Fib(0) = 0
Fib(1) = 1
Fib(2) = 1
Fib(3) = 2
Fib(4) = 3
Fib(5) = 5
Fib(6) = 8
Fib(7) = 13
Fib(8) = 21
Fib(9) = 34
Fib(10) = 55
Fib(11) = 89
Fib(12) = 144
Fib(13) = 233
Fib(14) = 377
1 * 1 = 1
2 * 2 = 4
3 * 3 = 9
4 * 4 = 16
5 * 5 = 25
6 * 6 = 36
7 * 7 = 49
8 * 8 = 64
9 * 9 = 81
```
### 研讀 [A (Bare-Bones) User-Level Thread Library](https://www.schaertl.me/posts/a-bare-bones-user-level-thread-library/)