2024q1 Homework1 (lab0)

contributed by < HotMercury >

開發環境

$ gcc --version
gcc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0
$ lscpu
Architecture:            x86_64
  CPU op-mode(s):        32-bit, 64-bit
  Address sizes:         39 bits physical, 48 bits virtual
  Byte Order:            Little Endian
CPU(s):                  16
  On-line CPU(s) list:   0-15
Vendor ID:               GenuineIntel
  Model name:            Intel(R) Core(TM) i7-10700 CPU @ 2.90GHz
    CPU family:          6
    Model:               165
    Thread(s) per core:  2
    Core(s) per socket:  8
    Socket(s):           1
    Stepping:            5
    CPU max MHz:         4800.0000
    CPU min MHz:         800.0000

Timsort 研讀測驗題

特性：為了解決日常可能會遇到的排序，通常為非亂序，有一定排序的 list
優化 (相較於 mersge sort)

降低 cache : 會在部分排序完後嘗試合併，減少 cache miss 所造成的成本
降低記憶體開銷 : 只需要配置重疊部分

某些濫用「優化」的語境，根本與「改良」和「改善」無異，那為何不直接講「改良」和「改善」呢？

\to

資訊科技詞彙翻譯

Implement queue.c funciton

q_free

改進你的漢語表達。

台大電機系李宏毅教授對於 ChatGPT 的看法是，要善用人工智慧的成果，一旦人們得以善用，人類的書寫不該比 GPT 一類的語言模型差。

\to

2023 年 Linux 核心設計/實作課程第一次作業檢討

要釋放所有的記憶體，這裡考慮到的不只是透過 container of 所找到的外層結構，也需要考慮到成員是否需要釋放記憶體。
TODO : 補充 rust 的生命週期

可藉由 container_of 巨集，從 list_head 找到 element_t 結構體開頭地址。
container_of(node, type, member)
Linux 核心原始程式碼巨集: container_of

宣告 struct 時會認為 byte 數是緊密排列的編譯器為了滿足 alignment 需求,所以要使用 __attribute__((packed)) 就會緊密排列, 但可能會造成效能問題
c89/c99 提供 offsetof 巨集
ex: offsetof(struct ele,mem)會回傳 mem 減去struct 起始位置
舊式寫法

#define offsetof(TYPE,MEMBER)((size_t)&((TYPE *)0)->member)

利用 0 看作指標操作運算元
container_of

#define container_of(ptr,type,member)\
    __extension__({
    const __typeof__(((type *)0)->member) *__pmember =(ptr);
    (type*)((char*) __pmember - offsetof(type,member));
})

先算出目前 member 位址
再透過減去 offsetof(可算出與起點距離)得知起點位置

q_insert_head and q_insert_tail

q_insert_head
strdup, strcpy, strncpy, memcpy 差異

strdup 會 malloc 所以要記得釋放記憶體

The strdup() function returns a pointer to a new string which is a duplicate of the string s. Memory for the new string is obtained with malloc(3), and can be freed with free(3).

strcpy 要注意 dest src 位置不能重疊
strncpy 可以決定複製 n 個字元，後面會用 '\0' 補完

q_insert_tail
與 q_insert_head 解法相似，再利用 list_add_tail 將 node 加入 queue.

TODO 上述兩個 function 相似，因此考慮維護性可以取出通用行為。

q_remove_head

使用 list_first_entry 可以找到第一個 element
判斷 bufsize 的關係：如果 bufsize 小於 element.value 的 size 要截斷 string 並且補上結束字元

q_remove_tail

使用 list_last_entry 可以找到最後一個 element
判斷 bufsize 的關係：如果 bufsize 小於 element.value 的 size 要截斷 string 並且補上結束字元

q_delete_mid

原本是想走訪完後算出總數再找到中間值再刪除, 後來參考 linked list 和非連續記憶體,可以利用 next->next 與 next 的速度差找到中間項, 再來因為 list_head 是雙向的結構也可以透過 next and prev 的交接處找尋中間項

刪除的 node 需要釋放記憶體嗎？
delete 需要釋放記憶體，remove 則是斷開節點

搞清楚 remove 和 delete 的差異，詳閱作業說明。

q_size

走訪後計算總數

q_delete_dup

刪除已經 sort 後的 queue
可以增加一條 list 來紀錄重複的 element, 走訪完後再一次 free 掉整條 queue

TODO : 刪除未 sort 過的 queue

改進你的漢語表達。

q_reverse

將目前的 last 記住，只要 head->next 不是 last 就插入到 head 的前一個

while (head->next != last) {
    list_move(head->next, last);
}

這裡有個問題把底下的 list_move 替換成 list_del and list_add 就會成功，但其實 code 是一樣的，還不知道為什麼。

你用 gcc -E 比對過嗎？

while (tail_it->prev != head) {
    struct list_head *tmp;
    tmp = head->next;
    // list_del(tmp);
    // list_add(tmp, tail_it->prev);
    list_move(tmp,tail_it->prev);
    tail_it = tail_it->prev;
    }

q_merge

merge all sorted queue

q : pointer to element_t, 也就是排序過的 queue
chain : 串了多個 queue, 也就是可能會有多條的 queue
size : q 有幾個 element
id : unique id

typedef struct {
    struct list_head *q;
    struct list_head chain;
    int size;
    int id;
} queue_contex_t;

走訪所有的 chain 並兩兩呼叫 merge, 我這邊使用的方式是 first (為第一個 queue) 與 second 合併到 first, 再使用 first 跟剩餘的一一 merge, 但會發現 first 越來越長，潛在的走訪成本會很大
TODO : 減少成本的作法

q_sort

方案 1 merge sort
參考 link list 實作 merge sort, 稍微有些不一樣的是在 merge 兩條 queue 的時候，第一個元素是純粹 list_head, 且還要考慮雙向鏈結以及 環狀佇列 的議題

找到中間的節點
切斷 list 製作出兩個出兩個獨立的 list
遞迴 1
merge

TODO : 將 function 變成可替換的風格，使用 callback function 的方式

你如何確認目前的測試方式已涵蓋排序演算法的最差狀況？

檢討

第一次完成所有 funciton 得到分數為 77, 因此要著手來改善並了解問題所在
錯誤題號 3, 6, 15, 17

Test 3

測試內容 : merge
執行後錯誤訊息為 ERROR: Attempted to free unallocated block. Address = 0xdeadbeef, 因此使用 valgrind --leak-check=full 來執行，以下為顯示提示

==14400== Invalid read of size 8
==14400==    at 0x10F43C: find_header (harness.c:98)
==14400==    by 0x10F43C: test_free (harness.c:181)
==14400==    by 0x10F746: q_free (queue.c:38)
==14400==    by 0x10BCB4: do_merge (qtest.c:836)
==14400==    by 0x10DFD1: interpret_cmda (console.c:181)
==14400==    by 0x10E586: interpret_cmd (console.c:201)
==14400==    by 0x10F1F0: run_console (console.c:691)
==14400==    by 0x10D3C3: main (qtest.c:1258)
==14400==  Address 0xdeadbee7 is not stack'd, malloc'd or (recently) free'd

看起來是 do_merge 與 q_free 間會出現問題，因此著手處理。
研究後發現目前的實做實作 q_merge 時所呼叫的 merge_two_queues 會有問題，在當走訪完 q1 後會直接將 q2 剩下的 element 加入 q1。
e.g. q1 : {1,2,3}, q2 : {4,5}, 走訪完 q1 都發現沒有大於 q2, 所以最後直接將 q2 剩餘 element 加入 q1, 此時卻忘記要將 q2 清空。

- list_splice_tail(q2,q1);
+ list_splice_tail_init(q2,q1);

改善後 test 3 成功通過

Test 6

使用作業指定的程式碼縮排風格

發現 q_sort 所呼叫的 merge_two_queues 會有問題，當檢查第 q2 走訪完後沒有正確跳出迴圈，這裡加上一個 goto 讓迴圈結束

if (descend) {
    while (strcmp(entry->value,
                  list_entry(q2, element_t, list)->value) < 0) {
        if (q2->next == q2_head) {
            q2 = q2->next;
            list_move(q2->prev, entry->list.prev);
            goto out;
        } else {
            q2 = q2->next;
            list_move(q2->prev, entry->list.prev);
        }
    }
}

沒看清楚 q_descend 的定義 : "Remove every node which has a node with a strictly greater value anywhere to the right side of it", 寫成從第一個開始刪除小於左邊的 node.

Test 15

前面解決後就 ok 了

Test 17

Test if time complexity of q_insert_tail, q_insert_head, q_remove_tail, and q_remove_head is constant
從錯誤訊息回推, 判斷 is_insert_tail_const 會 call 到 test_const

ERROR: Probably not constant time or wrong implementation

然後會執行 (ENOUGH_MEASURE / (N_MEASURES - DROP_SIZE * 2) + 1) 次 doit (次數的意義？)，然後從 test_const 可以發現只要 doit 對一次就會成功(這也很奇怪)。

doit
- measure : 測試 function 功能並紀錄開始與結束的時間
- differentiate : 計算時間差
- update_statistic
  - t_push : 計算 mean and m2
- report : 重點 function

struct t_context_t

mean : 平方
m2 : Mean squared error
n : 個數

report
計算 max_t 計算兩個樣本平均值的 t-檢定（t-test）統計值。這種 t-檢定通常用於比較兩個樣本的平均值是否顯著不同。
思路：我們在加入計算前應該先過濾一些極值，這樣可以讓兩個 class 的數值變得相近。

/* Definitely not constant time */
if (max_t > t_threshold_bananas)
    return false;

/* Probably not constant time. */
if (max_t > t_threshold_moderate)
    return false;

最後模仿 dudect/src/dudect.h 實作 prepare_percentiles 函式，在 update_statistics 之前過濾數據就可以通過測試，但原理是什麼，還真的不知道？

確認論文和程式碼實作是否存在出入？前處理的機制是否合理。

shuffle 命令

利用 Fisher–Yates shuffle 演算法來實作洗牌（shuffle）
主要概念是想要達到平均的亂數，假設 range 是 1 到 5, 那 random 所產生的數字在 1 到 5 都出現之前不會重複。
構思時的想法，交換的方式可能有幾種，1. 直接 swap 節點，2. 不改變節點順序只交換值，3. 選擇到的節點插入到 head 的 prev 這樣其實也是不重複亂數，之後會對於 1, 3 做出實驗。

在 qtest.c 中加入 shuffle command, 模仿 do_reverse 產生 do_shuffle
在 queue.c 中加入 q_shuffle
- 這裡為了不要動到 queue.h 因此使用 extern
- debug 時會一直進入 time error 因此將 exception_setup(true) 關掉
引入測試用腳本

$ pyenv install 3.10.4
$ pyenv local 3.10.4
$ python3 script/shuffle.py

測試 [1, 2, 3, 4] 做 shuffle 1,000,000 次的直方圖：
方法 1 直接 swap 節點
chi square sum: 25.61474583593338

方法 3 選擇到的節點插入 tail
這樣得好處是節省一次交換的成本

- struct list_head *tmp = it->prev;
- list_move(it,node);
- list_move(node,tmp);
- node = it;
+ struct list_head* tmp = head->prev;
+ list_move(it, tmp);

chi square sum: 20.501320021120343

TODO : 將測試方式加入 makefile 並可以控制輸入設定。

dudect: dude, is my code constant time? 論文閱讀

論文提到，~~並不是很清楚~~ modeling 是什麼

our solution requires no modeling of hardware behavior. Our solution can be used in black-box testing.

不懂就說不懂，誠實面對自己。

測試方式 Timing leakage detection

A. measure execution time
a) class definition
透過兩種 input data class 來當作 input, 並測量兩者關係
fix-vs-random : 大致上就是分成 fix and random 兩種 case, 其中 fix 適合用於 arithmetic time
b) cycle counters
可以使用 CPU 的 cycle counter
c) environment conditions
each measurement corresponds to a randomly chosen class, 不是很確定為什麼這樣可以降低影響。

B. Apply post-processing
在分析前可以進行一些預處理
a) cropping
通常時間分佈為 positively skewed, 而我們可以刪除一些可能受到作業系統影響的數據，像是過濾一樣。
b) higher-order prepocessing
可以透過像是 centered product mimicking higher-order DPA attacks 得到更好的數據，這裡也不是很清楚。

C. Apply statistical test
Apply a statistical test to try ti disprove the null hypothesis "兩個時間分佈相等"
可以用不同種方法來實作，論文中列出了兩種方法並對其簡單說明
a t-test
論文中提供的簡易檢測方法： Welch's t-test ，可以很輕易地檢測是否超時

b Non-parametric tests
非參數測試，例如：Kolmogorov-Smirnov
優點是這些測試通常對於基礎分佈的假設較少但缺點是它們可能收斂較慢並且需要更多的樣本。

result?

Pre-processing : 會先過濾大於論文所提到的 p percentile

`lib/list_sort.c` 研讀

觀察 lib/list_sort.c, 與目前我在 queue.c 中實作的 q_sort 差異, 接閜來會筆記以及分析

目前已知實作缺陷
使用 bottom-up 的 merge sort

對 cache 不友善

研讀後發現缺陷

維護雙向鏈結串列開銷大

lib_sort

`merge`

__attribute__((nonnull(2,3,4))) : 第 2, 3, 4 不為 null

The nonnull attribute specifies that some function parameters should be non-null pointers. For instance, the declaration

首先從 merge 的註解可以知道，list 在 function 內會變成 null-terminated, no reserved or sentinel head node, and ignore prev, 這樣可以簡少維護雙向 list 的成本, 接下來就是比較 ascend or descend 然後合併成一條。
這裡可以學習的是 list_cmp_func_t cmp 將 compare function 變成參數，這樣可以更好的維護後續的開發。

因為我們目前只比較 descend and ascend 所以把第一個參數當作控制。
加入 compare function 的 merge

typedef int (*list_cmp_func_t)(void* , const struct list_head*, const struct list_head*);

// pseudo code 
void compare(flag, list1, list2){
    if(flag == ascend){
        return strcmp(list1, list2);
    }else{
        return strcmp(list2, list1);
    }
}

`merge_final`

恢復雙向鏈結串列
前面步驟大致與 merge 相似，不一樣的點是做完後會有一個走訪將 prev 設定好
在連接 prev 的迴圈時有一段註解，目前還不是很清楚想要表達的意思。

If the merge is highly unbalanced (e.g. the input is already sorted), this loop may run many iterations. Continue callbacks to the client even though no element comparison is needed, so the client's cmp() routine can invoke cond_resched() periodically.

以下程式碼會進入 cmp，count 一開始設定成 0。

overflow : 但目前不知道設計原理

if (unlikely(!++count))
    cmp(priv, b, b);

工程人員說話要精準，避免說「感覺」。

`list_sort`

你到底在說什麼？

~~想要將 cache 都控制在 L1 cache~~ , 原本的 merge sort 採取 top down 的方式所以 cache miss 機率大。
TODO 補上其餘設計考量

分析第一段 code, 因為只看註解還是不清楚，所以決定帶入例子 trace, 假設只有 4 個 nodes, 所以分析 count 值分別會做的事情。主要 idea 類似 2048 的遊戲，當目前的 count 為 2 的冪時不會合併。
count

00 : list 移到第 2 個 node, pending 切割出來，目前 pending 只有一個 node, 且 pending 為 list 的 prev
01 : 這次還不會 merge, pending 變成第 2 個 node, list 變成第 3 個 node.
10 : tail 設定為 pending(第 2 個 node), a 設定為第 2 個 node, b 設定為第 1 個 node, 然後 merge.
11 : tail, pending 在 3, l 在 4, 不會 merge.

do {
	size_t bits;
	struct list_head **tail = &pending;

	/* Find the least-significant clear bit in count */
	for (bits = count; bits & 1; bits >>= 1)
		tail = &(*tail)->prev;
	/* Do the indicated merge */
	if (likely(bits)) {
		struct list_head *a = *tail, *b = a->prev;

		a = merge(priv, cmp, b, a);
		/* Install the merged result in place of the inputs */
		a->prev = b->prev;
		*tail = a;
	}

	/* Move one element from input list to pending */
	list->prev = pending;
	pending = list;
	list = list->next;
	pending->next = NULL;
	count++;
} while (list);

將 `lib/list_sort.c` 引入佇列操作

模仿 queue.c 的實作，去除所有 gcc attribute, 以及目前還不明所以的 void *priv 參數，加入 compare fucntion 當作參數，並在 queue.c 中命名此排序 function 為 q_libsort。
debug
在測試 q_libsort 時發現會有以下錯誤

Segmentation fault occurred.  You dereferenced a NULL or invalid pointer

為了找出錯誤的地方，使用 valgrind 來動態檢查，可以知道問題出在merge_final 使用到不對的 memory.

$ valgrind --leak-check=full ./qtest < mytest.cmd
==13727== Invalid read of size 8
==13727==    at 0x10F9CD: compare (queue.c:376)
==13727==    by 0x1104B0: merge_final (queue.c:422)
==13727==    by 0x1104B0: q_libsort (queue.c:498)
==13727==    by 0x10AE84: do_libsort (qtest.c:601)
==13727==    by 0x10E206: interpret_cmda (console.c:181)
==13727==    by 0x10E7BB: interpret_cmd (console.c:201)
==13727==    by 0x10F425: run_console (console.c:691)
==13727==    by 0x10D5F8: main (qtest.c:1307)
==13727==  Address 0xfffffffffffffff8 is not stack'd, malloc'd or (recently) free'd

接下來用 dbg tool 查找問題所在，發現 compare function 的時候使用到 0x0 的 address, 因此修正後就可以正常運行。

$ dbg ./qtest
(gdb) r < mycofig.cmd
0x000055555555b9cd in compare (q1=q1@entry=0x0, q2=q2@entry=0x55555556a348, descend=descend@entry=false) at queue.c:376
376             return strcmp(e1->value, e2->value);
(gdb) set print object on
(gdb) print q1
$1 = (struct list_head *) 0x0

分析效能

分析效能前，確認以下:

是否準備得以反映多種情境的資料輸入？
是否有程式碼結果的驗證機制？例如 stable 與否的檢查。

分析方式

這裡做的不好的是沒有明確制定測資，情境，以及正確性驗證，就直接跑分析，可以說是無效分析，所以以下是重新思考測試方式及預測結果。

測試資料不應該只有亂數生成，且自己也對 RAND 會生成的東西沒有把握
分析 worst case 的測資
正確性驗證，需要額外說明

目前為存亂數生成比較，還沒實作正確性驗證

亂數生成測試資料比較

new
it RAND 1000000
time libsort
free
new
it RAND 1000000
time sort

次數 \ Delta time	libsort	merge sort
100000	0.065	0.118
500000	0.477	0.687
1000000	0.861	1.337

perf 測量
在加入 makefile 時有注意到 qtest 後面會加類似 -v num 的參數，目前還不清楚有什麼意義，暫時使用 -v 0.(後來測試發現使用 0 會無視 time alarm 就不用自己去註解掉 exception_setup)
新建立 perf-trace 目錄 ~~資料夾~~，新增 shell script and 測試場景, 最後使用 make perf 就可以得到以下結果。

directory 的翻譯是「目錄」，而非「檔案夾」(folder)

perf-traces/
├── perf-test.sh
├── perf-out.sh
├── generate.sh
├── trace.cmd

會有以下錯誤，但還是能夠跑出數據
make: *** [Makefile:78: perf] Error 1

新增 perf-test.sh 可以使用 make perf 來觀察平均 10 次的各項指數。但目前有個問題這會連同產生測試資料的時間一起算進去。
commit f2742bf

merge sort

node num	time (ms)	branches	instructions	contex-switches
100	0.5221 (±7.20%)	27,6739	124,6968	0
1000	1.899 (±22.86%)	81,2584	334,4947	0
10000	10.64 (±24.28%)	648,3103	2787,1302	0
100000	117.4 (±4.17%)	6684,3990	4,6536,0683	0
1000000	1523 (±0.68%)	6,0459,0220	65,9992,7797	6

lib sort

node num	time (ms)	branches	instructions	contex-switches
100	0.8 (±32.85%)	27,6728	131,7903	0
1000	1.07 (±8.17%)	79,3860	327,3809	0
10000	12.6 (±22.3%)	633,9060	2564,8461	0
100000	99.9 (±3.81%)	6584,4093	4,5496,1186	3
1000000	1103 (±2.23%)	6,9207,1578	47,0139,4457	10

python 繪圖
對 python 實在是超級不熟，因此參閱 blueskyson 並做出對應調整

快速生成測試檔, 會生成對應的數目測試資料及操作方式的檔案，格式為 number.cmd

./perf-traces/generate.sh

新增 perf-out.sh 腳本，主要將前述測試結果輸入到 dump.txt
python script 將 perf 得到的資料透過 python 畫成圖表

總結作法：目前要先透過 generate.sh 手動調整 sort 的模式(libsort or merge sort …)，再透過 pserf-out.sh 生成要給 python 的測試資料，最後再執行 plot.py, 目前覺的這樣測試彈性太小了，且目前 queue.c 內部的 sort 已經變成在內部初始化決定要使用哪種 sort 而不是分別可以使用的 command, 因此這部份應該要重新規劃。

TODO : 理解 gnuplot 用法

gnuplot 腳本: komark06

不要打錯專案名稱。

整合 timsort, mergesort, libsort

timsort 筆記

commit 4e18e92

目前先將 q_sort 變成 sort 的 abstraction, 額外新增設置當前 sort 方式的 struct, 並且獨立 sort 的程式碼到 sort_impl.h 這樣以後可以更好的維護。

改進你的漢語表達。

typedef void (*sort_fn)(struct list_head *head, list_cmp_func_t cmp);
struct task{
    sort_fn sort;
};
struct task current_task;
void sort_init()
{
    current_task.sort = timsort;
}

原本在獨立的 sort_impl.c 中 include list.h 為了可以使用 container of 但 commit 時出先了以下錯誤，compare function 會用到 list_entry 間接使用到 contaner_of 。將 compare function 放在 queue.c 並不會有問題，目前還不知道怎麼解決。所以暫時將 compare function 移動 queue.c

改進你的漢語表達。

sort_impl.c:15:10: error: Null pointer dereference: (struct element_t*)0 [nullPointer]

element_t *e1 = NULL, *e2 = NULL;
e1 = list_entry(q1, element_t, list);

valgrind + 自動測試

test_malloc

在原本要 malloc 的 size 前後加入 magic number, 這讓我想到之前嵌入式專案設計 stack 時會先分配指定的大小例如 512 byte 如何在 schedule 前檢查這個 process 有沒有超過 stack 的使用就是在後面加入一個檢查值 0xdeadbeef, 只要值改變就代表曾經 overflow.

亂數

理想中的亂數應該是事件互相獨立，
Entropy 定義是某個隨機事件平均每個結果發生時所能傳達出的資訊量，參考Entropy (熵)是甚麼，而為什麼會有負號是因為可以讓資訊與熵成為互補，通常一件事發生的機率越低，往往可以得到的資訊量越多，在閱讀

H (X) = - \sum_{i = 1}^{n} P (x) l o g_{b} P (x_{i}) = \sum_{i = 0}^{n} P (x_{i}) I (x_{i})

H 為 entropy, X 為隨機變量，P 為期望值，I 為資訊量

web

~~關於 select 範例的[簡體攻略](https://www.cnblogs.com/skyfsm/p/7079458.html)~~

參閱第一手材料！

問題

在 lab0-c 專案中使用了 linenoise, 可以輔助輸入(tab 補字)以及 histroy 的功能，但當我們執行 web command 時需要關閉 linenose 功能，不然網頁就會卡死。追蹤 linenoise 的程式碼 : linenoise -> line_raw -> line_edit, 在 line_edit 會進入 while 迴圈等待輸入，依照作業的提示可以知道原因是使用到 read 等待使用者輸入 (fd 為 stdin), 因此要用 select or epoll 來處理同時監控多個 file discriptor 的方法。

原版流程

標註對應的 git commit

commit f237c7d
在執行 qtest 時會進入 run_console 因為我們要測試的是 linenoise 所以 fd 會設置成 STDIN_FILENO, 所以主要會走以下程式碼來執行 user command, 現在先假設 web 觸發後會在 interpret_cmda 找到 do_web 並將 use_linenoise 關掉，所以接下來要做的就是保留 use_linenoise 功能也確保網頁可以運行。

if (!has_infile) {
    char *cmdline;
    while (use_linenoise && (cmdline = linenoise(prompt))) {
        interpret_cmd(cmdline);
        line_history_add(cmdline);       /* Add to the history. */
        line_history_save(HISTORY_FILE); /* Save the history on disk. */
        line_free(cmdline);
        while (buf_stack && buf_stack->fd != STDIN_FILENO) {
            cmd_select(0, NULL, NULL, NULL, NULL);
        }
        has_infile = false;
    }
}

你該闡述為何有這個系統呼叫，當初開發者到底要解決什麼問題，又，在本程式中，如何使用該系統呼叫，你又發現什麼值得討論和改進之處。

讀者要看到你的洞見！

著手改善問題

決定使用 select() 的方式同時監控 STDIN_FILENO 及 web request, 要如何作到呢? 這裡先解釋 select 的原理及用法。

select() allows a program to monitor multiple file descriptors,waiting until one or more of the file descriptors become "ready".

int select(int nfds, fd_set *readfds, fd_set *writefds,
                  fd_set *exceptfds, struct timeval *timeout);

為什麼會出現 select() 我參考了A brief history of select(2) 及 wikipedia 並總結
早期的 unix 有檔案描述子 (file descriptors) 數量不足的問題，亦缺乏針對檔案描述子的多工 (或多路復用) 機制，如果要作到相近的事，會使用 fork 的方式，而隨著 4.2BSD 的發行，隨著 BSD Sockets API 的出現，select() 系統呼叫也隨之而來，當網路服務越來越複雜，fork() 的方式不只不能 share, IPC 的成本也相當高。

nfds

因為 STDIN_FILENO 為 0 所以需要監控最高位 + 1 個 fd.

This argument should be set to the highest-numbered file descriptor in any of the three sets, plus 1.

readfds

The file descriptors in this set are watched to see if they are ready for reading.

這裡提到的 not block 是什麼情況？所以可能的情況是 STDIN_FILENO 就算只輸入一個字也會被即時監控，讓 select 即時選擇運作？

A file descriptor is ready for reading if a read operation will not block; in particular, a file descriptor is also ready on end-of-file.

所以只要成功一次就會需要再重新設定，好奇的是有沒有可能在 select 使用前就已經有 fd 是可以 read 的。

After select() has returned, readfds will be cleared of all file descriptors except for those that are ready for reading.

writefds

這次沒有使用到

The file descriptors in this set are watched to see if they are ready for writing.

select macro
FD_ZERO : set fd_set to zero
FD_CLR : clear bit
FD_SET : set specific bit
FD_ISSET : test if set or not

接下來從改原本的架構，我們不將 linenoise 關閉, 而是在 line_edit 內部的 while 迴圈內做 stdin or weq request 的處理，這裡原本只會跑使用者所輸入的 input 做處理，而現在用 select() 來分辨這次是要做 input 的分析還是 web request 的接收，這樣子就等於透過統一的界面來解析命令。但這時就會出現一個 bug, 當使用者在命令列輸入時，系統就會判定這次為 stdin 所以就算沒有送出完整命令，web 發出的 request 也不會被系統處理。

socket tcp flow

commit 48effe8 成功解決 linenoise 問題並且可以在網頁上顯示結果

不該在 linenoise.c 置入不相關的網路封包操作的程式碼，應引入合適的 callback 來調整主要循環的動作。

Image Not Showing Possible Reasons

The image file may be corrupted
The server hosting the image is unavailable
The image path is incorrect
The image format is not supported

Learn More →

jserv

在本程式中為了解決兩者衝突的問題，使用了 select() 我讓 web request and command 可以同時在 linenoise 做 cmdline 的分析，透過 select 找到此次對應的 fd, 就不會被讓 process 一直被 read 佔住。

~~想像中~~ ??? 作為一個 server 可以利用 select() 接收到 socket fd, 但將運算完後的結果如何傳遞使得網頁可以顯示出對應的文字 ? 最後發現在 report() 會判斷是否有存在 web 的 fd, 所以最後的方法是將close(web_connfd) 藏在 interpret_cmd 裡面，雖然可以運行，但我覺的不是什麼好方法。

問題與改善

目前的方法會直接忽略原本使用的 cmd_select() 是還有改善的空間，或是直接刪除。
cmd> favicon.ico 依然沒有解決。

整合 ttt

井字遊戲，規則如下，jserv/ttt 不同的地方是，如果超過 3 個的連線要依照 ALLOW_EXCEED 來判定是否分出勝負。

The winner is determined by the first player who successfully places GOAL of their marks in a row, whether it is vertically, horizontally, or diagonally, regardless of the board size.

Monte Carlo tree

總結理解 Monte Carlo Tree Search 解說

Monte Carlo tree 演算法
概念是把遊戲模擬紀錄在搜尋樹，再選出勝率最高的選項

select leaf node
rollout or expand
backpropagate

flow
一開始會選定一個 leaf 節點，接下來會考慮做 expand or rollout，expand 是基於目前節點新曾 children，而 rollout 則是開始一場遊戲模擬一直到結束，接下來 backpropagate 便是將 node 資訊往上更新。
舉個例子，如果一開始 select root 因為沒有資訊所以會作 expand 生成 children，接下來 select UCB 大的 children 因為新節點所以開始作 rollout 直到有勝負後，backpropagate 往回更新。

upper confidence bound

U C B = \frac{w_{i}}{n_{i}} + C \sqrt{\frac{\ln N_{i}}{n_{i}}}

$W i$ # wins

$n_{i}$ # 總共進行 simulations

$N_{i}$ Parent's # simulations

$c = \sqrt{2}$

mcts
這裡利用蒙地卡羅找到棋盤上的位置。
ITERATIONS 可以控制我們要做幾次上述 flow 的運算，重要的選節點的方式在 select_move 會去計算底下 children 的 UCB，而目前實作方式是用浮點數運算，在這裡要改成定點數的方式。

Linux 核心浮點數運算

來源自 Linux Kernel Development 浮點運算:

the kernel does not have the luxury of seamless support for floating point because it cannot easily trap itself

在 Linux 核心中使用浮點數運算會引發 trap, 且需要自行 saving and restoring float-point registers, 為什麼 userspace 不會有問題？嘗試從 Use of floating point in the Linux kernel 理解，推論是當 user space 可以透過 trap 進到 floating mdoe, 但如果是 Linux 核心要使用就要 trap itself，但 trap itself 會有什麼問題？
在 Linux 核心中使用浮點數運算還有資安的風險，並不是每個 process 都會使用到浮點數暫存器，所以不一定會更動這些暫存器，因此在此時的行程可能透過這些未被更動的暫存器以竊取資料。

定點數

在中文敘述中，使用全形標點符號，例如該用「，」，而非 ","

定點數意義上是借位的概念，e.g.,

0.99

用

99

運算後再除

10^{2}

linux uptime 可以顯示一些系統平均負載資訊，此時就會運用到浮點數運算，但 Linux 核心又要避免使用 FPU, 所以參考 fixed_power_init 如何運用定點數。

the binary encoding of numbers used by computers:

n := \sum n_{i} \times 2^{i}

we find:

x^{n} := x^{(\sum n_{i} \times 2^{i})} := \prod x^{(n_{i} \times 2^{i})}

程式碼分析
用於計算

x^{n}

，frac_bits 代表小數 bit 數，+= 1UL << (frac_bits - 1) 是用來 4 捨 5 入進位，主要概念就是假設 n 為

0 b 1101

運算流程就會是

x \times x^{2} \times x^{3}

，複雜度為

O (\log n)

。

fixed_power_int(unsigned long x, unsigned int frac_bits, unsigned int n)
{
	unsigned long result = 1UL << frac_bits;

	if (n) {
		for (;;) {
			if (n & 1) {
				result *= x;
				result += 1UL << (frac_bits - 1);
				result >>= frac_bits;
			}
			n >>= 1;
			if (!n)
				break;
			x *= x;
			x += 1UL << (frac_bits - 1);
			x >>= frac_bits;
		}
	}

	return result;
}

接著觀察 fs/proc/loadavg.c 根據作業說明提到這是一個 pseudo file system，如何知道的？

procfs，參閱 Linux 核心原始程式碼的 Documentation 目錄
$\to$ Linux 核心設計: 作業系統術語及概念

在 loadavg_proc_show 中使用到以下的 format &lu.%02lu 會由 LOAD_INT 及 LOAD_FRAC 組成。FIXED_1 / 200 等同於 10 進位的 0.05 用來當作 4 捨 5 入。

get_avenrun(avnrun, FIXED_1 / 200, 0);
seq_printf(m, "%lu.%02lu %lu.%02lu %lu.%02lu %ld/%d %d\n",
       LOAD_INT(avnrun[0]), LOAD_FRAC(avnrun[0]), LOAD_INT(avnrun[1]),
       LOAD_FRAC(avnrun[1]), LOAD_INT(avnrun[2]), LOAD_FRAC(avnrun[2]),
       nr_running(), nr_threads,
       idr_get_cursor(&task_active_pid_ns(current)->idr) - 1);

/include/linux/sched/loadavg.h
*100 保留小數點後 2 位。

#define LOAD_INT(x) ((x) >> FSHIFT)
#define LOAD_FRAC(x) LOAD_INT(((x) & (FIXED_1-1)) * 100)

#19

#define FSHIFT	  11	   /* nr of bits of precision */
#define FIXED_1  (1<<FSHIFT) /* 1.0 as fixed-point */

再來探討 avnrun 是怎麼得到定點數，從 calc_global_load 註解 "calc_load - update the avenrun load estimates 10 ticks after the" 可以知道這裡主要是在做運算的地方。

每個 cpu 的 run queue 會維護一個獨立的 calc_load_update看到 jiffies 就想順便了解 kernel 內部計時的方法，透過 time_before : time_after(a,b) returns true if the time a is after time b. 也就是時間還沒到。

void calc_global_load(unsigned long ticks)
{
	unsigned long sample_window;
	long active, delta;

	sample_window = READ_ONCE(calc_load_update);
	if (time_before(jiffies, sample_window + 10))
		return;

	/*
	 * Fold the 'old' NO_HZ-delta to include all NO_HZ CPUs.
	 */
	delta = calc_load_nohz_fold();
	if (delta)
		atomic_long_add(delta, &calc_load_tasks);

	active = atomic_long_read(&calc_load_tasks);
	active = active > 0 ? active * FIXED_1 : 0;

	avenrun[0] = calc_load(avenrun[0], EXP_1, active);
	avenrun[1] = calc_load(avenrun[1], EXP_5, active);
	avenrun[2] = calc_load(avenrun[2], EXP_15, active);

	WRITE_ONCE(calc_load_update, sample_window + LOAD_FREQ);

	/*
	 * In case we went to NO_HZ for multiple LOAD_FREQ intervals
	 * catch up in bulk.
	 */
	calc_global_nohz();
}

calc_load 計算值的公式

a1 = a0 * e + a * (1 - e)

calc_load(unsigned long load, unsigned long exp, unsigned long active)
{
	unsigned long newload;

	newload = load * exp + active * (FIXED_1 - exp);
	if (active >= load)
		newload += FIXED_1-1;

	return newload / FIXED_1;
}

加入 ttt cmd

commit db87995
加入此專案的方式是原封不動的加入，所以 makefile 也保留，原本想用 submodule 的方式，但還是先讓它跑起來之後再用。我在改 makefile 發現 %.o: %.c 原來有 recursive 的功能，所以原本就可以編譯我放在 ttt_game/ 底下的 .c 檔，之後嘗試去掉 ttt_game 原有的 makefile，且目前不想動到原本 ttt 專案的東西所以先將 cppcheck 關閉。在 push 遇到以下錯誤，用 git rm --cached . -rf 刪除乾淨。
fatal: in unpopulated submodule 'ttt_game

新增 hlist
commit 45c13ee

makefile 調整

目前可以透過給定特定的參數決定使用何種 AI 演算法
e.g. 使用 MCTS
$make ttt_option=mcts

定點數替換

commit 9ad9712
為了使用定點數加入了以下的替代 function

divide_fix

a \times f i x \div b \times f i x

multiply_fix

相乘後需要 shift right fix number

log_fix

我在思考以 10 為底的 log 的時候想到，小數是不是不太影響 log 的運算，可以利用 10 的次方數逼近，但目前暫時用原本的 log 只是在 return 前換成定點數。

sqrt_fix

使用牛頓法
即求

f (x) = x^{2} - n = 0

用 x_old 保留資料，與新的 x (函數的切線逼近函數的根)

x_{n + 1} = x_{n} - f (x) / f^{'} (x)

又

f^{'} (x) = 2 x

x 距離的負號 x 斜率＋高度會 = 0，因此可以推出以下式子

0 = (x - x_{0}) \times f^{'} (x) + f (x_{0})

再推導出

X_{n + 1} = X_{n} - \frac{f (X_{n})}{f^{'} (X_{n})} ⟺ 2 X_{0} (X - X_{0}) + (X_{0})^{2} - n = 0

X_{1} = \frac{X_{0} + \frac{n}{X_{0}}}{2}

// 固定小數點數的平方根計算函數
int sqrt_fx(int n) {
    // 初始猜測值 - 可以進行更好的猜測
    int x = 1 << SHIFT;
    int n_one = n << SHIFT;
    int x_old;
    while (1) {
        x_old = x;
        x = (x + n_one / x) / 2;
        if (x == x_old) {
            break;
        }
    }
    return x;
}

定點數總結

對於定點數掌握度還是不高，數學的應用也不熟悉，目前只有大致上改成定點數，且速度會肉眼可見的變慢。

擴充 ttt 命令

commit cdbafaf
首先要在 qtest 引入新的 option 這樣可以控制對決模式，所以先查看 add_param 發現只是純粹為了顯示，並不會參預參數的存取，這裡先嘗試用 gdb 10.4 Artificial Arrays 顯示 argv，透過參數決定要 AI vs AI or player vs AI
p *argv@len

You can do this by referring to a contiguous span of memory as an artificial array, using the binary operator ‘@’. The left operand of ‘@’ should be the first element of the desired array and be an individual object.

AI vs AI : negamax vs mcts
player vs AI : Player vs mcts

加入 coroutine

coroutine 當中的 context switch 實作方式，可以藉由 setjmp / longjmp 來完成，首先先聊解 jmp_buf which store information to restore a calling environment.

Saves the current execution context into a variable env of type jmp_buf. This variable can later be used to restore the current execution context by longjmp function

透過以下範例可以快速理解用法，會使用 global 的 jmp_buf 來當作 coroutine 的跳躍條件。

#include <stdio.h>
#include <setjmp.h>
#include <stdnoreturn.h>

jmp_buf my_jump_buffer;

noreturn void foo(int status){
	printf("foo(%d) called\n", status);
	longjmp(my_jump_buffer, status+1);
}

int main(void){
	volatile int count = 0;
	if(setjmp(my_jump_buffer) != 5)
		foo(++count);
}

2024q1 Homework1 (lab0)

開發環境

Timsort 研讀 測驗題

Implement queue.c funciton

q_free

q_insert_head and q_insert_tail

q_remove_head

q_remove_tail

q_delete_mid

q_size

q_delete_dup

q_reverse

q_merge

q_sort

檢討

Test 3

Test 6

Test 15

Test 17

shuffle 命令

dudect: dude, is my code constant time? 論文閱讀

測試方式 Timing leakage detection

result?

lib/list_sort.c 研讀

lib_sort

merge

merge_final

list_sort

將 lib/list_sort.c 引入佇列操作

分析效能

分析方式

整合 timsort, mergesort, libsort

valgrind + 自動測試

test_malloc

亂數

web

問題

原版流程

著手改善問題

問題與改善

整合 ttt

Monte Carlo tree

Linux 核心浮點數運算

定點數

加入 ttt cmd

makefile 調整

定點數替換

定點數總結

擴充 ttt 命令

加入 coroutine

Read more

2024q1 Homework5

2024q1 Homework6 (integration)

2024q1 Homework4

2024q1 Homework2 (quiz1+2)

Timsort 研讀測驗題

`lib/list_sort.c` 研讀

`merge`

`merge_final`

`list_sort`

將 `lib/list_sort.c` 引入佇列操作