2024q1 Homework1 (lab0)

# 2024q1 Homework1 (lab0) contributed by < [`yenslife`](https://github.com/yenslife) > #### Reviewed by `HotMercury` [commit e81db36](https://github.com/yenslife/lab0-c/commit/e81db36652ec4cc3d39d3e03f80ffa7e29bfd2f9) 要解決的是非法讀取的行為，依照底下解釋可以理解為當 buffer size 與 value 長度相等 `memcpy` 就是合法的嗎？又或者如果先計算 bufsize 就可以避免發生 Invalid read ? #### Reviewed by `YiChiChao` `q_insert_head` 和 `q_insert_tail` 不必透過輔助函式簡化程式碼，而是在 `q_insert_tail` 中呼叫 `q_insert_head` 即可。 `q_insert_tail` 可以看作以 `Head->prev` 為佇列之首，作 `q_insert_head` 之插入。 ```c bool q_insert_head(struct list_head *head, char *s) { /* The code of inserting new node from the head of the linked list */ } bool q_insert_tail(struct list_head *head, char *s) { return q_insert_head(struct list_head *head->prev, char *s); } ``` ![try](https://hackmd.io/_uploads/BJz9xkfAT.png =75%x) > 已改進 > commit [3c6cd18](https://github.com/yenslife/lab0-c/commit/3c6cd18) > [name=yenslife] ## 開發環境 ```shell $ gcc --version gcc (Ubuntu 9.4.0-1ubuntu1~20.04.2) 9.4.0 Copyright (C) 2019 Free Software Foundation, Inc. This is free software; see the source for copying conditions. There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. $ lscpu Architecture: aarch64 CPU op-mode(s): 64-bit Byte Order: Little Endian CPU(s): 2 On-line CPU(s) list: 0,1 Thread(s) per core: 1 Core(s) per socket: 2 Socket(s): 1 NUMA node(s): 1 Vendor ID: ARM Model: 0 Stepping: r0p0 BogoMIPS: 48.00 NUMA node0 CPU(s): 0,1 ``` ## 佇列操作參考 [yanjiew1](https://hackmd.io/@yanjiew/linux2023q1-lab0#q_new_elem), [Risheng](https://hackmd.io/@Risheng/linux2023-lab0), [Shiritai](https://hackmd.io/@Eroiko/lab0-impl#q_delete_dup) ### `LIST_HEAD()` 使用案例以及為什麼要用 LIST API `LIST_HEAD()` 和 `INIT_LIST_HEAD` 都是將自己的 `next`, `prev` 指向自己，那他們有什麼差別呢？ > An object has a storage duration that determines its lifetime. There are three storage durations: static, automatic, and allocated. Allocated storage is described in 7.20.3. > > For such an object that does not have a variable length array type, its lifetime extends from entry into the block with which it is associated until execution of that block ends in any way. (Entering an enclosed block or calling a function suspends, but does not end, execution of the current block.) If the block is entered recursively, a new instance of the object is created each time. The initial value of the object is indeterminate. If an initialization is specified for the object, it is performed each time the declaration is reached in the execution of the block; otherwise, the value becomes indeterminate each time the declaration is reached. 在 C99 規格書中提到，每個物件 (object) 都有自己的生命週期，又可以分成 "static", "automatic" 和 "allocated"。`LIST_HEAD` 屬於 `automatic` 物件，表示只會存在於區塊 (block) 中，直到該區塊執行完畢；而 `INIT_LIST_HEAD` 處理的是 "allocated" 物件，必須先分配好記憶體空間，才會有可以在不同區塊之間傳遞的記憶體位址。搞清楚生命週期之後，就不用憑感覺寫程式了。先觀察 `INIT_LIST_HEAD`，傳入一個 `struct list_head *`，將其 `next`, `prev` 動態地指向自己，用來初始化結構體相當方便。 ```c static inline void INIT_LIST_HEAD(struct list_head *head) { head->next = head; head->prev = head; } ``` `LIST_HEAD` 則是一個巨集，宣告一個 `struct list_head` 類型的變數，傳入的字串即變數名稱。同時將自己的記憶體位址指派 (assigned) 給 `next`, `prev` 成員。 ```c #define LIST_HEAD(head) struct list_head head = {&(head), &(head)} ``` #### 應用場景那麼 `LIST_HEAD` 到底有什麼用？它不像 `INIT_LIST_HEAD` 一樣有明確的用途，單就宣告一個 `struct list_head` 區域變數到底可以做什麼？答案是：搭配 `list_splice`, `list_cut_position` 等移動節點的 API 使用，當作節點的暫存變數。舉個例子，我的 `q_sort` 實作[合併排序法](https://en.wikipedia.org/wiki/Merge_sort)，需要將左邊半佇列以及右邊半佇列分別遞迴呼叫 `q_sort`，最後合併。初始佇列如下 ```mermaid graph LR head<-->n1<-->n2<-->n3<-->n4<-->n5<-->n6<-->head ``` 切開並接上新的頭節點 "left" ```mermaid graph LR left<-->n1<-->n2<-->n3<-->left head<-->n4<-->n5<-->n6<-->head ``` 這時候就需要一個暫存變數來儲存被分割佇列的頭，先宣告一個 `struct list_head` (automatic 物件)，並用 `INIT_LIST_HEAD` 將其成員指向的自身的位址。 ```c /* divide and conquer */ struct list_head left; INIT_LIST_HEAD(&left); list_cut_position(&left, head, mid->prev); /* 將 head 到 * mid->prev 之間的節點 * 節點切下，並接到 left */ q_sort(head, descend); q_sort(&left, descend); /* merge two list */ merge_two_list(head, &left, descend); ``` 發現了嗎？這邊只有 `q_sort` 函式會用到 `left` 的記憶體位址，函式執行結束就不會再用到了，所以可以把剛才宣告的流程簡化成一個 `LIST_HEAD(left)`，這種用完就丟的場境最適合 `LIST_HEAD` 了。 ```c LIST_HEAD(left); list_cut_position(&left, head, mid->prev); /* 將 head 到 ``` >[commit e7fc3df](https://github.com/yenslife/lab0-c/commit/e7fc3df) :::info 善用 List API 可以讓 C 語言寫起來和 Python, Kotlin 等「高階語言」(C 語言也是高階語言，但是寫起來就沒有後者的「絲滑感」) 一樣的感覺，將較繁瑣的實作細節隱藏起來，用更抽象的方式撰寫程式碼。 ::: ### `q_new` 函式功能：建立新的「空」佇列; 為新的佇列 `malloc` 一段空間，並檢測 `malloc` 是否成功，失敗則回傳 `NULL`。使用 `INIT_LIST_HEAD()` 初始化 `struct list_head`，使之 `next`, `prev` 指到自身 :::danger 避免非必要的項目縮排 (即 `* ` 和 `- `)，以清晰、明確，且流暢的漢語書寫。 ::: 觀察 `LIST_HEAD()` 及 `INIT_LIST_HEAD()`，<s>我其實想不到</s> `LIST_HEAD()` 的使用場景，只知道他不是動態操作，在 `q_new` 中使用 `INIT_LIST_HEAD()` 會比較方便。 :::warning 現在應該有想法？ > 在使用 `list_splice`, `list_cut_position` 等移動節點的函式時，可以方便地使用 LIST_HEAD() 作為節點的臨時變數。 > [name=yenslife] > > 很好，將這個發現 (附上應用案例) 整理在 `q_new` 之前，作為佇列操作的引言，藉此說明 Linux 核心開發者的巧思，這也是我們真正該學到的議題，即比原始程式碼更重要的素養和態度。 :notes: jserv > > 已附上案例以及引言 [name=yenslife] ::: ```c #define LIST_HEAD(head) struct list_head head = {&(head), &(head)} static inline void INIT_LIST_HEAD(struct list_head *head) { head->next = head; head->prev = head; } ``` 後來我在使用 `list_cut_position` 才知道 `LIST_HEAD` 的方便之處，因為很多時候 `list_head` 操作只需要一個區域變數當作指向佇列的「頭」，不用特別管記憶體位置。 :::danger HackMD 不是讓你張貼完整程式碼的地方，GitHub 才是！因此你在開發紀錄只該列出關鍵程式碼 (善用 diff 標示)，可附上對應 GitHub commit 的超連結，列出程式碼是為了「檢討」和「便於他人參與討論」，不是用來「假裝自己有付出」 > 收到，謝謝老師！目前已經刪除一些非關鍵程式碼了 > [name=yenslife] ::: ### `q_insert_head`, `q_insert_tail` 在佇列開頭 (head) / 結尾 (tail) 插入 (insert) 給定的新節點; 建立一個節點，並檢測是否建立成功，並且指定內含值，並利用 `list_add` 將新節點插入佇列開頭 :::warning 改進漢語表達。 ::: 觀察 `q_insert_head` 傳入值，發現 `char *s` 參數 ```c bool q_insert_head(struct list_head *head, char *s) ``` 觀察到在 `queue.h` 中有 `element_t` 結構體，其中 `char *s` 對應到 `char *value` 參數。`element_t` 的 `list` 指向一個 `list_head`，整個 `element_t` 結構如下[圖](https://hackmd.io/@sysprog/c-linked-list#circular-linked-list)所示，這時我深刻感受到 Linux 的 `list.h` 的便利之處，因為它提供了統一的介面，使得大家能夠更容易地進行開發。 ```graphviz digraph list { rankdir=LR; node[shape=record, style=bold]; subgraph cluster_0 { node [shape=record]; value [label="{value}"]; head [label="{<prev>prev|<next>next}"]; style=bold; label=element_t } } ``` ```c typedef struct { char *value; struct list_head list; } element_t; ``` 觀察 `list_add`，其功能為將 node 插到 head 的下一個節點之前 (這邊就是 `element_t` 的 `list`)。 ```c static inline void list_add(struct list_head *node, struct list_head *head) { struct list_head *next = head->next; next->prev = node; node->next = next; node->prev = head; head->next = node; } ``` 為了簡化程式碼，額外建立輔助函式 `create_element`，`q_insert_head` 操作如下： ```c element_t *new_ele = create_element(s); if (!new_ele) return false; list_add(&new_ele->list, head); ``` 同理，`q_insert_tail` 的實作則是將 `list_add` 換成 `list_add_tail` API 即可。測試後發現 [`create_element`](https://github.com/sysprog21/lab0-c/commit/1a46f5105714ed762bf8ca2dca7cb3dafb6b4713) 無法通過 `trace-11-malloc.cmd` 和 `trace-12-malloc.cmd` 測試，因為沒有考慮到記憶體不足時，`strdup` 可能會分配 NULL 值的問題。 ```diff element_t *create_element(char *s) if (!ele) return NULL; ele->value = strdup(s); + if (!ele->value) { + free(ele); + return NULL; + } return ele; } ``` ### `q_free` 釋放佇列所佔用的記憶體; :::danger 避免非必要的項目縮排 (即 `* `, `- ` 或數字)，以清晰、明確，且流暢的漢語書寫。授課教師在大學教書十餘年，見到不少學生在課堂表現不俗，但在面試場合卻無法清晰地闡述自己的成果和獨到的成就，從而錯過躋身一流資訊科技企業的機會。為此，要求學員書寫開發紀錄，並隨時做好「當主管問起，就能清晰解釋自己的投入狀況」，是強調讓學員「翻身」的本課程所在意的事。濫用條列 (或說 bullet) 來展現，乍看很清爽，但在面試的場合中，就很難用流暢的話語說明。反之，若平日已有這方面的練習，則面試過程中，才可充分展現自己專業之處。 ::: 檢查傳入值是否為空。接著，使用 `list_for_each_entry_safe` 系列巨集來走訪所有節點，並依次釋放空間：首先釋放 `value`，然後釋放 `element`，最後釋放 `head`。老實說這是我第一次有意識的使用 C 語言的 function-like 巨集，一開始覺得為什麼不要寫成一個函式就好，但看了實作才發現他要簡化的部分只有迴圈，而且是「複雜」的迴圈 (至少對我來說還很複雜)，讓使用者可以更專注在邏輯而不是繁瑣的語法，使用巨集再適合不過了。 ```c #define list_for_each(node, head) \ for (node = (head)->next; node != (head); node = node->next) ``` 但這邊要移除的不是 `list_head`，而是自定義結構體 `element_t`，所以用 `list_for_each_entry` 或是 `list_for_each_entry_safe` 較為合適。參考 [gcc.gnu.org](https://gcc.gnu.org/onlinedocs/gcc/Typeof.html) 對 `typeof` 的解釋，才知道原來 C 語言標準並沒有 `typeof()`，寫在標頭檔必須用 `__type__()`。 :::warning `typeof` 已納入 C23 標準的支援。 ::: 看到這邊才知道為什麼〈[你所不知道的 C 語言: linked list 和非連續記憶體](https://hackmd.io/@sysprog/c-linked-list#list_empty)〉要特別解釋 `container_of()`，因為到處都會出現，只是自己沒有察覺到而已。以前會覺得 C 語言礙手礙腳的，現在我被他的高度彈性給嚇到了。 ```c #define list_entry(node, type, member) container_of(node, type, member) #define list_for_each_entry_safe(entry, safe, head, member) \ for (entry = list_entry((head)->next, __typeof__(*entry), member), \ safe = list_entry(entry->member.next, __typeof__(*entry), member); \ &entry->member != (head); entry = safe, \ safe = list_entry(safe->member.next, __typeof__(*entry), member)) ``` 我後來在 `queue.h` 中發現 `q_release_element` 函式，它做的事情很簡單就是先釋放 element 的 value 的記憶體，然後再釋放 element 本身，使用它來增加可讀性。實際使用： ```diff element_t *current, *next; list_for_each_entry_safe (current, next, l, list) { - free(current->value); - free(current); + q_release_element(current); } free(l); } ``` ### `q_remove_head`, `q_remove_tail` 在佇列開頭 (head) / 結尾 (tail) 移去 (remove) 給定的節點; - 檢測傳入值是否為空 - `head` 要還要特別檢查佇列是否為空，可以用 `list_empty` 來檢查 - 取得 entry 的記憶體位址 (`list_first_entry` / `list_last_entry`) - 使用 `list_del` 巨集來移除節點 - 例外處理 - 檢查緩衝區大小是否為 0 - 回傳被移除節點的位址 `list_del` 只負責將點從鏈結串列中移除 (remove)，不會釋放記憶體，函式名稱的 "remove" 表示只需將斷開節點和節點的關聯。函式的樣子如下，其中 `head` 表示鏈結串列的開頭，`sp` 表示被複製字串的起點，`bufsize` 則是字串長度。 ```c element_t *q_remove_head(struct list_head *head, char *sp, size_t bufsize) {} ``` 實作到一半發現自己忘了 Linux 已經有提供方便的 `list_first_entry` API，可以增加可讀性。 ```c #define list_first_entry(head, type, member) \ list_entry((head)->next, type, member) #define list_last_entry(head, type, member) \ list_entry((head)->prev, type, member) ``` 為了簡化程式碼，額外建立輔助函式 `remove_element`，檢查 bufsize 是否為 0 的例外情況，並在字串結尾加上 `\0`，複製到 `sp`： ```c element_t *remove_element(element_t *ele, char *sp, size_t bufsize) { list_del(&ele->list); if (sp && bufsize) { memcpy(sp, ele->value, bufsize); sp[bufsize - 1] = '\0'; } return ele; } ``` 操作如下： ```c element_t *ele = list_first_entry(head, element_t, list); remove_element(ele, sp, bufsize); ``` 同理，`q_remove_tail` 的實作只要將 `list_first_entry` 替換成 `list_last_entry` 即可 :::info 我後來發現根本不需要輔助函式，因為後者只用於 `q_remove_head` 和 `q_remove_tail`，那該如何解決程式碼重複的問題呢？只要在 `q_remove_tail` 中呼叫 `q_remove_head` 並稍微修改 head 指向的位置即可。 > commit [2c572a9](https://github.com/yenslife/lab0-c/commit/2c572a9) ::: ### `q_delete_mid` 函式功能：找出中間的節點並刪除 (delete) 他我原本的做法是從頭到尾走訪，計算節點數量，然後在數量除以二的地方將節點刪除 (利用 `list_del`, `q_release_element`)。但後來發現 leetcode 的題目是**單向**鏈結串列，作業的佇列則是**雙向**鏈結串列，因此可以用更有效率的方法 (頭尾往中間走) 來處理，減少記憶體存取次數。 > commit [f142986](https://github.com/yenslife/lab0-c/commit/f14298604f4602d24ef8a89cdd016c1b73b05b9d) :::danger 提供「減少記憶體存取次數」的量化分析，到底是幾次？ > 這邊我寫錯了，是「減少迭代次數」，已補上後續分析 > [name=yenslife] ::: :::warning 額外考慮到環狀且雙向特性，改進程式碼。 ::: 使結尾節點與開頭節點同時往中間走，會有二個情況 - 當二個節點的值相同，表示當下的節點是中間節點 - 當尾端節點與開頭節點的前一個節點相同，表示開頭節點 (first) 為中間節點 ```c struct list_head *first = head->next, *end = head->prev; /* 僅執行 n / 2 次 */ while (first != end && first->prev != end) { first = first->next; end = end->prev; } element_t *ele = list_entry(first, element_t, list); list_del(first); q_release_element(ele); ``` #### 分析「頭尾往中間走」和「計算長度找到中間節點」方法方法 2 ：計算出節點數後，若 $n$ 為節點個數，走訪 $n\div2$ 個節點便可找到中間點 ```c struct list_head *p = head->next; int cnt = 1; while (p != head) { /* n 次 */ cnt++; p = p->next; } cnt = cnt >> 1; p = head; while (cnt--) { /* n / 2 次 */ p = p->next; } element_t *ele = list_entry(p, element_t, list); list_del(p); q_release_element(ele); ``` 前者的迭代次數為 $n\div2$；後者的迭代次數為 $n\div2+n$，所以選擇「頭尾往中間走」的方法會更有效率。 ### `q_delete_dup` 在已經排序的狀況，移走佇列中具備重複內容的節點; 由於佇列已經排序，因此重複的節點會相鄰。為了處理這種情況，我打算額外分配空間來存儲節點的狀態。 :::danger 注意用語 ::: 利用 `list_for_each_entry_safe` 去走訪節點，找到前後值相同的節點後，刪除 (delete) 當下的節點，並且將 `del_flag` 設成 true (為了重複的最後一個節點) :::danger 避免非必要的項目縮排 (即 `* ` 和 `- `)，以清晰、明確，且流暢的漢語書寫。 ::: 我還在思考有沒有辦法用雙向鏈結串列的特性來提升效率，但我還沒想到，還有這個 flag 真的有必要嗎，總覺得有更好的做法，因為這樣的程式碼很像為了特殊案例存在的，<s>看起來醜醜的</s> ，違反 [Linus Torvalds 的訪談](https://www.ted.com/talks/linus_torvalds_the_mind_behind_linux?language=zh-tw)中的 "Good taste"。[commit](https://github.com/yenslife/lab0-c/commit/ed464fd620d676aa95ad4d4314ec44a06ef5fb3e) :::danger 問題只有「醜」嗎？你要說出原本的問題在哪裡，效能、安全，還是可重用性低？ > 應避免不必要的分支 (已修改) > [name=yenslife] ::: 修改後的版本利用 `cur_dup` 和 `next_dup` 來避免使用 `else if` 處理最後一個節點重複的情況。 > commit [1665478](https://github.com/yenslife/lab0-c/commit/1665478) ```diff - bool del_flag = false; + bool cur_dup = false; list_for_each_entry_safe (cur, next, head, list) { - if (&next->list != head && !strcmp(cur->value, next->value)) { - list_del(&cur->list); - q_release_element(cur); - del_flag = true; - } else if (del_flag) { - del_flag = false; + bool next_dup = &next->list != head && !strcmp(cur->value, next->value); + if (next_dup || cur_dup) { list_del(&cur->list); q_release_element(cur); } + cur_dup = next_dup; } ``` ### `q_reverse` 以反向順序重新排列鏈結串列，該函式不該配置或釋放任何鏈結串列節點，換言之，它只能重排已經存在的節點; - 使用 `list_move` 來移動節點到開頭 (head) - 從頭到尾依序移動 (`list_for_each_safe`) 到開頭，即可得到反向排序 :::warning TODO: 撰寫更精簡的程式碼。 ::: ### `q_reverseK` 函式功能：類似 `q_reverse`，但指定每 k 個節點為一組，進行反向順序排列，LeetCode [25. Reverse Nodes in k-Group](https://leetcode.com/problems/reverse-nodes-in-k-group/)。 - 利用指標指出 K 個節點的開頭 - 如同 `q_reverse` 的做法，將 K 個節點依序放到開頭指標 (會用到額外計數器) - 重複以上步驟直到剩餘節點不足 K 個注意 `list_move` 中的 `list_add` 會假設我們是針對整個佇列的頭去操作，所以下方程式碼才需要 `start = ih->prev` 往前一個節點。 TODO: 用 `list.h` API 嘗試簡化程式碼 ```c list_for_each_safe (ih, it, head) { if (!cnt) start = ih->prev; if (cnt++ == k - 1) { tmp = start->next; while (cnt) { cnt--; next = tmp->next; list_move(tmp, start); tmp = next; } } } ``` ### `q_swap` 交換佇列中鄰近的節點，`q_reverseK` 的變形，把 k 設定成 2 就是兩兩交換。 ### `q_descend`, `q_ascend` 給定一個鍵結串列，走訪每個節點，當節點的右邊有更大的節點時，移除節點。利用雙向鏈結串列的特性，可以從尾端開始往前比較，透過移除節點使佇列維持遞增排序。因為 `list_for` 系列 API 都是從開頭到結尾去走訪，可以先 `q_reverse` 整個佇列，用 `list_for` API 走訪佇列並維持遞增排序，最後再把整個佇列 `q_reverse` 回原本的排序，如下圖。 - 檢測輸入值 - `q_reverse` 目標佇列 - `list_for_each_safe` 走訪節點，紀錄目前最大值 - 移除小於最大值的節點 - 保留不小於最大值的節點 - `q_reverse` 還原佇列排序 ```mermaid graph LR e(5) --> d(2) --> c(13) --> b(3) --> a(8) ``` ```mermaid graph LR a(8) --> b(3) --> c(13) --> d(2) --> e(5) ``` ```mermaid graph LR a(8) --> c(13) ``` ```mermaid graph LR c(13) --> a(8) ``` `q_ascend` 就只要略過 `q_reverse` 的步驟即可。為了增加程式碼重用性，寫好 `q_ascend` 就可以在 `q_descend` 中使用。 > commit [ee42d66](https://github.com/yenslife/lab0-c/commit/ee42d66) ```diff q_reverse(head); - struct list_head *cur, *safe; - element_t *max_ele = list_entry(head->next, element_t, list); - list_for_each_safe (cur, safe, head) { - element_t *cur_ele = list_entry(cur, element_t, list); - if (strcmp(cur_ele->value, max_ele->value) >= 0) - max_ele = cur_ele; - else { - list_del(cur); - q_release_element(cur_ele); - } - } + q_ascend(head); q_reverse(head); return q_size(head); } ``` ### `q_merge` > commit [0dd25ea](https://github.com/yenslife/lab0-c/commit/0dd25ea) 合併所有已排序的佇列，並合而為一個新的**已排序**佇列; :::danger 避免非必要的項目縮排 (即 `* ` 和 `- `)，以清晰、明確，且流暢的漢語書寫。 ::: 計算目前佇列中的節點數量，若為奇數則跳過最後一個節點，等待下一輪處理。使用迭代的方式從佇列的開頭開始進行節點的走訪。另外，自定義函式 `merge_two_queue` 會逐一比較兩個佇列中的節點，根據大小決定是否將節點插入到目標佇列中。這個函式要處理的是**串連佇列的雙向鏈結串列**，觀察 `queue.h` 中的結構體 `queue_contex_t`，可以發現它也是利用 chain 這個 `list_head` 去連結，所以可以利用 `list_entry` 等 API 來操作。 ```c typedef struct { struct list_head *q; /* head of this queue */ struct list_head chain; /* link other queue */ int size; int id; } queue_contex_t; ``` 我在做這份作業的時候一直有個疑問，就是為什麼 Linux 核心的 `list_head` API 總要假定佇列會有一個「頭」，為什麼不要從節點的開頭開始算就好 (像 `list_entry_first` 的實作實際上是操作 `head->next`)，這樣不是比較方便嗎？直到實作 `q_merge` 才懂，用佇列的頭來將佇列們串起來，就像鑰匙圈一樣。 ### `q_sort` > commit [77799e5](https://github.com/yenslife/lab0-c/commit/77799e5aec85169672ba9d2979cd8a406f614408) 以遞增或遞減順序來排序鏈結串列的節點，這邊我使用 Merge sort 演算法，因為可以搭配 `q_merge` 使用。原本我嘗試使用 Insertion sort 演算法，但是完全忘記考慮 `list_for_xxx_safe` 函式只能移除一個節點的限制 (不能改變當前節點之後的結構)，因此繞了一個大彎，在這邊提醒一股腦衝動的自己：「要好好規劃再寫，就像老師說的不要直接看程式碼，至少要有 domain-know 才行。」 :::danger 避免非必要的項目縮排 (即 `* ` 和 `- `)，以清晰、明確，且流暢的漢語書寫。 ::: 檢查輸入值是否合法，並確認佇列是否為空或只有一個節點。利用雙向鏈結串列的特性，找出中間點，類似於 `q_delete_mid` 的方法。然後使用 `merge_two_list` 函式將兩個佇列合併。遞迴以上步驟。實作參考〈[你所不知道的 C 語言: linked list 和非連續記憶體](https://hackmd.io/YA7RMqd6RE2UL8OmmjU9PQ?view#Merge-Sort-%E7%9A%84%E5%AF%A6%E4%BD%9C)〉在使用 `list_cut_position` 函數時，其 node 參數應指定為中間節點。若佇列節點數為偶數，則應指定為中間二個節點中的前一個節點 (考慮到只剩下二個節點的情況) ```c /* divide and conquer */ struct list_head left; INIT_LIST_HEAD(&left); list_cut_position(&left, head, mid->prev); q_sort(head, descend); q_sort(&left, descend); /* merge two list */ merge_two_list(head, &left, descend); ``` > [How to make Mergesort to perform O(n) comparisons in best case? ](https://www.geeksforgeeks.org/make-mergesort-perform-comparisons-best-case/) 在合併之前先比較兩個佇列的尾端節點的值即可。左右兩邊的佇列都是已排序佇列，若左邊佇列的最後一個節點值比右邊佇列的最後一個節點值大，表示左邊佇列至少有一個節點會被換到右邊；反之，沒有比較的必要。至少要寫出這個判斷式再來討論最差狀況比較好，否則所有情況的比較次數都會相同。 :::danger 你如何確認目前的測試程式已涵蓋排序演算法的最差狀況？如何確保演算法是 stable (sorting)？ ::: #### 檢測是否為 stable sort 為了進行快速進行測試，我修改了 `element_t` 結構體，新增了一個成員 `no`，用於表示節點的編號。同時，在 `q_insert_head` 中為每個節點設置了編號，從 1 開始。這個修改突顯了程式碼重用的重要性，因為我只需要修改 `q_insert_head` 函式，而不必再去修改 `q_insert_tail` 函式。還有修改 `qtest.c` 的 `q_show` 函式，讓節點編號可以顯示出來。 ``` cmd> new l = [] cmd> it a 3 l = [a(1) a(2) a(3)] cmd> ih e 3 l = [e(-2) e(-1) e(0) a(1) a(2) a(3)] cmd> it b 2 l = [e(-2) e(-1) e(0) a(1) a(2) a(3) b(4) b(5)] cmd> sort ERROR: The sorting is not stable. The duplicate string "a" is not in original order l = [a(3) a(2) a(1) b(5) b(4) e(0) e(-1) e(-2)] ``` 很明顯這個 merge sort 並不 stable，因為相同值的相對位置被更動了。從結果分析，原本的編號順序被反過來了，推測在 `merge_two_list` 時，遇到相同大小的情況交換了節點。 ```diff - if ((!descend && cmp_result <= 0) || (descend && cmp_result >= 0)) { + if ((!descend && cmp_result < 0) || (descend && cmp_result > 0)) { ``` 修改後觀察執行結果，排序後相對順序如同排序前，符合 stable sort > [commit 398b3df](https://github.com/yenslife/lab0-c/commit/398b3df) ``` cmd> new l = [] cmd> it e 3 l = [e(1) e(2) e(3)] cmd> it b 3 l = [e(1) e(2) e(3) b(4) b(5) b(6)] cmd> it a 3 l = [e(1) e(2) e(3) b(4) b(5) b(6) a(7) a(8) a(9)] cmd> it b 2 l = [e(1) e(2) e(3) b(4) b(5) b(6) a(7) a(8) a(9) b(10) b(11)] cmd> sort l = [a(7) a(8) a(9) b(4) b(5) b(6) b(10) b(11) e(1) e(2) e(3)] ``` 為什麼更改條件判斷就可以說 Merge sort 是 stable sort？參考課本[〈Introduction to Algorithm〉](https://dahlan.unimal.ac.id/files/ebooks/2009%20Introduction%20to%20Algorithms%20Third%20Ed.pdf)上的 Merge Sort 虛擬碼，對照 `q_sort`，邏輯上相同。 $$ \begin{aligned} &\text{Merge-Sort}(A, p, r)\\ &\qquad 1 \ \ \text{if } p < r\\ &\qquad 2 \qquad q = \lfloor (p + r)/2 \rfloor\\ &\qquad 3 \qquad \text{Merge-Sort}(A, p, q)\\ &\qquad 4 \qquad \text{Merge-Sort}(A, q + 1, r)\\ &\qquad 5 \qquad \text{Merge}(A, p, q, r) \end{aligned} $$ ```c void q_sort(struct list_head *head, bool descend) { /* other code of q_sort */ q_sort(head, descend); q_sort(&left, descend); /* merge two list */ if (!list_empty(&left)) merge_two_list(head, &left, descend); } ``` 觀察 Merge 的 12 ~ 17 行，在比大小之後會才涉及交換操作。由於左部分一定比右部分還要早出現，所以**一旦出現相同值的情況，就需要將左邊的節點放到陣列中**。 $$ \begin{aligned} \text{Merge}(A, p, q, r)\\ 1 \quad & n1 = q - p + 1\\ 2 \quad & n2 = r - q\\ 3 \quad & \text{let } L[1..n1 + 1] \text{ and } R[1..n2 + 1] \text{ be new arrays}\\ 4 \quad & \text{for } i = 1 \text{ to } n1\\ 5 \quad & \qquad L[i] = A[p + i - 1]\\ 6 \quad & \text{for } j = 1 \text{ to } n2\\ 7 \quad & \qquad R[j] = A[q + j]\\ 8 \quad & L[n1 + 1] = \infty\\ 9 \quad & R[n2 + 1] = \infty\\ 10 \quad & i = 1\\ 11 \quad & j = 1\\ 12 \quad & \text{for } k = p \text{ to } r\\ 13 \quad & \qquad \text{if } L[i] \leq R[j]\\ 14 \quad & \qquad \qquad A[k] = L[i]\\ 15 \quad & \qquad \qquad i = i + 1\\ 16 \quad & \qquad \text{else } A[k] = R[j]\\ 17 \quad & \qquad \qquad j = j + 1 \end{aligned} $$ 再來觀察修改後的 `merge_two_list`，如果是相同值的情況，則會將 p2 指向的節點 (也就是傳入的 `&left`，左邊部分) 放到 `head`，與虛擬碼相同。這邊也發現一個問題，變數的命名不應該用 L1, L2，改成 right, left 或者 head, left 可能會更好。 ```c /* Merge two sorted queue */ void merge_two_list(struct list_head *L1, struct list_head *L2, bool descend) { struct list_head *p1 = L1->next, *p2 = L2->next; if (list_empty(L1)) { list_splice_init(L2, L1); return; } /* for k = p to r */ while (!list_empty(L2)) { element_t *ele1 = list_entry(p1, element_t, list); element_t *ele2 = list_entry(p2, element_t, list); int cmp_result = strcmp(ele1->value, ele2->value); if ((!descend && cmp_result < 0) || (descend && cmp_result > 0)) { p1 = p1->next; } else { /* insert left node first if the value is equal*/ struct list_head *tmp = p2->next; list_del(p2); list_move(p2, p1->prev); p2 = tmp; } if (p1 == L1) { /* splice remain L2 to L1 tail */ list_splice_tail_init(L2, L1); break; } } } ``` :::warning TODO: 針對現有 qtest 程式碼對於排序結果的檢查，使其確認排序結果為穩定 (stable)，若不滿足此條件則告知使用者並終結運作。這部分的成果可貢獻回 [sysprog21/lab0-c](https://github.com/sysprog21/lab0-c)。 > 已發送 [PR](https://github.com/sysprog21/lab0-c/pull/177) [name=yenslife] ::: 修改測試程式碼如下 ```diff diff --git a/qtest.c b/qtest.c index 0de4842..4eee392 100644 --- a/qtest.c +++ b/qtest.c @@ -67,6 +67,9 @@ typedef struct { static queue_chain_t chain = {.size = 0}; static queue_contex_t *current = NULL; +int node_no_ih = 0; +int node_no_it = 1; + /* How many times can queue operations fail */ static int fail_limit = BIG_LIST_SIZE; static int fail_count = 0; @@ -254,6 +257,7 @@ static bool queue_insert(position_t pos, int argc, char *argv[]) break; } lasts = cur_inserts; + entry->no = pos == POS_TAIL ? node_no_it++ : node_no_ih--; } else { fail_count++; if (fail_count < fail_limit) @@ -622,6 +626,13 @@ bool do_sort(int argc, char *argv[]) ok = false; break; } + + if (!strcmp(item->value, next_item->value) && + item->no > next_item->no) { + report(1, "ERROR: Not stable sort"); + ok = false; + break; + } } } @@ -919,7 +930,8 @@ static bool q_show(int vlevel) while (ok && ori != cur && cnt < current->size) { element_t *e = list_entry(cur, element_t, list); if (cnt < BIG_LIST_SIZE) { - report_noreturn(vlevel, cnt == 0 ? "%s" : " %s", e->value); + report_noreturn(vlevel, cnt == 0 ? "%s(%d)" : " %s(%d)", + e->value, e->no); if (show_entropy) { report_noreturn( vlevel, "(%3.2f%%)", diff --git a/queue.h b/queue.h index bbea8ec..205f7c9 100644 --- a/queue.h +++ b/queue.h @@ -23,6 +23,7 @@ typedef struct { char *value; struct list_head list; + int no; } element_t; ``` 我後來又發現一個問題，就是當佇列被翻轉 (`q_reverse`, `q_reverseK`) 的時候，原先的「前一個元素應該要比後一個元素的編號小」的規則也要跟著修改，這樣實在太麻煩了。所以我打算在 `q_sort` 執行之前，才設定每個節點的編號。並且將原先的 `int no` 改成 `unsign no`，因為不會出現負數了，可以記錄到更大的數值 ```diff set_noallocate_mode(true); - if (current && exception_setup(true)) + unsigned no = 0; + if (current && exception_setup(true)) { + element_t *entry; + list_for_each_entry (entry, current->q, list) { + entry->no = no++; + } q_sort(current->q, descend); + } ``` --- ## 以 Valgrind 分析記憶體問題使用 `valgrind ./qtest` 來看看有沒有記憶體外洩問題 ### 問題一：特定格式的命令註解會導致 "Conditional jump or move depends on uninitialized value(s)" 當我在使用 `traces/trace-eg.cmd` 來測試時，發現當註解格式為 `# . t`，一個註解符號加上一個點和二個空白字符，最後搭配一個非空白字元時，會跳出以下的警告訊息，目前推測是程式在解析字串時遇到例外情況了，像是用到沒有字串結束字元 (null-terminated string) 的字串。 ```shell =338029== Conditional jump or move depends on uninitialised value(s) ==338029== at 0x484D280: __GI_strlen (in /usr/lib/aarch64-linux-gnu/valgrind/vgpreload_memcheck-arm64-linux.so) ==338029== by 0x10D307: strsave_or_fail (report.c:254) ==338029== by 0x10DD2B: parse_args (console.c:152) ==338029== by 0x10DD2B: interpret_cmd (console.c:200) ==338029== by 0x10EA93: run_console (console.c:691) ==338029== by 0x10C98F: main (qtest.c:1258) ==338029== ==338029== Conditional jump or move depends on uninitialised value(s) ==338029== at 0x484D554: strncpy (in /usr/lib/aarch64-linux-gnu/valgrind/vgpreload_memcheck-arm64-linux.so) ==338029== by 0x10D37B: strncpy (string_fortified.h:106) ==338029== by 0x10D37B: strsave_or_fail (report.c:266) ==338029== by 0x10DD2B: parse_args (console.c:152) ==338029== by 0x10DD2B: interpret_cmd (console.c:200) ==338029== by 0x10EA93: run_console (console.c:691) ==338029== by 0x10C98F: main (qtest.c:1258) ==338029== ``` ### 問題二：`remove` 系列函式錯誤 Valgrind 指出在使用 `memcpy` 時會出現非法讀取的行為。 ```shell cmd> it b l = [a b] cmd> rh ==338578== Invalid read of size 8 ==338578== at 0x484F068: __GI_memcpy (in /usr/lib/aarch64-linux-gnu/valgrind/vgpreload_memcheck-arm64-linux.so) ==338578== by 0x10F1F7: memcpy (string_fortified.h:34) ==338578== by 0x10F1F7: remove_element (queue.c:85) ==338578== by 0x10F22F: q_remove_head (queue.c:99) ==338578== by 0x10BCD7: queue_remove (qtest.c:353) ==338578== by 0x10BE73: do_rh (qtest.c:410) ==338578== by 0x10D6FF: interpret_cmda (console.c:181) ==338578== by 0x10DD5F: interpret_cmd (console.c:201) ==338578== by 0x10EA93: run_console (console.c:691) ==338578== by 0x10C98F: main (qtest.c:1258) ==338578== Address 0x4ac7840 is 6 bytes after a block of size 42 alloc'd ==338578== at 0x4849D8C: malloc (in /usr/lib/aarch64-linux-gnu/valgrind/vgpreload_memcheck-arm64-linux.so) ==338578== by 0x10EBCB: test_malloc (harness.c:133) ==338578== by 0x10EE63: test_strdup (harness.c:212) ==338578== by 0x10F0EF: create_element (queue.c:44) ==338578== by 0x10F0EF: q_insert_head (queue.c:55) ==338578== by 0x10C0DF: queue_insert (qtest.c:232) ==338578== by 0x10C2BF: do_ih (qtest.c:280) ==338578== by 0x10D6FF: interpret_cmda (console.c:181) ==338578== by 0x10DD5F: interpret_cmd (console.c:201) ==338578== by 0x10EA93: run_console (console.c:691) ==338578== by 0x10C98F: main (qtest.c:1258) ``` 解決方式：將 `memcpy` 改成 `strncpy`。 > [commit e81db36](https://github.com/yenslife/lab0-c/commit/e81db36652ec4cc3d39d3e03f80ffa7e29bfd2f9) 參考 man page 對 `strncpy` 以及 `memcpy` 的描述，可以知道前者是**最多**會複製 n 個 bytes，後者則是不考慮直接複製 n 個 bytes。在不卻定使用者輸入的情況下，使用 `strncpy` 相對保險一點， > The strcpy() function copies the string pointed to by src, including the terminating null byte ('\0'), to the buffer pointed to by dest. The strings may not overlap, and the destination string dest must be large enough to receive the copy. Beware of buffer overruns! (See BUGS.) > > The strncpy() function is similar, except that at most n bytes of src are copied. Warning: If there is no null byte among the first n bytes of src, the string placed in dest will not be null-terminated. > > The memcpy() function copies n bytes from memory area src to memory area dest. The memory areas must not overlap. Use memmove(3) if the memory areas do overlap. --- ## [〈Dude, is my code constant time?〉](https://eprint.iacr.org/2016/1123.pdf) 論文閱讀測試 `trace-17-complexity` 一直沒有通過 `remove`，錯誤訊息顯示與常數時間相關。 ``` ++ TESTING trace trace-17-complexity: # Test if time complexity of q_insert_tail, q_insert_head, q_remove_tail, and q_remove_head is constant ERROR: Probably not constant time or wrong implementation Probably constant time ERROR: Probably not constant time or wrong implementation ``` 這篇論文主要介紹 [dudect](https://github.com/oreparaz/dudect) 這個工具，由一段三百行的 C 語言程式碼實作而成，基於洩漏檢測 (time leakage detaction)，可以在任何 CPU 架構下量測一段程式碼是否為 constant time。為什麼要常數時間？因為這樣可以避免 [timing attack](https://en.wikipedia.org/wiki/Timing_attack)。 :::danger 資訊技術的術語避免引用中文 Wikipedia，因為後者通常內容過時且不完整，除非該條目僅有中文描述。 > 已附上英文 Wikipedia > [name=yenslife] ::: Timing attack 是一種測量密碼學實作程式碼的執行時間來推斷密鑰或密碼等秘密資訊，是一種<s>[旁路攻擊](https://zh.wikipedia.org/zh-tw/%E6%97%81%E8%B7%AF%E6%94%BB%E5%87%BB)</s> [side channel attack](https://en.wikipedia.org/wiki/Side-channel_attack)。 Constant time 在這邊指的不是「固定時間」，而是讓執行時間不洩漏有關密碼的資訊。 ### Time leakage detection 這套方法的實作簡單來說就是在執行時間上採取 leakage test，先量測二個不同類別輸入值的執行時間，然後檢查他們有沒有[顯著差異 (統計差異)](https://en.wikipedia.org/wiki/Statistical_significance)。 1. Measure execution time - 定義兩種 (class) data 來做 time leakage detection - 不同的 time leakage detaction 最大的差別也是 class 的定義，這邊採用 fix-vs-random - fix value 是為了 "special" corner case 3. Apply post-processing - Cropping：去除無效數值 (可能是系統中斷、測量誤差等因素導致的過長的、與數據無關的時間) - Higher-order processing：這個的解釋我真的看不懂 5. Apply statistical test - [虛無假說](https://zh.wikipedia.org/zh-tw/%E8%99%9B%E7%84%A1%E5%81%87%E8%AA%AA)：「兩種時間分佈是相同的」 - 如果檢定結果反對虛無假說，那這個程式就不是 constant time - 這邊使用 [Welch's t-test](https://en.wikipedia.org/wiki/Welch's_t-test)，簡單好實作 ### 實作方法 1. Pre-process: 在統計分析之前會先進行 pre-process，將小於 $p$ percentile 的量測結果捨棄，目的是要限制分佈的範圍。 2. 統計分析 ### 記憶體比較測試 #### 變數時間實作 (Variable-time implementation) 論文基於 `memcpy` 函式來實作第一類與第二類的 16 bytes 結果比較。第一類為 "random class"，均勻分布的隨機 16 bytes 字串；第二類為 "fix class"，固定輸入為 $s$。以下為這兩類測量時間的 cumulative distribution function，從中可以輕易觀察到輸入資料的 time leakage。 ![image](https://hackmd.io/_uploads/rJuxsQWAp.png) 將 $t$ 檢定的結果繪製如下圖，不同的線表示不同的 $p$ 值，可以發現量測數值的增長與 $t$ 值成正比。當 $t$ 值大於 4.5 可以明顯看出分佈結果是不同的，有些 $t$ 值甚至到達 $|t| = 1000$。 > $t$ 值越接近 0 表示二個分佈越相似 ![image](https://hackmd.io/_uploads/HkETh7-Aa.png) 再來用不同的測試裝置 (test fixture) 來測試。將 fix class 設定成 0 而不是先前的 $s$，假設 emulator 不知道 $s$，可以看到兩類 timing cdf 的分佈更接近 ![image](https://hackmd.io/_uploads/BkesZyQCp.png) 兩種分佈的結果依然不支持虛無假說，儘管有更多量測結果有顯著差異 (statistically significant) $t$ 值比之前的測試還要低很多，但還是有差異，綠色的部分是 $t < 4.5$ ![image](https://hackmd.io/_uploads/HkHff17A6.png) #### 常數時間實作 (Constant-time implementation) 再來分析一段被認為是常數時間的記憶體比較函式 (使用 logical operation，並且不會在不匹配時提前中止)，可以看到二個分佈是相同的。 ![image](https://hackmd.io/_uploads/r1d94J7R6.png) 可以看到二個分佈是無法區分的 (indistinguishable) ![image](https://hackmd.io/_uploads/BJP-SkXRT.png) 論文還有用 AES, Curve25519 等密碼學方法來驗證測試結果是有效的，這邊就不列舉出來。 ### 實作程式碼在[作業要求 lab0-c(d)](https://hackmd.io/@sysprog/linux2024-lab0-d) 中提到有關 $t$ 值的處理只有到比大小，並沒有去除 (Cropping) 極端值。觀察 [dudect.h](https://github.com/oreparaz/dudect/blob/master/src/dudect.h) ，以下是程式碼重點摘錄 :::danger 避免非必要的項目縮排 (即 `* ` 和 `- `)，以清晰、明確，且流暢的漢語書寫。 ::: `t_push(ttest_ctx_t *ctx, double x, uint8_t clazz)` 計算 t value 方程式中出現的值，和 lab0-c 唯一不同的只有變數，在 `dudect.h` 中 `class` 寫成 `clazz`，不知道為什麼要這樣？ `static double t_compute(ttest_ctx_t *ctx)` 計算 Welch's t-test 的 $t$ 值，和 lab0-c 一樣，公式如下 $\text{var}[0] = \frac{\text{m2}[0]}{\text{n}[0] - 1} \quad \text{and} \quad \text{var}[1] = \frac{\text{m2}[1]}{\text{n}[1] - 1}$ $\text{num} = \text{mean}[0] - \text{mean}[1]$ $\text{den} = \sqrt{\frac{\text{var}[0]}{\text{n}[0]} + \frac{\text{var}[1]}{\text{n}[1]}}$ $\text{t_value} = \frac{\text{num}}{\text{den}}$ `ttest_ctx_t` 是 t-test context 的意思 `static int64_t percentile(int64_t *a_sorted, double which, size_t size)` 計算某個百分位數的值，`a_sorted` 是已排序之序列，`which` 是百分比大小，`size` 是序列大小 `static void prepare_percentiles(dudect_ctx_t *ctx)` 用來計算百分位數代表的數值，以及要去除的閾值，先將 `exec_times` 排序，然後計算一系列百分位數，將結果存在 `dudect_ctx_t *ctx` 的 `percentiles` 中。 `void randombytes(uint8_t *x, size_t how_much)` 利用 linux 提供的密碼學安全亂數產生器 [`/dev/urandom`](https://en.wikipedia.org/wiki//dev/random) 產生亂數。 :::warning 檢驗 `randombytes` 的品質，其「亂度」如何。 ::: 觀察 `randombytes` 程式碼 ```c void randombytes(uint8_t *x, size_t how_much) { ssize_t i; static int fd = -1; ssize_t xlen = (ssize_t)how_much; /* 為什麼還要這行？ */ assert(xlen >= 0); if (fd == -1) { for (;;) { fd = open("/dev/urandom", O_RDONLY); /* Linux 提供的亂數產生器 */ if (fd != -1) break; sleep(1); } } while (xlen > 0) { if (xlen < 1048576) /* 1048576 == 2^20 */ i = xlen; else i = 1048576; i = read(fd, x, (size_t)i); if (i < 1) { sleep(1); continue; } x += i; xlen -= i; } } ``` --- ## Code Review [Amazing Code Reviews: Creating a Superhero Collective](https://youtu.be/ly86Wq_E18o) 講座重點摘錄 ### 寫程式的人 1. KISS (Keep It Small and Simple) : 當程式碼超過一定的量 (200 ~ 300 行) 就應立即檢討。簡潔的程式碼是為了能讓人「第一眼」就快速理解、減少盲點，以更上層的角度去看整個系統，而不是鑽牛角尖。特別是在龐大的資訊系統，人腦沒辦法記住每一個元件、變數，過長的程式碼容易使審閱者失焦 (Don't lose big picture!)。 2. 善用 `git branch`, `git cherry-pick`, `git rebase -i` 讓 commit 更有組織。 ### 審閱者的角度 1. 審閱者應思考 "How can we imporve our code?"，就像老師說「批評」不是什麼壞事，是針對「程式碼」而不是「人」。所以不用怕得罪人，前提是必須**清楚表明意圖**。 2. 給予正向回饋能讓團隊更有動力。 3. 避免假設立場，腦補作者的意圖。 ### 範例範例 1：沒有清楚定義問題 > bool or int? > > ![image](https://hackmd.io/_uploads/Hy-Z5bGRa.png) 範例 2：沒有清楚表示動機、沒有指出問題所在、沒有指出閱讀手冊的章節、假定對方沒有閱讀手冊 > Read the Google Python style guidelines > > ![image](https://hackmd.io/_uploads/By80FbMCa.png) 範例 3：如果對方是有經驗的開發者，或者已經與審閱者有溝通過類似的問題，毋須過多解釋也能表明意圖；如果對方是新手 (newbie)，這樣的回饋可能不太適合，可以用外部連結補充說明。 >We should avoid single-character variables. How about `board_size` or simply `size` ? > >![image](https://hackmd.io/_uploads/SkQVsZGAa.png) 好的範例：解釋動機 (why to do this way?)、使用 "We" 讓人有「團隊感」、提供解決方案、「建議」而不是「命令」 > How about breaking out the problem description to a Markdown file? That way we have formatting, and we can more easily share the whole file with th candidates. > >![image](https://hackmd.io/_uploads/ryhz5bMRT.png) ### 討論 1. 避免不必要的討論 2. 避免過多人討論 3. 討論需聚焦在同一主題，避免 context switch 4. 盡可能選擇已經理解程式碼前後關係、在這個主題有經驗的人來討論 / 審閱。 ### Every Engineer is also a Writer > [影片](https://youtu.be/hKKeqDERssg?si=k9Eyn293UxwUx82l) 原本的句子使用不恰當的代名詞 "one"，這邊用 "you" 會更好，甚至根本就不需要代名詞 > when one thinks about machine learning models, one must consider that it is really only as good as the data upon which one trains it on. 在開始討論前不需要額外的句子，這也是我以前常犯的錯誤，常常用一些「廢話」來假裝自己「懂很多」 > This is to say that when one thinks about... 一個好的說明應該像這樣 > Machine learning models are only as good as their training data. ## Git rebase 參考：[另一種合併方式 (使用 rebase)](https://gitbook.tw/chapters/github/syncing-a-fork), [怎麼跟上當初 fork 專案的進度](https://gitbook.tw/chapters/branch/merge-with-rebase) 為了跟上 [sysprog21/lab0-c](https://github.com/sysprog21/lab0-c.git) 在我 fork 之後的新進度 (commit)，使用 `git fetch`, `git rebase` 等命令以更新分支，將我自己的修改分支合併到最新的分支上 ### 新增遠端分支並 rebase 新增 `sysprog21/lab0-c` url，取名為 "lab0-c"，並 fetch 遠端分支 ```shell $ git remote add lab0-c https://github.com/sysprog21/lab0-c.git $ git remote -v # 查看現存 url lab0-c https://github.com/sysprog21/lab0-c.git (fetch) lab0-c https://github.com/sysprog21/lab0-c.git (push) origin git@github.com:yenslife/lab0-c.git (fetch) origin git@github.com:yenslife/lab0-c.git (push) $ git fetch lab0-c ``` 查看目前分支，確認 `sysprog21/lab0-c` 的 master 分支已經存在，目前所在分支為本地的 master ```shell $ git branch -a * master remotes/lab0-c/master remotes/origin/HEAD -> origin/master remotes/origin/master remotes/origin/rebase-branch ``` 原本的 base 是 267cca7，使用 `git rebase lab0-c/master` 將本地分支「移花接木」到新的 base d43e072 之後 ```mermaid graph LR n1(4627f69\n Merge pull request #152) n2(267cca7\nFix cppcheck) n3(b6de2ed\nFix typo) n4(34a821c\nMerge pull request #157) n1-->n2-->n3-->n4 n5(4d27632 \nImplement q_size) n6(ecbfe45 \nImplement q_new) n5-->n6 n2-->n5 ``` ```mermaid graph LR n1(d48c56d\nImprove string randomness) n2(d43e072\nlab0-c/master\nMerge pull request #173) n3(4d27632 \nImplement q_size) n4(ecbfe45 \nImplement q_new) n1-->n2-->n3-->n4 ``` ### 解衝突在我 fork 專案之後，sysprog/lab0-c 修改了 queue.c 中的 `q_free` 函式，導致無法自動合併，需要手動解衝突 ```shell $ git rebase lab0-c/master Auto-merging queue.c CONFLICT (content): Merge conflict in queue.c error: could not apply 28e91e6... Implement q_free hint: Resolve all conflicts manually, mark them as resolved with hint: "git add/rm <conflicted_files>", then run "git rebase --continue". hint: You can instead skip this commit: run "git rebase --skip". hint: To abort and get back to the state before "git rebase", run "git rebase --abort". Could not apply 28e91e6... Implement q_free ``` `<<<<<<<HEAD` 到 `=======` 中間的程式碼表示欲合併遠端分支 (在這邊是 sysprog21/lab0-c 的 master) 的版本，`=======` 到 >>>>>>> `28e91e6 (Implement q_free)` 則是本地的版本。要做的就是將這三行刪除，選擇要進行的變更。這個例子中，我將所有 `l` 變數重新命名成 `head`。 ``` /* Free all storage used by queue */ <<<<<<< HEAD void q_free(struct list_head *head) {} ======= void q_free(struct list_head *l) { if (!l) return; element_t *current, *next; list_for_each_entry_safe (current, next, l, list) { free(current->value); free(current); } free(l); } >>>>>>> 28e91e6 (Implement q_free) ``` 修改完畢後，使用 `git add` 將修改好的檔案加入 stage。並使用 `git rebase --continue` 來繼續完成其他 commit 的更新，並編輯 commit message。 ``` $ git add queue.c $ git rebase --continue [detached HEAD 717d9e3] Implement q_free 1 file changed, 12 insertions(+), 1 deletion(-) Auto-merging queue.c CONFLICT (content): Merge conflict in queue.c error: could not apply e435a52... Use q_release_element to simplify code hint: Resolve all conflicts manually, mark them as resolved with hint: "git add/rm <conflicted_files>", then run "git rebase --continue". hint: You can instead skip this commit: run "git rebase --skip". hint: To abort and get back to the state before "git rebase", run "git rebase --abort". Could not apply e435a52... Use q_release_element to simplify code ``` ## 在 qtest 提供新的命令 `shuffle` > commit [4378740](https://github.com/yenslife/lab0-c/commit/4378740) 使用 `[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]` 來測試 ## 研讀 Linux 核心原始程式碼的 lib/list_sort.c