Linux 核心專題: 並行的環狀雙向鏈結串列

# Linux 核心專題: 並行的環狀雙向鏈結串列 > 執行人: wu81177 ### Reviewed by `kevinzxc1217` 這裡僅介紹各函式功能，希望有關於如何運用這些函式來進行節點標的流程說明。 > 節點標記相關的函式有三個 :::success 可以參考 concurrent-ll 專案的 [src/lockfree/list.c](https://github.com/sysprog21/concurrent-ll/blob/master/src/lockfree/list.c)，在 `list_search` 中: ```c if (!is_marked_ref(t_next)) { (*left_node) = t; left_node_next = t_next; } ``` 當判斷 `t_next` 沒有被標記，就更新 `left_node` 和 `left_node_next` 到下一個值來走訪 `get_unmarked_ref` 是用來取得沒被標記的版本幫助被困在已移除節點的執行緒脫困 `get_marked_ref` 則是用來取得標記移除的指標，用來把指標修改成標記的版本 ::: ### Reviewed by `stevendd543` 想請問 `add_tail` 中有實作鏈結串列 traversal 的部分，若失敗將會 re-traversal，但在並行環境下如果 prev 被刪掉了是否會造成 dangling pointer 這部分要如何在這個狀況下解決並且維持 lock-free。 ```c while (prev->next != tail) { prev = prev->next; } ``` :::success 若仿照〈[A Pragmatic Implementation of Non-Blocking Linked Lists](https://www.cl.cam.ac.uk/research/srg/netos/papers/2001-caslists.pdf)〉論文的方法，prev 被刪掉只是將指標沒用到的 bit 做修改來表示被移除，停留在被移除節點的執行緒還是可以將被標記的指標還原然後離開 ```graphviz digraph G { rankdir=LR; node [shape=record, style=filled, fillcolor=white]; edge [arrowhead=none]; H [label="{H| }", shape=record, width=0.5, height=0.5]; n10 [label="{10|X}", shape=record, width=1]; n30 [label="{30| }", shape=record, width=1]; T [label="{T| }", shape=record, width=0.5, height=0.5]; H -> n10 [color=gray, arrowhead=none]; n10 -> n30 [color=red, arrowhead=normal]; n30 -> T [color=red, arrowhead=normal]; H -> n30 [color=red, arrowhead=normal, constraint=false]; } ``` ::: ### Reviewed by `youjiaw` 可以使用更多效能分析工具來討論修改後的鏈結串列有哪些方面的效能提升。 > 好的，想請問有建議使用哪些工具嗎 ### Reviewed by `Shiang1212` > commit : [f635ed](https://github.com/wu81177/concurrent_singly_ll/commit/f635ed6bc84d9743e75dd71469a809828f82a62f) > commit : [d48b945](https://github.com/wu81177/concurrent_singly_ll/commit/d48b945a2fdfdb8626e6618dafd9dc7754b7e447) 在撰寫 commit message 時，應參考[如何寫一個 Git Commit Message]([https://](https://blog.louie.lu/2017/03/21/%E5%A6%82%E4%BD%95%E5%AF%AB%E4%B8%80%E5%80%8B-git-commit-message/))，試著簡述你每次 commit 的修改內容與考量，保持良好的 commit message 撰寫習慣，能提高專案的可維護性。 > 了解，感謝提醒 ## 任務簡介以 Linux 核心的環狀雙向鏈結串列為範本，開發 lock-free 的實作，並探究其效能表現。 ## TODO: 研究[並行的鏈結串列](https://hackmd.io/@sysprog/concurrency-lockfree) > 針對鏈結串列，研究 lock-based 和 lock-free 實作手法、重現實驗，並理解其 scalability 表現。不使用 lock 的前提之下常常會看到兩個詞， lockfree 和 lockless，定義如下： * lock-free: 強調以系統的觀點來看，只要執行足夠長的時間，至少會有一個執行緒會有進展 (progress) * lockless: 避免使用 lock 而能夠安全無誤地操作共用資料為了探討在 [concurrent-ll](https://github.com/sysprog21/concurrent-ll) 程式碼 lock 和 lock-free 版本的差異，老師在 4 核心處理器上做測試: ![image](https://hackmd.io/_uploads/r1UhBPtIR.png) :::danger 你要重現實驗，而非引用。 ::: 可以看到這張圖顯示了 lock-free 和 lock-based 這兩種不同同步機制在多執行緒環境中的效能差異。綠色線條代表的 lock-free 機制，其效能會隨著執行緒數量的增加而顯著提升，這表示 lock-free 在高並行情況下能更好地利用多核心處理器的資源。相較之下，紅色線條代表的 lock-based 機制，其效能幾乎沒有隨著執行緒數量的增加而提升，這顯示了在高並行情境下，lock-based 機制的瓶頸效應，使得它無法充分利用多核心處理器的優勢而上面的實作基礎是一篇 2001 年的論文[〈A Pragmatic Implementation of Non-Blocking Linked Lists〉](https://www.cl.cam.ac.uk/research/srg/netos/papers/2001-caslists.pdf) :::danger linked list 翻譯為「鏈結串列」。 ::: 裡面提到當鏈結串列在很近的時間做 insert 和 remove 如果不考慮 concurrent 有可能會發生錯誤 ![Sy1lwB1QA](https://hackmd.io/_uploads/Sywoc4om0.png) 如圖， remove 10 比 insert 20 早一點發生，就出現了這種錯誤，因此[〈A Pragmatic Implementation of Non-Blocking Linked Lists〉](https://www.cl.cam.ac.uk/research/srg/netos/papers/2001-caslists.pdf)中提出先將欲刪除的節點標記等到之後再拆除的做法 ![SymbwHkXC](https://hackmd.io/_uploads/Sy1OjEim0.png) :::danger 注意用語： * byte 是「位元組」，而非「字節」 * bit 之所以翻譯為「位元」，是避免歧異，例如 "32-bit integer" 在簡體中文寫作「三十二位整数」，bit 乍看就成為量詞，與本意牴觸。 ::: 由於結構定址會對齊，在64<s>位</s> 位元系統會對齊 8 個<s>字節</s> 位元組，所以底下結構的最後一個 bit 不會用到，可以用來標記是否移除 ```c typedef intptr_t val_t; typedef struct node { val_t data; struct node *next; } node_t; ``` 節點標記相關的函式有三個 ```c static inline bool is_marked_ref(void *i) { return (bool) ((uintptr_t) i & 0x1L); } static inline void *get_unmarked_ref(void *w) { return (void *) ((uintptr_t) w & ~0x1L); } static inline void *get_marked_ref(void *w) { return (void *) ((uintptr_t) w | 0x1L); } ``` `is_marked_ref` 會判斷最後一個 bit 是否為 1，如果是代表節點被標記移除 `get_unmarked_ref` 會取得當前節點最後一個 bit 為 0 的值，方便讀取鍵值 `get_marked_ref` 可取得最後一個 bit 為 1 的值，也就是標記的值 :::danger 注意用語： * thread 是「執行緒」，而非「線程」務必使用本課程教材規範的術語。 ::: 而在這裡會用到的 atomic operation 有兩種，第一種是 compare and swap (CAS)，行為如下，如果 a 指向舊值就會換成新值，如果 a 被其他<s>線程</s>執行緒改為不如預期的東西，就會回傳方便查看 :::danger 改進你的漢語表達。 ::: ```c int CAS(int *a, int old_val, int new_val) { if (*a == old_val) { *a = new_val; return old_val; } return *a; } ``` :::danger 注意書寫規範： * 使用 lab0 規範的程式碼書寫風格，務必用 clang-format 確認一致 * 注意看[程式碼規範](https://hackmd.io/@sysprog/linux2024-lab0-a)的「clang-format 工具和一致的程式撰寫風格」，筆記列出的程式碼依循指定的風格。 ::: 實際使用如下 ```c #define CAS_PTR(a, b, c) \ __extension__({ \ typeof(*a) _old = b, _new = c; \ __atomic_compare_exchange(a, &_old, &_new, 0, __ATOMIC_SEQ_CST, \ __ATOMIC_SEQ_CST); \ _old; \ }) ``` :::danger atomic 不能翻譯為「原子」！參見: https://hackmd.io/@sysprog/it-vocabulary ::: 首先使用了 GNU 擴展中的 `__extension__` 來避免特定編譯器警告。程式碼的步驟如下，接著使用 GCC　內建<s>函數</s>　`__atomic_compare_exchange`　進行<s>原子</s> 不可分割的比較並交換操作，0 代表 strong 比較交換，會一直試到成功， `__ATOMIC_SEQ_CST` 代表使用<s>最強</s> the strongest memory order。 :::danger weak 和 strong 不能直接翻譯。區分「函數」和「函式」，留意用語。 ::: 另一個 atomic operation 是 `__atomic_fetch_add` ```c __atomic_fetch_add(a, 1, __ATOMIC_RELAXED) ``` :::danger 注意用語： * memory 是「記憶體」，而非「內存」務必詳閱 [資訊科技詞彙翻譯](https://hackmd.io/@sysprog/it-vocabulary) 和 [詞彙對照表](https://hackmd.io/@l10n-tw/glossaries) ::: 作用是將指向 a 所指向的變數的值加1，並返回該變數在加1之前的值 `__ATOMIC_RELAXED` 是指定記憶體順序的標志，在這裡使用的是 relaxed ordering ，表示這個加法操作不需要進行同步或排序保證，它僅僅是對變數值進行加法操作，且不對其他執行緒可見的內存操作進行任何排序約束。 ### 撰寫 Lockfree 程式實作 add_tail 和 remove 附上測試，考慮並行程式的正確性參考 [lockfree/list.c](https://github.com/sysprog21/concurrent-ll/blob/master/src/lockfree/list.c) ，差別是將插入到排序位置的 `list_add` 改為新增到尾部的 `add_tail` ，串列鍵值沒有排序的版本。 :::danger 在中文敘述中，使用全形標點符號，例如該用「，」，而非 "," ::: #### `list_search` :::danger 注意書寫規範： * 使用 lab0 規範的程式碼書寫風格，務必用 clang-format 確認一致 * 在中文敘述中，使用全形標點符號，例如該用「，」，而非 "," ::: 首先理解原版 ```c while (is_marked_ref(t_next) || (t->data < val)) { if (!is_marked_ref(t_next)) { (*left_node) = t; left_node_next = t_next; } t = get_unmarked_ref(t_next); if (t == set->tail) break; t_next = t->next; } ``` :::danger 注意用語： * access 是「存取」，而非「訪問」(visit) ::: 若進入 while 有兩種原因，`t->data < val` 比好理解，就是迴圈跑到了要插入的位置，就算一直跑到 tail 也會被它的鍵值 `INT_MAX` 擋下來而另外一個條件 `is_marked_ref(t_next)` 則是用來跳過已被標記要刪除的節點，用 `t = get_unmarked_ref(t_next);` 來正確 <s>訪問</s> 存取下一個節點 :::danger 改進你的漢語表達。 ::: 而在我的版本中由於是用在未排序的串列，可以做一些簡化 ```diff + while (t != set->tail) { - while (is_marked_ref(t_next) || (t->data < val)) { ``` 再來是做以下修改讓它可以找到對應鍵值 ```diff - if (t == set->tail) + if (t->data == val || t == set->tail) ``` #### `add_tail` :::danger 注意書寫規範： * 使用 lab0 規範的程式碼書寫風格，務必用 clang-format 確認一致 ::: 由於是加到尾部，不像 `list_add` 需要用到 `list_search` ```c bool add_tail(list_t *the_list, val_t val) { if (the_list == NULL) return false; node_t *new_elem = new_node(val, NULL); if (new_elem == NULL) return false; while (1) { node_t *tail = the_list->tail; node_t *prev = the_list->head; while (prev->next != tail) { prev = prev->next; } new_elem->next = tail; if (CAS_PTR(&(prev->next), tail, new_elem) == tail) { FAI_U32(&(the_list->size)); return true; } } } ``` #### 測試 `add_tail` & `list_search` :::danger 闡明你的測試計畫。區分「函數」和「函式」，務必詳閱: https://hackmd.io/@sysprog/it-vocabulary ::: 撰寫 test.c 用 `add_tail` 插入數個節點然後用 `list_search` 找看看剛加入的節點編譯遇到的第一個問題是在 atomics.h 中使用 typeof 時沒有聲明這個<s>函數</s> ``` atomics.h:14:9: warning: implicit declaration of function ‘typeof’ [-Wimplicit-function-declaration] 14 | typeof(*a) _old = b, _new = c; \ | ^~~~~~ ``` 在 Makefile 中將編譯器標準從 `-std=c11` 改成 `-std=gnu99`，這樣才可以確保 GCC 擴展（如 typeof）被正確使用 :::danger 改成 `-std=gnu11` ::: 再來是函式外部引用的問題 ``` /usr/bin/ld: test.o: in function `main': test.c:(.text+0xd5): undefined reference to `list_search' /usr/bin/ld: test.c:(.text+0x129): undefined reference to `list_search' collect2: error: ld returned 1 exit status make: *** [Makefile:11: test] Error 1 ``` 正常來說因為 `list_search` 不會被外部函式引用，所以可以前綴 static ，但是在測試的時候應該要先拿掉才能被測試函式直接引用 :::danger 改進漢語表達。 ::: 成功編譯之後執行遇到 Segmentation fault 於是使用 GNU GDB 找錯 ``` Program received signal SIGSEGV, Segmentation fault. 0x00005555555555c8 in main () at test.c:20 20 if (found_node && found_node->data == 20) { ``` 先確認了 `found_node` 不是 NULL ，覺得很奇怪，經過一段時間的嘗試，發現關鍵問題 ``` left_node_next: 0x562de324a360, right_node: 0x562de324a360, right_node->data: 60 return right_node :0x562de324a360 found_node: 0xffffffffe324a360 ``` :::danger bit 是「位元」，而非「位」，後者是量詞。 bit 之所以翻譯為「位元」，是避免歧異，例如 "32-bit integer" 在簡體中文寫作「三十二位整数」，bit 乍看就成為量詞，與本意牴觸。 ::: return right_node 是 `list_search` 要回傳給 `found_node` 的指標，但高 32 <s>位</s> 被捨棄了，經過查找 <s>[資料](https://blog.csdn.net/musiclvme/article/details/105230955)</s> :::danger 查閱第一手材料，避免參照良莠不齊的 CSDN ::: 發現如果64位系統中使用函式的檔案沒有包含那個函式的原型，雖然 GCC 能夠順利編譯，但在回傳指標的時候會被截斷成32位，因此將 `list_search` 的原型寫進標頭檔裡就能解決這個問題，但如果不是為了在 test.c 中測試 `list_search` 也沒必要多加這行原型，測試完成後可以刪掉，並且加上 static 關鍵字，維持符號限定於所處的編譯單元。 :::danger 注意用語。「封閉性」可能是數學的專有名詞，本專題可能會有數學證明，因此你該避免濫用詞彙。 ::: > commit [8777d6f](https://github.com/wu81177/concurrent_singly_ll/commit/8777d6f95c9e273665d730364622a9d8686df043) #### `list_remove` 原版的 list_remove 適用於存在沒有相同鍵值的串列，這裡我希望修改成可移除多個相同鍵值 :::danger 注意書寫規範： * 使用 lab0 規範的程式碼書寫風格，務必用 clang-format 確認一致 ::: ```c bool list_remove(list_t *the_list, val_t val) { bool removed = false; while (1) { node_t *left = NULL; node_t *right = list_search(the_list, val, &left); /* check if we found our node */ if ((right == the_list->tail) || (right->data != val)) { return removed; // Return true if any element was removed } node_t *right_succ = right->next; if (!is_marked_ref(right_succ)) { if (CAS_PTR(&(right->next), right_succ, get_marked_ref(right_succ)) == right_succ) { FAD_U32(&(the_list->size)); removed = true; } } } } ``` 經過測試會在 `list_search` 中陷入無窮迴圈，嘗試除錯多時無果，暫時放棄 > commit : [f635ed](https://github.com/wu81177/concurrent_singly_ll/commit/f635ed6bc84d9743e75dd71469a809828f82a62f) 修正為 clang-format > commit:[d48b945](https://github.com/wu81177/concurrent_singly_ll/commit/d48b945a2fdfdb8626e6618dafd9dc7754b7e447) ## TODO: 修改為符合 Linux 風格的環狀雙向鏈結串列 > 開發 lock-based 和 lock-free 且符合 Liux 核心風格的環狀雙向鏈結串列，附上完整測試和效能分析。 > 應當證明你的實作能夠正確運作，參照課程教材提及的論文，提供其數學和正規證明。 > 需要使用 ThreadSanitizer 排除並行程式的潛在問題。目標是將 [lockfree/list.c](https://github.com/sysprog21/concurrent-ll/blob/master/src/lockfree/list.c) 修改為符合 Linux 風格的 circular doubly-linked list 下圖展現 `list_head` 中包含 next 和 prev 雙向連接其他 list_head 一個 link list 中會有一個 list_head 單獨存在當 head 其他則是被包含在 node_t 物件裡面 ```graphviz digraph G { node [shape=record]; node1 [label="{ node_t | { list_head } }"]; node2 [label="{ list_head }"]; node3 [label="{ node_t | { list_head } }"]; node1 -> node2; node2 -> node3; node3 -> node1; } ``` 主要的改變是捨棄原本 `list_t` 結構的定義： ``` c typedef struct { node_t *head; node_t *tail; uint32_t size; } list_t; ``` 因為 list head 就足以代表一個 linux like 的串列然後 `node_t` 的定義也跟著改成以下這樣 ```c typedef struct node { val_t data; struct list_head list; } node_t; ``` #### `list_search` & `list_insert` `list_search` 因為捨棄單向鏈結串列中 head 和 tail 的定義，所以要先排除單個和零個成員的情況 ```c if(head->next == head) return NULL; if(head->next == head->prev) return list_entry(head->next, node_t, list); ``` 這裡照理說有兩種修改方式，一種是定義 left, right為 node_t 的指標或間接指標，這樣的優點是取 data 不用經過 `list_entry` ，另一中種方式是將 left, right 定義成 list_head 指標或間接指標類型，優點是移動到下一個節點比較方便我原先是嘗試第一種，但在 `list_search` 上遇到問題，照理說 CAS_PTR 第一個輸入值類型是後面的指標，於是我像以下這樣對 `list_entry` 的值取指標： ```c CAS_PTR(＆(list_entry((*left_node)->list.next)), node_t, list), left_node_next, right_node) ``` :::danger 之所以編譯失敗，是因為你沒搞清楚指標型態，回去看 C 語言規格書。 ::: 但就是編譯不過，所以我改成將左右節點的類型定義為 `list_head` 的方式經過修改可以順利編譯了但是遇到 Segmentation fault > commit：[25b00d7](https://github.com/wu81177/lockfree_circular_doubly_ll/commit/25b00d75edf5cde6e23df18208f374f9ae1e2b7d) > 嘗試用 GDB 尋找問題，發現是在插入第一個節點時就在 `list_insert` 第 100 行發生錯誤: ```c if (CAS_PTR(&(left->next), right, new_elem) == right) { ``` 經過兩個修改，第一個是改變 `list_search` 在沒有在空串列的情況，改成回傳 head 而不是 NULL，並且刪掉單節點的情況，應該不用分開討論令一個改變是分開討論 `list_insert` 插入第一個節點的情況： ```c if (left == NULL) { if (CAS_PTR(&(head->next), head, new_elem)) { head->prev = new_elem; new_elem->next = head; new_elem->prev = head; return true; } continue; } ``` 經過這樣修改只能應付絕對遞增依序加入節點的情況 > commit: [a3e2f61](https://github.com/wu81177/lockfree_circular_doubly_ll/commit/a3e2f61dfe6b8ca3e53df2580aafb49f725015d7) 經過一些修改，現在沒遇到 Segmentation fault 但會在 list_search 的 while(1) 中無限循環，代表 `left_node_next != right_node` >commit:[782de28](https://github.com/wu81177/lockfree_circular_doubly_ll/commit/782de2823f8ec4204adf49979cd6f0c3ae0e0a5e) 接著我意識到如果加回原版有最小和最大鍵值的 head 和 tail 可以少考慮很多邊界條件所以我把 `list_new` 改成以下這樣 ```c list_head *list_new() { list_head *head = malloc(sizeof(list_head)); if (!head) { return NULL; } node_t *min_node = malloc(sizeof(node_t)); if (!min_node) { free(head); return NULL; } min_node->data = INT_MIN; node_t *max_node = malloc(sizeof(node_t)); if (!max_node) { free(min_node); free(head); return NULL; } max_node->data = INT_MAX; head->next = &min_node->list; min_node->list.prev = head; min_node->list.next = &max_node->list; max_node->list.prev = &min_node->list; max_node->list.next = head; head->prev = &max_node->list; return head; } ``` :::danger 使用精準的詞彙。 ::: 建立串列的同時會置入兩個分別有最小和最大值 <s>極值</s> 的節點在加上一些小修正，終於使 `list_insert` 正常工作 >commit:[d8cd6bb](https://github.com/wu81177/lockfree_circular_doubly_ll/commit/d8cd6bb54365e0815e57be0a4f396a2d327e95e4) 隨後又發現其實只有<s>順行正確，逆行</s> 從開頭往 next 的方向正確，往 prev 的方向接不起來，所以在 `list_insert` 做了以下修正 :::danger 避免說「順行」和「逆行」。以環狀雙向鏈結串列的節點存取角度來看，只有「從開頭往 `next` 的方向」和「從開頭往 `prev` 的方向」，二者並非「順」和「逆」的關聯。用語精準是工程人員的必要素養之一。 ::: ```diff if (CAS_PTR(&(left->next), right, new_elem) == right) { - left->next->prev = new_elem; + while (!CAS_PTR(&(right->prev), left, new_elem)) { + right = list_search(head, val, &left); + if (right != head->prev && list_entry(right, node_t, list)->data == val) { + return false; + } + new_elem->next = right; + new_elem->prev = left; + CAS_PTR(&(left->next), right, new_elem); + } return true; } ``` :::danger 改進你的漢語表達。 ::: 在插入新節點 new_elem 後，用 whlie 確保 right 的 prev 指向新節點，如果 CAS_PTR 操作失敗，則重複嘗試，直到成功為止，後面則是多做一些重複動作確保在並行環境下不會出錯 >commit:[818f790](https://github.com/wu81177/lockfree_circular_doubly_ll/commit/818f79011c0ee67db95b39042e944806bb22560c) #### `list_remove` 一樣也是把原版做一些修正，像是 node_t 改成 list_head 等等，但發現在移除的時候 `list_search` 又出現問題了由於我是移除最一個節點所以在這行出現問題 ```c if (list_entry(t, node_t, list)->data == INT_MAX) ``` 經過一些思考發現是我上一行犯了一個錯誤，前一行 `t = t_next` 應該改為 `t = get_marked_ref(list_entry(t_next, node_t, list));` 這樣才能讀到原始的指標，但是經過這樣修正後印出串列會陷入 -2147483648 20 33 的無限循環，其中 -2147483648 是 INT_MIN ，測試的內容是插入 20 10 40 然後移除 40 ，不知道 33 是從哪裡來的 > commit:[88e91ac](https://github.com/wu81177/lockfree_circular_doubly_ll/commit/88e91acbc4cdf7535a2b91afce93ef19f00dc3bd) 然後我又發現，其實剛剛的 `t = t_next` 不用改掉，因為這和原版 singly 的狀況大不同，原本串列的連接是依靠 node_t 裡面的 next 成員來連接，但是在這裡是用裡面的 list_head 來串接，所以 node_t 的 bit 被改寫不會影響 list_head 的連接由於卡關多時，我又回去複習了一下這個演算法的精神 ```graphviz digraph G { rankdir=LR; node [shape=record, style=filled, fillcolor=white]; edge [arrowhead=none]; H [label="{H| }", shape=record, width=0.5, height=0.5]; n10 [label="{10|X}", shape=record, width=1]; n30 [label="{30| }", shape=record, width=1]; T [label="{T| }", shape=record, width=0.5, height=0.5]; H -> n10 [color=gray, arrowhead=none]; n10 -> n30 [color=red, arrowhead=normal]; n30 -> T [color=red, arrowhead=normal]; H -> n30 [color=red, arrowhead=normal, constraint=false]; } ``` :::danger 使用 Graphviz 重新製圖。 ::: 發現我在這個版本中應該標記的是 list_head 的指標才對，而不是 node_t，否則沒有地方把標記的指標存起來但如果要修改就要先檢查 list_head 的對齊狀況，複習了一下 [你所不知道的 C 語言：記憶體管理、對齊及硬體特性](https://hackmd.io/@sysprog/c-memory) 並且測試實際狀況 ```c printf("Alignment of struct list_head *: %zu\n", __alignof__(struct list_head *)); struct list_head node1; struct list_head *ptr = &node1; printf("size of list_head : %d\n", sizeof(node1)); printf("size of list_head* : %d\n", sizeof(ptr)); ``` 輸出： ``` Alignment of struct list_head *: 8 size of list_head : 16 size of list_head* : 8 ``` 可以看到我的電腦是以 8 bytes 去對齊的，代表 list_head 指標最低 3 個 bits 保證是 0，可以用來標記確定策略後先回頭修改 list_search: > commit:[d4fbde7](https://github.com/wu81177/lockfree_circular_doubly_ll/commit/d4fbde727aaf917e1c34344ff7b5d975f0386485) 接著完成了 list_remove 從開頭往 next 的方向的部份，然後往 prev 方向回去看被移除的節點的 next 是否最低位為 1 ``` 0x5cc06d5e1759 ``` 可以看到確實是 1 沒錯 > commit : [38e6af8](https://github.com/wu81177/lockfree_circular_doubly_ll/commit/38e6af8a1d393925383a993e577b32e45e630963) :::danger 此處鏈結串列是「環狀」且「雙向」的鏈結串列，哪來「逆向」呢？你需要定義操作。 ::: 再來就是完成從開頭往 prev 的方向的部份 ```diff list_head *right_succ = right->next; + list_head *right_prev = right->prev; if (!is_marked_ref(right_succ)) { if (CAS_PTR(&(right->next), right_succ, get_marked_ref(right_succ)) == right_succ) { - if (CAS_PTR(&(left->next), right, right_succ) == right) - return true; + if (CAS_PTR(&(left->next), right, right_succ) == right) { + if (CAS_PTR(&(right_succ->prev), right, right_prev) == right) { + return true; + } + } } } } ``` > commit:[a229811](https://github.com/wu81177/lockfree_circular_doubly_ll/commit/a2298115332add2afef0573c2547d94e49e87a3e) 目前為止完成了單執行緒的測試更改為 clang-format > commit:[368bca0](https://github.com/wu81177/lockfree_circular_doubly_ll/commit/368bca0406bd3783ef5219ee27a2cb1261f8d850) ## TODO: 提出效能改進方案 > 善用課程提到的效能分析工具，提出上述程式碼的後續效能改善。 ### 使用 ThreadSanitizer 測試並行首先要在 Makefile 裡面加上這兩行 ``` CFLAGS += -fsanitize=thread LDFLAGS += -fsanitize=thread ``` CFLAGS 是編譯器選項變數，用於指定編譯階段的編譯器選項，第一行的作用是在編譯階段啟用 ThreadSanitizer，使得編譯時插入額外的程式碼來檢測執行緒相關的錯誤 LDFLAGS 是連結器選項變數，用於指定連結階段的連結器選項，第二行使得連結器在生成最終的可執行<s>文件</s> 檔案時包含 ThreadSanitizer 所需的運行時 runtime library <s>庫</s> 和支援程式碼 :::danger 注意用語： * library 在軟體工程，譯作「函式庫」或「程式庫」，而非語焉不詳的「庫」 * file 是「檔案」，而非「文件」(document) ::: :::danger 注意書寫規範： * 使用 lab0 規範的程式碼書寫風格，務必用 clang-format 確認一致 ::: 再來是撰寫測試程式: ```c #define NUM_THREADS 4 #define NUM_ITERATIONS 100 struct list_head *head; void* thread_func(void* arg) { for (int i = 0; i < NUM_ITERATIONS; ++i) { list_insert(head, i); list_remove(head, i); } return NULL; } int main() { pthread_t threads[NUM_THREADS]; head = list_new(); for (int i = 0; i < NUM_THREADS; ++i) { if (pthread_create(&threads[i], NULL, thread_func, NULL) != 0) { perror("pthread_create"); return 1; } } for (int i = 0; i < NUM_THREADS; ++i) { if (pthread_join(threads[i], NULL) != 0) { perror("pthread_join"); return 1; } } printf("All threads completed.\n"); return 0; } ``` 這裡會建立四個執行緒，每個執行緒會連續插入和移除 `i` 值 100 次而 `pthread_join` 是用來確保主執行緒 main 在所有加入的執行緒結束後才結束以下是 ThreadSanitizer 輸出結果 ``` SUMMARY: ThreadSanitizer: data race ================== [Thread 0x7ffff39fc640 (LWP 178998) exited] [Thread 0x7ffff0c13640 (LWP 178999) exited] All threads completed. ThreadSanitizer: reported 4 warnings [Thread 0x7ffff51ff640 (LWP 178995) exited] [Inferior 1 (process 178992) exited with code 0102] ``` 這四個 data race 警告都在講同一個地方，只是是在不同執行緒遇到 ``` WARNING: ThreadSanitizer: data race (pid=178992) Read of size 8 at 0x7b0800002fb0 by thread T4: #0 list_search /home/bohan/linux2024/lockfree_circular_doubly_ll/lockfree.c:83 (test+0x161d) #1 list_insert /home/bohan/linux2024/lockfree_circular_doubly_ll/lockfree.c:107 (test+0x1817) #2 thread_func /home/bohan/linux2024/lockfree_circular_doubly_ll/test.c:13 (test+0x1dd1) ``` 上方所標記之處是在 `list_search` 中的 `t_next = t->next` ，用來走訪鏈節串列然而這就是這個演算法的關鍵所在 ```graphviz digraph G { rankdir=LR; node [shape=record, style=filled, fillcolor=white]; edge [arrowhead=none]; H [label="{H| }", shape=record, width=0.5, height=0.5]; n10 [label="{10|X}", shape=record, width=1]; n30 [label="{30| }", shape=record, width=1]; T [label="{T| }", shape=record, width=0.5, height=0.5]; H -> n10 [color=gray, arrowhead=none]; n10 -> n30 [color=red, arrowhead=normal]; n30 -> T [color=red, arrowhead=normal]; H -> n30 [color=red, arrowhead=normal, constraint=false]; } ``` :::danger 注意用語： * access 是「存取」，而非「訪問」(visit) ::: 如圖，當有一個執行緒在其他執行緒存取 10 的時候把它移除了，那個執行緒便會用 `get_unmarked_ref(t_next)` 去<s>訪問</s> 存取下一個節點 30 ，所以雖然它是 racing 但不影響使用 > commit:[32fbf52](https://github.com/wu81177/lockfree_circular_doubly_ll/commit/32fbf52260c0dde35a2903b3f4db038fd4f7b65e) ### 可能的改進方案在這個實作裡面 `head` 的 `next` 和 `prev` 分別會有存放 `INT_MIN` 和 `INT_MAX` 的節點，雖然這樣做可以簡化程式的邊界條件撰寫，但是每次走訪就要多經過一個節點，如果這個鏈節串列大部分的使用情況是只存放數個節點，那多經過那兩個節點的時間佔比就很可觀，也許以後可以修改成不用這兩個節點的版本另外就是我這次實作是把原本的單向鏈節串列修改為符合 Linux 風格的環狀雙向鏈結串列，但是並沒有發揮環狀和雙向的優勢，程式架構幾乎和原版一樣，也許可以做一些統計，像是紀錄鍵值平均，然後判斷要往 `next` 還是 `prev` 方向去走訪，如果節點操作都是在靠近中後段的的地方這樣可以節省大量走訪時間，充分利用雙向的特性