[2021 年暑期 Linux 核心](https://hackmd.io/@sysprog/linux2021-summer/) 第 2 週測驗題

# [2021 年暑期 Linux 核心](https://hackmd.io/@sysprog/linux2021-summer/) 第 2 週測驗題 ###### tags: `linux2021` :::info 目的: 檢驗學員對 [lock-free 程式設計](https://hackmd.io/@sysprog/concurrency)的認知 ::: ==[作答表單](https://docs.google.com/forms/d/e/1FAIpQLSdIEDLk5ZNlPiaUZP_k_xqkF6x3FXtEZd3Cyz7iGG1lQcbwCA/viewform)== 在並行程式設計中，當我們在存取共用的記憶體物件時，需要考慮到其他執行緒是否有可能也正在存取同一個物件，若要釋放該記憶體物件時，不考慮這個問題，會引發嚴重的後果，例如 [dangling pointer](https://www.wikiwand.com/en/Dangling_pointer)。使用 mutex 是最簡單且直觀的方法：存取共用記憶體時，acquire lock 即可保證沒有其他執行緒正在存取同一物件，也就可安全地釋放記憶體。但若我們正在存取的是一種 lock-free 資料結構，當然就不能恣意地使用 lock，因為會違反 lock-free 特性，即無論任何執行緒失敗，其他執行緒都能可繼續執行。於是乎，我們需要某種同為 lock-free 的記憶體物件回收機制。對於 C 這樣缺乏內建 concurrent GC 機制的程式語言來說，若要實作 lock-free 演算法，就要自行處理記憶體釋放的議題。[Hazard pointer](https://en.wikipedia.org/wiki/Hazard_pointer) 是其中一種解決方案，其原理是讀取端執行緒對指標進行識別，指標 (特別是指向的記憶體區塊) 若要釋放時，會事先保存，延遲到確認沒有讀取端執行緒，才進行真正的釋放。Linux 核心的 [RCU 同步機制](https://hackmd.io/@sysprog/linux-rcu)是另一種 lock-free 程式設計演算法和記憶體回收機制。 :::info "[hazard](https://dictionary.cambridge.org/zht/%E8%A9%9E%E5%85%B8/%E8%8B%B1%E8%AA%9E-%E6%BC%A2%E8%AA%9E-%E7%B9%81%E9%AB%94/hazard)" 一詞多用來指「危險物、危害物」，與 "danger" 的主要區別: 1. hazard 是可數名詞 2. "hazard" 不是立刻會發生的危險，而是可能會帶來危險的「隱患」，例如 a health hazard (一個健康隱患) 和 a safety hazard (一個[行為或功能的]安全隱患) ::: 〈[Lock-Free Data Structures with Hazard Pointers](https://erdani.org/publications/cuj-2004-12.pdf)〉寫道: > Each reader thread owns a single-writer/multi-reader shared pointer called [Hazard pointer](https://en.wikipedia.org/wiki/Hazard_pointer). When a reader thread assigns the address of a map to its hazard pointer, it is basically announcing to other threads (writers), "I am reading this map. You can replace it if you want, but don’t change its contents and certainly keep your deleteing hands off it." [Hazard pointer](https://en.wikipedia.org/wiki/Hazard_pointer) 可簡稱為 "HP"，其關鍵的結構有: * Hazard pointer * retire list (也稱為 thread free list) ![](https://i.imgur.com/XtihewZ.png) Hazard pointer 指一個讀取端執行緒若要使用一個指標時，就會建立一個Hazard pointer 以包裝這個指標。一個 Hazard pointer 會被一個執行緒寫入，被多個執行緒讀取: ```cpp struct HazardPointer { void *real_ptr; /* 包裝的指標 */ }; void func() { HazardPointer *hp = accquire(_real_ptr); ... // use _real_ptr release(hp); } ``` retire list 則是指每個執行緒都該有的 list，保存著將要釋放的指標清單，該清單僅對應到執行緒的讀寫操作: ```cpp void defer_free(void *ptr) { _free_list.push_back(ptr); } ``` 當某個執行緒嘗試釋放 Free List 中的指標時，如指標 `ptr`，就檢查所有其他執行緒使用的 Hazard pointer，檢查是否存在包裝 `ptr` 的 Hazard pointer，若無，則說明沒有讀取端執行緒正在使用 `ptr`，可安全地釋放 `ptr`。 ```cpp void gc() { for (ptr in _free_list) { conflict = false for (hp in _all_hazard_pointers) { if (hp->_real_ptr == ptr) { confilict = true; break; } } if (!conflict) free(ptr); } } ``` CAS: [You Can Do Any Kind of Atomic Read-Modify-Write Operation](https://preshing.com/20150402/you-can-do-any-kind-of-atomic-read-modify-write-operation/) 實作 lock-free 演算法時，無可避免用到 atomic 操作，伴隨而生的 [ABA 問題](https://en.wikipedia.org/wiki/ABA_problem): > 進行 CAS 操作時，因在更改 V 之前，CAS 主要詢問「`V` 的值是否仍然為 `A`？」，所以在第一次讀取 `V` 後及對 `V` 執行 CAS 操作之前，若將值從 `A` 改為 `B`，然後再改回 `A`，會讓運用 CAS 的程式碼混淆。 [Hazard pointer](https://en.wikipedia.org/wiki/Hazard_pointer) 提到一個 ABA 案例: > 在一個 lock-free 的 stack 實作中，若要 pop 時，stack 的內容為 $[A, B, C]$，head 指向 stack 頂部，隨即執行 ```cpp compare_and_swap(target = &head, newvalue = B, expected = A) ``` > 但在這個操作中，其他執行緒將 `A` 和 `B` 都 pop，且刪除 `B`，又把 `A` push 到 stack 中，即 $[A, C]$。於是前一個執行緒的 `compare_and_swap` 能夠成功，此時 `head` 指向一個已被刪除的 `B` gist: [list.c](https://gist.github.com/jserv/6e1f76b494f9157d9e37b4ad275d05f4) 是個使用 [Hazard pointer](https://en.wikipedia.org/wiki/Hazard_pointer) 的並行 linked list 實作。編譯方式: ```shell $ gcc -Wall -o list list.c -lpthread -g -fsanitize=thread ``` 預期執行輸出: ```shell $ ./list inserts = 4098, deletes = 4098 ``` 假設記憶體配置都會成功，且已知上述程式碼在執行時期不會遇到 assertion 或者 ThreadSanitizer 的錯誤。請補完程式碼，使得執行結果符合預期。 ==作答區== RRR = ? NNN = ? FFF = ? DDD = ? :::success 延伸問題: 1. 解釋上述程式碼運作原理 2. 指出改進空間並著手實作 3. 對比 [rcu_list](https://github.com/sysprog21/concurrent-programs/tree/master/rcu_list)，解釋同為 lock-free 演算法，跟上述 [Hazard pointer](https://en.wikipedia.org/wiki/Hazard_pointer) 手法有何異同？能否指出 [rcu_list](https://github.com/sysprog21/concurrent-programs/tree/master/rcu_list) 實作缺陷並改進？ :::