2022q1 Homework5 (quiz5)

--- tags: linux2022 --- # 2022q1 Homework5 (quiz5) contributed by < `sternacht` > > [作業要求](https://hackmd.io/@sysprog/linux2022-homework5) ### 測驗 1 ### 測驗 2 #### 答案 EXP3 = `list_remove(&dom->retired, ptr)` EXP4 = `list_remove(&dom->retired, ptr)` [github](https://github.com/sternacht/quiz5-B) #### 延伸問題 - 解釋上述程式碼運作原理，指出可改進之處並實作本題所提到的 `hazard pointer` 是一種 lock-free 的資料結構，用來解決多執行緒下可能會發生的問題，如[ dangling pointer ](https://www.wikiwand.com/en/Dangling_pointer)。 `hazard pointer` 的概念是，允許多個 readers 以及一個 writer 在 lock-free 的情境下，reader thread 在讀取的時候會將 `hazard pointer` 標記起來，當 writer thread 要將 `hazard pointer` 刪除時不會直接將其刪除，而要確認沒有被其他 reader thread 標記的才能將其空間還給系統，下圖是每個 thread 中的 `hazard pointer` 與 `retire list` 結構 (圖源 :[ linux2022-quiz5#測驗-2 ](https://hackmd.io/@sysprog/linux2022-quiz5#測驗-2)) ![](https://i.imgur.com/X8i2Kej.png) 當 thread 在進行讀取的操作時，`hazard pointer` 指向讀取的位址，此時若有其他的 thread 要對同一個位址進行釋放的動作，就需要先遍尋 `hazard pointers` (`hazard pointers` 為 list 結構)，確認是否有相同的位址存在，如果存在就不能釋放。`retire list` 則對應到該 thread 預定要釋放的指標空間，同樣以 list 做串接，一定條件之後會對 list 裡的所有節點進行嘗試釋放的動作，也就是上述提到的遍尋 `hazard pointer`，滿足條件才會將其釋放，此 `retire list` 不需對其他 thread 同步，只有持有的 thread 自己才看的到。 ```c typedef struct __hp { uintptr_t ptr; struct __hp *next; } hp_t; typedef struct { hp_t *pointers; hp_t *retired; void (*deallocator)(void *); } domain_t; ``` `hazard pointer` 及 `retired list` 的基本結構中， `__hp` 是最小的單元，也就是一個節點，存有一個指向目標的指標以及另一個用來串接 linked list 的指標。 `domain_t`，則是一個 thread 中各持有一個的結構，包含 `hazard pointer` 以及 `retired list` ， `deallocator` 照其字面上的意思就是用來釋放空間的函式。 **針對 list 的操作** ```c /* Allocate a new node with specified value and append to list */ static hp_t *list_append(hp_t **head, uintptr_t ptr) { hp_t *new = calloc(1, sizeof(hp_t)); if (!new) return NULL; new->ptr = ptr; hp_t *old = atomic_load(head); do { new->next = old; } while (!atomic_cas(head, &old, &new)); return new; } /* Attempt to find an empty node to store value, otherwise append a new node. * Returns the node containing the newly added value. */ hp_t *list_insert_or_append(hp_t **head, uintptr_t ptr) { hp_t *node; bool need_alloc = true; LIST_ITER (head, node) { uintptr_t expected = atomic_load(&node->ptr); if (expected == 0 && atomic_cas(&node->ptr, &expected, &ptr)) { need_alloc = false; break; } } if (need_alloc) node = list_append(head, ptr); return node; } /* Remove a node from the list with the specified value */ bool list_remove(hp_t **head, uintptr_t ptr) { hp_t *node; const uintptr_t nullptr = 0; LIST_ITER (head, node) { uintptr_t expected = atomic_load(&node->ptr); if (expected == ptr && atomic_cas(&node->ptr, &expected, &nullptr)) return true; } return false; } /* Returns 1 if the list currently contains an node with the specified value */ bool list_contains(hp_t **head, uintptr_t ptr) { hp_t *node; LIST_ITER (head, node) { if (atomic_load(&node->ptr) == ptr) return true; } return false; } /* Frees all the nodes in a list - NOT THREAD SAFE */ void list_free(hp_t **head) { hp_t *cur = *head; while (cur) { hp_t *old = cur; cur = cur->next; free(old); } } ``` 針對 list 的操作共有四個，分別是 `insert_and_append`, `remove`, `contains`, `free`。 * `insert_and_append` 操作會有兩種不同的結果，一開始先遍尋 list，若當中有一個指標是空的，則該函式會作 CAS 操作，此為 insert，若沒有空指標的話則是作 append，向系統索取一塊新的空間來存放，返回值為新加入的節點。 * `remove` 逐一比對目標是否存在於 list 並將其移走，成功回傳 true，失敗回傳 false。 * `contains` 逐一比對目標是否存在 list 中，是則回傳 true，否則回傳 false。 * `free` 將給定的 list 整個釋放掉。 **針對 domain_t 的操作** ```c uintptr_t load(domain_t *dom, const uintptr_t *prot_ptr) { const uintptr_t nullptr = 0; while (1) { uintptr_t val = atomic_load(prot_ptr); hp_t *node = list_insert_or_append(&dom->pointers, val); if (!node) return 0; /* Hazard pointer inserted successfully */ if (atomic_load(prot_ptr) == val) return val; /* * This pointer is being retired by another thread - remove this hazard * pointer and try again. We first try to remove the hazard pointer we * just used. If someone else used it to drop the same pointer, we walk * the list. */ uintptr_t tmp = val; if (!atomic_cas(&node->ptr, &tmp, &nullptr)) list_remove(&dom->pointers, val); } } ``` `load` 對應到 reader thread 讀取某一個指標內容的操作，操作時的順序為 * 將目標地址的值讀到 val 變數中，並嘗試將目標地址之指標放入 `hazard pointer` 中，失敗則回傳 0 * 再一次確定目標地址中的值與 val 中的值相同，以確保過程中沒有其他執行緒對該地址的值進行寫入操作 * 若地址中的值被改變，則嘗試將 `hazard pointer` 中的值改寫回空指標，並把 val 從 `hazard pointer` 中刪除。 ```c void swap(domain_t *dom, uintptr_t *prot_ptr, uintptr_t new_val, int flags) { const uintptr_t old_obj = atomic_exchange(prot_ptr, new_val); cleanup_ptr(dom, old_obj, flags); } ``` `swap` 對應到 writer thread 寫入的動作，傳入的 new_val 會取代 old_val ，接著再將 old_val 空間釋放掉。 ```c static void cleanup_ptr(domain_t *dom, uintptr_t ptr, int flags) { if (!list_contains(&dom->pointers, ptr)) { /* deallocate straight away */ dom->deallocator((void *) ptr); } else if (flags & DEFER_DEALLOC) { /* Defer deallocation for later */ list_insert_or_append(&dom->retired, ptr); } else { /* Spin until all readers are done, then deallocate */ while (list_contains(&dom->pointers, ptr)) ; dom->deallocator((void *) ptr); } } ``` `cleanup_ptr` 嘗試將一個指標從 `hazard pointer` 中移除，並釋放其空間，或是先將該指標移到 `retired list` 中存放，待稍後再將其移除。若是 flags 沒有標注要放到 `retired list`，則函式會不斷嘗試將指標釋放直到成功為止。 ```c void cleanup(domain_t *dom, int flags) { hp_t *node; LIST_ITER (&dom->retired, node) { uintptr_t ptr = node->ptr; if (!ptr) continue; if (!list_contains(&dom->pointers, ptr)) { /* We can deallocate straight away */ if (EXP3) dom->deallocator((void *) ptr); } else if (!(flags & DEFER_DEALLOC)) { /* Spin until all readers are done, then deallocate */ while (list_contains(&dom->pointers, ptr)) ; if (EXP4) dom->deallocator((void *) ptr); } } } ``` `cleanup` 則是試圖將整個 `retired list` 刪除，首先要先確認要釋放的指標是否在 `hazard pointer` 內，若沒有則將該指標從 `retired list` 中刪除，並釋放空間。如果還有其他 `hazard pointer` 存有該指標，且 flags 沒有標注要暫緩釋放的動作，則函式會不斷等到可以釋放為止。 ```c void drop(domain_t *dom, uintptr_t safe_val) { if (!list_remove(&dom->pointers, safe_val)) __builtin_unreachable(); } ``` `drop` 的用途很簡單，就是將一個指標從 `hazard pointer` 中刪除。比較特別的是底下的 `__builtin_unreachable()`，其用途是告訴編譯器，在這之後的程式碼是不會執行到的( 執行此函式為未定義行為 )，其中一種比較常見的用法是接在結束程式的函式後頭，或是程式要跳躍到其他地方，而不會回來的時候，加上這個函式則可以避免觸發編譯器的警告，也節省編譯後的組合語言行數，底下是一個例子，在條件結束的函式後面分別加與不加 `__builtin_unreachable()` ，看看編譯後的結果為何。[參考](https://stackoverflow.com/questions/54764535/what-optimizations-does-builtin-unreachable-facilitate) ```c void exit_if_true(bool cond){ if(cond) exit(0); } void foo(bool x){ if(x){ exit_if_true(1); __builtin_unreachable(); printf("shall not come here"); } else return; } void foo2(bool x){ if(x){ exit_if_true(1); // __builtin_unreachable(); printf("shall not come here"); } else return; } ``` 關閉最佳化並輸出組合語言 `gcc -S -O0`，因 `exit_if_true` 傳入值必為真，因此開啟最佳化會自行刪除後面的程式碼 ```asm foo: pushq %rbp movq %rsp, %rbp subq $16, %rsp movl %edi, %eax movb %al, -4(%rbp) cmpb $0, -4(%rbp) je .L7 movl $1, %edi call exit_if_true .L7: nop leave ret .LC0: .string "shall not come here" foo2: pushq %rbp movq %rsp, %rbp subq $16, %rsp movl %edi, %eax movb %al, -4(%rbp) cmpb $0, -4(%rbp) je .L11 movl $1, %edi call exit_if_true movl $.LC0, %edi movl $0, %eax call printf jmp .L8 .L11: nop .L8: leave ret ``` 組合語言的前半部分是相同的，但在 `foo` 裡面，一旦執行到 `call exit_if_true` 之後程式就判斷是結束了，即使後面還有一個 `printf()` ，這就是 `__builin_unreachable()` 的效果，反觀 `foo2` ，即使執行到 `call exit_if_true` 之後程式仍沒有結束，並嘗試執行 `printf()` 輸出字串。 **可改進之處** 仔細看程式碼會發現 `cleanup` 從頭到尾都沒有被使用到，而在 `swap` 函式中，`flag` 參數也是設為 0 ，表示目前的實作並沒有用到 `retired list` ，若 reader 數量很多， writer 將會花費相當大量的時間在 cleanup_ptr 的自旋上。透過 `valgrind --tool=massif` 指令產生的檔案可以觀察到程式執行的指令數量(預設)，或是執行的時間，在有 100 個 reader ，並迭代(N_ITER) 100,000 次的情況下，使用 `retired list` 的版本僅花費 2 分鐘就完成，而原本的版本則在執行 10 分鐘之後仍未完成。 > 若只調整 `flags` ，兩種版本的執行速度都很快，約花費 1.5 秒，而透過輸出來觀察發現，很有可能是 writer 沒有平均分散到整段執行時間所導致的，因此在迭代的過程加上 `usleep(1)` ，使每個執行續的執行過程有稍微停頓，可以讓 writer 分散的更平均一些。在一開始的範例中， writer 只有一個，`retired list` 則是 public 的，因此處理 `retired list` 的步驟可以等到 `deinit` 再去執行，而不是在 writer thread 結束的時候。反之若是只在 writer thread 結束之前做處理，則在 writer 比任何 reader 更早完成的情況下，會產生 memory leak 的問題。 :::info `deinit` 中的 `domain_free` 也會嘗試對 `retired list` 做釋放的動作，但若不在此函式之前先做 `cleanup` ，則會產生 memory leak 的情況，why? ::: 但是這麼做會衍伸另一個問題，當我們把迭代次數(N_ITER)設的很高，且有一定數量的 reader 時(共享空間指標容易在 hazard pointers 裡)，會有較多的指標被放到 `retired list` 中等待在程式最後釋放，因此空間就被占據了。在論文 [Lock-Free Data Structures with Hazard Pointers](https://erdani.org/publications/cuj-2004-12.pdf) 中有提到 `retired list` 應該在甚麼時機做 `cleanup` 比較好，即 `retired list` 中的各數達到一個上限時，而這個上限的設定與 reader threads 的數量呈線性關係並略大於該數量，例如當 `retired list` 中的各數達到 reader 的 1.25 倍時就會觸發一次 `cleanup`，為此，我們需要一個額外的計數器來記錄 `retired list` 中目前有多少個待釋放的節點，並在 cleanup 一次之後重新計算剩餘的節點數，改動的部分如下 ```c /* Compute the size of list */ uint32_t list_size(hp_t *head) { if (!head) return 0; uint32_t c = 0; hp_t *node = head; while(node){ if(node->ptr) c++; node = node->next; } return c; } typedef struct { hp_t *pointers; hp_t *retired; uint32_t r_count; void (*deallocator)(void *); } domain_t; static void cleanup_ptr(domain_t *dom, uintptr_t ptr, int flags) { if (!list_contains(&dom->pointers, ptr)) { /* deallocate straight away */ dom->deallocator((void *) ptr); } else if (flags & DEFER_DEALLOC) { /* Defer deallocation for later */ list_insert_or_append(&dom->retired, ptr); dom->r_count += 1; } else { /* Spin until all readers are done, then deallocate */ while (list_contains(&dom->pointers, ptr)) ; dom->deallocator((void *) ptr); } } ... // in 'writer thread' function swap(config_dom, (uintptr_t *) &shared_config, (uintptr_t) new_config, 1); print_config("----updated config ", new_config); if (config_dom->r_count > r_limit){ cleanup(config_dom, 1); config_dom->r_count = list_size(config_dom->retired); } ... ``` 接著同樣用 `valgrind` 來測試執行所需的時間以及記憶體使用量，以下分別是三種不同方式的結果 * **在每次迭代時都做一次 cleanup** ``` -------------------------------------------------------------------------------- n time(ms) total(B) useful-heap(B) extra-heap(B) stacks(B) -------------------------------------------------------------------------------- 55 113,229 28,488 27,600 888 0 56 114,315 28,488 27,600 888 0 57 115,402 28,488 27,600 888 0 58 116,487 28,488 27,600 888 0 59 117,580 28,184 27,316 868 0 ``` * **只在最後做 cleanup** ``` -------------------------------------------------------------------------------- n time(ms) total(B) useful-heap(B) extra-heap(B) stacks(B) -------------------------------------------------------------------------------- 80 112,131 33,600 30,580 3,020 0 81 113,058 33,600 30,580 3,020 0 82 113,985 33,600 30,580 3,020 0 83 114,912 33,600 30,580 3,020 0 84 116,460 30,728 29,012 1,716 0 ``` * **retired list 達一定數量後做 cleanup** ``` -------------------------------------------------------------------------------- n time(ms) total(B) useful-heap(B) extra-heap(B) stacks(B) -------------------------------------------------------------------------------- 49 101,150 30,008 28,540 1,468 0 ``` > 測試參數 > N_READER = 100 > N_WRITER = 1 > N_ITER = 100,000 > r_limit = 125 (N_READER * 1.25) 在記憶體使用上如預期的，第一種方式最少，第二種最多，而第三種則介於兩者之間，而在執行時間方面，第三種方法相較前面兩者得到了約 14% 的進步。 #### 上述程式碼開啟多個 reader 時，無法通過 ThreadSanitizer，請指出具體問題並克服根據 [ThreadSanitizer](https://clang.llvm.org/docs/ThreadSanitizer.html) 網站內容的方式編譯程式，在多個 reader 的情況下並沒有任何錯誤，但是在調整成兩個 writer 的時候就會報錯了，錯誤的原因是 data race，而問題乍看之下是兩個 writer 之間共用同一個 `retired list` 及變數 `r_count` ，因此先改寫 `retired list` 及 `r_count` 成只有持有的 thread 才能讀取。改寫重點主要是將 `retired list`, `r_count` 獨立為另一個 type ，只有 writer 持有， writer 開始執行時建立，結束時清空 `retired list` 釋放其空間，並調整相關函式的傳入參數使用等，部分程式碼如下 ```c typedef struct { hp_t *retired; uint32_t r_count; } wconfig_t; /* Free wconfig */ void wconfig_free(wconfig_t *wcfig) { if (!wcfig) return; if(wcfig->retired) list_free(&wcfig->retired); free(wcfig); } static void cleanup_ptr(domain_t *dom, wconfig_t *wcfig, uintptr_t ptr, int flags) { if (!list_contains(&dom->pointers, ptr)) { /* deallocate straight away */ dom->deallocator((void *) ptr); } else if (flags & DEFER_DEALLOC) { /* Defer deallocation for later */ list_insert_or_append(&wcfig->retired, ptr); wcfig->r_count += 1; } else { /* Spin until all readers are done, then deallocate */ while (list_contains(&dom->pointers, ptr)) ; dom->deallocator((void *) ptr); } } static void *writer_thread(void *arg) { (void) arg; wconfig_t *wconfig = create_wconfig(); for (int i = 0; i < N_ITERS; ++i) { config_t *new_config = create_config(); new_config->v1 = rand(); new_config->v2 = rand(); new_config->v3 = rand(); // print_config("----updating config", new_config); swap(config_dom, wconfig, (uintptr_t *) &shared_config, (uintptr_t) new_config, 1); // print_config("----updated config ", new_config); if (wconfig->r_count > r_limit){ cleanup(config_dom, wconfig, 1); wconfig->r_count = list_size(wconfig->retired); } usleep(1); } cleanup(config_dom, wconfig, 0); wconfig_free(wconfig); return NULL; } ``` 在這裡有一個地方需要注意， writer thread 結束之前要將自己的 `retired list` 全部清空，而這會受到 reader 的影響，一但有 reader 還在讀取， `retired list` 就無法全部釋放，因此最後一個 `cleanup` 的 `flags` 需要設為 0 ，以確保 writer 結束前能全部清空。但這樣做會造成一個問題，當 reader 比 writer 晚結束， writer 的 spinlock 會佔據大量的運算，導致 reader 的進行緩慢，因此我在 spinlock 的過程中同樣使用了 `usleep()`，讓 reader 有機會繼續執行。 ```c void cleanup(domain_t *dom, wconfig_t *wcfig, int flags) { hp_t *node; LIST_ITER (&wcfig->retired, node) { uintptr_t ptr = node->ptr; if (!ptr) continue; if (!list_contains(&dom->pointers, ptr)) { /* We can deallocate straight away */ if (list_remove(&wcfig->retired, ptr)) dom->deallocator((void *) ptr); } else if (!(flags & DEFER_DEALLOC)) { /* Spin until all readers are done, then deallocate */ while (list_contains(&dom->pointers, ptr)){ usleep(10); }; if (list_remove(&wcfig->retired, ptr)) dom->deallocator((void *) ptr); } } } ``` 此時重新編譯並執行並不會有 data race 報錯，表示先前的問題確實是在於共用 `retired list` 及 `r_count` 上然而，問題到此還沒結束，先前為了減少執行時間而將輸出內容的 `print_config` 函式暫時註解掉，而若重新使用這些函式，即會看到 data race 的問題又重新出現。觀察這次的錯訊息，發現問題還是出在兩個 writer 之間，當其中一個 writer 寫入完畢之後，會執行第二個 `print_config` 函式，若在輸出的同時另一個 writer 也去更改共用空間，並替換、釋放掉前一個 writer 寫入的內容，此時 `print_config` 就會出現問題。追根究柢，這個問題的原因是 print 是相當花費時間的一個函式，因此在 print 的過程中被其他 thread 搶占並更動到要使用的記憶體空間，造成錯誤。記得在課堂上有聽到老師提到過，類似的問題有一解法，就是在輸出之前先複製一個備份，並將第二次輸出改成輸出備份的內容，如此一來即使在輸出的時候記憶體空間被更改或釋放，都不影響輸出結果。於是再對 writer thread 作一些改動 ```c static void *writer_thread(void *arg) { (void) arg; wconfig_t *wconfig = create_wconfig(); config_t *backup = create_config(); for (int i = 0; i < N_ITERS; ++i) { config_t *new_config = create_config(); new_config->v1 = rand(); new_config->v2 = rand(); new_config->v3 = rand(); *backup = *new_config; print_config("----updating config", new_config); swap(config_dom, wconfig, (uintptr_t *) &shared_config, (uintptr_t) new_config, 1); print_config("----updated config ", backup); if (wconfig->r_count > r_limit){ cleanup(config_dom, wconfig, 1); wconfig->r_count = list_size(wconfig->retired); } usleep(1); } cleanup(config_dom, wconfig, 0); wconfig_free(wconfig); free(backup); return NULL; } ``` ### 〈[Lock-Free Data Structures with Hazard Pointers](https://erdani.org/publications/cuj-2004-12.pdf)〉論文閱讀問題 - reader 數量不固定 (或 reader 持有多個 hazard pointers) 時，writer 該如何調整釋放時機，且在實際應用環境下，該如何確定 reader 數量 - hazard pointer 是否能運作在不同形式的資料結構中，如讀寫 linked-list - 論文中提到 hazard pointer 不怕遇到 ABA problem ，為何? > m could have been removed from and installed in pMap_ one or more times during that interval, and that doesn’t matter. What matters is that at the time of the second read, m is certainly not removed (because pMap_ points to it) and at that point the reader’s hazard pointer already points to it. - update (swap) 也能是 wait-free 嗎? :::warning 論文的意思並非「不怕遇到 ABA problem」，注意看前後文。 :notes: jserv ::: > 重新閱讀該部分的論文，整理一下思緒，發現論文的意思與 ABA problem 並無關係，而是指在第二次讀取 `ptr` 之前，不論 `ptr` 被更改過多少次都會因為第二次檢查而被擋下來並重新執行，只有在 `m` 不被釋放，以及 hazard pointer 被設定的前提下，才會成功。 ### 從 lock-free 到 wait-free #### lock-free hazard pointer 測驗2 中的 hazard pointer 實作實際上是一個 lock-free 的機制，而非 wait-free 的原因是在 reader thread 的 load 。 ```c uintptr_t load(domain_t *dom, const uintptr_t *prot_ptr) { const uintptr_t nullptr = 0; while (1) { ... if (!atomic_cas(&node->ptr, &tmp, &nullptr)) list_remove(&dom->pointers, val); } } ``` load 在完成讀取並設定好 hazard pointer 之後，會再一次的對原本讀取的指標 `p` 內的值做一次比較，目的是避免在 **讀取** 到 **設定 hazard pointer** 之間被搶佔，並且 `p` 被釋放，導致 reader 後續使用時發生錯誤。而這樣的保護機制並不保證能夠在 constant time 裡完成，假設有一個 writer 一直頻繁的寫入新的值並釋放舊指標，那 load 即很有可能會長時間的困在 while 迴圈內。 ```c void swap(domain_t *dom, uintptr_t *prot_ptr, uintptr_t new_val, int flags) { // In multiple-writer situation, CAS replace exchange here const uintptr_t old_obj = atomic_exchange(prot_ptr, new_val); cleanup_ptr(dom, old_obj, flags); } ``` 同理在 multi-writer 的情況中， writer 也可能會在寫入的過程中與其他 writer 發生衝突，也就是在同一時間對同一個共享物件進行寫入，先完成的 writer 會成功，後完成的則會失敗。在部分實作中不會嘗試對失敗的寫入進行重新寫入，但如果有，則 writer 也並非 wait-free。 > 測驗 2 原本的程式碼適用於 single-writer-multiple-reader 的環境，因此不會有 writers 衝突的情況發生，用 atomic-exchange 是可行的 #### trap 機制假設 writer 不會嘗試寫入直到成功，或是在 single-writer 的環境中，那 writer 就算是 wait-free 了，至於 reader 該如何成為 wait-free，〈[Practical Lock-Free and Wait-Free LL/SC/VL Implementations Using 64-Bit CAS](http://www.cs.tau.ac.il/~afek/Maged64bit-disc-2004.pdf)〉給出了一種途徑，也就是 trap 機制。首先先簡述 trap 的使用方式 - reader : reader 會在嘗試設定 `hazard pointer` 失敗一次之後設置一個 trap，並重新設定一次 `hazard pointer`， seq (sequence number) 是物件內一個變數。接著檢查剛剛設置的 trap 中是否有符合條件的指標，若沒有，則表示該物件在重新設定的過程中沒有被變更，因此可以安全的讀取物件內的值，反之則表示內容被變更，但我們可以從 trap 內取得一個符合條件且尚未被釋放的指標來讀取其中的值，並把舊的指標設為 NULL 。最後，返回之前再把 trap 給釋放掉，以防其他指標落入 trap 中。 - writer : writer 在寫入時，先取一塊新的區塊 b 來存放要寫入的值，接著讀取變數 seq，該變數用於計數該物件有幾次成功的寫入，當前為 $seq_{th}$ ；同時將 b 的 seq 加一，表示當 b 成功寫入時，是為 $seq+1_{th}$ 更新。用在 trap 上，就只會捕捉大於等於自身 seq 的物件，表示只有在**更新過 seq 次**之後的物件才對 trap 是有效的，目的是防止讀取到舊的值。接著 writer 利用 CAS 嘗試將新的值寫入，一旦失敗，整個寫入就算是結束了，而成功的時後則會對目前所有的 trap 做遍尋，確定所更動的物件是否有相對應的 trap。 ![](https://i.imgur.com/PijFZA0.png) 圖片截取自論文 p.8 接著講解各項參數與實作細節 - 一個 trap 由 `SetTrap` 函式建立，並設定其各項參數。$Var$ 推測是指向一個物件的指標，表示這個 trap 只對該物件生效；$Seq$ 的用途在前面已經提過；$Captured$ 用來標記目前 trap 的狀態，以及捕捉到的物件之指標；$traphp_p$ 本質上是 `hazard pointer` ，用途上也相同，保證直到 `hazard pointer` 被釋放為止，其所指向的空間不會被釋放；$Active$ 用來標記 trap 是否被啟用，在這個函式中被寫為 true。 $tag_p$ 被用在 $Captured$ 及 $traphp_p$ 的初始化之中，其值為 `non-pointer value` ，對 64-bits 電腦來說，一次取值的大小為 64-bits，也就是 8 bytes ，在位址上要做到 alignment ，就必須為 8 的倍數。因此只要知道位址的最後三位，就可以得知是否為 `non-pointer value` ， (詳細關於 data alignment，可參考 [你所不知道的 C 語言：記憶體管理、對齊及硬體特性](https://hackmd.io/@sysprog/c-memory?type=view#data-alignment)) 。當 $Captured$ 內的值為 `non-pointer value` 時，表示 trap 尚未捕捉到任何物件，且尚未被釋放。 > 論文中有提到 tag 會逐漸向上加，同時因為 `non-pointer value` 範圍極大因此不需要額外考慮 `wrap-around` ，`wrap-around` 如何影響 trap 運作? - trap 透過 `ReleaseTrap` 清除掉舊的資訊，但並不一定是釋放空間，而是等待下一次的使用，過程中先將 $Active$ 寫為 false ，此一條件也是後續 ScanTraps 判斷的第一個依據；接著將 $Captured$ 及 $traphp_p$ 寫入 NULL，表示 trap 已經被釋放掉了。 - `GetCapturedBlock` 取出 $Captured$ 中的值，若是 `non-pointer value` ，表示 trap 未捕捉到任何物件，此時回傳 NULL 給 reader。若是一個 `pointer value`，則直接回傳。 - `ScanTrap` 在流程上會逐一對每個 trap 檢查 $Active$ 是否為 true 以及 $Captured$ 中的值，確認 trap 的狀態是否還在運作，並且尚未捕捉到任何物件。接著讀取 trap 中的參數，並在 list 中尋找是否有符合條件的物件，若有則嘗試將找到的物件之指標寫入 $Captured$ 及 $traphp_p$ ，寫入失敗或是沒找到時，就換下一個 trap，最後結束前呼叫 `RetireNode(list)` 。 `ScanTrap` 在前述應用範例中，是每一次 writer 寫入成功就做一次，但這樣做的缺點是時間花費與寫入的成功次數呈線性關係，意味著寫入越多就越浪費時間。因此在論文中採取另一種方式，先累積一定數量因寫入成功被交換下來的舊物件，再一口氣做 `ScanTrap`。 ![](https://i.imgur.com/NaJ3kEG.png) 圖片截取自論文 p.11 ### Userspace RCU 版本 mrsw :::warning 考慮到接下來會比較 HP 和 RCU，原本的程式碼需要調整，才能藉由條件編譯或執行時期的設定，讓這二種 memory reclamation 方式存在於同一份實作中。 > [liburcu](https://liburcu.org/) :notes: jserv ::: RCU 版本中所用的函式庫是從上方老師提供的 [liburcu](https://liburcu.org/) 編譯並使用，使用前需要編譯函式庫，並在編譯程式的同時將函式庫連結上才能正常編譯，並且作者有提供簡單的 [URCU API](https://github.com/urcu/userspace-rcu/blob/master/doc/rcu-api.md) 供參考。 RCU (Read-Copy-Update) , 相較於對 reader 友善的 rwlock 或對 writer 友善的 seqlock， RCU 比較有 HP 樣子，都嘗試在不影響 reader 讀取的情況下去修改共享空間，並尋找合適的時機將修改的資訊放回，再釋放舊的指標。由於是使用現成的函式庫撰寫，因此大部分的函式都不會使用到而不須更改，只在 `reader_thread`, `writer_thread` 及 `init`, `deinit` 部份作調整。reader 在讀取共享空間的時間段稱為 critical section (簡稱CS)，在進入 CS 前後需要用 `rcu_read_lock()`, `rcu_read_unlock()` 來標記，而使用到此兩函式的 thread 則需要在使用前呼叫 `rcu_register_thread()` 來告訴系統接下來要使用 RCU ，並且會進入 CS ，並在離開 thread 之前呼叫 `rcu_unregister_thread()`。 `writer_thread` 需要用 spin lock 來保證同一段時間內只會有一個 writer 在寫入，為此這裡選擇 linux 內的 `pthread_spin_lock()`，並且將 `lock` 宣告為全域變數，讓全部的 writer 都能讀取到 `lock`。寫入的過程用 `pthread_spin_lock()`, `pthread_spin_unlock()` 包起來，並且令一個新的指標來除存稍後要釋放的指標，這個動作也要包含在 spin lock 內，以保證不會有兩個指標存到相同的值，造成重複釋放的問題。最後在釋放前呼叫 `synchronize_rcu()` 等待所有在 CS 內的 reader 離開，再釋放指標。 init() 及 deinit() 則調整為負責 spinlock 的初始化等相關工作 ```c void init() { shared_config = create_config(); pthread_spin_init(&lock, PTHREAD_PROCESS_PRIVATE); } void deinit() { delete_config(shared_config); pthread_spin_destroy(&lock); } uintptr_t rcu_load(const uintptr_t *prot_ptr) { rcu_read_lock(); uintptr_t val = atomic_load(prot_ptr); rcu_read_unlock(); return val; } static void *reader_thread(void *arg) { int N_ITERS = *(int *)arg; rcu_register_thread(); for (int i = 0; i < N_ITERS; ++i) { config_t *safe_config = (config_t *)rcu_load((uintptr_t *)&shared_config); if (!safe_config) err(EXIT_FAILURE, "load"); print_config("read config ", safe_config); } rcu_unregister_thread(); return NULL; } static void *writer_thread(void *arg) { int N_ITERS = *(int *)arg; config_t *cloned_config = create_config(); for (int i = 0; i < N_ITERS / 2; ++i) { config_t *new_config = create_config(); new_config->v1 = rand(); new_config->v2 = rand(); new_config->v3 = rand(); *cloned_config = *new_config; print_config("updating config", new_config); pthread_spin_lock(&lock); config_t *old_config = shared_config; rcu_assign_pointer(shared_config, new_config); pthread_spin_unlock(&lock); print_config("updated config ", cloned_config); synchronize_rcu(); delete_config(old_config); } delete_config(cloned_config); return NULL; } ``` > 釋放舊指標時不可直接釋放 `shared_config` ，而要 assign 到一個新的指標來做釋放，雖然是很基本的概念，但意外的是執行時期沒有任何報錯，直到用 valgrind 作檢查時才發現。接下來利用條件編譯，可以將兩份實做合併做同一份，以下展示條件編譯的結構 ```c #define mode 1 #if mode == 0 typedef struct __hp { uintptr_t ptr; struct __hp *next; } hp_t; ... // hp function, reader, writer implementation #else #include <urcu.h> uintptr_t rcu_load(const uintptr_t *prot_ptr) { rcu_read_lock(); uintptr_t val = atomic_load(prot_ptr); rcu_read_unlock(); return val; } ... // rcu function, reader, writer implementation #endif int main(int argc, void *argv[]) { ... } ``` ### HP object 的重用 :question: 考慮到原論文的 `active` 成員，現有的程式碼要作哪些修改，才可重用 HP object? 考慮最簡單的情況， hazard pointer 一旦結束工作就被釋放掉，在多個 reader 大量 read 的狀況，預期將會花費大量的時間在 HP object 的 allocate 與 free。參考論文〈[Lock-Free Data Structures with Hazard Pointers](https://erdani.org/publications/cuj-2004-12.pdf)〉中的做法，我們可以在 HP structure 中加入一個參數 `active`，用以表示該 HP object 是否正被使用來取代 free。既然多了一個參數要考慮，現有的程式碼勢必要做調整，以下分成 reader 與 writer 兩部分討論 - **Reader** 前述提過 `active` 用以表示 HP 是否被使用，因此最主要的概念就是使用時必須使 `active` 為 true，並在使用結束後 `active` 為 false。對應使用 HP 的操作分別為 `load` 裡的 `list_insert_or_append` 及 `list_remove` ，前者根據 HP list 的狀態選擇 insert 或 append 一個 HP object，後者將 HP object 從 list 中移除。 - **Writer** writer 對 HP object 的操作則比較少，僅有在 `cleanup_ptr` 及 `cleanup` 兩個函式中，需要透過 `list_contain` 來確認要釋放的目標是否還被 HP 保護。在加上 `active` 之後，還需要進一步確認該 HP object 是否正被使用。 ```c typedef struct __hp { bool active; uintptr_t ptr; struct __hp *next; } hp_t; /* Allocate a new node with specified value and append to list */ static hp_t *list_append(hp_t **head, uintptr_t ptr) { hp_t *new = calloc(1, sizeof(hp_t)); if (!new) return NULL; new->active = true; new->ptr = ptr; hp_t *old = atomic_load(head); do { new->next = old; } while (!atomic_cas(head, &old, &new)); return new; } /* Attempt to find an empty node to store value, otherwise append a new node. * Returns the node containing the newly added value. */ hp_t *list_insert_or_append(hp_t **head, uintptr_t ptr) { hp_t *node; bool need_alloc = true; const bool act_true = 1; const bool act_false = 0; LIST_ITER (head, node) { uintptr_t expected = atomic_load(&node->ptr); if (expected == 0 && atomic_cas(&node->active, &act_false, &act_true)) { atomic_cas(&node->ptr, &expected, &ptr); need_alloc = false; break; } } if (need_alloc) node = list_append(head, ptr); return node; } /* Remove a node from the list with the specified value */ bool list_remove(hp_t **head, uintptr_t ptr) { hp_t *node; const uintptr_t nullptr = 0; const bool act_true = 1; const bool act_false = 0; LIST_ITER (head, node) { uintptr_t expected = atomic_load(&node->ptr); if (expected == ptr && atomic_cas(&node->active, &act_true, &act_false)){ atomic_cas(&node->ptr, &expected, &nullptr); return true; } } return false; } /* Returns 1 if the list currently contains an node with the specified value */ bool list_contains(hp_t **head, uintptr_t ptr) { hp_t *node; LIST_ITER (head, node) { if (atomic_load(node->active) && atomic_load(&node->ptr) == ptr) return true; } return false; } ``` 包含 `active` 成員的程式碼如上方的範例，但執行時仍有錯誤尚待完善，推測原因是因為加入、移出 hp_list 都要同時對 `active` 及 `ptr` 做修改，在沒有適當的規劃下，可能有指令 reorder 或 data race 發生，導致結果不如預期。更簡潔的作法可參考 [quiz5-B](https://hackmd.io/@sysprog/linux2022-quiz5#測驗-2) 裡面的實作，捨棄 `active` 成員，只用 `ptr` 是否為 null 來判斷，簡化程式碼節省執行時間，同時將需要做 CAS 的數量降低也使程式碼能正確運行 :::warning 只用 CAS 更新全域的 HP 物件和 retire list，會有嚴重的效能疑慮，於是可用 thread-local storage (TLS) 來加速，避免頻繁更新全域物件的操作。可參見 Folly 的實作: https://github.com/facebook/folly/blob/main/folly/synchronization/Hazptr.h :notes: jserv :::

Syntax	Example	Reference
# Header	Header	基本排版
- Unordered List	Unordered List
1. Ordered List	Ordered List
- [ ] Todo List	Todo List
> Blockquote	Blockquote
Bold font	Bold font
Italics font	Italics font
~~Strikethrough~~	~~Strikethrough~~
19^th^	19^th
H~2~O	H₂O
++Inserted text++	Inserted text
==Marked text==	Marked text
[link text](https:// "title")	Link
![image alt](https:// "title")	Image
`Code`	`Code`	在筆記中貼入程式碼
```javascript var i = 0; ```	`var i = 0;`
:smile:		Emoji list
{%youtube youtube_id %}	Externals
$L^aT_eX$	L^aT_eX
:::info This is a alert area. :::	This is a alert area.