--- tags: linux2022 --- # 2022q1 Homework5 (quiz5) contributed by < `Xx-oX` > > [題目](https://hackmd.io/@sysprog/linux2022-quiz5) ## 測驗 `2` ### [Hazard Pointer](https://en.wikipedia.org/wiki/Hazard_pointer) 在並行程式設計中,當我們在存取共用的記憶體物件時,需要考慮到其他執行緒是否有可能也正在存取同一個物件,若要釋放該記憶體物件時,不考慮這個問題,會引發嚴重的後果,例如 [dangling pointer](https://www.wikiwand.com/en/Dangling_pointer)。 使用 mutex 是最簡單且直觀的方法:存取共用記憶體時,acquire lock 即可保證沒有其他執行緒正在存取同一物件,也就可安全地釋放記憶體。但若我們正在存取的是一種 lock-free 資料結構,當然就不能恣意地使用 lock,因為會違反 lock-free 特性,即無論任何執行緒失敗,其他執行緒都能可繼續執行。於是乎,我們需要某種同為 lock-free 的記憶體物件回收機制。 >[lock-free](https://hackmd.io/@sysprog/concurrency/https%3A%2F%2Fhackmd.io%2F%40sysprog%2Fconcurrency-lockfree): 以系統的觀點來看,只要執行足夠長的時間,至少會有一個執行緒會有進展 (progress),是一種性質 >lock-less: 不使用 mutex / semaphore 之類的 lock 以達到安全無誤地操作共用資料 #### [ABA Problem](https://en.wikipedia.org/wiki/ABA_problem) 任何使用 [compare and swap](https://en.wikipedia.org/wiki/Compare-and-swap) primitive 的 lock-free 資料結構都有可能碰上 ABA 問題。 ABA 問題源自於 cas 的 c (compare),只要被比較的值沒有改變,就會執行 swap 操作,但這個被比較的值可能已經被動過手腳。 舉個例子,設計一個 lock-free stack (改寫自 [lab](https://lumian2015.github.io/lockFreeProgramming/lock-free-stack.html)) ```c #include <pthread.h> #include <stdio.h> #include <stdlib.h> #define atomic_load(src) __atomic_load_n(src, __ATOMIC_SEQ_CST) #define atomic_cas(dst, expected, desired) \ __atomic_compare_exchange(dst, expected, desired, 0, __ATOMIC_SEQ_CST, \ __ATOMIC_SEQ_CST) struct node { int data; struct node *next; }; struct stack { struct node *top; } void push(struct stack *s, int val) { struct stack next; struct stack orig = atomic_load(s); struct node *new_el = malloc(sizeof(struct node)); do { new_el->data = val; new_el->next = orig.top; } while(!atomic_cas(s, &orig, next)); } int pop(struct stack *s) { struct stack next; struct stack orig = atomic_load(s); do{ if(orig.top == NULL) { return -1; } next.top = orig.top->next; } while(!atomic_cas(s, &orig, next)); int ret = orig.top.data; free(orig.top); return ret; } ``` 假設現在 stack 裡面資料如下 (其中 0x01, 0x02, 0x03 為記憶體位置),然後有兩個 thread T1, T2 ```graphviz digraph g{ rankdir = LR node[shape=record] top[label="<p>top|"] 1[label="<p>0x01|1"] 2[label="<p>0x02|2"] 3[label="<p>0x03|3"] top:p -> 1:p -> 2:p -> 3:p } ``` T1 執行 pop() ```c orig.top = 0x01 next.top = 0x02 ``` 被 T2 插隊, T2 執行 2 次 pop() stack 變成 ```graphviz digraph g{ rankdir = LR node[shape=record] top[label="<p>top|"] 3[label="<p>0x03|3"] top:p -> 3:p } ``` T2 再執行一次 push() stack 變成 ```graphviz digraph g{ rankdir = LR node[shape=record] top[label="<p>top|"] 1[label="<p>0x01|4"] 3[label="<p>0x03|3"] top:p -> 1:p -> 3:p } ``` 這邊假設 0x01 被重新使用了 如此一來,當 T2 結束, T1 繼續運行時執行剛剛被中斷的下一步 ```c atomic_cas(s, &orig, next) ``` 此時 orig->top 依舊是 0x01 compare 成立 ```c s->top == orig.top == 0x01 ``` 所以便將 s 的值 swap 成 next 的值 ```c s->top = next.top == 0x02 ``` 但是 0x02 已經被釋放了 (在 T2 pop 時),於是就導致了錯誤!! 解決的方法: * Tagged state reference: 用一個 tag 記錄 pointer 被更動的次數,或者記錄時間戳,以便比對,例如 java 中的 [AtomicStampedReference](https://docs.oracle.com/javase/7/docs/api/java/util/concurrent/atomic/AtomicStampedReference.html) * Intermediate nodes: 使用一些不代表任何 data 的中間節點來標記哪些節點是被刪除的,但 cost 太高 * Deferred reclamation: 利用 [Garbage collection](https://en.wikipedia.org/wiki/Garbage_collection_(computer_science)) 機制、 Hazard pointer 或 [RCU (read copy update)](https://en.wikipedia.org/wiki/Read-copy-update) 來解決 > defer: to delay something until a later time > defer to sb/sth: to allow someone or something to make decisions for you or tell you what to do because of your respect for them or because of their higher rank, authority, knowledge, etc. > 與 delay 的差別在於,defer 表示主動推遲某件事,而不是因為突發或不可控的狀況造成的延遲 對於 C 這樣缺乏內建 concurrent GC 機制的程式語言來說。Hazard pointer 是其中一種解決方案,其原理是讀取端執行緒對指標進行識別,指標 (特別是指向的記憶體區塊) 若要釋放時,會事先保存,延遲到確認沒有讀取端執行緒,才進行真正的釋放。Linux 核心的 [RCU 同步機制](https://hackmd.io/@sysprog/linux-rcu)是另一種 lock-free 程式設計演算法和記憶體回收機制。 Hazard pointer 可簡稱為 “HP”,其關鍵的結構有: * Hazard pointer: 儲存此 thread 正在存取的指標,因此該指標不能直接被釋放 * Retire list (也稱為 thread free list): 被這個 thread 要求釋放的指標,但實際尚未釋放 ![](https://i.imgur.com/P1Yh5AV.png) 要安全的釋放記憶體,其基本想法就是: * 每個執行緒放在 hazard pointer 中的指標尚不能被釋放 * 每個執行緒要求釋放的指標先放在 retire list 中 * 掃描 retire list 可以得知所有執行緒皆不使用的指標,則可將其真正的釋放給作業系統 ### 程式碼解釋 定義 atomic 操作,其中 cas 為 compare and swap 的縮寫 ```c /* shortcuts */ #define atomic_load(src) __atomic_load_n(src, __ATOMIC_SEQ_CST) #define atomic_store(dst, val) __atomic_store(dst, val, __ATOMIC_SEQ_CST) #define atomic_exchange(ptr, val) \ __atomic_exchange_n(ptr, val, __ATOMIC_SEQ_CST) #define atomic_cas(dst, expected, desired) \ __atomic_compare_exchange(dst, expected, desired, 0, __ATOMIC_SEQ_CST, \ __ATOMIC_SEQ_CST) ``` > `__ATOMIC_SEQ_CST` Enforces total ordering with all other `__ATOMIC_SEQ_CST` operations. 儲存 hazard pointer 跟 retire list 的 linked list 資料結構 ```c typedef struct __hp { uintptr_t ptr; struct __hp *next; } hp_t; ``` 管理 hazard pointer 的結構 ```c typedef struct { hp_t *pointers; hp_t *retired; void (*deallocator)(void *); } domain_t; ``` 測試用的資源 ```c typedef struct { unsigned int v1, v2, v3; } config_t; ``` 要填空的部分,cleanup 用來釋放已經在 retire list 中,但還沒被 deallocate 的資源。 flag: 為 0 或者 DEFER_DEALLOC (1) 若為 0 則等待無人使用指標指向的資源 為 DEFER_DEALLOC 時,則先不 deallocate ```c /* Forces the cleanup of old objects that have not been deallocated yet. Just * like `swap`, if `flags` is 0, this function will wait until there are no * more references to each object. If `flags` is `DEFER_DEALLOC`, only * objects that already have no living references will be deallocated. */ void cleanup(domain_t *dom, int flags) { hp_t *node; LIST_ITER (&dom->retired, node) { uintptr_t ptr = node->ptr; if (!ptr) continue; if (!list_contains(&dom->pointers, ptr)) { /* We can deallocate straight away */ if (EXP3) dom->deallocator((void *) ptr); } else if (!(flags & DEFER_DEALLOC)) { /* Spin until all readers are done, then deallocate */ while (list_contains(&dom->pointers, ptr)) ; if (EXP4) dom->deallocator((void *) ptr); } } } ``` `EXP3`: list_remove(&dom->retired, ptr) `EXP4`: list_remove(&dom->retired, ptr) 要先將 ptr 從 retired list 中移除,再進行 deallocate,故使用 list_remove list_remove: 將 linked list 中特定的節點移除,成功時回傳 true 使用 atomic_cas 以便在不被插隊的情況下將要移除的位址指向 nullptr ```c /* Remove a node from the list with the specified value */ bool list_remove(hp_t **head, uintptr_t ptr) { hp_t *node; const uintptr_t nullptr = 0; LIST_ITER (head, node) { uintptr_t expected = atomic_load(&node->ptr); if (expected == ptr && atomic_cas(&node->ptr, &expected, &nullptr)) return true; } return false; } ``` :::spoiler 完整程式碼 ```c /* shortcuts */ #define atomic_load(src) __atomic_load_n(src, __ATOMIC_SEQ_CST) #define atomic_store(dst, val) __atomic_store(dst, val, __ATOMIC_SEQ_CST) #define atomic_exchange(ptr, val) \ __atomic_exchange_n(ptr, val, __ATOMIC_SEQ_CST) #define atomic_cas(dst, expected, desired) \ __atomic_compare_exchange(dst, expected, desired, 0, __ATOMIC_SEQ_CST, \ __ATOMIC_SEQ_CST) #include <stdint.h> #define LIST_ITER(head, node) \ for (node = atomic_load(head); node; node = atomic_load(&node->next)) typedef struct __hp { uintptr_t ptr; struct __hp *next; } hp_t; #include <stdbool.h> #include <stdlib.h> #include <string.h> /* Allocate a new node with specified value and append to list */ static hp_t *list_append(hp_t **head, uintptr_t ptr) { hp_t *new = calloc(1, sizeof(hp_t)); if (!new) return NULL; new->ptr = ptr; hp_t *old = atomic_load(head); do { new->next = old; } while (!atomic_cas(head, &old, &new)); return new; } /* Attempt to find an empty node to store value, otherwise append a new node. * Returns the node containing the newly added value. */ hp_t *list_insert_or_append(hp_t **head, uintptr_t ptr) { hp_t *node; bool need_alloc = true; LIST_ITER (head, node) { uintptr_t expected = atomic_load(&node->ptr); if (expected == 0 && atomic_cas(&node->ptr, &expected, &ptr)) { need_alloc = false; break; } } if (need_alloc) node = list_append(head, ptr); return node; } /* Remove a node from the list with the specified value */ bool list_remove(hp_t **head, uintptr_t ptr) { hp_t *node; const uintptr_t nullptr = 0; LIST_ITER (head, node) { uintptr_t expected = atomic_load(&node->ptr); if (expected == ptr && atomic_cas(&node->ptr, &expected, &nullptr)) return true; } return false; } /* Returns 1 if the list currently contains an node with the specified value */ bool list_contains(hp_t **head, uintptr_t ptr) { hp_t *node; LIST_ITER (head, node) { if (atomic_load(&node->ptr) == ptr) return true; } return false; } /* Frees all the nodes in a list - NOT THREAD SAFE */ void list_free(hp_t **head) { hp_t *cur = *head; while (cur) { hp_t *old = cur; cur = cur->next; free(old); } } #define DEFER_DEALLOC 1 typedef struct { hp_t *pointers; hp_t *retired; void (*deallocator)(void *); } domain_t; /* Create a new domain on the heap */ domain_t *domain_new(void (*deallocator)(void *)) { domain_t *dom = calloc(1, sizeof(domain_t)); if (!dom) return NULL; dom->deallocator = deallocator; return dom; } /* Free a previously allocated domain */ void domain_free(domain_t *dom) { if (!dom) return; if (dom->pointers) list_free(&dom->pointers); if (dom->retired) list_free(&dom->retired); free(dom); } /* * Load a safe pointer to a shared object. This pointer must be passed to * `drop` once it is no longer needed. Returns 0 (NULL) on error. */ uintptr_t load(domain_t *dom, const uintptr_t *prot_ptr) { const uintptr_t nullptr = 0; while (1) { uintptr_t val = atomic_load(prot_ptr); hp_t *node = list_insert_or_append(&dom->pointers, val); if (!node) return 0; /* Hazard pointer inserted successfully */ if (atomic_load(prot_ptr) == val) return val; /* * This pointer is being retired by another thread - remove this hazard * pointer and try again. We first try to remove the hazard pointer we * just used. If someone else used it to drop the same pointer, we walk * the list. */ uintptr_t tmp = val; if (!atomic_cas(&node->ptr, &tmp, &nullptr)) list_remove(&dom->pointers, val); } } /* * Drop a safe pointer to a shared object. This pointer (`safe_val`) must have * come from `load` */ void drop(domain_t *dom, uintptr_t safe_val) { if (!list_remove(&dom->pointers, safe_val)) __builtin_unreachable(); } static void cleanup_ptr(domain_t *dom, uintptr_t ptr, int flags) { if (!list_contains(&dom->pointers, ptr)) { /* deallocate straight away */ dom->deallocator((void *) ptr); } else if (flags & DEFER_DEALLOC) { /* Defer deallocation for later */ list_insert_or_append(&dom->retired, ptr); } else { /* Spin until all readers are done, then deallocate */ while (list_contains(&dom->pointers, ptr)) ; dom->deallocator((void *) ptr); } } /* Swaps the contents of a shared pointer with a new pointer. The old value will * be deallocated by calling the `deallocator` function for the domain, provided * when `domain_new` was called. If `flags` is 0, this function will wait * until no more references to the old object are held in order to deallocate * it. If flags is `DEFER_DEALLOC`, the old object will only be deallocated * if there are already no references to it; otherwise the cleanup will be done * the next time `cleanup` is called. */ void swap(domain_t *dom, uintptr_t *prot_ptr, uintptr_t new_val, int flags) { const uintptr_t old_obj = atomic_exchange(prot_ptr, new_val); cleanup_ptr(dom, old_obj, flags); } /* Forces the cleanup of old objects that have not been deallocated yet. Just * like `swap`, if `flags` is 0, this function will wait until there are no * more references to each object. If `flags` is `DEFER_DEALLOC`, only * objects that already have no living references will be deallocated. */ void cleanup(domain_t *dom, int flags) { hp_t *node; LIST_ITER (&dom->retired, node) { uintptr_t ptr = node->ptr; if (!ptr) continue; if (!list_contains(&dom->pointers, ptr)) { /* We can deallocate straight away */ if (list_remove(&dom->retired, ptr)) dom->deallocator((void *) ptr); } else if (!(flags & DEFER_DEALLOC)) { /* Spin until all readers are done, then deallocate */ while (list_contains(&dom->pointers, ptr)) ; if (list_remove(&dom->retired, ptr)) dom->deallocator((void *) ptr); } } } #include <assert.h> #include <err.h> #include <pthread.h> #include <stdio.h> #define N_READERS 1 #define N_WRITERS 1 #define N_ITERS 20 #define ARRAY_SIZE(x) sizeof(x) / sizeof(*x) typedef struct { unsigned int v1, v2, v3; } config_t; static config_t *shared_config; static domain_t *config_dom; config_t *create_config() { config_t *out = calloc(1, sizeof(config_t)); if (!out) err(EXIT_FAILURE, "calloc"); return out; } void delete_config(config_t *conf) { assert(conf); free(conf); } static void print_config(const char *name, const config_t *conf) { printf("%s : { 0x%08x, 0x%08x, 0x%08x }\n", name, conf->v1, conf->v2, conf->v3); } void init() { shared_config = create_config(); config_dom = domain_new(delete_config); if (!config_dom) err(EXIT_FAILURE, "domain_new"); } void deinit() { delete_config(shared_config); domain_free(config_dom); } static void *reader_thread(void *arg) { (void) arg; for (int i = 0; i < N_ITERS; ++i) { config_t *safe_config = (config_t *) load(config_dom, (uintptr_t *) &shared_config); if (!safe_config) err(EXIT_FAILURE, "load"); print_config("read config ", safe_config); drop(config_dom, (uintptr_t) safe_config); } return NULL; } static void *writer_thread(void *arg) { (void) arg; for (int i = 0; i < N_ITERS / 2; ++i) { config_t *new_config = create_config(); new_config->v1 = rand(); new_config->v2 = rand(); new_config->v3 = rand(); print_config("updating config", new_config); swap(config_dom, (uintptr_t *) &shared_config, (uintptr_t) new_config, 0); print_config("updated config ", new_config); } return NULL; } int main() { pthread_t readers[N_READERS], writers[N_WRITERS]; init(); /* Start threads */ for (size_t i = 0; i < ARRAY_SIZE(readers); ++i) { if (pthread_create(readers + i, NULL, reader_thread, NULL)) warn("pthread_create"); } for (size_t i = 0; i < ARRAY_SIZE(writers); ++i) { if (pthread_create(writers + i, NULL, writer_thread, NULL)) warn("pthread_create"); } /* Wait for threads to finish */ for (size_t i = 0; i < ARRAY_SIZE(readers); ++i) { if (pthread_join(readers[i], NULL)) warn("pthread_join"); } for (size_t i = 0; i < ARRAY_SIZE(writers); ++i) { if (pthread_join(writers[i], NULL)) warn("pthread_join"); } deinit(); return EXIT_SUCCESS; } ``` ::: ### 無法通過 Thread Sanitizer #### 1 reader 1 writer :::spoiler ThreadSanitizer 訊息 ``` read config : { 0x00000000, 0x00000000, 0x00000000 } read config : { 0x00000000, 0x00000000, 0x00000000 } updating config : { 0x6b8b4567, 0x327b23c6, 0x643c9869 } ================== WARNING: ThreadSanitizer: data race (pid=14932) Write of size 8 at 0x7b0400000000 by thread T2: #0 free ../../../../src/libsanitizer/tsan/tsan_interceptors_posix.cpp:707 (libtsan.so.0+0x35f25) #1 delete_config <null> (test+0x13f2) #2 cleanup_ptr <null> (test+0x1df3) Previous read of size 4 at 0x7b0400000004 by thread T1: #0 print_config <null> (test+0x1445) #1 reader_thread <null> (test+0x15ee) Thread T2 (tid=14935, running) created by main thread at: #0 pthread_create ../../../../src/libsanitizer/tsan/tsan_interceptors_posix.cpp:962 (libtsan.so.0+0x5ea79) #1 main <null> (test+0x1858) Thread T1 (tid=14934, finished) created by main thread at: #0 pthread_create ../../../../src/libsanitizer/tsan/tsan_interceptors_posix.cpp:962 (libtsan.so.0+0x5ea79) #1 main <null> (test+0x1801) SUMMARY: ThreadSanitizer: data race (/home/ycwu4142/linux2022/quiz5/test+0x13f2) in delete_config ================== ================== WARNING: ThreadSanitizer: data race (pid=14932) Write of size 8 at 0x7b0400000008 by thread T2: #0 free ../../../../src/libsanitizer/tsan/tsan_interceptors_posix.cpp:707 (libtsan.so.0+0x35f25) #1 delete_config <null> (test+0x13f2) #2 cleanup_ptr <null> (test+0x1df3) Previous read of size 4 at 0x7b0400000008 by thread T1: #0 print_config <null> (test+0x142d) #1 reader_thread <null> (test+0x15ee) Thread T2 (tid=14935, running) created by main thread at: #0 pthread_create ../../../../src/libsanitizer/tsan/tsan_interceptors_posix.cpp:962 (libtsan.so.0+0x5ea79) #1 main <null> (test+0x1858) Thread T1 (tid=14934, finished) created by main thread at: #0 pthread_create ../../../../src/libsanitizer/tsan/tsan_interceptors_posix.cpp:962 (libtsan.so.0+0x5ea79) #1 main <null> (test+0x1801) SUMMARY: ThreadSanitizer: data race (/home/ycwu4142/linux2022/quiz5/test+0x13f2) in delete_config ================== updated config : { 0x6b8b4567, 0x327b23c6, 0x643c9869 } ThreadSanitizer: reported 2 warnings ``` ::: > 為了簡化,僅使用 2 個 iter 大意是說,T1 (reader thread) 在 print_config (讀取並印出) 時跟 T2 (writer thread) 執行的 delete_config 會有 data race 的情形 仔細查看,發現在 writer_thread 中 swap(將 shared_config 的值更新)時,會執行 cleanup_ptr 進而引發 delete_config 將舊的指標釋放掉 ```c void cleanup_ptr(domain_t *dom, uintptr_t ptr, int flags) { if (!list_contains(&dom->pointers, ptr)) { /* deallocate straight away */ dom->deallocator((void *) ptr); } else if (flags & DEFER_DEALLOC) { /* Defer deallocation for later */ list_insert_or_append(&dom->retired, ptr); } else { /* Spin until all readers are done, then deallocate */ while (list_contains(&dom->pointers, ptr)) ; dom->deallocator((void *) ptr); } } ``` 將傳入的 flags 改成 `DEFER_DEALLOC`,試圖讓 cleanup_ptr 將指標 (ptr) 丟到 retired list 而不是直接 deallocate,但發現結果沒變。 更進一步查看發現原來是進到了第一個 case - hazard pointer 裡面沒有這個指標 (ptr) :::warning 若是 writer 在 reader 讀取的過程中試圖 swap ,理論上該指標必然會在 hazard pointer 的串列之中,因為 reader 使用了 load 來讀取,而 load 會執行 insert_or_append ,將指標放進 hazard pointer 中,直到呼叫 drop 來解除。 ::: > 參考 [kdnvt](https://hackmd.io/@kdnvt/linux2022-quiz5#Memory-leak) 同學與 [bakudr18](https://hackmd.io/@bakudr18/r1Lbsu27q#%E9%87%8D%E6%96%B0%E5%AF%A6%E9%A9%97) 同學的共筆中的分析方式 ```c= reader | writer --------------------+------------------------ | /* swap */ | val = atomic_exchange(ptr,new); | /* cleanup_ptr */ | if(!list_contain(val)){ /* load */ | list_insert(val); | | /* load on ptr */ | if(ptr == val){ | return val; | } | | free(val); | } ``` 由於 reader 在第 9 行才將 val insert 到 hazard pointer 裡面,而 writer 在第 6 行檢查時就找不到 val,所以便直接釋放 val 另外從兩位同學的共筆中還發現第 14 行的 free 會將 reader 讀到的值 free 掉,巧合的是正好我的 cpu 就照這樣的順序跑... 以上的結果是我將 hp 相關程式碼分離出來,編譯成 .o 檔 (ELF 格式),再連結的結果 > 見 [code](https://github.com/Xx-oX/Quiz5-Hazard-Pointer) 但如果按直接塞在一個檔案中直接編譯則有機率跑出以下結果 ```c= read config : { 0x00000000, 0x00000000, 0x00000000 } read config : { 0x00000000, 0x00000000, 0x00000000 } read config : { 0x00000000, 0x00000000, 0x00000000 } read config : { 0x00000000, 0x00000000, 0x00000000 } read config : { 0x00000000, 0x00000000, 0x00000000 } read config : { 0x00000000, 0x00000000, 0x00000000 } read config : { 0x00000000, 0x00000000, 0x00000000 } read config : { 0x00000000, 0x00000000, 0x00000000 } read config : { 0x00000000, 0x00000000, 0x00000000 } read config : { 0x00000000, 0x00000000, 0x00000000 } read config : { 0x00000000, 0x00000000, 0x00000000 } read config : { 0x00000000, 0x00000000, 0x00000000 } read config : { 0x00000000, 0x00000000, 0x00000000 } read config : { 0x00000000, 0x00000000, 0x00000000 } updating config : { 0x6b8b4567, 0x327b23c6, 0x643c9869 } read config : { 0x00000000, 0x00000000, 0x00000000 } read config : { 0x6b8b4567, 0x327b23c6, 0x643c9869 } updated config : { 0x6b8b4567, 0x327b23c6, 0x643c9869 } updating config : { 0x66334873, 0x74b0dc51, 0x19495cff } read config : { 0x6b8b4567, 0x327b23c6, 0x643c9869 } read config : { 0x66334873, 0x74b0dc51, 0x19495cff } read config : { 0x66334873, 0x74b0dc51, 0x19495cff } updated config : { 0x66334873, 0x74b0dc51, 0x19495cff } updating config : { 0x2ae8944a, 0x625558ec, 0x238e1f29 } read config : { 0x66334873, 0x74b0dc51, 0x19495cff } updated config : { 0x2ae8944a, 0x625558ec, 0x238e1f29 } updating config : { 0x46e87ccd, 0x3d1b58ba, 0x507ed7ab } updated config : { 0x46e87ccd, 0x3d1b58ba, 0x507ed7ab } updating config : { 0x2eb141f2, 0x41b71efb, 0x79e2a9e3 } updated config : { 0x2eb141f2, 0x41b71efb, 0x79e2a9e3 } updating config : { 0x7545e146, 0x515f007c, 0x5bd062c2 } updated config : { 0x7545e146, 0x515f007c, 0x5bd062c2 } updating config : { 0x12200854, 0x4db127f8, 0x0216231b } updated config : { 0x12200854, 0x4db127f8, 0x0216231b } updating config : { 0x1f16e9e8, 0x1190cde7, 0x66ef438d } updated config : { 0x1f16e9e8, 0x1190cde7, 0x66ef438d } updating config : { 0x140e0f76, 0x3352255a, 0x109cf92e } updated config : { 0x140e0f76, 0x3352255a, 0x109cf92e } updating config : { 0x0ded7263, 0x7fdcc233, 0x1befd79f } updated config : { 0x0ded7263, 0x7fdcc233, 0x1befd79f } ``` 其中 15 ~ 18 行很明顯就有 racing 的狀況,但沒有被 thread sanitizer 抓出來 #### multi-readers multi-writers 當有多個 writer 時也會發生 data race :::spoiler Thread Sanitizer 訊息 ```c read config : { 0x00000000, 0x00000000, 0x00000000 } read config : { 0x00000000, 0x00000000, 0x00000000 } updating config : { 0x6b8b4567, 0x327b23c6, 0x643c9869 } updated config : { 0x6b8b4567, 0x327b23c6, 0x643c9869 } updating config : { 0x66334873, 0x74b0dc51, 0x19495cff } ================== WARNING: ThreadSanitizer: data race (pid=2670) Write of size 8 at 0x7b0400000810 by thread T3: #0 free <null> (libtsan.so.0+0x35f45) #1 delete_config <null> (a.out+0x1d7c) #2 cleanup_ptr <null> (a.out+0x1ac3) #3 swap <null> (a.out+0x1b82) #4 writer_thread <null> (a.out+0x2089) #5 <null> <null> (libtsan.so.0+0x2d1af) Previous read of size 4 at 0x7b0400000810 by thread T2: #0 print_config <null> (a.out+0x1de2) #1 writer_thread <null> (a.out+0x209c) #2 <null> <null> (libtsan.so.0+0x2d1af) Thread T3 (tid=2674, running) created by main thread at: #0 pthread_create <null> (libtsan.so.0+0x5ea99) #1 main <null> (a.out+0x217d) Thread T2 (tid=2673, finished) created by main thread at: #0 pthread_create <null> (libtsan.so.0+0x5ea99) #1 main <null> (a.out+0x217d) SUMMARY: ThreadSanitizer: data race (/lib/x86_64-linux-gnu/libtsan.so.0+0x35f45) in __interceptor_free ================== ================== WARNING: ThreadSanitizer: data race (pid=2670) Write of size 8 at 0x7b0400000818 by thread T3: #0 free <null> (libtsan.so.0+0x35f45) #1 delete_config <null> (a.out+0x1d7c) #2 cleanup_ptr <null> (a.out+0x1ac3) #3 swap <null> (a.out+0x1b82) #4 writer_thread <null> (a.out+0x2089) #5 <null> <null> (libtsan.so.0+0x2d1af) Previous read of size 4 at 0x7b0400000818 by thread T2: #0 print_config <null> (a.out+0x1db7) #1 writer_thread <null> (a.out+0x209c) #2 <null> <null> (libtsan.so.0+0x2d1af) Thread T3 (tid=2674, running) created by main thread at: #0 pthread_create <null> (libtsan.so.0+0x5ea99) #1 main <null> (a.out+0x217d) Thread T2 (tid=2673, finished) created by main thread at: #0 pthread_create <null> (libtsan.so.0+0x5ea99) #1 main <null> (a.out+0x217d) SUMMARY: ThreadSanitizer: data race (/lib/x86_64-linux-gnu/libtsan.so.0+0x35f45) in __interceptor_free ================== updated config : { 0x66334873, 0x74b0dc51, 0x19495cff } ThreadSanitizer: reported 2 warnings ``` ::: 可以看出 writer 跟 writer 間有 UAF (use after free) 的問題 解決方法:在 writer print_config 之前強行 load 一次 shared_config,確保待會印出的內容沒有被其他人釋放 ```c static void *writer_thread(void *arg) { (void) arg; for (int i = 0; i < N_ITERS / 2; ++i) { config_t *new_config = create_config(); new_config->v1 = rand(); new_config->v2 = rand(); new_config->v3 = rand(); print_config("updating config", new_config); swap(config_dom, (uintptr_t *) &shared_config, (uintptr_t) new_config, 0); config_t *safe_config = (config_t *) load(config_dom, (uintptr_t *) &shared_config); if (!safe_config) err(EXIT_FAILURE, "load"); print_config("updated config ", safe_config); drop(config_dom, (uintptr_t) safe_config); } return NULL; } ``` 注意 load 完要 drop,不然會一直占用資源 (ptr 持續留在 hp 中) ### 比較 RCU 與 HP RCU 適合用於 1 個 writer 多個 reader 的情形 利用 [訂閱-發布機制](https://hackmd.io/@sysprog/linux-rcu#%E8%A8%82%E9%96%B1%E2%80%94%E7%99%BC%E5%B8%83%E6%A9%9F%E5%88%B6) 來確保所有 reader 讀取到的資源都沒有被釋放,雖然內容可能不是最新的 (因此只能用於 [DNS resolution](https://nlp.stanford.edu/IR-book/html/htmledition/dns-resolution-1.html) 之類即使存取舊的資料,也不會影響最終行為正確性的場景)。 > 訂閱-發布機制: writer 更改資料然後發布,此時會進入「等待讀取端執行緒離開 critical section」等待所有訂閱該資料的 reader 都離開這個 critical section 後才釋放舊內容的記憶體 RCU 與 Hazard pointer 最大的區別在於 HP 保護的是一個 pointer 或者說一個記憶體區塊,而 RCU 則是保護一整個 critical section 整體來說 RCU 適用於 read-intensive workloads 而 HP 適用於 update rates 高的 workload > 參見 [Proposed RCU C++ API](http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2017/p0461r1.pdf) 也就是說 write/read 比率高 => HP write/read 比率低 => RCU > 取自 [對比其他 lock-free 同步機制](https://hackmd.io/@sysprog/linux-rcu#%E5%B0%8D%E6%AF%94%E5%85%B6%E4%BB%96-lock-free-%E5%90%8C%E6%AD%A5%E6%A9%9F%E5%88%B6) | | RCU| HP | | -------- | -------- | -------- | |Unreclaimed objects|Unbounded|Bounded| |Traversal forward progress|wait-free|lock-free| |Reclamation forward progress|blocking|wait-free| |Reference acquisition|unconditional|conditional| > * wait-free 是 non-blocking 之中強度最高的,它保證所有的指令可以 bound 在某個時間內完成,也就是所有的執行緒經過一段足夠的時間後,一定會有所進展 > * lock-free 稍微弱一點,它允許部份執行緒被暫停,但仍然保證整體程式的其他部份會有進展 最差的情況下,就算只有一個執行緒在執行,其他所有執行緒都在等待,仍然屬於 lock-free 在 [ABA 問題](https://en.wikipedia.org/wiki/ABA_problem) 中解決辦法的部分提到 > Hazard pointers are lock-free, but can only track at most a fixed number of elements per thread as being in-use. 不確定跟表格中的 Unreclaimed objects 是不是同一件事 我認為能追蹤的數量有限是因為 retire list 需要 clean up,而執行 clean up 的時機,通常是 retire list 中的內容到達一定數量 (就好比垃圾桶滿了 => 此時需要傾倒)。因此需要設置上限 (定義何謂「滿」) ,而這個滿就限制了 HP 能夠追蹤的量