並行程式設計: Reference Counting

--- tags: concurrency --- # [並行程式設計](https://hackmd.io/@sysprog/concurrency): Reference Counting > 貢獻者: csm1735, jserv ## Reference counting 〈[並行和多執行緒程式設計](https://hackmd.io/@sysprog/concurrency)〉和〈[Linux 核心模組運作原理](https://hackmd.io/@sysprog/linux-kernel-module)〉提及 [reference counting](https://en.wikipedia.org/wiki/Reference_counting)，[refcnt](https://github.com/sysprog21/concurrent-programs/tree/master/refcnt) 嘗試以 C11 Atomics 實作，參考執行輸出: ``` hread 1632626432, 0: Hello, world! Thread 1632626432, 1: Hello, world! Thread 1632626432, 2: Hello, world! Thread 1632626432, 3: Hello, world! ... Thread 1103886080, 97: Hello, world! Thread 1103886080, 98: Hello, world! Thread 1103886080, 99: Hello, world! ``` 其中 `1103886080` 是執行緒的編號 (thread ID)。一旦 `REFCNT_TRACE` 事先定義，程式輸出如下: ``` ... main.c:141:(test_thread) refcnt_t_unref(str2)main.c:138:(test_thread) refcnt_t_ref(str)Thread 1174378240, 97: Hello, world! main.c:141:(test_thread) refcnt_t_unref(str2)main.c:138:(test_thread) refcnt_t_ref(str)Thread 1174378240, 98: Hello, world! main.c:141:(test_thread) refcnt_t_unref(str2)main.c:138:(test_thread) refcnt_t_ref(str)Thread 1174378240, 99: Hello, world! ``` ## 程式碼運作原理 ```c #define maybe_unused __attribute__((unused)) ``` 先定義 `maybe_unused` ，其作用為針對可能不會使用到的部份，消除編譯時的警告。 ```c typedef struct { #ifdef REFCNT_CHECK int magic; #endif atomic_uint refcount; char data[]; } refcnt_t; ``` 宣告了一個名為 `refcnt_t` 的 struct，可見其中除了 `data` 外，還有一個 atomic_uint 的變數 `refcount` ，用來以 atomic 的型態儲存 reference counting ，而如果程式中有定義 `REFCNT_CHECK` ， struct 中還會再多一個 `magic` 的變數，用途為檢查傳入 `refcnt_ref` 和 `refcnt_unref` 的 reference 是否由 `refcnt_malloc` 或 `refcnt_strdup` 所建立。 ```c static maybe_unused void *refcnt_malloc(size_t len) { refcnt_t *ref = malloc(sizeof(refcnt_t) + len); if (!ref) return NULL; #ifdef REFCNT_CHECK ref->magic = REFCNT_MAGIC; #endif atomic_init(&ref->refcount, 1); return ref->data; } ``` 先透過 malloc 配置大小為 `sizeof(refcnt_t) + len` 的記憶體空間，如果配置失敗則直接回傳 NULL ，如果有定義 `REFCNT_CHECK` 的話，則將 `ref->magic` 設為事先定義好的魔術數 `REFCNT_MAGIC` ，接著再透過 atomic_init 將代表 reference counting 的 `ref->refcount` 初始化為 1，最後會回傳 `ref->data`。 ```c static maybe_unused void *refcnt_realloc(void *ptr, size_t len) { refcnt_t *ref = (void *) (ptr - offsetof(refcnt_t, data)); #ifdef REFCNT_CHECK assert(ref->magic == REFCNT_MAGIC); #endif ref = realloc(ref, sizeof(refcnt_t) + len); if (!ref) return NULL; return ref->data; } ``` 使用 offsetof 計算 `refcnt_t` 的成員 `data` 距離結構起始位址的偏移量，以透過 `(ptr - offsetof(refcnt_t, data))` 取得結構起始位址，如果有定義 `REFCNT_CHECK` 的話，則會去檢查 `ref->magic` ，接著透過 realloc 重新調整 `ref` 所指向的記憶體空間的大小為 `sizeof(refcnt_t) + len` ，如果配置失敗則回傳 NULL ，最後會回傳 `ref->data`。 ```c static maybe_unused void *refcnt_ref(void *ptr) { refcnt_t *ref = (void *) (ptr - offsetof(refcnt_t, data)); #ifdef REFCNT_CHECK assert(ref->magic == REFCNT_MAGIC && "Invalid refcnt pointer"); #endif atomic_fetch_add(&ref->refcount, 1); return ref->data; } ``` 使用 offsetof 計算 `refcnt_t` 的成員 `data` 距離結構起始位址的偏移量，以透過 `(ptr - offsetof(refcnt_t, data))` 取得結構起始位址，如果有定義 `REFCNT_CHECK` 的話，則會去檢查 `ref->magic` ，接著再透過 atomic_fetch_add 去將 `ref->refcount` 做加一的動作，最後會回傳 `ref->data`。 ```c static maybe_unused void refcnt_unref(void *ptr) { refcnt_t *ref = (void *) (ptr - offsetof(refcnt_t, data)); #ifdef REFCNT_CHECK assert(ref->magic == REFCNT_MAGIC && "Invalid refcnt pointer"); #endif if (atomic_fetch_sub(&ref->refcount, 1) == 1) free(ref); } ``` 使用 offsetof 計算 `refcnt_t` 的成員 `data` 距離結構起始位址的偏移量，以透過 `(ptr - offsetof(refcnt_t, data))` 取得結構起始位址，如果有定義 `REFCNT_CHECK` 的話，則會去檢查 `ref->magic` ，接著再透過 `atomic_fetch_sub` 去將 `ref->refcount` 做減一的動作，如果 `ref->refcount` 的值在減一之前為 1 ，即做完減一的動作後會變成 0 的話，則將 `ref` 做 free 的動作。 ```c static maybe_unused char *refcnt_strdup(char *str) { refcnt_t *ref = malloc(sizeof(refcnt_t) + strlen(str) + 1); if (!ref) return NULL; #ifdef REFCNT_CHECK ref->magic = REFCNT_MAGIC; #endif atomic_init(&ref->refcount, 1); strcpy(ref->data, str); return ref->data; } ``` 先透過 malloc 配置大小為 `sizeof(refcnt_t) + strlen(str) + 1` 的記憶體空間，如果配置失敗則直接回傳 NULL ，如果有定義 `REFCNT_CHECK` 的話，則將 `ref->magic` 設為事先定義好的魔術數 `REFCNT_MAGIC` ，接著再透過 atomic_init 將代表 reference counting 的 `ref->refcount` 初始化為 1，再使用 strcpy 將字串 `str` 複製到 `ref->data`，最後會回傳 `ref->data`。 ```c #define N_ITERATIONS 100 static void *test_thread(void *arg) { char *str = arg; for (int i = 0; i < N_ITERATIONS; i++) { char *str2 = refcnt_ref(str); fprintf(stderr, "Thread %u, %i: %s\n", (unsigned int) pthread_self(), i, str2); refcnt_unref(str2); } refcnt_unref(str); return NULL; } #define N_THREADS 64 int main(int argc, char **argv) { /* Create threads */ pthread_t threads[N_THREADS]; /* Create a new string that is count referenced */ char *str = refcnt_strdup("Hello, world!"); /* Start the threads, passing a new counted copy of the referece */ for (int i = 0; i < N_THREADS; i++) pthread_create(&threads[i], NULL, test_thread, refcnt_ref(str)); /* We no longer own the reference */ refcnt_unref(str); /* Whichever thread finishes last will free the string */ for (int i = 0; i < N_THREADS; i++) pthread_join(threads[i], NULL); void *ptr = malloc(100); /* This should cause a heap overflow while checking the magic num which the * sanitizer checks. * Leaving commented out for now */ // refcnt_ref(ptr); free(ptr); return 0; } ``` `main` 中使用 `pthread_create` 來建立 `N_THREADS` 個執行緒，而新建立的執行緒會去調用 `test_thread` 來開始執行，`refcnt_ref(str)` 則是作為 `test_thread` 的參數傳遞。在 `test_thread` 中可發現每次做 fprintf 前都會做 `refcnt_ref` 的動作，在fprintf 完後再做 `refcnt_unref` 的動作，而在 fprintf 中可發現有使用到 `pthread_self()` ，其功能為取得執行緒自身的 ID 。此外， `main` 中也使用了 `pthread_join` ，其作用為去等待指定的執行緒執行完畢，如果沒有去使用 `pthread_join` 可能會造成我們所建立的執行緒沒有執行完的問題。 ### 在 Linux 核心原始程式碼中找出相關 [reference counting](https://en.wikipedia.org/wiki/Reference_counting) 的程式碼在取得 Linux 核心原始程式碼後，可用以下命令去搜尋包含關鍵字 reference count 的 commit。 ```bash $ git log --grep="reference count" ``` 可發現在 [commit ddaf098](https://github.com/torvalds/linux/commit/ddaf098ea779b3c1302c7843f6ff01e89b1fd380) 提及 reference count ，而相關的程式碼可以在 [drivers/base/class.c](https://github.com/torvalds/linux/blob/master/drivers/base/class.c) 及 [include/linux/device/class.h](https://github.com/torvalds/linux/blob/master/include/linux/device/class.h) 裡面查看。根據註解及程式碼推測，`subsys_get` 的功能與 reference count 的 INCREMENT 有關，而 `subsys_put` 的功能則與 reference count 的 DECREMENT 有關。在 [lxr](https://elixir.bootlin.com/linux/latest/source) 搜尋 `subsys_get` ```c static inline struct subsys_private *subsys_get(struct subsys_private *sp) { if (sp) kset_get(&sp->subsys); return sp; } ``` 在 `subsys_get` 中呼叫 `kset_get` ```c static inline struct kset *kset_get(struct kset *k) { return k ? to_kset(kobject_get(&k->kobj)) : NULL; } ``` 在 `kset_get` 中呼叫 `kobject_get` ```c struct kobject *kobject_get(struct kobject *kobj) { if (kobj) { if (!kobj->state_initialized) WARN(1, KERN_WARNING "kobject: '%s' (%p): is not initialized, yet kobject_get() is being called.\n", kobject_name(kobj), kobj); kref_get(&kobj->kref); } return kobj; } ``` 在 `kobject_get` 中呼叫 `kref_get` ```c /** * kref_get - increment refcount for object. * @kref: object. */ static inline void kref_get(struct kref *kref) { refcount_inc(&kref->refcount); } ``` 在 `kref_get` 中會呼叫 `refcount_inc` 去做 reference count 的 INCREMENT。由此可見，`subsys_get` 確實與增加 reference count 有所關聯。接著搜尋 `subsys_put` ```c static inline void subsys_put(struct subsys_private *sp) { if (sp) kset_put(&sp->subsys); } ``` 在 `subsys_put` 中呼叫 `kset_put` ```c static inline void kset_put(struct kset *k) { kobject_put(&k->kobj); } ``` 在 `kset_put` 中呼叫 `kobject_put` ```c void kobject_put(struct kobject *kobj) { if (kobj) { if (!kobj->state_initialized) WARN(1, KERN_WARNING "kobject: '%s' (%p): is not initialized, yet kobject_put() is being called.\n", kobject_name(kobj), kobj); kref_put(&kobj->kref, kobject_release); } } ``` 在 `kobject_put` 中呼叫 `kref_put` ```c /** * kref_put - decrement refcount for object. * @kref: object. * @release: pointer to the function that will clean up the object when the * last reference to the object is released. * This pointer is required, and it is not acceptable to pass kfree * in as this function. * * Decrement the refcount, and if 0, call release(). * Return 1 if the object was removed, otherwise return 0. Beware, if this * function returns 0, you still can not count on the kref from remaining in * memory. Only use the return value if you want to see if the kref is now * gone, not present. */ static inline int kref_put(struct kref *kref, void (*release)(struct kref *kref)) { if (refcount_dec_and_test(&kref->refcount)) { release(kref); return 1; } return 0; } ``` `kref_put` 中會去做 reference count 的 DECREMENT，而當最後的 reference 被釋放之後，該物件也會被釋放。由此可見，`subsys_put` 也確實與減少 reference count 有所關聯。在 [drivers/base/class.c](https://github.com/torvalds/linux/blob/master/drivers/base/class.c) 中 `class_to_subsys` 的功能是將 `struct class` 轉換為 `struct subsys_private`，做這樣的轉換是因為 driver core 內部是需要處理 `struct subsys_private`，不是外部的 `struct class`，而在找到匹配的 class 後就會 return 與該 class 相關的內部結構 subsys_private，如果沒有找到匹配的 class 則會 return NULL，而在不為 NULL 的時候，就會呼叫 `subsys_get` 去做 reference count 的 INCREMENT，因此在使用完畢後必須呼叫 `subsys_put()` 才能正確地去做釋放的動作。而 [commit ddaf098](https://github.com/torvalds/linux/commit/ddaf098ea779b3c1302c7843f6ff01e89b1fd380) 所修正的問題就是當呼叫 `class_dev_iter_init` 函式去初始化 `class_dev_iter` 的時候使用了 `class_to_subsys` 去取得 `subsys_private` 結構並使其 reference count 增加，但當 `class_dev_iter` 使用完畢後，卻沒有將 `subsys_private` 的 reference count 做減少的動作，由於這個缺失，漸漸地就會造成 memory leak 的問題。 ### 為何 Linux 核心的同步機制不依賴 [reference counting](https://en.wikipedia.org/wiki/Reference_counting)？參閱 [RCU 同步機制](https://hackmd.io/@sysprog/linux-rcu)，RCU 與 reference counting 的對比如下 | | [Reference Counting](https://en.wikipedia.org/wiki/Reference_counting) | RCU |:----------------:|:------------------:|:---:| | Unreclaimed objects | Bounded | Unbounded | | Contention among readers | High | None | | Traversal forward progress | lock-free | wait-free | | Reclamation forward progress | lock-free | blocking | | Traversal speed | atomic | no-overhead | | | Reference acquisition | unconditional | unconditional | | Automatic reclamation | yes | no | | | Purpose of domains | N/A | isolate slow reader | 首先，reference counting 的主要問題是 atomic。多個執行緒可能同時增加或減少 reference counting，這就需要使用 atomic operation 來確保計數的一致性。然而，atomic operation 是相對高成本的操作，特別是當多個執行緒同時競爭一個資源時。在高度並行的情況下，可能會導致性能下降。再來，reference counting 可能會在多個 readers 的情況下導致高度競爭，如果多個執行緒同時增加或減少引用計數，可能會產生競爭，導致計數結果不正確，而為了解決這個問題，需要使用其他同步機制，但也會使得整體的複雜性增加。另外，當 reference counting 存在 reference cycles (an object which refers directly or indirectly to itself) 的情況時，會導致計數保持非零，無法正確釋放資源。這種情況需要特殊的機制來檢測和處理，以確保資源能夠正確地釋放，但也會增加 reference counting 的成本和複雜性。基於以上這些性能、競爭、和資源釋放等方面的考慮，這也是為什麼 Linux 核心的同步機制不依賴 reference counting。