2022q1 Homework1 (quiz1)

--- tags: linux2022 --- # 2022q1 Homework1 (quiz1) contributed by < `cwl0429` > > [測驗題目](https://hackmd.io/@sysprog/linux2022-quiz1) ## 測驗 `1` - 藉由 bitwise operation，定義 hash table 大小 ```c #define MAP_HASH_SIZE(bits) (1 << bits) ``` - 配置空間給 map 及 map->ht，建立一個 map ```c map_t *map_init(int bits) { map_t *map = malloc(sizeof(map_t)); if (!map) return NULL; map->bits = bits; map->ht = malloc(sizeof(struct hlist_head) * MAP_HASH_SIZE(map->bits)); if (map->ht) { for (int i = 0; i < MAP_HASH_SIZE(map->bits); i++) (map->ht)[i].first = NULL; } else { free(map); map = NULL; } return map; } ``` 記得將 map->ht 所有的 first 指標初始化，避免有存取安全疑慮 ```c for (int i = 0; i < MAP_HASH_SIZE(map->bits); i++) (map->ht)[i].first = NULL; ``` - 用 linux 風格，將 hlist_node 嵌在 hash_key 內，方便後續操作 ```c struct hash_key { int key; void *data; struct hlist_node node; }; ``` - 藉由 type 及當前 ptr 位址，回推出此 type 結構的起始位址 ```c #define container_of(ptr, type, member) \ ({ \ void *__mptr = (void *) (ptr); \ ((type *) (__mptr - offsetof(type, member))); \ }) ``` - 根據 hash table 的 bits 數量，將 hash 結果限縮在多少 bits 內 - 譬如 bits = 4，則 hash 範圍在 0~15 (0000~1111) ```c #define GOLDEN_RATIO_32 0x61C88647 static inline unsigned int hash(unsigned int val, unsigned int bits) { /* High bits are more random, so use them. */ return (val * GOLDEN_RATIO_32) >> (32 - bits); } ``` 為何使用 GOLDEN RATIO 作為 hash function? - 根據 [Fibonacci Hashing](https://probablydance.com/2018/06/16/fibonacci-hashing-the-optimization-that-the-world-forgot-or-a-better-alternative-to-integer-modulo/) 的說明，在作者設計的實驗中，fibonacci hashing 較傳統的 integer modulo 快了兩倍左右(在資料量大於 400000 時)，圖表如下: ![](https://i.imgur.com/eiKVFn4.png) 其中只有 ska::unordered_map 使用 fibonacci hashing - 先藉由 key 找到對應的 hlist_head，接著在此 head 不斷往後找，直到有 key 相等或找完所有 hlist ```c static struct hash_key *find_key(map_t *map, int key) { struct hlist_head *head = &(map->ht)[hash(key, map->bits)]; for (struct hlist_node *p = head->first; p; p = p->next) { struct hash_key *kn = container_of(p, struct hash_key, node); if (kn->key == key) return kn; } return NULL; } ``` - 取得 key 所在位址，若找到則傳該 key node ```c void *map_get(map_t *map, int key) { struct hash_key *kn = find_key(map, key); return kn ? kn->data : NULL; } ``` - 將 data 加入 map 中 ```c void map_add(map_t *map, int key, void *data) { struct hash_key *kn = find_key(map, key); if (kn) return; kn = malloc(sizeof(struct hash_key)); kn->key = key, kn->data = data; struct hlist_head *h = &map->ht[hash(key, map->bits)]; struct hlist_node *n = &kn->node, *first = h->first; n->next = first; // AAA; if (first) first->pprev = &n->next; h->first = n; n->pprev = &h->first; // BBB; ``` 原先的 code 缺少了 AAA 及 BBB 兩部分，以下是不加 AAA 及 BBB 的示意圖(圖表改自[開發紀錄(quiz1)](https://hackmd.io/@Risheng/linux2022-quiz1)，假設 hash key 為 n ): ```graphviz digraph G { rankdir = LR; node[shape = "record"] map[label = "map|bits|<h>ht"] ht[label = "Hash table|<m1>first_1|<m2>first_2|first_3|...|first_n-1|<mn>first_n"] f2_hk_1[label = "hash_key | key | data | <m>hlist_node | {pprev | <n>next}"] f2_hk_2[label = "hash_key | key | data | <m>hlist_node | {pprev | <n>next}"] f2_hk_3[label = "hash_key | key | data | <m>hlist_node | {pprev | <n>next}"] fn_hk_1[label = "hash_key | key | data | <m>hlist_node | {pprev | <n>next}"] fn_hk_2[label = "hash_key(new) | key | data | <m>hlist_node | {pprev | <n>next}"] NULL_1[shape = plaintext, label = "NULL"] NULL_2[shape = plaintext, label = "NULL"] map:h -> ht:m1 ht:m2 -> f2_hk_1:m f2_hk_1:p -> ht:m2 f2_hk_1:n -> f2_hk_2:m f2_hk_2:p -> f2_hk_1:n f2_hk_2:n -> f2_hk_3:m f2_hk_3:p -> f2_hk_2:n f2_hk_3:n -> NULL_1 ht:mn -> fn_hk_2:m fn_hk_1:p -> fn_hk_2:n fn_hk_1:n -> NULL_2 } ``` 可以看出，有兩條 link 沒有正確串接，分別是 n->next 和 n->pprev，於是將其補上，使其變成下面的圖表: ```graphviz digraph G { rankdir = LR; node[shape = "record"] map[label = "map|bits|<h>ht"] ht[label = "Hash table|<m1>first_1|<m2>first_2|first_3|...|first_n-1|<mn>first_n"] f2_hk_1[label = "hash_key | key | data | <m>hlist_node | {pprev | <n>next}"] f2_hk_2[label = "hash_key | key | data | <m>hlist_node | {pprev | <n>next}"] f2_hk_3[label = "hash_key | key | data | <m>hlist_node | {pprev | <n>next}"] fn_hk_1[label = "hash_key | key | data | <m>hlist_node | {pprev | <n>next}"] fn_hk_2[label = "hash_key(new) | key | data | <m>hlist_node | {pprev | <n>next}"] NULL_1[shape = plaintext, label = "NULL"] NULL_2[shape = plaintext, label = "NULL"] map:h -> ht:m1 ht:m2 -> f2_hk_1:m f2_hk_1:p -> ht:m2 f2_hk_1:n -> f2_hk_2:m f2_hk_2:p -> f2_hk_1:n f2_hk_2:n -> f2_hk_3:m f2_hk_3:p -> f2_hk_2:n f2_hk_3:n -> NULL_1 ht:mn -> fn_hk_2:m fn_hk_2:p -> ht:mn fn_hk_2:n -> fn_hk_1 fn_hk_1:p -> fn_hk_2:n fn_hk_1:n -> NULL_2 } ``` - 使 map 完整釋放空間 - 從 `map->ht[0]` 開始，先將此 slot 內的 hlist_node 逐一釋放，重複此動作，直到釋放完所有的 `map->ht[MAP_HASH_SIZE]` :::info 需要注意的是，這段 code 使用到 goto 技巧，用以跳過 16~20 行，但我目前還沒想到何時會有 unhash 的狀況發生 ::: ```c= void map_deinit(map_t *map) { if (!map) return; for (int i = 0; i < MAP_HASH_SIZE(map->bits); i++) { struct hlist_head *head = &map->ht[i]; for (struct hlist_node *p = head->first; p;) { struct hash_key *kn = container_of(p, struct hash_key, node); struct hlist_node *n = p; p = p->next; if (!n->pprev) /* unhashed */ goto bail; struct hlist_node *next = n->next, **pprev = n->pprev; *pprev = next; if (next) next->pprev = pprev; n->next = NULL, n->pprev = NULL; bail: free(kn->data); free(kn); } } free(map); } ``` - 找到二數相加後等於 sum 想法是不斷地將 nums 的數放入 map 中，直到 map_get 找到正確的數進行回傳 - 這裡也用到了 goto，目的是在 ret 無法被配置時，jump 至 bail 將配置完成的 map 釋放 ```c int *twoSum(int *nums, int numsSize, int target, int *returnSize) { map_t *map = map_init(10); *returnSize = 0; int *ret = malloc(sizeof(int) * 2); if (!ret) goto bail; for (int i = 0; i < numsSize; i++) { int *p = map_get(map, target - nums[i]); if (p) { /* found */ ret[0] = i, ret[1] = *p; *returnSize = 2; break; } p = malloc(sizeof(int)); *p = i; map_add(map, nums[i], p); } bail: map_deinit(map); return ret; } ``` :::spoiler TODO 研讀 Linux 核心原始程式碼 include/linux/hashtable.h 及對應的文件 How does the kernel implements Hashtables?，解釋 hash table 的設計和實作手法，並留意到 tools/include/linux/hash.h 的 GOLDEN_RATIO_PRIME，探討其實作考量 ::: ## 測驗 `2` - 目標是將重複的 node 移出 sorted list 觀察 `COND1`，此時需要進入 if 內做刪除重複 node 的動作，因此推斷 `COND1` 會需要 `head->val == head->next->val` 的條件，另外又需要考慮到若是 `head->next` 為 NULL 會產生問題，所以再將 `head->next` 加入判斷，於是得到結論 `COND1:` `head->next && head->val == head->next->val` 再觀察 `COND2` 會發現需要的條件和 `COND2` 相同，所以 `COND2:` `head->next && head->val == head->next->val` :::spoiler 完整程式碼 ```c #include <stddef.h> struct ListNode { int val; struct ListNode *next; }; struct ListNode *deleteDuplicates(struct ListNode *head) { if (!head) return NULL; if (COND1) { /* Remove all duplicate numbers */ while (COND2) head = head->next; return deleteDuplicates(head->next); } head->next = deleteDuplicates(head->next); return head; } ``` ::: ### 嘗試避免遞迴的版本利用指標的指標 `**tmp` 移除重複的 node，方便移除 node ```c= struct ListNode* deleteDuplicates(struct ListNode* head) { if(!head){ return NULL; } int curr = -111; for(struct ListNode **tmp = &head; *tmp ;) { if((*tmp)->next && (*tmp)->val == (*tmp)->next->val) { *tmp = (*tmp)->next; curr = (*tmp)->val; } if(curr == (*tmp)->val) *tmp = (*tmp)->next; else tmp = &(*tmp)->next; } return head; } ``` ### 用 linux 核心的 circular doubly-linked list 改寫 #### 遞迴版