2022q1 Homework1 (quiz1)

--- tags: linux2022 --- # 2022q1 Homework1 (quiz1) contributed by < `sternacht` > > [作業要求](https://hackmd.io/@sysprog/B166rc3Jq) ### 測驗1 [LeetCode 1. Two Sum](https://leetcode.com/problems/two-sum/) #### 答案 AAA = `(c)` `n->next = first` BBB = `(a)` `n->pprev = &h->first` #### 解析先了解這道題想做甚麼，題目出自 LeetCode 1. Two Sum ，要求從給定的一個整數陣列 `nums` 找到一組兩個數相加能等於目標 `target` ，已知該陣列中一定存在且只存在一組解，要求回傳該解之兩個值得 index ，例如 `nums` 為 [2, 7, 11, 15] ， `target` 為 9 ，很明顯 2 + 7 = 9 是答案，也就是回傳應為 [0, 1] 。最直接的方法就是逐個去尋找兩兩相加等於 `target` 的數，但在這種方法下的時間複雜度為 $O(n^2)$ ，因此改用建立 hash table 的方式，以空間換時間達到縮短執行時間的目標，具體的方式會在接下來的 [延伸題目](https://hackmd.io/tPI7uafoQ1G2t5Bv9vsxEg#延伸題目) 中解釋，簡單的概念是假設有一個足夠好的 hash function ，我們可以利用其建立一個 hash table ，結構長得像這樣 ![](https://i.imgur.com/BtQLtvJ.png) 在程式中會利用 `nums` 裡的值透過 hash function 得到 key ，找到其所屬的 `hlist_head[key]` ，並建立一個新的 hlist_node 插入 hlist 中做儲存。如果要確認一個值 n 是否為組成 target 的一部分，則要對 `target - n` 做 hash 得到 key 值，並查找 `hlist_head[key]` 中是否有相對應的 `target - n` 存在，若否，則依據前述把 n 存進 hlist 。由於好的 hash function 可以平均分散資料所屬的 hlist ，因此只要 key 的範圍足夠，就能使每個 hlist 中的節點數足夠少，進而讓遍尋任意 hlist 的時間趨近於 $O(1)$ ，且整個程式的時間複雜度為 $O(n)$ 。接著回過頭來看看這兩行題目是出現在程式碼的何處 ```c void map_add(map_t *map, int key, void *data) { struct hash_key *kn = find_key(map, key); if (kn) return; kn = malloc(sizeof(struct hash_key)); kn->key = key, kn->data = data; struct hlist_head *h = &map->ht[hash(key, map->bits)]; struct hlist_node *n = &kn->node, *first = h->first; AAA; if (first) first->pprev = &n->next; h->first = n; BBB; } ``` 由函式名稱、內容可以大概看出這一段程式碼是要將一個值放入其所屬的 hlist 中，且從前後幾行程式碼可以看出是 AAA, BBB 是指標的更新操作，倒數第二行也暗示這是 insert head 的形式。在 AAA 的選項中， `(B) n->pprev = first` 、 `(D) n->pprev = n` 是不合理的，因為 pprev 是 "指標的指標" 型態，但賦予的型態卻是單純的指標，除了看型態之外，如果能夠知道這裡是 insert head 形式的話，也可以判斷出 `(B)` 、 `(D)` 的指向是錯的，而 `(A) /* no operation */` 固然沒有語法上的錯誤(根本沒語法)，但算一算指標更新的個數就會知道若是 AAA 沒有動作的話就不夠了，因此此題為 `(C)` 。有了 AAA 的答案，接下來就很容易看出 BBB 應該就是最後剩下的 n->pprev 的更新了，因此答案選 `(A)` 。同樣的，利用形態可以刪除 `(E)` ，判斷指標方向可以刪除 `(B)`、`(C)` ， `(D)` 選項雖然型態正確，指向也沒錯，但其實該操作在 AAA 就做過了。 #### 延伸問題-解釋程式碼運作原理 ```c // 初始化 map (存放 hlist_head 的 array) // 型態為自定義的 map_t，包含一個 int bits 表示大小，及 struct hlist_head *ht 指向 hlist // map 是一個指標指向 hlist_head 的 array ，並根據傳入的 bits 值設定 hash table 的大小 // 接著向記憶體要求空間放入 hlist_head ，並逐個做初始化的動作(設為 NULL)，最後回傳 map // malloc 時失敗則回傳 NULL map_t *map_init(int bits) { map_t *map = malloc(sizeof(map_t)); if (!map) return NULL; map->bits = bits; // 根據 #define MAP_HASH_SIZE(bits) (1 << bits)，該函式的回傳值是 // 2 的 'bits' 次方 map->ht = malloc(sizeof(struct hlist_head) * MAP_HASH_SIZE(map->bits)); if (map->ht) { for (int i = 0; i < MAP_HASH_SIZE(map->bits); i++) (map->ht)[i].first = NULL; } else { free(map); map = NULL; } return map; } ``` ```c // 定義 hash_key 結構 struct hash_key { int key; void *data; struct hlist_node node; }; ``` ```c // 定義 container_of 巨集，以及 GOLDEN_RATIO_32 的值 #define container_of(ptr, type, member) \ ({ \ void *__mptr = (void *) (ptr); \ ((type *) (__mptr - offsetof(type, member))); \ }) #define GOLDEN_RATIO_32 0x61C88647 ``` ```c // hash function : 將 val 乘上 GOLDEN_RATIO_32 後，取前 'bits' 個 bits 的值作為key static inline unsigned int hash(unsigned int val, unsigned int bits) { /* High bits are more random, so use them. */ return (val * GOLDEN_RATIO_32) >> (32 - bits); } ``` ```c // 給予一個值 key ，並嘗試在 map 中找尋其是否存在，是則返回該節點，否則回傳 NULL // 先用 hash() 找到 key 的 hash 值，接著在對應的 hlist_head 下逐個尋找 hash_key 是否 // 有跟 key 相同的值。 static struct hash_key *find_key(map_t *map, int key) { struct hlist_head *head = &(map->ht)[hash(key, map->bits)]; for (struct hlist_node *p = head->first; p; p = p->next) { struct hash_key *kn = container_of(p, struct hash_key, node); if (kn->key == key) return kn; } return NULL; } ``` ```c // 給定一個值 key ，嘗試在 map 中找存其是否存在，是則返回該該節點內的資料，實際儲存的是在 nums // 中的 index 值，否則回傳 NULL void *map_get(map_t *map, int key) { struct hash_key *kn = find_key(map, key); return kn ? kn->data : NULL; } ``` ```c // 先找 map 中是否已經有 key 值存在了，有的話就返回(理論上leetcode題目中不應該出現這種狀況)， // 沒有的話就要將 key 值放入 map 裡，先向記憶體索取空間放入 key 值以及 data (num 的 index)， // 接著更新指標，將其放入最前面，當目前 key 對應的 hlist 不為空時有例外狀況。 void map_add(map_t *map, int key, void *data) { struct hash_key *kn = find_key(map, key); if (kn) return; kn = malloc(sizeof(struct hash_key)); kn->key = key, kn->data = data; struct hlist_head *h = &map->ht[hash(key, map->bits)]; struct hlist_node *n = &kn->node, *first = h->first; n->next = first; if (first) first->pprev = &n->next; h->first = n; n->pprev = &h->first; } ``` ```c // 將 map 的內容全部刪除並釋放記憶體空間，這裡比較複雜，因此我做了一些圖讓我比較好理解 // 首先考慮有圖 1 這樣子的一個 map ，當 head 指向 hlist_head[0] 的時候， p 指向 NULL， // 因此這裡會直接跳出迴圈; 到圖二，當 head 指向 hlist_head[1] 的時候， p 會先指向第一個 // hlist_node，接著先保留後面連接的 hlist_node ，至此如圖三，接下來會有刪除的動作， // 就在 goto bail 的部分，會判斷目前的 node 是否已經脫離 hlist_head 所能到達位址，是的話 // 就將其記憶體釋放。 // 最後將 map 釋放掉 void map_deinit(map_t *map) { if (!map) return; for (int i = 0; i < MAP_HASH_SIZE(map->bits); i++) { struct hlist_head *head = &map->ht[i]; for (struct hlist_node *p = head->first; p;) { struct hash_key *kn = container_of(p, struct hash_key, node); struct hlist_node *n = p; p = p->next; if (!n->pprev) /* unhashed */ goto bail; //////////////// 這一塊還沒看懂 struct hlist_node *next = n->next, **pprev = n->pprev; *pprev = next; if (next) next->pprev = pprev; n->next = NULL, n->pprev = NULL; //////////////// bail: free(kn->data); free(kn); } } free(map); } ``` ```graphviz digraph structs { node [shape = record]; struct1 [label = "<h0> hlist_head[0].first | <h1> hlist_head[1].first \ | <h2> hlist_head[2].first | <h3> ... "]; struct2 [label = "<hn1> hlist_node|{<hn1p>pprev|<hn1n>next}"]; subgraph cluster_1{ label = "key_hash"; struct2; } subgraph cluster_2{ label = "key_hash"; struct3; } struct3 [label = "<hn2> hlist_node|{<hn2p>pprev|<hn2n>next}"]; node [shape = none]; NULL1 [label = "NULL"]; map [label = "map"]; head [label = "head"]; rankdir = "LR"; struct1:h1-> struct2:hn1; struct2:hn1n-> struct3:hn2; struct2:hn1p-> struct1:h1; struct3:hn2p-> struct2:hn1n; struct3:hn2n-> NULL1 map:e -> struct1:n; head-> struct1:h0; p->NULL[color="red"]; } ``` 圖一 ```graphviz digraph structs { node [shape = record]; struct1 [label = "<h0> hlist_head[0].first | <h1> hlist_head[1].first \ | <h2> hlist_head[2].first | <h3> ..."]; struct2 [label = "<hn1> hlist_node|{<hn1p>pprev|<hn1n>next}"]; subgraph cluster_1{ label = "key_hash"; struct2; } subgraph cluster_2{ label = "key_hash"; struct3; } struct3 [label = "<hn2> hlist_node|{<hn2p>pprev|<hn2n>next}"]; node [shape = none]; NULL1 [label = "NULL"]; map [label = "map"]; head [label = "head"]; rankdir = "LR"; struct1:h1-> struct2:hn1; struct2:hn1n-> struct3:hn2; struct2:hn1p-> struct1:h1; struct3:hn2p-> struct2:hn1n; struct3:hn2n-> NULL1 map:e -> struct1:n; head-> struct1:h1; p->struct2[color="red"] } ``` 圖二 ```graphviz digraph structs { compound=true; node [shape = record]; struct1 [label = "<h0> hlist_head[0].first | <h1> hlist_head[1].first \ | <h2> hlist_head[2].first | <h3> ..."]; struct2 [label = "<hn1> hlist_node|{<hn1p>pprev|<hn1n>next}"]; subgraph cluster_1{ label = "key_hash"; struct2; } subgraph cluster_2{ label = "key_hash"; struct3; } struct3 [label = "<hn2> hlist_node|{<hn2p>pprev|<hn2n>next}"]; node [shape = none]; NULL1 [label = "NULL"]; map [label = "map"]; head [label = "head"]; rankdir = "LR"; struct1:h1-> struct2:hn1; struct2:hn1n-> struct3:hn2; struct2:hn1p-> struct1:h1; struct3:hn2p-> struct2:hn1n; struct3:hn2n-> NULL1 map:e -> struct1:n; head-> struct1:h1; p->struct3[color="red"]; n->struct2[color="red"]; kn->struct2:n[lhead=cluster_1 color="red"] } ``` 圖三 ```c // hash 版本的 two sum 實作，先將 map 初始化，並向記憶體預先索取存放返回值的大小(兩個 int)， // 接著對 nums 裡的每個值，用 target - n 來確認是否有符合條件的值存在，若有的話，則跳出迴圈， // 並回傳結果，若否則將 n 以及其 index 存入 map 中。 int *twoSum(int *nums, int numsSize, int target, int *returnSize) { map_t *map = map_init(10); *returnSize = 0; int *ret = malloc(sizeof(int) * 2); // 索取記憶體失敗，跳至 bail 結束程式。 if (!ret) goto bail; for (int i = 0; i < numsSize; i++) { int *p = map_get(map, target - nums[i]); if (p) { /* found */ ret[0] = i, ret[1] = *p; *returnSize = 2; break; } p = malloc(sizeof(int)); *p = i; map_add(map, nums[i], p); } bail: map_deinit(map); return ret; } ``` #### 延伸問題 - 解釋 hash table 的設計和實作手法，並探討 GOLDEN_RATIO_PRIME 實作考量 TODO ### 測驗2 [LeetCode 82. Remove Duplicates from Sorted List II](https://leetcode.com/problems/remove-duplicates-from-sorted-list-ii/) #### 答案 COND1 = `head->next && head->val == head->next->val` COND2 = `head->next && head->val == head->next->val` #### 解析 ```c #include <stddef.h> struct ListNode { int val; struct ListNode *next; }; struct ListNode *deleteDuplicates(struct ListNode *head) { if (!head) return NULL; if (COND1) { /* Remove all duplicate numbers */ while (COND2) head = head->next; return deleteDuplicates(head->next); } head->next = deleteDuplicates(head->next); return head; } ``` 在範例程式碼中，是以 recursive 的形式去實作，先撇除 if(COND1) 的部分，剩下四行的功能其實就是把 linked list 連起來，因此中間的 if 便是決定要捨棄哪些節點的關鍵部分。先考慮刪除的條件 : 當前的 val 與下一個 node 的 val 值相同，因此 COND1 必定有一部分是 `head->val == head->next->val` ，接著在考慮到 head->next 可能是 NULL 的情況，也就是碰到 list 的最後一個值的時候， `head->next->val` 會出錯，因此要再補上一個判斷式，最後就寫成 `head->next && head->val == head->next->val` ，而接著的 while(COND2) 則是用來判斷後續是否還有相同的值相連，全部都要跳過，因此思考的邏輯與 COND1 就會完全一樣了，最後得到 COND2 = `head && head->val == head->next->val`。 #### 延伸問題 - 嘗試避免遞迴，寫出同樣作用的程式碼 ```c struct ListNode *deleteDuplicates(struct ListNode *head) { if (!head) return head; ListNode* node = head; ListNode* prev = NULL; while (node->next){ if (node->val == node->next->val){ while (node->next && node->val == node->next->val) { node = node->next; } if (!prev) head = node->next; else prev->next = node->next; if(!node->next) { if (prev) prev->next = NULL; return head; } } else prev = node; node = node->next; } return head; } ``` 在 singly linked list 中，因為無法取得前一個節點的位置，因此需要建立 prev 來紀錄上一個安全的節點位置(沒有重複的節點)，但如此一來就需要多考慮一種情況，就是開頭第一組數字就重複，此時 prev 尚未給值，因此要額外寫判斷式，而且根據結束狀況的不同(結尾值是否重複)，也需要判斷式加以分離。 #### 延伸問題 - 以類似 Linux 核心的 circular doubly-linked list 改寫，撰寫遞迴和迭代 (iterative) 的程式碼考慮一個 struct ListNode ```c typedef struct{ int data; struct list_head list; } ListNode; ``` 並且該 linked list 結構為 ([圖片參考](https://hackmd.io/@sysprog/c-linked-list#Linked-list-在-Linux-核心原始程式碼)) ```graphviz digraph ele_list { rankdir=LR; node[shape=record]; h [label="{<left>prev|<right>next}", style="bold"]; e1 [label="data|{<left>prev|<right>next}", style="bold"]; e2 [label="data|{<left>prev|<right>next}", style="bold"]; e3 [label="...", style="bold"]; e4 [label="data|{<left>prev|<right>next}", style="bold"]; e5 [label="data|{<left>prev|<right>next}", style="bold"]; h:right:e -> e1:left:w[arrowhead=normal, arrowtail=normal, dir=both] e1:right:e -> e2:left:w[arrowhead=normal, arrowtail=normal, dir=both]; e2:right:e -> e3:left:w[arrowhead=normal, arrowtail=normal, dir=both]; e3:right:e -> e4:left:w[arrowhead=normal, arrowtail=normal, dir=both]; e4:right:e -> e5:left:w[arrowhead=normal, arrowtail=normal, dir=both]; e5:right:s -> h:left:s[arrowhead=normal, arrowtail=normal, dir=both]; head -> h; } ``` 則迭代版的程式可寫成 ```c struct list_head *deleteDuplicates(struct list_head *head) { // head 指向 NULL 、佇列為空或是只有一個節點存在，此三種情況直接返回 head if (!head || head->next == head->prev) return head; struct list_head *node = head->next; ListNode *entry1 = NULL, *entry2 = NULL; while (node->next != head) { // 取出兩個 entry 比較兩著的 data 是否相同 entry1 = list_entry(node, ListNode, list); entry2 = list_entry(node->next, ListNode, list); if (entry1->data == entry2->data) { while (node->next != head && entry1->data == entry2->data) { // 刪除重複的後者並取得下一個 entry list_del(node->next); entry2 = list_entry(node->next, ListNode, list); } // 將重複的開頭也刪除掉 node = node->next; list_del(node->prev); if (node == head) break; } else { node = node->next; } } return head; } ``` 遞迴版本 TODO ### 測驗3 [LeetCode 146. LRU Cache](https://leetcode.com/problems/lru-cache/) #### 答案 MMM1 = `list_for_each_entry_safe` MMM2 = `list_for_each_entry` MMM3 = `list_for_each_entry` MMM4 = `list_last_entry` #### 解析個別看題目出現的程式碼 ```c void lRUCacheFree(LRUCache *obj) { LRUNode *lru, *n; MMM1 (lru, n, &obj->dhead, dlink) { list_del(&lru->dlink); free(lru); } free(obj); } ``` 首先 MMM1 出現在刪除 Cache 的函式中，很容易就可以想到要對節點做改變的話，就必須是帶有 _safe 的巨集，而此題中要刪除的並不只是 `list_head` ，而是整個 entry ，因此要使用 `list_for_each_entry_safe` 。 ```c int lRUCacheGet(LRUCache *obj, int key) { LRUNode *lru; int hash = key % obj->capacity; MMM2 (lru, &obj->hheads[hash], hlink) { if (lru->key == key) { list_move(&lru->dlink, &obj->dhead); return lru->value; } } return -1; } ``` MMM2出現在查找 key 是否存在對應的 list 中的函式裡，因此在 MMM2 的部分要對一個 list 做遍尋的操作，而且遍尋的目標 lru 是屬於 entry ，使用 `list_for_each_entry` ```c void lRUCachePut(LRUCache *obj, int key, int value) { LRUNode *lru; int hash = key % obj->capacity; MMM3 (lru, &obj->hheads[hash], hlink) { if (lru->key == key) { list_move(&lru->dlink, &obj->dhead); lru->value = value; return; } } if (obj->count == obj->capacity) { lru = MMM4(&obj->dhead, LRUNode, dlink); list_del(&lru->dlink); list_del(&lru->hlink); } else { lru = malloc(sizeof(LRUNode)); obj->count++; } lru->key = key; list_add(&lru->dlink, &obj->dhead); list_add(&lru->hlink, &obj->hheads[hash]); lru->value = value; } ``` MMM3 與 MMM4 出現在將 value 放入 key 對應的 list 中，首先要在 MMM3 的部分確認 key 是否已經存在 list 中，有的話就做更新 value 的動作，因此這也是一個遍尋的操作，且目標是 entry ，因此 MMM3 是 `list_for_each_entry` 。如果 list 已經滿了，則在 MMM4 的部分要先將一個 entry 刪除，再把 value 放進去，因此 MMM4 就是要挑選一個 entry 將其刪除，可能的答案有 `list_first_entry` 及 `list_last_entry` 兩個，因為更新或放入永遠會在 list 最前面，選擇 `list_last_entry` 是最符合 "last recently used" 的答案。 #### 延伸 - 解釋上述程式碼的運作，撰寫完整的測試程式，指出其中可改進之處並實作 ```c // 定義 struct ， LRUCache 用來存放 Cache 的訊息， dhead 指向近期被呼叫過的 LRUNode， // 而 hhead[i] 則根據 hash 值存放近期被呼叫過的 LRUNode ，方便尋找某一個 Node 是否存在 typedef struct { int capacity, count; struct list_head dhead, hheads[]; } LRUCache; // 定義 LRUNode ，存放單個節點的訊息，當該存在 Cache 中時， dlink 指向 dhead所串接的 // linked list， hlink 則是指向 hhead[i] 所串接的 linked list typedef struct { int key, value; struct list_head hlink, dlink; } LRUNode; ``` ```c // 建立新的 Cache 並對所有 linked list 初始化，大小由傳入的 capacity 決定。 LRUCache *lRUCacheCreate(int capacity) { LRUCache *obj = malloc(sizeof(*obj) + capacity * sizeof(struct list_head)); obj->count = 0; obj->capacity = capacity; INIT_LIST_HEAD(&obj->dhead); for (int i = 0; i < capacity; i++) INIT_LIST_HEAD(&obj->hheads[i]); return obj; } ``` ```c // 釋放 cahce 中的所有節點，包括 cache 本身。 void lRUCacheFree(LRUCache *obj) { LRUNode *lru, *n; list_for_each_entry_safe (lru, n, &obj->dhead, dlink) { list_del(&lru->dlink); free(lru); } free(obj); } ``` ```c // 取得一個 key 值對應的 LRUNode->value ，並回傳value。若該節點不存在，回傳 -1。 // valid value : 0 <= value <= 10^5 int lRUCacheGet(LRUCache *obj, int key) { LRUNode *lru; int hash = key % obj->capacity; list_for_each_entry (lru, &obj->hheads[hash], hlink) { if (lru->key == key) { list_move(&lru->dlink, &obj->dhead); return lru->value; } } return -1; } ``` ```c // 放入一個近期被呼叫的節點於 cahce 的最前端，若該節點已經存在於 cahce 中，則對其 value 做更新， // 並移動至最前端。若不在 cache 中，則考慮 cache 是否已滿，還沒滿就建立一個新的節點並放於最前端， // 若已滿則先刪除最後端的節點(距離上次呼叫最遠)，再將新節點放入。 void lRUCachePut(LRUCache *obj, int key, int value) { LRUNode *lru; int hash = key % obj->capacity; list_for_each_entry (lru, &obj->hheads[hash], hlink) { if (lru->key == key) { list_move(&lru->dlink, &obj->dhead); lru->value = value; return; } } if (obj->count == obj->capacity) { lru = list_last_entry(&obj->dhead, LRUNode, dlink); list_del(&lru->dlink); list_del(&lru->hlink); } else { lru = malloc(sizeof(LRUNode)); obj->count++; } lru->key = key; list_add(&lru->dlink, &obj->dhead); list_add(&lru->hlink, &obj->hheads[hash]); lru->value = value; } ``` 接著我補上以下幾段程式碼來測試上方的函式 ```c // 印出目前 cache 中的所有 node，格式為 'key':value，並且依照 cache 內的順序印出 // 若 cache 不存在或為空時則印出文字訊息 void cache_print(LRUCache *obj){ if (!obj) printf("cache not exist\n"); else if (list_empty(&obj->dhead)) printf("cache is empty\n"); else{ printf("cache : "); LRUNode *node = NULL; list_for_each_entry(node, &obj->dhead, dlink){ printf("'%d':%d, ", node->key, node->value); } printf("\n"); } return; } // 搜尋給定的 key 值之 value ，若該 key 值不存在於 cache 內，則印出文字訊息。 void key_print(LRUCache *obj, int key){ int value = lRUCacheGet(obj, key); if (value >= 0) printf("'%d': %d\n", key, value); else printf("no node with key '%d' exist\n", key); return; } // 測試 int main() { int cap = 5, value = 0; LRUCache *cache = NULL; // test print, NULL expected cache_print(cache); cache = lRUCacheCreate(cap); // test print, empty list expected cache_print(cache); lRUCachePut(cache, 1, value++); lRUCachePut(cache, 2, value++); lRUCachePut(cache, 3, value++); lRUCachePut(cache, 4, value++); // test print cache, four key and value expected (4,3,2,1) cache_print(cache); lRUCachePut(cache, 1, value++); // test print cache, key '1' overwiritten expected (1,4,3,2) cache_print(cache); lRUCachePut(cache, 5, value++); lRUCachePut(cache, 6, value++); // test print cache, key '2' removed expected (6,5,1,4,3) cache_print(cache); // test cachceget key_print(cache, 5); key_print(cache, 2); lRUCacheFree(cache); //test print, NULL expected cache_print(cache); return 0; } ``` 執行結果 ![](https://i.imgur.com/HVHEyzg.png) 大部分的內容都按照預期的輸出了，只有在最後 free 之後，預期要輸出 `cache not exist` ，指標卻沒有讓它指回 NULL ，只變成空的 list 。但因為記憶體都已經釋放了，因此讓指標繼續存在會有相對的風險，應該讓 obj 指向 NULL ，這是第一個可以改進的地方。接著我注意到一件事情，在 [leetcode](https://leetcode.com/problems/lru-cache/) 的規則中， key 值與 value 值是指定大於 0 的，但是在這個程式中卻沒有相對應的檢查機制，因此若是 key 值給了小於 0 的值，則會發生讀取到 head[-1] 這樣的事發生，而若是 value 給了 -1 ，則在 `LRUCacheGet` 的時候會與沒找到時回傳的 -1 有所重疊，造成無法用回傳值判斷 key 值是否存在於 cache 內。例如稍微寫成以下的形式，可以發現雖然 key '2' 存在，而且可以被找到，但是 lRUCacheGet 預設給無效的返回值就會跟正確的 value 相同，而使程式無法判斷。 ```c lRUCachePut(cache, 1, value--); lRUCachePut(cache, 2, value++); lRUCachePut(cache, 3, value++); lRUCachePut(cache, 4, value++); // test print cache, four key and value expected (4,3,2,1) cache_print(cache); key_print(cache, 2) ``` ![](https://i.imgur.com/VLr6F7Y.png) > 註 : hash < 0 值的問題可以用第四題的方式將其轉換成 >=0 的整數，但根據 leetcode 的規定來寫的話，應該是要盡量避免 key < 0 的輸入才對。針對以上兩點，我將程式改寫如下 ```c // 改變函式回傳型態，使 cache free 後回傳 NULL struct LRUCache *lRUCacheFree(LRUCache *obj) { // 判斷 cache 是否存在 if (!obj){ printf("cache not exist"); return NULL; } // 這個判斷應該加在每一個函式最前面，以免記憶體位址讀取錯誤發生 // 這裡僅列出兩個修改較多的函式之程式碼 LRUNode *lru, *n; list_for_each_entry_safe (lru, n, &obj->dhead, dlink) { list_del(&lru->dlink); free(lru); } free(obj); return NULL; } void lRUCachePut(LRUCache *obj, int key, int value) { if (!obj){ printf("cache not exist"); return; } // 確認 key 與 value 是否在大於 0 的範圍，否則印出文字訊息並結束函式 if (key < 0 || value < 0){ printf("invalid key or value\n"); return; } LRUNode *lru; int hash = key % obj->capacity; list_for_each_entry (lru, &obj->hheads[hash], hlink) { if (lru->key == key) { list_move(&lru->dlink, &obj->dhead); lru->value = value; return; } } if (obj->count == obj->capacity) { lru = list_last_entry(&obj->dhead, LRUNode, dlink); list_del(&lru->dlink); list_del(&lru->hlink); } else { lru = malloc(sizeof(LRUNode)); obj->count++; } lru->key = key; list_add(&lru->dlink, &obj->dhead); list_add(&lru->hlink, &obj->hheads[hash]); lru->value = value; } ``` ### 測驗4 [LeetCode 128. Longest Consecutive Sequence](https://leetcode.com/problems/longest-consecutive-sequence/description/) #### 答案 LLL = `--left` RRR = `++right` #### 解析題目給予一個 nums 陣列，要求回傳一個整數為 nums 中最長的相聯數字個數，例如 [5, 9, 6, 1, 7, 3, 8] ，其中 5 到 9 是相連的，因此就要回傳 5 。這個解法在結構上跟第一題有點相似，同樣是以 hash table 來實作，以達到 $O(n)$ 的時間複雜度。 >不同的是，一般 hash table 為了讓 hash function 盡量讓避免相連的 key 值得到相連的 hash 值，需要較複雜的 function ，但這題並無此需求，因此 hash function 相對簡單。知道這個程式的概念之後，接著看 LLL 跟 RRR 的使用 : ```c int longestConsecutive(int *nums, int n_size) { int hash, length = 0; struct seq_node *node; struct list_head *heads = malloc(n_size * sizeof(*heads)); for (int i = 0; i < n_size; i++) INIT_LIST_HEAD(&heads[i]); for (int i = 0; i < n_size; i++) { if (!find(nums[i], n_size, heads)) { hash = nums[i] < 0 ? -nums[i] % n_size : nums[i] % n_size; node = malloc(sizeof(*node)); node->num = nums[i]; list_add(&node->link, &heads[hash]); } } for (int i = 0; i < n_size; i++) { int len = 0; int num; node = find(nums[i], n_size, heads); while (node) { len++; num = node->num; list_del(&node->link); int left = num, right = num; while ((node = find(LLL, n_size, heads))) { len++; list_del(&node->link); } while ((node = find(RRR, n_size, heads))) { len++; list_del(&node->link); } length = len > length ? len : length; } } return length; } ``` LLL 跟 RRR 分別在一個 while 迴圈的判斷式之中，當前後的 heads list 之中有跟 num 相連的數字時，將 len++ ，因此 LLL 與 RRR 就是一個與 num 相連的計數器，而且兩個分別往不同方向去數。而被作為計數器使用的變數 left, right ，因為初始值是 num ，因此要 ++( 或 -- ) 的動作就要加在變數之前，最後考慮到命名方式，所以 LLL 應該是 `--left` ，而 RRR 則是 `++right`。 > 以實作的邏輯正確方面來看，將 LLL 寫作 `++left` 、 RRR 寫作 `--right` ，也是沒錯的，但容易使人混淆。 #### 延伸 - 解釋上述程式碼的運作，撰寫完整的測試程式，指出其中可改進之處並實作結構圖範例 ```graphviz digraph structs { compound=true; node [shape = record]; head [label = "<h0> head[0] | <h1> head[1] |\ <h2> head[2] | <h3> ..."]; node1 [label = "<hn1> num[i]|{<hn1p>prev|<hn1n>next}"]; node2 [label = "<hn2> num[j]|{<hn2p>prev|<hn2n>next}"]; subgraph cluster_1{ label = "seq_node"; node1; } subgraph cluster_2{ label = "seq_node"; node2; } node [shape = none]; map [label = "head"]; rankdir = "LR"; head:h1-> node1:hn1p [dir = "both"]; node1:hn1n:e-> node2:hn2p:w [dir = "both"]; head:h1:e->node2:hn2n:s [dir = "both",constraint=false]; map:e -> head:n; } ``` ```c // 定義 struct ，存放數值以及利用 list_head 實現 hash table struct seq_node { int num; struct list_head link; }; ``` ```c // 查找 num 是否在 hash table 中，是則返回該節點，否則返回 NULL // @num: 要找的值 // @size: nums 的長度，用來做為 hash 值的計算 // @*head: 指向 hash table 的指標 static struct seq_node *find(int num, int size, struct list_head *heads) { struct seq_node *node; int hash = num < 0 ? -num % size : num % size; list_for_each_entry (node, &heads[hash], link) { if (node->num == num) return node; } return NULL; } ``` ```c // 尋找最長連續整數之個數 // @nums: 整數陣列 // @n_size: 整數陣列之長度 int longestConsecutive(int *nums, int n_size) { int hash, length = 0; struct seq_node *node; // 建立長度為 n_size 的 list_head array，並對每一個 list_head 做初始化 struct list_head *heads = malloc(n_size * sizeof(*heads)); for (int i = 0; i < n_size; i++) INIT_LIST_HEAD(&heads[i]); // 將 nums 的每一個值放入其中，做出 hash table， hash 值為 nums[i] % n_size 的絕對值 for (int i = 0; i < n_size; i++) { if (!find(nums[i], n_size, heads)) { hash = nums[i] < 0 ? -nums[i] % n_size : nums[i] % n_size; node = malloc(sizeof(*node)); node->num = nums[i]; list_add(&node->link, &heads[hash]); } } // 依序對 nums 的值尋找是否有相連的數值在 hash table 中，並記錄相連的長度，若長度大於當前 // 紀錄的最長長度，則對其做更新，最後回傳最長長度 for (int i = 0; i < n_size; i++) { int len = 0; int num; node = find(nums[i], n_size, heads); while (node) { len++; num = node->num; // list_del 將找到的節點刪掉，避免重複的計算 list_del(&node->link); int left = num, right = num; while ((node = find(LLL, n_size, heads))) { len++; list_del(&node->link); } while ((node = find(RRR, n_size, heads))) { len++; list_del(&node->link); } length = len > length ? len : length; } } return length; } ``` 目前程式的遍尋會讓 nums 裡的每一個數都輪到一次，但若是一開始第一個數所連接的就是最長長度的數列，則後續的走訪就沒有意義，因此可以將 nsize做調整，並改成以下的方式 ```c int longestConsecutive(int *nums, int n_size) { int hash, length = 0; struct seq_node *node; struct list_head *heads = malloc(n_size * sizeof(*heads)); for (int i = 0; i < n_size; i++) INIT_LIST_HEAD(&heads[i]); for (int i = 0; i < n_size; i++) { if (!find(nums[i], n_size, heads)) { hash = nums[i] < 0 ? -nums[i] % n_size : nums[i] % n_size; node = malloc(sizeof(*node)); node->num = nums[i]; list_add(&node->link, &heads[hash]); } } for (int i = 0; length < n_size;) { int len = 0; int num; node = find(nums[i], n_size, heads); len++; num = node->num; list_del(&node->link); int left = num, right = num; while ((node = find(LLL, n_size, heads))) { len++; list_del(&node->link); } while ((node = find(RRR, n_size, heads))) { len++; list_del(&node->link); } nsize -= len; i += len; length = len > length ? len : length; } return length; } ``` 如此一來，利用 nsize 紀錄剩餘的節點個數，以及用 i 來紀錄已經走訪過的節點個數，如果剩餘節點數小於等於目前記錄的最長長度，則表示剩餘的節點中已經不可能存在更長的連續整數了，程式可以提前結束。此外，雖然因為 hash function 簡單快速而有必要讓 hash 值的範圍擴大，確保其遍尋每個 list_head 盡量為 $O(1)$ 的時間複雜度，但相對來說也浪費了很多不必要的空間，光是 list_head 的數量就是 n_size 的兩倍。為此，我認為是可以適當的縮減 hash 值的範圍，如 n_size 除以一個常數，以達到更好的空間利用 #### 延伸 - 嘗試用 Linux 核心風格的 hash table 重新實作上述程式碼 ```c #include <stdio.h> #include <stdlib.h> #include "list.h" struct seq_node { int num; struct hlist_node link; }; static struct seq_node *find(int num, int size, struct list_head *heads) { struct seq_node *node; int hash = num < 0 ? -num % size : num % size; hlist_for_each_entry (node, &heads[hash], link) { if (node->num == num) return node; } return NULL; } int longestConsecutive(int *nums, int n_size) { int hash, length = 0; struct seq_node *node; struct hlist_head *heads = malloc(n_size * sizeof(*heads)); for (int i = 0; i < n_size; i++) &heads[i]->first = NULL; for (int i = 0; i < n_size; i++) { if (!find(nums[i], n_size, heads)) { hash = nums[i] < 0 ? -nums[i] % n_size : nums[i] % n_size; node = malloc(sizeof(*node)); node->num = nums[i]; hlist_add_head_rcu(&node->link, &heads[hash]); } } for (int i = 0; i < n_size; ) { int len = 0; int num; node = find(nums[i], n_size, heads); len++; num = node->num; hlist_del_rcu(&node->link); int left = num, right = num; while ((node = find(LLL, n_size, heads))) { len++; hlist_del_rcu(&node->link); } while ((node = find(RRR, n_size, heads))) { len++; hlist_del_rcu(&node->link); } nsize -= len; i += len; length = len > length ? len : length; } return length; } ```