# 2022q1 Homework1 (quiz1) contributed by < `robertnthu` > ###### tags: `linux2022` # 測驗 1 [Two Sum](https://leetcode.com/problems/two-sum/) ## map_add() > AAA = n->next = first; BBB = n->pprev = &h->first; 首先,這個程式是要加入一個 `hlist_node` 到 `map_t` 裡面,而觀察可以判斷出,這個新的 `hlist_node` 要被加在最前面 (graphviz參考 [jim12312321](https://hackmd.io/xPI27Wb1Rc2mvs3VkiiaTA?view)的圖並作[筆記](https://www.notion.so/Graphviz-6af840044c5c456e9989320e75abef6e) ```graphviz digraph G { rankdir = LR; splines = true; node[shape = "record"] num[label="2^bits",shape=plaintext] head[label ="<h>||<f>list_head.first\n||<e>",width = 1.5] node_1[label = "<m>hlist_node_1 | {<p>pprev \n| <n>next \n}",width = 1.5]; node_2[label = "<m>hlist_node_2 | {<p>pprev | <n>next \n}",width = 1.5]; node_3[label = "<m>hlist_node_3 | {<p>pprev \n| <n>next \n}",width = 1.5]; etc_next[shape = plaintext,label = "..."] num->head -> node_1 -> node_2 -> node_3 -> etc_next[weight = 10, style = invis]; head:h -> head:e[dir=both]; head:f -> node_1:m; node_1:p -> head:f; node_1:n -> node_2:m; node_2:p -> node_1:n; node_2:n -> node_3:m; node_3:p -> node_2:n; node_3:n -> etc_next; } ``` ```graphviz digraph G { rankdir = LR; splines = true; node[shape = "record"] num[label="2^bits",shape=plaintext] head[label ="<h>||<f>list_head.first\n||<e>",width = 1.5] new[label = "<m>new_node | {<p>pprev | <n>next}", width = 1.5, color = "red"] node_1[label = "<m>hlist_node_1 | {<p>pprev \n| <n>next \n}",width = 1.5]; node_2[label = "<m>hlist_node_2 | {<p>pprev | <n>next \n}",width = 1.5]; etc_next[shape = plaintext,label = "..."] num -> head -> new -> node_1 -> node_2 -> etc_next[weight = 10, style = invis]; head:h -> head:e[dir=both]; head:f -> new:m[color = "red"]; new:p -> head:f[color = "red"]; new:n -> node_1:m[color = "red"]; node_1:p -> new:n[color = "red"]; node_1:n -> node_2:m; node_2:p -> node_1:n; node_2:n -> etc_next; } ``` 所以可以看出有四個指標要更動,以下程式 ```c struct hlist_head *h = &map->ht[hash(key, map->bits)]; struct hlist_node *n = &kn->node, *first = h->first; // Insert hlist_node into the list // but kn is hash key // we did not add any hlist_node to kn->node now // *n is the node that kn point to. // n->pprev = &first n->next = first; if (first) // if first exists, put its pprev points to &n->next, which is a pointer point to itself first->pprev = &n->next; h->first = n; n->pprev = &h->first; ``` * 先把新的 `struct hlist_node n`的`next`指向 `first`,不管 `first` 是不是空的 * 如果`first`非空,也就是本來那裡有其他`hlist_node`,就把本來的`first->pprev`指向`&n->next`,也就是指向自己的指標(指向指標,所以是指標的指標) * 再把`h->first`指向新加入的`n` * 最後把`n->pprev`指到「指向`n`的指標」,也就是`&h->first` ### map_init() 造一個2^10^個 array,每個 element 都是 `hlist_head`,並指向`NULL` ### map_get(map_t *map, int key) * 呼叫`find_key()`找跟參數`key`有一樣`key`的`hash_key`,找到的話就回傳`data`,這個`data`就是「所求的index」 * 沒找到就回傳`NULL` ### find_key() * 到 `hash_table`找,有沒有`target-nums[i]`這個值的`hlist_node`,有的話就回傳`hlist_node` * 沒有就回傳`NULL` ## 程式運作原理 * 遍歷整個`nums`,查詢 hash table 裡面有沒有值是 **target-nums[i]** * 如果找到的話,代表有另外一個元素值等於 **target-nums[i]**,跟現在這個元素`nums[i]`,相加之後剛好會等於`target`,那就是我們所求,於是回傳那個元素在`nums`的 index,以及當前元素的 index ,也就是 i * 如果沒找到的話,把當前元素放進 hash table ### 時間複雜度 如果在 hash table 搜尋的速度視為 constant,那時間複雜度就只跟 for-loop 有關,也就是 O(n) ## 研讀 linux/hashtable.h >When first encountering this, most people are confused because they have been taught to implement hashtables differently. So Why do we have a pointer to hlist_node? Why do do we need a list in the first place? 因為上述這句話,我認為 hashtable.h 的實作是:盡可能滿足所有需求的同時,固定每個人的實作方式。 # 測驗 2 Remove Duplicates from Sorted List II >COND1= head->next && head->val == head->next->val COND2= head->next && head->val == head->next->val ```graphviz digraph G { rankdir = LR; splines = true; node[shape = "record"] 1[label = "1"]; 2[label = "2"]; 3[label = "2"]; 4[label = "2"]; 5[label = "3"]; 6[label = "4"]; 1 -> 2 -> 3 -> 4 -> 5 -> 6 } ``` 刪除重複節點的方式 ```graphviz digraph G { rankdir = LR; splines = true; node[shape = "record"] 1[label = "1"]; 2[label = "2"]; 3[label = "2"]; 4[label = "2"]; 5[label = "3"]; 6[label = "4"]; 1 -> 2 -> 3 -> 4 -> 5 -> 6[weight = 10, style = "invis"] 1->5[color = "red"] 5->6 2 -> 3 -> 4 -> 5 } ``` ## 避免遞迴的方法 * 用一個指標,記住「指向 node 的指標」然後一直往後走 * 如果遇到一樣的值,就 iterate 找到沒有相同值的 node ,並連接上 以下是在 [LeetCode 82. Remove Duplicates from Sorted List II](https://leetcode.com/problems/remove-duplicates-from-sorted-list-ii/) 測試通過的實作 ```c ListNode* deleteDuplicates(ListNode* head) { if (!head) return NULL; struct ListNode **indir = &head; while (*indir != NULL && (*indir)->next != NULL) { if ((*indir)->val == (*indir)->next->val) { struct ListNode *tmp = (*indir)->next; while (tmp != NULL && tmp->val == (*indir)->val) { tmp = tmp->next; } *indir = tmp; } else { indir = &(*indir)->next; } } return head; } ``` 概念跟遞迴版本一樣 ## 以類似 Linux 核心的 circular doubly-linked list 改寫 以下是在 [lab0-c](https://hackmd.io/xPI27Wb1Rc2mvs3VkiiaTA?view#q_delete_dup) 的實作程式 ```c bool q_delete_dup(struct list_head *head) { // https://leetcode.com/problems/remove-duplicates-from-sorted-list-ii/ if (!head || list_empty(head)) return false; struct list_head *node = head->next; // iterate node and check whether there are duplicates // if val(node) == val(node->next), there are duplicates // and we have to check whether node->next exists while (node != head) { if (node->next != head && strcmp(val(node), val(node->next)) == 0) { // there are duplicates, iterate to delete the node char *tar_val = strdup(val(node)); // copy the target value while (node != head && strcmp(val(node), tar_val) == 0) { struct list_head *del = node; node = node->next; // delete del list_del(del); q_release_element(list_entry(del, element_t, list)); } free(tar_val); } else { node = node->next; } } return true; } char *val(struct list_head *l) { return (list_entry(l, element_t, list))->value; } ``` # 測驗 3 LRU Cache >MMM1 = list_for_each_entry_safe >MMM2 = list_for_each_entry >MMM3 = list_for_each_entry >MMM4 = list_last_entry ## 程式運作 * 這個程式利用 linked list 來進行 least recently used 的判斷。 * 如果一個 `LRUNode` 被讀或寫,那這個 `LRUNode`會被找出來,並且放在 linked list 的最前面 * 而當 LRUCache 滿的時候,如果要再加入新的 LRUNode ,則會從 linked list 的 tail 刪除 LRUNode ,因為最後一個就是都沒被讀或寫到的 LRUNode * 以下是 LRUCache 的結構 ```graphviz digraph G { rankdir = LR; splines = true; node[shape = "record"] node_1[label = "<m>dhead | {<p>prev \n| <n>next \n}",width = 1.5]; node_2[label = "<m>hlist_node_2 | {<p>prev | <n>next \n}",width = 1.5]; node_3[label = "<m>hlist_node_3 | {<p>prev \n| <n>next \n}",width = 1.5]; node_4[label = "<m>hlist_node_4 | {<p>prev \n| <n>next \n}",width = 1.5]; node_1 -> node_2 -> node_3 -> node_4[weight = 10, style = invis]; node_1:p -> node_4:m; node_1:n -> node_2:m; node_2:p -> node_1:m; node_2:n -> node_3:m; node_3:p -> node_2:m; node_3:n -> node_4:m; node_4:p -> node_3:m; node_4:n -> node_1:m; } ``` ```graphviz digraph G { rankdir = LR; splines = true; node[shape = "record"] num[label="hheads[]",shape=plaintext] head[label ="<h>||<m>list_head | {<p>prev | <n>next}||<e>",width = 1.5] node_1[label = "<m>hlist_node_x | {<p>prev \n| <n>next \n}",width = 1.5]; node_2[label = "<m>hlist_node_y | {<p>prev | <n>next \n}",width = 1.5]; node_3[label = "<m>hlist_node_z | {<p>prev \n| <n>next \n}",width = 1.5]; etc_next[shape = plaintext,label = "..."] num->head -> node_1 -> node_2 -> node_3 -> etc_next[weight = 10, style = invis]; head:h -> head:e[style = "invis"]; head:n -> node_1:m; node_1:p -> head:m; node_1:n -> node_2:m; node_2:p -> node_1:m; node_2:n -> node_3:m; node_3:p -> node_2:m; node_3:n -> head:f; head:p -> node_3; } ``` ```c int lRUCacheGet(LRUCache *obj, int key) { LRUNode *lru; int hash = key % obj->capacity; list_for_each_entry(lru, &obj->hheads[hash], hlink) { // list_for_each_entry if (lru->key == key) { list_move(&lru->dlink, &obj->dhead); // This node was used, move it to the head return lru->value; } } return -1; } ``` * 當 `lRUCacheGet` 時,去 hash table 找,如果有找到的話,把它在 dlink 移到第一個,代表它最近被使用到 * 沒找到就回傳 -1 ```c void lRUCachePut(LRUCache *obj, int key, int value) { LRUNode *lru; int hash = key % obj->capacity; list_for_each_entry(lru, &obj->hheads[hash], hlink) { if (lru->key == key) { list_move(&lru->dlink, &obj->dhead); // This node was write again, so put it to head lru->value = value; return; } } if (obj->count == obj->capacity) { // Cache is full lru = list_last_entry(&obj->dhead, LRUNode, dlink); // get the LRU node (head, entry, member) list_del(&lru->dlink); // del lru->dlink and lru->hlink list_del(&lru->hlink); } else { lru = malloc(sizeof(LRUNode)); obj->count++; } // add lru into the LRUCache lru->key = key; list_add(&lru->dlink, &obj->dhead); // dlink is added in &obj->dhead list_add(&lru->hlink, &obj->hheads[hash]); // hlink is added into &obj->hheads[hash] lru->value = value; } ``` * 中間是一個條件判斷,如果 cache 滿了,就把最後一個 LRUNode 給刪除 * 在兩個結構都要刪,然後空間沒有馬上 free 掉,留給要加進去的 LRUNode 使用 * 然後在兩個結構都把新的 LRUNode 都加進去 # 測驗 4 Longest Consecutive Sequence >LLL = --left >RRR = ++right ```c static struct seq_node *find(int num, int size, struct list_head *heads) { struct seq_node *node; int hash = num < 0 ? -num % size : num % size; list_for_each_entry (node, &heads[hash], link) { if (node->num == num) return node; } return NULL; } ``` 到 hash table 的 slot 去找有當前 key 值的 seq_node ```c for (int i = 0; i < n_size; i++) // init every list_head INIT_LIST_HEAD(&heads[i]); for (int i = 0; i < n_size; i++) { if (!find(nums[i], n_size, heads)) { // if we do not find nums[i] hash = nums[i] < 0 ? -nums[i] % n_size : nums[i] % n_size; node = malloc(sizeof(*node)); node->num = nums[i]; list_add(&node->link, &heads[hash]); // add a new seq_node into the hash table } } ``` 初始化 hash table,並把 nums[ ] 裡面的元素都加進去 hash table 裡面 ```c for (int i = 0; i < n_size; i++) { int len = 0; int num; node = find(nums[i], n_size, heads); // find nums[i] from hash table while (node) { len++; num = node->num; // num is the current value list_del(&node->link); // remove the current list_head int left = num, right = num; while ((node = find(LLL, n_size, heads))) { len++; list_del(&node->link); } while ((node = find(RRR, n_size, heads))) { len++; list_del(&node->link); } length = len > length ? len : length; // length = max(len, length) } } ``` 迭代 nums[ ] 裡面的數字 * 向左(比 nums[i] 小1 的數字)一個一個往下找 * 向右(比nums[i] 大1 的數字)一個一個往下找 如果有找到,`while`迴圈會繼續做,直到沒有連續數字才會停止。 ## 改進程式 這個程式儘管使用 hash table,但因為有兩個迴圈存在,時間複雜度會是O(n^2^),如果先排序之後,再迭代整個 linked list,時間複雜度就是O(n*log(n)) ## linux 核心的 hash table * 利用 `hash_for_each()` ## 未完成