# 2022q1 Homework1 (quiz1)
contributed by < `robertnthu` >
###### tags: `linux2022`
# 測驗 1 [Two Sum](https://leetcode.com/problems/two-sum/)
## map_add()
> AAA = n->next = first;
BBB = n->pprev = &h->first;
首先,這個程式是要加入一個 `hlist_node` 到 `map_t` 裡面,而觀察可以判斷出,這個新的 `hlist_node` 要被加在最前面
(graphviz參考 [jim12312321](https://hackmd.io/xPI27Wb1Rc2mvs3VkiiaTA?view)的圖並作[筆記](https://www.notion.so/Graphviz-6af840044c5c456e9989320e75abef6e)
```graphviz
digraph G {
rankdir = LR;
splines = true;
node[shape = "record"]
num[label="2^bits",shape=plaintext]
head[label ="<h>||<f>list_head.first\n||<e>",width = 1.5]
node_1[label = "<m>hlist_node_1 | {<p>pprev \n| <n>next \n}",width = 1.5];
node_2[label = "<m>hlist_node_2 | {<p>pprev | <n>next \n}",width = 1.5];
node_3[label = "<m>hlist_node_3 | {<p>pprev \n| <n>next \n}",width = 1.5];
etc_next[shape = plaintext,label = "..."]
num->head -> node_1 -> node_2 -> node_3 -> etc_next[weight = 10, style = invis];
head:h -> head:e[dir=both];
head:f -> node_1:m;
node_1:p -> head:f;
node_1:n -> node_2:m;
node_2:p -> node_1:n;
node_2:n -> node_3:m;
node_3:p -> node_2:n;
node_3:n -> etc_next;
}
```
```graphviz
digraph G {
rankdir = LR;
splines = true;
node[shape = "record"]
num[label="2^bits",shape=plaintext]
head[label ="<h>||<f>list_head.first\n||<e>",width = 1.5]
new[label = "<m>new_node | {<p>pprev | <n>next}", width = 1.5, color = "red"]
node_1[label = "<m>hlist_node_1 | {<p>pprev \n| <n>next \n}",width = 1.5];
node_2[label = "<m>hlist_node_2 | {<p>pprev | <n>next \n}",width = 1.5];
etc_next[shape = plaintext,label = "..."]
num -> head -> new -> node_1 -> node_2 -> etc_next[weight = 10, style = invis];
head:h -> head:e[dir=both];
head:f -> new:m[color = "red"];
new:p -> head:f[color = "red"];
new:n -> node_1:m[color = "red"];
node_1:p -> new:n[color = "red"];
node_1:n -> node_2:m;
node_2:p -> node_1:n;
node_2:n -> etc_next;
}
```
所以可以看出有四個指標要更動,以下程式
```c
struct hlist_head *h = &map->ht[hash(key, map->bits)];
struct hlist_node *n = &kn->node, *first = h->first;
// Insert hlist_node into the list
// but kn is hash key
// we did not add any hlist_node to kn->node now
// *n is the node that kn point to.
// n->pprev = &first
n->next = first;
if (first) // if first exists, put its pprev points to &n->next, which is a pointer point to itself
first->pprev = &n->next;
h->first = n;
n->pprev = &h->first;
```
* 先把新的 `struct hlist_node n`的`next`指向 `first`,不管 `first` 是不是空的
* 如果`first`非空,也就是本來那裡有其他`hlist_node`,就把本來的`first->pprev`指向`&n->next`,也就是指向自己的指標(指向指標,所以是指標的指標)
* 再把`h->first`指向新加入的`n`
* 最後把`n->pprev`指到「指向`n`的指標」,也就是`&h->first`
### map_init()
造一個2^10^個 array,每個 element 都是 `hlist_head`,並指向`NULL`
### map_get(map_t *map, int key)
* 呼叫`find_key()`找跟參數`key`有一樣`key`的`hash_key`,找到的話就回傳`data`,這個`data`就是「所求的index」
* 沒找到就回傳`NULL`
### find_key()
* 到 `hash_table`找,有沒有`target-nums[i]`這個值的`hlist_node`,有的話就回傳`hlist_node`
* 沒有就回傳`NULL`
## 程式運作原理
* 遍歷整個`nums`,查詢 hash table 裡面有沒有值是 **target-nums[i]**
* 如果找到的話,代表有另外一個元素值等於 **target-nums[i]**,跟現在這個元素`nums[i]`,相加之後剛好會等於`target`,那就是我們所求,於是回傳那個元素在`nums`的 index,以及當前元素的 index ,也就是 i
* 如果沒找到的話,把當前元素放進 hash table
### 時間複雜度
如果在 hash table 搜尋的速度視為 constant,那時間複雜度就只跟 for-loop 有關,也就是 O(n)
## 研讀 linux/hashtable.h
>When first encountering this, most people are confused because they have been taught to implement hashtables differently. So Why do we have a pointer to hlist_node? Why do do we need a list in the first place?
因為上述這句話,我認為 hashtable.h 的實作是:盡可能滿足所有需求的同時,固定每個人的實作方式。
# 測驗 2 Remove Duplicates from Sorted List II
>COND1= head->next && head->val == head->next->val
COND2= head->next && head->val == head->next->val
```graphviz
digraph G {
rankdir = LR;
splines = true;
node[shape = "record"]
1[label = "1"];
2[label = "2"];
3[label = "2"];
4[label = "2"];
5[label = "3"];
6[label = "4"];
1 -> 2 -> 3 -> 4 -> 5 -> 6
}
```
刪除重複節點的方式
```graphviz
digraph G {
rankdir = LR;
splines = true;
node[shape = "record"]
1[label = "1"];
2[label = "2"];
3[label = "2"];
4[label = "2"];
5[label = "3"];
6[label = "4"];
1 -> 2 -> 3 -> 4 -> 5 -> 6[weight = 10, style = "invis"]
1->5[color = "red"]
5->6
2 -> 3 -> 4 -> 5
}
```
## 避免遞迴的方法
* 用一個指標,記住「指向 node 的指標」然後一直往後走
* 如果遇到一樣的值,就 iterate 找到沒有相同值的 node ,並連接上
以下是在 [LeetCode 82. Remove Duplicates from Sorted List II](https://leetcode.com/problems/remove-duplicates-from-sorted-list-ii/) 測試通過的實作
```c
ListNode* deleteDuplicates(ListNode* head) {
if (!head)
return NULL;
struct ListNode **indir = &head;
while (*indir != NULL && (*indir)->next != NULL) {
if ((*indir)->val == (*indir)->next->val) {
struct ListNode *tmp = (*indir)->next;
while (tmp != NULL && tmp->val == (*indir)->val) {
tmp = tmp->next;
}
*indir = tmp;
} else {
indir = &(*indir)->next;
}
}
return head;
}
```
概念跟遞迴版本一樣
## 以類似 Linux 核心的 circular doubly-linked list 改寫
以下是在 [lab0-c](https://hackmd.io/xPI27Wb1Rc2mvs3VkiiaTA?view#q_delete_dup) 的實作程式
```c
bool q_delete_dup(struct list_head *head)
{
// https://leetcode.com/problems/remove-duplicates-from-sorted-list-ii/
if (!head || list_empty(head))
return false;
struct list_head *node = head->next;
// iterate node and check whether there are duplicates
// if val(node) == val(node->next), there are duplicates
// and we have to check whether node->next exists
while (node != head) {
if (node->next != head && strcmp(val(node), val(node->next)) == 0) {
// there are duplicates, iterate to delete the node
char *tar_val = strdup(val(node)); // copy the target value
while (node != head && strcmp(val(node), tar_val) == 0) {
struct list_head *del = node;
node = node->next;
// delete del
list_del(del);
q_release_element(list_entry(del, element_t, list));
}
free(tar_val);
} else {
node = node->next;
}
}
return true;
}
char *val(struct list_head *l)
{
return (list_entry(l, element_t, list))->value;
}
```
# 測驗 3 LRU Cache
>MMM1 = list_for_each_entry_safe
>MMM2 = list_for_each_entry
>MMM3 = list_for_each_entry
>MMM4 = list_last_entry
## 程式運作
* 這個程式利用 linked list 來進行 least recently used 的判斷。
* 如果一個 `LRUNode` 被讀或寫,那這個 `LRUNode`會被找出來,並且放在 linked list 的最前面
* 而當 LRUCache 滿的時候,如果要再加入新的 LRUNode ,則會從 linked list 的 tail 刪除 LRUNode ,因為最後一個就是都沒被讀或寫到的 LRUNode
* 以下是 LRUCache 的結構
```graphviz
digraph G {
rankdir = LR;
splines = true;
node[shape = "record"]
node_1[label = "<m>dhead | {<p>prev \n| <n>next \n}",width = 1.5];
node_2[label = "<m>hlist_node_2 | {<p>prev | <n>next \n}",width = 1.5];
node_3[label = "<m>hlist_node_3 | {<p>prev \n| <n>next \n}",width = 1.5];
node_4[label = "<m>hlist_node_4 | {<p>prev \n| <n>next \n}",width = 1.5];
node_1 -> node_2 -> node_3 -> node_4[weight = 10, style = invis];
node_1:p -> node_4:m;
node_1:n -> node_2:m;
node_2:p -> node_1:m;
node_2:n -> node_3:m;
node_3:p -> node_2:m;
node_3:n -> node_4:m;
node_4:p -> node_3:m;
node_4:n -> node_1:m;
}
```
```graphviz
digraph G {
rankdir = LR;
splines = true;
node[shape = "record"]
num[label="hheads[]",shape=plaintext]
head[label ="<h>||<m>list_head | {<p>prev | <n>next}||<e>",width = 1.5]
node_1[label = "<m>hlist_node_x | {<p>prev \n| <n>next \n}",width = 1.5];
node_2[label = "<m>hlist_node_y | {<p>prev | <n>next \n}",width = 1.5];
node_3[label = "<m>hlist_node_z | {<p>prev \n| <n>next \n}",width = 1.5];
etc_next[shape = plaintext,label = "..."]
num->head -> node_1 -> node_2 -> node_3 -> etc_next[weight = 10, style = invis];
head:h -> head:e[style = "invis"];
head:n -> node_1:m;
node_1:p -> head:m;
node_1:n -> node_2:m;
node_2:p -> node_1:m;
node_2:n -> node_3:m;
node_3:p -> node_2:m;
node_3:n -> head:f;
head:p -> node_3;
}
```
```c
int lRUCacheGet(LRUCache *obj, int key)
{
LRUNode *lru;
int hash = key % obj->capacity;
list_for_each_entry(lru, &obj->hheads[hash], hlink) { // list_for_each_entry
if (lru->key == key) {
list_move(&lru->dlink, &obj->dhead); // This node was used, move it to the head
return lru->value;
}
}
return -1;
}
```
* 當 `lRUCacheGet` 時,去 hash table 找,如果有找到的話,把它在 dlink 移到第一個,代表它最近被使用到
* 沒找到就回傳 -1
```c
void lRUCachePut(LRUCache *obj, int key, int value)
{
LRUNode *lru;
int hash = key % obj->capacity;
list_for_each_entry(lru, &obj->hheads[hash], hlink) {
if (lru->key == key) {
list_move(&lru->dlink, &obj->dhead); // This node was write again, so put it to head
lru->value = value;
return;
}
}
if (obj->count == obj->capacity) { // Cache is full
lru = list_last_entry(&obj->dhead, LRUNode, dlink); // get the LRU node (head, entry, member)
list_del(&lru->dlink); // del lru->dlink and lru->hlink
list_del(&lru->hlink);
} else {
lru = malloc(sizeof(LRUNode));
obj->count++;
}
// add lru into the LRUCache
lru->key = key;
list_add(&lru->dlink, &obj->dhead); // dlink is added in &obj->dhead
list_add(&lru->hlink, &obj->hheads[hash]); // hlink is added into &obj->hheads[hash]
lru->value = value;
}
```
* 中間是一個條件判斷,如果 cache 滿了,就把最後一個 LRUNode 給刪除
* 在兩個結構都要刪,然後空間沒有馬上 free 掉,留給要加進去的 LRUNode 使用
* 然後在兩個結構都把新的 LRUNode 都加進去
# 測驗 4 Longest Consecutive Sequence
>LLL = --left
>RRR = ++right
```c
static struct seq_node *find(int num, int size, struct list_head *heads)
{
struct seq_node *node;
int hash = num < 0 ? -num % size : num % size;
list_for_each_entry (node, &heads[hash], link) {
if (node->num == num)
return node;
}
return NULL;
}
```
到 hash table 的 slot 去找有當前 key 值的 seq_node
```c
for (int i = 0; i < n_size; i++) // init every list_head
INIT_LIST_HEAD(&heads[i]);
for (int i = 0; i < n_size; i++) {
if (!find(nums[i], n_size, heads)) { // if we do not find nums[i]
hash = nums[i] < 0 ? -nums[i] % n_size : nums[i] % n_size;
node = malloc(sizeof(*node));
node->num = nums[i];
list_add(&node->link, &heads[hash]); // add a new seq_node into the hash table
}
}
```
初始化 hash table,並把 nums[ ] 裡面的元素都加進去 hash table 裡面
```c
for (int i = 0; i < n_size; i++) {
int len = 0;
int num;
node = find(nums[i], n_size, heads); // find nums[i] from hash table
while (node) {
len++;
num = node->num; // num is the current value
list_del(&node->link); // remove the current list_head
int left = num, right = num;
while ((node = find(LLL, n_size, heads))) {
len++;
list_del(&node->link);
}
while ((node = find(RRR, n_size, heads))) {
len++;
list_del(&node->link);
}
length = len > length ? len : length; // length = max(len, length)
}
}
```
迭代 nums[ ] 裡面的數字
* 向左(比 nums[i] 小1 的數字)一個一個往下找
* 向右(比nums[i] 大1 的數字)一個一個往下找
如果有找到,`while`迴圈會繼續做,直到沒有連續數字才會停止。
## 改進程式
這個程式儘管使用 hash table,但因為有兩個迴圈存在,時間複雜度會是O(n^2^),如果先排序之後,再迭代整個 linked list,時間複雜度就是O(n*log(n))
## linux 核心的 hash table
* 利用 `hash_for_each()`
## 未完成