# 2022q1 Homework1 (quiz1)
contributed by < [`linjohnss`](https://github.com/linjohnss) >
> [測驗題目](https://hackmd.io/@sysprog/linux2022-quiz1)
## 測驗 1
> Leetcode [1. Two Sum](https://leetcode.com/problems/two-sum/)
### 變數結構
變數節有 4 種:
1. `map_t` 為 hash table 的結構,包含一個紀錄 hash index 數量的 `bit` 以及指向一個 `hlist_head` 陣列的指標。
2. `hlist_node` 與 `hlist_head` 為 Linux 核心中專門為了 hash table 而設計的資料結構,由一個 `hlist_head` 連接多個 `hlist_node` 組成非環狀 doubly linked list。
3. `hash_key` 為儲存陣列值作為 `key` 以及其 index 作為 `data`。可以藉由 `container_of` 讀取。
```c
struct hlist_node { struct hlist_node *next, **pprev; };
struct hlist_head { struct hlist_node *first; };
typedef struct { int bits; struct hlist_head *ht; } map_t;
struct hash_key {
int key;
void *data;
struct hlist_node node;
};
```
### 初始化 hash table
```graphviz
digraph hash_table {
layout=fdp;
subgraph cluster_0 {
label="map_t";
style=filled;
color=lightgrey;
node [shape=record,color=black];
map_t [label="{ bits |<a> ht }", width=1, height=0.2];
}
subgraph cluster_1 {
label="struct hlist_head";
style=filled;
color=lightgrey;
node [shape=record,color=black];
hlist_head [label="{<b> first[0] | <b1> first[1] | <b2> first[2] | <b3> ... | <b4> first[\(1 \<\< bits\) - 1]}", width=1, height=0.2];
}
NULL [shape=plaintext]
{
edge[style=solid];
N1 [pos="0,4!", style=invis]
N2 [pos="2,1!", style=invis]
N3 [pos="4,4!", style=invis]
edge[style=invis, rankdir=LR]
N1 -> N2 -> N3
}
{
// rankdir=TB;
edge [style=invis]
map_t -> N1;
cluster_1 -> N2;
NULL -> N3;
}
// next
map_t:a -> hlist_head:b -> NULL
}
```
> graphviz 參考 [cy023 quiz1 開發筆記](https://hackmd.io/I3tnQ5a5SsiQuiLhW6yPLg?view)
1. 先利用 `malloc` 配置 `map_t` 空間。
2. 接著配置 `hlist_head` 指標陣列,共 `MAP_HASH_SIZE(map->bits)` 個 bucket,全指向 NULL。
3. `MAP_HASH_SIZE` 是用於取得 bucket 數量(2^bit)。
```c
#define MAP_HASH_SIZE(bits) (1 << bits)
```
```c
map_t *map_init(int bits) {
map_t *map = malloc(sizeof(map_t));
if (!map)
return NULL;
map->bits = bits;
map->ht = malloc(sizeof(struct hlist_head) * MAP_HASH_SIZE(map->bits));
if (map->ht) {
for (int i = 0; i < MAP_HASH_SIZE(map->bits); i++)
(map->ht)[i].first = NULL;
} else {
free(map);
map = NULL;
}
return map;
}
```
### hash function
```c
#define GOLDEN_RATIO_32 0x61C88647
static inline unsigned int hash(unsigned int val, unsigned int bits) {
/* High bits are more random, so use them. */
return (val * GOLDEN_RATIO_32) >> (32 - bits);
}
```
### 搜尋 hash table
```graphviz
digraph hash_table {
layout=fdp;
subgraph cluster_0 {
label="map_t";
style=filled;
color=lightgrey;
node [shape=record,color=black];
map_t [label="{ bits |<a> ht }", width=1, height=0.2];
}
subgraph cluster_1 {
label="struct hlist_head";
style=filled;
color=lightgrey;
node [shape=record,color=black];
hlist_head [label="{<b> first[0] | <b1> first[1] | <b2> first[2] | <b3> ... | <b4> first[\(1 \<\< bits\) - 1]}", width=1, height=0.2];
}
subgraph cluster_2 {
label="struct hlist_node";
style=filled;
color=lightgrey;
node [shape=record,color=black,rankdir=LR];
hlist_node0 [label="<c> pprev | <d> next", width=1, height=0.2];
}
subgraph cluster_3 {
label="struct hlist_node";
style=filled;
color=lightgrey;
node [shape=record,color=black,rankdir=LR];
hlist_node1 [label="<c> pprev | <d> next", width=1, height=0.2];
}
NULL [shape=plaintext]
{
edge[style=solid];
N1 [pos="0,4!", style=invis]
N2 [pos="2,1!", style=invis]
N3 [pos="7,1!", style=invis]
N4 [pos="10,1!", style=invis]
N5 [pos="13,1!", style=invis]
edge[style=invis, rankdir=LR]
N1 -> N2 -> N3 -> N4 -> N5
}
{
// rankdir=TB;
edge [style=invis]
map_t -> N1;
cluster_1 -> N2;
cluster_2 -> N3;
cluster_3 -> N4;
NULL -> N5;
}
// next
map_t:a -> hlist_head:b -> cluster_2
hlist_node0:d -> cluster_3
hlist_node1:d -> NULL
p -> cluster_2
// pprev
edge[splines=curved];
hlist_node0:c -> hlist_head:b [color=lightskyblue4]
hlist_node1:c -> hlist_node0:d [color=lightskyblue4]
}
```
走訪相同 index 的 list,並找出 `key` 相同的 `hash_key`,若沒找到則回傳 NULL。
```c
static struct hash_key *find_key(map_t *map, int key) {
struct hlist_head *head = &(map->ht)[hash(key, map->bits)];
for (struct hlist_node *p = head->first; p; p = p->next) {
struct hash_key *kn = container_of(p, struct hash_key, node);
if (kn->key == key)
return kn;
}
return NULL;
}
void *map_get(map_t *map, int key)
{
struct hash_key *kn = find_key(map, key);
return kn ? kn->data : NULL;
}
```
### 新增 hash table
```graphviz
digraph hash_table {
layout=fdp;
subgraph cluster_0 {
label="map_t";
style=filled;
color=lightgrey;
node [shape=record,color=black];
map_t [label="{ bits |<a> ht }", width=1, height=0.2];
}
subgraph cluster_1 {
label="struct hlist_head";
style=filled;
color=lightgrey;
node [shape=record,color=black];
hlist_head [label="{<b> first[0] | <b1> first[1] | <b2> first[2] | <b3> ... | <b4> first[\(1 \<\< bits\) - 1]}", width=1, height=0.2];
}
subgraph cluster_2 {
label="struct hlist_node";
style=filled;
color=lightgrey;
node [shape=record,color=black,rankdir=LR];
hlist_node0 [label="<c> pprev | <d> next", width=1, height=0.2];
}
subgraph cluster_3 {
label="struct hlist_node";
style=filled;
color=lightgrey;
node [shape=record,color=black,rankdir=LR];
hlist_node1 [label="<c> pprev | <d> next", width=1, height=0.2];
}
NULL [shape=plaintext]
{
edge[style=solid];
N1 [pos="0,4!", style=invis]
N2 [pos="2,1!", style=invis]
N3 [pos="7,1!", style=invis]
N4 [pos="10,1!", style=invis]
N5 [pos="13,1!", style=invis]
edge[style=invis, rankdir=LR]
N1 -> N2 -> N3 -> N4 -> N5
}
{
// rankdir=TB;
edge [style=invis]
map_t -> N1;
cluster_1 -> N2;
cluster_2 -> N3;
cluster_3 -> N4;
NULL -> N5;
}
// next
map_t:a -> hlist_head:b -> cluster_2
hlist_node0:d -> cluster_3
hlist_node1:d -> NULL
// pprev
edge[splines=curved];
hlist_node0:c -> hlist_head:b [color=lightskyblue4]
hlist_node1:c -> hlist_node0:d [color=lightskyblue4]
}
```
1. 用 `find_key` 進行搜尋,如果有找到相同的 `key` 表示已經找到答案,故 `return`。
2. 若沒找到,則配置一個 `hash_key` 空間,
```c
void map_add(map_t *map, int key, void *data)
{
struct hash_key *kn = find_key(map, key);
if (kn)
return;
kn = malloc(sizeof(struct hash_key));
kn->key = key, kn->data = data;
struct hlist_head *h = &map->ht[hash(key, map->bits)];
struct hlist_node *n = &kn->node, *first = h->first;
AAA;
if (first)
first->pprev = &n->next;
h->first = n;
BBB;
}
```
:::info
**AAA**:n->next = first
**BBB**:n->pprev = &h->first
1. 先將 `n` 放在 `first` 的前面。
2. 若 `first` 不為 NULL,則需將 `first->pprev` 指向 `&n-<next`。
3. 最後將 `h` 與 `n` 連接,即可完成 node 的插入。
```c
n->next = first;
if (first)
first->pprev = &n->next;
h->first = n;
n->pprev = &h->first;
}
```
:::
### 清除 hash table
走訪整個 hash table 並將記憶體空間釋放掉。
1. 走訪 `map->ht` 的每一個元素,用 `head` 指向該元素。
2. 走訪每一個元素的 list,用 `kn` 指向 `hash_key`、`n` 指向該 node。
3. 把 `n` 從 list remove。
4. 釋放 `kn` 的記憶體空間。
5. 走訪完整個 hash table 後,釋放 `map` 的記憶體空間。
```c
void map_deinit(map_t *map)
{
if (!map)
return;
for (int i = 0; i < MAP_HASH_SIZE(map->bits); i++) {
struct hlist_head *head = &map->ht[i];
for (struct hlist_node *p = head->first; p;) {
struct hash_key *kn = container_of(p, struct hash_key, node);
struct hlist_node *n = p;
p = p->next;
if (!n->pprev) /* unhashed */
goto bail;
struct hlist_node *next = n->next, **pprev = n->pprev;
*pprev = next;
if (next)
next->pprev = pprev;
n->next = NULL, n->pprev = NULL;
bail:
free(kn->data);
free(kn);
}
}
free(map);
}
```
### Two Sum
1. 給定一個 int 陣列 `num`、陣列大小 `numSize`、目標總和 `target` 與回傳結果的陣列大小 `returnSize`。
2. 先初始化 hash table(`map_init`),並配置結果陣列的記憶體空間 `ret`。
3. 找尋 hash table 中是否有與 `target` 差值的 index 存在。
4. 若存在,則回傳匹配的兩個元素所組成的陣列指標(`ret`)。
5. 若不存在,則將該元素加入 hash table。
6. 找完所有元素都沒有匹配的話,清除整個 hash table。
```c
int *twoSum(int *nums, int numsSize, int target, int *returnSize)
{
map_t *map = map_init(10);
*returnSize = 0;
int *ret = malloc(sizeof(int) * 2);
if (!ret)
goto bail;
for (int i = 0; i < numsSize; i++) {
int *p = map_get(map, target - nums[i]);
if (p) { /* found */
ret[0] = i, ret[1] = *p;
*returnSize = 2;
break;
}
p = malloc(sizeof(int));
*p = i;
map_add(map, nums[i], p);
}
bail:
map_deinit(map);
return ret;
}
```
### Linux 核心原始程式碼 include/linux/hashtable.h
根據 [include/linux/types.h](https://github.com/torvalds/linux/blob/master/include/linux/types.h) 中定義,`hlist_head` 以及 `hlist_node` 的結構如下:
```c
struct hlist_head {
struct hlist_node *first;
};
struct hlist_node {
struct hlist_node *next, **pprev;
};
```
宣告一個有 2^bits 個元素的 hash table,並初始化每一個元素。
```c
#define DEFINE_HASHTABLE(name, bits) \
struct hlist_head name[1 << (bits)] = \
{ [0 ... ((1 << (bits)) - 1)] = HLIST_HEAD_INIT }
```
根據 `hash.h` 的註解,提到將輸入乘以一個大的奇數並取高位,可以有效分散數列。其中 GOLDEN_RATIO `phi = (sqrt(5)-1)/2` 或其負數具有特別好的特性,而取負數 `(1 - phi) = phi**2 = (3 - sqrt(5))/2` 更方便實做乘法,且對成果不影響,故選擇以下數字作為 GOLDEN_RATIO。
```c
#define GOLDEN_RATIO_32 0x61C88647
#define GOLDEN_RATIO_64 0x61C8864680B583EBull
```
## 測驗 2
> LeetCode [82. Remove Duplicates from Sorted List II](https://leetcode.com/problems/remove-duplicates-from-sorted-list-ii/)
### recursive
```graphviz
digraph "Doubly Linked List" {
rankdir=LR;
node [shape=record];
a [label="{ <data> 1 | <ref> }"];
b [label="{ <data> 1 | <ref> }"];
c [label="{ <data> 1 | <ref> }"];
d [label="{ <data> 2 | <ref> }"];
e [label="{ <data> 3 | <ref> }"];
a:ref:c -> b:data:n [arrowhead=vee, arrowtail=dot, dir=both, tailclip=false];
b:ref:c -> c:data:n [arrowhead=vee, arrowtail=dot, dir=both, tailclip=false];
c:ref:c -> d:data:n [arrowhead=vee, arrowtail=dot, dir=both, tailclip=false];
d:ref:c -> e:data:n [arrowhead=vee, arrowtail=dot, dir=both, tailclip=false];
e:ref:c -> NULL:n [arrowhead=vee, arrowtail=dot, dir=both, tailclip=false];
}
```
```c
#include <stddef.h>
struct ListNode {
int val;
struct ListNode *next;
};
struct ListNode *deleteDuplicates(struct ListNode *head)
{
if (!head)
return NULL;
if (COND1) {
/* Remove all duplicate numbers */
while (COND2)
head = head->next;
return deleteDuplicates(head->next);
}
head->next = deleteDuplicates(head->next);
return head;
}
```
:::info
判斷 `head->val` 與 `head->neat->val` 是否相等。
COND1:head->neat && head->val == head->next->val
COND2:head->next && head->val == head->next->val
:::
### iterative
```c
struct ListNode* deleteDuplicates(struct ListNode* head){
if (!head)
return NULL;
bool found = false;
struct ListNode **target = &head;
while ((*target)->next) {
if((*target)->val == (*target)->next->val) {
struct ListNode **indirect = &head;
while (*indirect != *target)
indirect = &(*indirect)->next;
*indirect = (*target)->next;
found = true;
target = &(*indirect);
} else if (found) {
struct ListNode **indirect = &head;
while (*indirect != *target)
indirect = &(*indirect)->next;
*indirect = (*target)->next;
found = false;
target = &(*indirect);
} else {
target = &(*target)->next;
}
}
return head;
}
```
### Linux 核心
```c
struct ListNode *deleteDuplicates(struct list_head *head)
{
if (!head || list_empty(head))
return NULL;
element_t *entry, *safe;
bool found = NULL; // use left to delete the last duplicate element
list_for_each_entry_safe (entry, safe, head, list) {
if (&safe->list != head && !strcmp(entry->value, safe->value)) {
list_del(&entry->list);
free(entry->val);
free(entry)
found = true;
} else if (left) {
list_del(&entry->list);
free(entry->val);
free(entry);
found = false;
}
}
return true;
}
```
## 測驗 3
> Leetcode [146. LRU Cache](https://leetcode.com/problems/lru-cache/)
### lRUCacheCreate
1. 用 `malloc` 配置一個 `LRUCache` 空間,`capacity` 為 cache 的容量。
2. 初始化 `obj->dhead` 與 `obj->hheads`
```c
LRUCache *lRUCacheCreate(int capacity)
{
LRUCache *obj = malloc(sizeof(*obj) + capacity * sizeof(struct list_head));
obj->count = 0;
obj->capacity = capacity;
INIT_LIST_HEAD(&obj->dhead);
for (int i = 0; i < capacity; i++)
INIT_LIST_HEAD(&obj->hheads[i]);
return obj;
}
```
### lRUCacheFree
1. 用 `list_for_each_entry_safe` 走訪 `obj->dhead`,釋放 list 節點的記憶體。
2. 釋放全部節點的記憶體後,釋放整個 cache 的記憶體。
```c
void lRUCacheFree(LRUCache *obj)
{
LRUNode *lru, *n;
MMM1 (lru, n, &obj->dhead, dlink) {
list_del(&lru->dlink);
free(lru);
}
free(obj);
}
```
:::info
**MM1**:list_for_each_entry_safe
:::
### lRUCacheGet
1. 先計算 `key` 的 `hash`。
2. 用 `list_for_each_entry_safe` 走訪 `obj->hheads[hash]` 的 list。
3. 尋找相同 `key` 的節點,並將其移動至最前端,然後回傳 `lru->value`。
4. 若沒找到,則回傳 -1。
```c
int lRUCacheGet(LRUCache *obj, int key)
{
LRUNode *lru;
int hash = key % obj->capacity;
MMM2 (lru, &obj->hheads[hash], hlink) {
if (lru->key == key) {
list_move(&lru->dlink, &obj->dhead);
return lru->value;
}
}
return -1;
}
```
:::info
**MM2**:list_for_each_entry
:::
### lRUCachePut
1. 計算 `key` 的 `hash`。
2. 先利用 `list_for_each_entry` 走訪 `obj->hheads[hash]` 的 list,找到是否已經存在,若存在則用 `list_move` 把 `lru->dlink` 移到最前端。
3. 若為第一次加入,判斷 cache 是否滿了,如果沒滿,則配置 `lru` 的空間,並計算 `count++`。
4. 如果滿了,則移除 list 的最後一個節點(LRU)。
5. 將新的節點加到 list 的最前端。
```c
void lRUCachePut(LRUCache *obj, int key, int value)
{
LRUNode *lru;
int hash = key % obj->capacity;
MMM3 (lru, &obj->hheads[hash], hlink) {
if (lru->key == key) {
list_move(&lru->dlink, &obj->dhead);
lru->value = value;
return;
}
}
if (obj->count == obj->capacity) {
lru = MMM4(&obj->dhead, LRUNode, dlink);
list_del(&lru->dlink);
list_del(&lru->hlink);
} else {
lru = malloc(sizeof(LRUNode));
obj->count++;
}
lru->key = key;
list_add(&lru->dlink, &obj->dhead);
list_add(&lru->hlink, &obj->hheads[hash]);
lru->value = value;
}
```
:::info
**MM3**:list_for_each_entry
**MM4**:list_last_entry
:::
### 測試程式
```c
void show(int value, int key)
{
if (value != -1)
printf("GET [%d, %d]\n", key, value);
else
printf("Key %d not found!\n", key);
};
int main()
{
int capacity = 2;
LRUCache *cache = lRUCacheCreate(capacity);
lRUCachePut(cache, 1, 1);
lRUCachePut(cache, 2, 2);
show(lRUCacheGet(cache, 1), 1);
lRUCachePut(cache, 3, 3);
show(lRUCacheGet(cache, 2), 2);
lRUCachePut(cache, 4, 4);
show(lRUCacheGet(cache, 1), 1);
show(lRUCacheGet(cache, 3), 3);
show(lRUCacheGet(cache, 4), 4);
lRUCacheFree(cache);
return 0;
}
```
```
$ ./q3
GET [1, 1]
Key 2 not found!
Key 1 not found!
GET [3, 3]
GET [4, 4]
```
### Linux 核心 LRU ([include/linux/lru_cache.h](https://elixir.bootlin.com/linux/latest/source/include/linux/lru_cache.h))
在 `linux/mm/swap.c` 中有一個函式 `folio_add_lru`,其運用到 LRU 相關的實做。
```c
void folio_add_lru(struct folio *folio)
{
struct pagevec *pvec;
VM_BUG_ON_FOLIO(folio_test_active(folio) && folio_test_unevictable(folio), folio);
VM_BUG_ON_FOLIO(folio_test_lru(folio), folio);
folio_get(folio);
local_lock(&lru_pvecs.lock);
pvec = this_cpu_ptr(&lru_pvecs.lru_add);
if (pagevec_add_and_need_flush(pvec, &folio->page))
__pagevec_lru_add(pvec);
local_unlock(&lru_pvecs.lock);
}
EXPORT_SYMBOL(folio_add_lru);
```
> 參考資料
> https://www.kernel.org/doc/gorman/html/understand/understand013.html
> https://elixir.bootlin.com/linux/latest/source/mm/swap.c#L458
## 測驗 4
> Leetcode [128. Longest Consecutive Sequence](https://leetcode.com/problems/longest-consecutive-sequence/description/)
### find
1. 計算 `hash`,需取絕對值。
2. 走訪 `heads[hash]` 找到是否有相同的值。
3. 若有則回傳該 `node` ; 否則回傳 NULL。
```c
static struct seq_node *find(int num, int size, struct list_head *heads)
{
struct seq_node *node;
int hash = num < 0 ? -num % size : num % size;
list_for_each_entry (node, &heads[hash], link) {
if (node->num == num)
return node;
}
return NULL;
}
```
### longestConsecutive
1. 先配置 `head` 的記憶體空間(指標陣列),並初始化每個元素。
2. 將每一個 input 元素加入到 hash table(若已經存在,則不加入)。
3. 尋找最長的連續數列
1. 先判斷該數字是否存在 hash table
2. 若存在,則向它的左右節點開始尋找最常的連續數列
3. 判斷是否為最常的連續數列
4. 回傳最長的數列
```c
int longestConsecutive(int *nums, int n_size)
{
int hash, length = 0;
struct seq_node *node;
struct list_head *heads = malloc(n_size * sizeof(*heads));
for (int i = 0; i < n_size; i++)
INIT_LIST_HEAD(&heads[i]);
for (int i = 0; i < n_size; i++) {
if (!find(nums[i], n_size, heads)) {
hash = nums[i] < 0 ? -nums[i] % n_size : nums[i] % n_size;
node = malloc(sizeof(*node));
node->num = nums[i];
list_add(&node->link, &heads[hash]);
}
}
for (int i = 0; i < n_size; i++) {
int len = 0;
int num;
node = find(nums[i], n_size, heads);
while (node) {
len++;
num = node->num;
list_del(&node->link);
int left = num, right = num;
while ((node = find(LLL, n_size, heads))) {
len++;
list_del(&node->link);
}
while ((node = find(RRR, n_size, heads))) {
len++;
list_del(&node->link);
}
length = len > length ? len : length;
}
}
return length;
}
```
:::info
**LLL**: --left
**RRR**:++right
:::
### 測試程式
```c
int main()
{
int num[10] = {0,3,7,2,5,8,4,6,0,1};
int *ptr = num;
int length = longestConsecutive(num, 10);
printf("%d\n", length);
return 0;
}
```
```
$ ./q4
9
```
###### tags: `linux kernel` `linux2022`