---
tags: linux2022
---
# 2022q1 Homework1 (quiz1)
> [作業要求](https://hackmd.io/@sysprog/B166rc3Jq)
> [quiz1](https://hackmd.io/@sysprog/linux2022-quiz1)
### 測驗`1` Two Sum 以 Linux 核心風格的 hash table 解之
#### 說明及補完程式碼
##### 結構
參考 Linux 原始碼中 type.h :
```c
struct list_head {
struct list_head *next, *prev; // circular doubly-linked list
};
struct hlist_head {
struct hlist_node *first; // linked list
};
struct hlist_node {
struct hlist_node *next, **pprev; // doubly-linked list
};
```
```c
struct hlist_node { struct hlist_node *next, **pprev; };
struct hlist_head { struct hlist_node *first; };
typedef struct { int bits; struct hlist_head *ht; } map_t;
#define MAP_HASH_SIZE(bits) (1 << bits)
map_t *map_init(int bits) {
map_t *map = malloc(sizeof(map_t));
if (!map)
return NULL;
map->bits = bits;
map->ht = malloc(sizeof(struct hlist_head) * MAP_HASH_SIZE(map->bits));
if (map->ht) {
for (int i = 0; i < MAP_HASH_SIZE(map->bits); i++)
(map->ht)[i].first = NULL;
} else {
free(map);
map = NULL;
}
return map;
}
struct hash_key {
int key;
void *data;
struct hlist_node node;
};
```
```graphviz
digraph structs {
rankdir=LR
node [shape=record]
map [label="map_t|{bits|<ht>ht}"]
hlist_head [label="{{Hash Table|<h1>hlist_head[0]-first|<h2>hlist_head[1]-first|...|<hn>hlist_head[2^bits-1]-first}}"]
node1 [label="{hash_key |{key|data|<h_node>hlist_node|{ <p>pprev| <next>next}}}"]
node2 [label="{hash_key |{key|data|<h_node>hlist_node|{ <p>pprev| <next>next}}}"]
node3 [label="{NULL}"]
node4 [label="{hash_key |{key|data|<h_node>hlist_node|{ <p>pprev| <next>next}}}"]
node5 [label="{NULL}"]
node6 [label="{seq_node6|{key|data|<h_node>hlist_node|{ <p>pprev| <next>next}}}"]
node7 [label="{NULL}"]
map -> hlist_head -> node1 -> node2 -> node3[weight = 10, style = invis];
hlist_head -> node4 -> node5[weight = 10, style = invis];
hlist_head -> node6[weight = 10, style = invis];
map:ht -> hlist_head:h1;
hlist_head :h1 -> node1:h_node:t;
node1:p -> hlist_head:h1:t;
node1:next -> node2:h_node:w;
node2:p -> node1:h_nod;
node2:next -> node3:h_node:w;
hlist_head :h2 -> node4:h_node:t;
node4:p -> hlist_head:h2:t;
node4:next -> node5:h_node:t;
hlist_head :hn -> node6:h_node:t;
node6:p -> hlist_head:hn:t;
node6:next -> node7:h_node:w;
}
```
* 由 `map_t` 裡的 `hlist_head` array 當作 bucket,每個 bucket 裝有一串結構體:`hash_key`,並由 `hash_key` 裡的 `hlist_node` 串起。
* `1 << bits` 代表可以有多少不同的十進制數字,也就是 hash table 有多少 bucket(`hlist_head`[] size)。
##### 在 hash table 裡找 key 及 value
```c=
#define container_of(ptr, type, member) \
({ \
void *__mptr = (void *) (ptr); \
((type *) (__mptr - offsetof(type, member))); \
})
#define GOLDEN_RATIO_32 0x61C88647
static inline unsigned int hash(unsigned int val, unsigned int bits) {
/* High bits are more random, so use them. */
return (val * GOLDEN_RATIO_32) >> (32 - bits);
}
static struct hash_key *find_key(map_t *map, int key) {
struct hlist_head *head = &(map->ht)[hash(key, map->bits)];
for (struct hlist_node *p = head->first; p; p = p->next) {
struct hash_key *kn = container_of(p, struct hash_key, node);
if (kn->key == key)
return kn;
}
return NULL;
}
void *map_get(map_t *map, int key)
{
struct hash_key *kn = find_key(map, key);
return kn ? kn->data : NULL;
}
```
* 在 `find_key` 函式中,將 key 代入 hash 函式找出對應的 hash 值,來得到該 bucket(`hlist_head`),再由 `hlist_head->first` 開始迭代 `hlist_node`。
* 透過 `container_of` 找出 結構體 `hash_key` 地址,便可以找到該結構體的 key。
##### 在 hash table 裡儲存 key 及 value
```c=
void map_add(map_t *map, int key, void *data)
{
struct hash_key *kn = find_key(map, key);
if (kn)
return;
kn = malloc(sizeof(struct hash_key));
kn->key = key, kn->data = data;
struct hlist_head *h = &map->ht[hash(key, map->bits)];
struct hlist_node *n = &kn->node, *first = h->first;
AAA;
if (first)
first->pprev = &n->next;
h->first = n;
BBB;
}
```
* 第 7 行配置一個結構體空間:`hash_key`,第 10 行找到該b ucket(`hlist_head`):`h`,第 11 行找到該新增的結構體裡的 `hlist_node`:`n`。
* 接下來要處理新增的 `hlist_node` 的 `next` 及 `pprev`,新增的 `hlist_node` 會直接插入 bucket 的第一個位置, 因此將新增的 `hlist_node->next` 指向 `hlist_head` 現在的 `first`,`AAA` = `n->next = first`。
第 13-15 行,若 `hlist_head` 裡有 `hlist_node`,`hlist_head` 現在的 `first->pprev` 指向前一個元素的記憶體位址本身:`&n->next`==(為何不指向 &n?)==。`hlist_head` 現在的 `first` 替換成新增的 `hlist_node`。
新增的 `hlist_node->pprev` 指向前一個元素的記憶體位址本身,因此 `BBB` = `n->pprev = &h->first` ==(為何不指向 &h?)==。
```graphviz
digraph structs{
rankdir = LR
node[shape=record]
h [label="<a>hlist_head:h|<f>first"]
n [label="<a>hlist_node:n|{<f>pprev|<n>next}"]
o [label="<a>first|{<f>pprev|next}"]
n:n -> o:a;
o:f -> n:n;
h:f -> n:a;
n:f -> h;
}
```
:::info
上述 highlight 處,我認為有可能的原因在於方便其他操作,如下方 `map_deinit` 函式裡第 18 行,只需 dereference `pprev` 就可以拿到指向下一個 node 的指標。
:::
##### 釋放 map 記憶體
```c=
void map_deinit(map_t *map)
{
if (!map)
return;
for (int i = 0; i < MAP_HASH_SIZE(map->bits); i++) {
struct hlist_head *head = &map->ht[i];
for (struct hlist_node *p = head->first; p;) {
struct hash_key *kn = container_of(p, struct hash_key, node);
struct hlist_node *n = p;
p = p->next;
if (!n->pprev) /* unhashed */
goto bail;
// Remove n from the list
struct hlist_node *next = n->next, **pprev = n->pprev;
*pprev = next;
if (next)
next->pprev = pprev;
n->next = NULL, n->pprev = NULL;
bail:
free(kn->data);
free(kn);
}
}
free(map);
}
```
* 釋放資源的順序為 hash table 裡的每個 bucket -> bucket 裡的每個 hash_key 及 hlist_node。
* ==第 13 行,unhashed 意思?==
* 第 17 至 21 行將 node:n 移出 bucket,把 n 的前一個節點的 next 指到 n 的下一個節點,且把 n 的下個節點的 pprev 指向 n 的前一個節點。
##### Two Sum 主程式
```c
int *twoSum(int *nums, int numsSize, int target, int *returnSize)
{
map_t *map = map_init(10);
*returnSize = 0;
int *ret = malloc(sizeof(int) * 2);
if (!ret)
goto bail;
for (int i = 0; i < numsSize; i++) {
int *p = map_get(map, target - nums[i]);
if (p) { /* found */
ret[0] = i, ret[1] = *p;
*returnSize = 2;
break;
}
p = malloc(sizeof(int));
*p = i;
map_add(map, nums[i], p);
}
bail:
map_deinit(map);
return ret;
}
```
* 初始化 hash table,若無法配置回傳結果的空間,便釋放掉 hash table。
* 在 hash table 中找出 nums 裡對應的 two sum 值,找完後便釋放 hash table,若其中有任一 nums 找不到,便將該 key(nums 值)及 value(index)存入 hash map 中,讓該 nums 值對應的 two sum 值之後可以找到該 nums 值。
:::danger
疑問:上述 highlight 處
:::
:::warning
TODO:
* 研讀 Linux 核心原始程式碼 [include/linux/hashtable.h](https://github.com/torvalds/linux/blob/master/include/linux/hashtable.h) 及對應的文件 [How does the kernel implements Hashtables?](https://kernelnewbies.org/FAQ/Hashtables),解釋 hash table 的設計和實作手法,並留意到 [tools/include/linux/hash.h](https://github.com/torvalds/linux/blob/master/tools/include/linux/hash.h) 的 GOLDEN_RATIO_PRIME,探討其實作考量
:::
### 測驗`2` Remove Duplicates from Sorted List
#### 說明及補完程式碼
針對 [LeetCode 82. Remove Duplicates from Sorted List II](https://leetcode.com/problems/remove-duplicates-from-sorted-list-ii/),以下是可能的合法 C 程式實作:
```c
#include <stddef.h>
struct ListNode {
int val;
struct ListNode *next;
};
struct ListNode *deleteDuplicates(struct ListNode *head)
{
if (!head)
return NULL;
if (COND1) {
/* Remove all duplicate numbers */
while (COND2)
head = head->next;
return deleteDuplicates(head->next);
}
head->next = deleteDuplicates(head->next);
return head;
}
```
此遞迴函式的終止條件分成三部份:
1. 當前節點為空時,回傳空節點。
2. 當前節點的下個節點不為空,且當前節點的值等於下個節點的值,當前節點會往下迭代,如此可以跳過重複的節點。
```graphviz
digraph struct{
label = "at the beginning";
node[shape=box];
a1 [label="a"];
a2 [label="a"];
a3 [label="a"];
b [label="b"];
c [label="c"];
{
head[shape=plain]
rank = same;
head [label="head"];
head-> a1[arrowhead=vee];
}
rankdir = LR;
a1 -> a2 -> a3 -> b -> c;
}
```
```graphviz
digraph struct{
label = "after iteration";
node[shape=box];
a1 [label="a"];
a2 [label="a"];
a3 [label="a"];
b [label="b"];
c [label="c"];
{
head[shape=plain]
rank = same;
head [label="head"];
head-> a3[arrowhead=vee];
}
rankdir = LR;
a1 -> a2 -> a3 -> b -> c;
}
```
```graphviz
digraph struct{
label = "function call";
node[shape=box];
a1 [label="a"];
a2 [label="a"];
a3 [label="a"];
b [label="b"];
c [label="c"];
{
head[shape=plain]
rank = same;
head [label="head"];
head-> a3[arrowhead=vee];
}
{
entry[shape=plain]
rank = same;
entry [label="function entry"];
entry-> b[arrowhead=onormal];
}
rankdir = LR;
a1 -> a2 -> a3 -> b -> c;
}
```
回傳下個節點為首的 linked-list 。
所以 `COND1` 以及 `COND2` = `head->next && head->val == head->next->val` 。
3. 當前節點的下個節點不為空,且當前節點的值與下個節點的值不同,以當前節點的下個節點為首呼叫此遞迴函式,來刪掉下個節點為首的 linked list 裡重複的節點。
```graphviz
digraph struct{
label = "at the beginning";
node[shape=box];
a [label="a"];
b1 [label="b"];
b2 [label="b"];
b3 [label="b"];
c [label="c"];
{
head[shape=plain]
rank = same;
head [label="head"];
head-> a[arrowhead=vee];
}
rankdir = LR;
a -> b1 -> b2 -> b3 -> c;
}
```
```graphviz
digraph struct{
label = "function call";
node[shape=box];
a [label="a"];
b1 [label="b"];
b2 [label="b"];
b3 [label="b"];
c [label="c"];
{
head[shape=plain]
rank = same;
head [label="head"];
head-> a[arrowhead=vee];
}
{
entry[shape=plain]
rank = same;
entry [label="function entry"];
entry-> b1[arrowhead=onormal];
}
rankdir = LR;
a -> b1 -> b2 -> b3 -> c;
}
```
回傳當前節點為首的 linked-list 。
#### 嘗試避免遞迴,寫出同樣作用的程式碼
```c
struct ListNode* deleteDuplicates(struct ListNode* head){
if (!head)
return NULL;
struct ListNode* iterator = head;
struct ListNode* prev = NULL;
bool isDuplicated = 0;
while (iterator) {
if (iterator->next && iterator->val == iterator->next->val) {
iterator->next = iterator->next->next;
isDuplicated = 1;
} else {
if (isDuplicated && prev != NULL) {
prev->next = iterator->next;
isDuplicated = 0;
} else if (isDuplicated && prev == NULL) {
head = head->next;
prev = NULL;
isDuplicated = 0;
} else {
prev = iterator;
}
iterator = iterator->next;
}
}
return head;
}
```
想法是迭代整個 linked list,只要遇到連續同樣值的節點,便將「出現同樣值的第一個節點」的 `next` 指向下下一個節點。
要注意的是需要移除所有重複的節點,所以「出現同樣值的第一個節點」也必需被移除。而這裡的 linked list 為單向,所以需要宣告新的變數來儲存前一個節點,來讓「出現同樣值的第一個節點」被其前節點跳過。
:::warning
TODO:
* 以類似 Linux 核心的 circular doubly-linked list 改寫,撰寫遞迴和迭代 (iterative) 的程式碼
:::
### 測驗`3` LRU Cache
#### 說明及補完程式碼
針對 [LeetCode 146. LRU Cache](https://leetcode.com/problems/lru-cache/),以下是 [Least Recently Used (LRU)](https://en.wikipedia.org/wiki/Cache_replacement_policies#Least_recently_used_(LRU)) 可能的合法 C 程式實作
補完程式碼 `MMM1`, `MMM2`, `MMM3`, `MMM4`,都是 [Linux 核心風格的 list 巨集](https://github.com/sysprog21/lab0-c/blob/master/list.h),以 list_ 開頭
以下為參考 [`jim12312321`](https://hackmd.io/@jim12312321/H1ml7rzlq#%E6%B8%AC%E9%A9%973-%EF%BC%9A-%E5%AF%A6%E4%BD%9C-LeetCode-146) 且經消化過的內容
##### 結構
```c
struct list_head {
struct list_head *prev;
struct list_head *next;
};
```
```c
typedef struct {
int capacity, count;
struct list_head dhead, hheads[];
} LRUCache;
```
* capacity:表示 `LRUCache` 裡 `hheads` 有多大,也就是 LRUNode 的上限
* count:表示目前有幾個 `LRUNode`
* dhead:紀錄最近被使用次序的 `LRUNode` 的 head
* hheads:根據不同 hash 值紀錄 `LRUNode` 的 head
```c
typedef struct {
int key, value;
struct list_head hlink, dlink;
} LRUNode;
```
* key:`LRUNode` 的編號
* value:`LRUNode` 的值
* hlink:接在 `hheads[hash值]` 後面
* dlink:接在 `dhead` 後面,越前面代表越近被使用過
```graphviz
digraph structs {
rankdir=LR
node [shape=record]
cache [label="{LRUCache|{capacity = 2|count|<dh> dhead|<hh0> hheads[0]|<hh1> hheads[1]}}"]
node1 [label="{LRUNode|{key|value |<d> dlink|<h> hlink}}"]
node2 [label="{LRUNode|{key|value |<d> dlink|<h> hlink}}"]
cache -> node1 -> node2[weight = 10, style = invis];
cache:dh -> node1:d:w[color="darkgreen"];
cache:hh0 -> node1:h:w;
cache:hh1 -> node2:h:w;
node1:d -> node2:d:w[color="darkgreen"];
}
```
##### 建立 LRU Cache
```c
LRUCache *lRUCacheCreate(int capacity)
{
LRUCache *obj = malloc(sizeof(*obj) + capacity * sizeof(struct list_head));
obj->count = 0;
obj->capacity = capacity;
INIT_LIST_HEAD(&obj->dhead);
for (int i = 0; i < capacity; i++)
INIT_LIST_HEAD(&obj->hheads[i]);
return obj;
}
```
因為 `hheads` 是 array,得知 capacity 時需配置另外的空間。
`INIT_LIST_HEAD` 為將結點設為雙向鏈結串列。
##### 釋放 LRU Cache
```c
void lRUCacheFree(LRUCache *obj)
{
LRUNode *lru, *n;
MMM1 (lru, n, &obj->dhead, dlink) {
list_del(&lru->dlink);
free(lru);
}
free(obj);
}
```
`MMM1` = `list_for_each_entry_safe`,透過此 preprocessor directive 可以取得每個 `LRUNode->dlink` 的地址,其定義為:
```c
/**
* list_for_each_entry_safe - iterate over list entries and allow deletes
* @entry: pointer used as iterator
* @safe: @type pointer used to store info for next entry in list
* @head: pointer to the head of the list
* @member: name of the list_head member variable in struct type of @entry
*
* The current node (iterator) is allowed to be removed from the list. Any
* other modifications to the the list will cause undefined behavior.
*
* FIXME: remove dependency of __typeof__ extension
*/
#define list_for_each_entry_safe(entry, safe, head, member) \
for (entry = list_entry((head)->next, __typeof__(*entry), member), \
safe = list_entry(entry->member.next, __typeof__(*entry), member); \
&entry->member != (head); entry = safe, \
safe = list_entry(safe->member.next, __typeof__(*entry), member))
```
entry:由 `list_head` 串起的主體,在這裡即 `LRUNode`
head:整串 list,在這裡即 `LRUCache` 的 `dhead`
採用 safe 的版本是因為需要存下一個 node 用以接著 delete
##### 從 LRU Cache 取得資訊
```c
int lRUCacheGet(LRUCache *obj, int key)
{
LRUNode *lru;
int hash = key % obj->capacity;
MMM2 (lru, &obj->hheads[hash], hlink) {
if (lru->key == key) {
list_move(&lru->dlink, &obj->dhead);
return lru->value;
}
}
return -1;
}
```
`MMM2` = `list_for_each_entry`,透過此可以取得 `hlink` 的地址,其作用為 `list_for_each_entry_safe` 沒有儲存下一個 node 的版本:
```c
#ifdef __LIST_HAVE_TYPEOF
#define list_for_each_entry(entry, head, member) \
for (entry = list_entry((head)->next, __typeof__(*entry), member); \
&entry->member != (head); \
entry = list_entry(entry->member.next, __typeof__(*entry), member))
#endif
```
entry:由 `list_head` 串起的主體,在這裡即 `LRUNode`
head:整串 list,在這裡即 `obj->hheads[hash值]`
`lRUCacheGet` 目的在於依據 `key` 值,將對應的 `LRUNode` 取出,並透過 `list_move` 將此 `LRUNode` 移到 `dhead` 的開頭,代表目前最常被使用過。
##### 放置資訊至 LRU Cache
```c
void lRUCachePut(LRUCache *obj, int key, int value)
{
LRUNode *lru;
int hash = key % obj->capacity;
MMM3 (lru, &obj->hheads[hash], hlink) {
if (lru->key == key) {
list_move(&lru->dlink, &obj->dhead);
lru->value = value;
return;
}
}
if (obj->count == obj->capacity) {
lru = MMM4(&obj->dhead, LRUNode, dlink);
list_del(&lru->dlink);
list_del(&lru->hlink);
} else {
lru = malloc(sizeof(LRUNode));
obj->count++;
}
lru->key = key;
list_add(&lru->dlink, &obj->dhead);
list_add(&lru->hlink, &obj->hheads[hash]);
lru->value = value;
}
```
`MMM3` = `list_for_each_entry`,解釋同`MMM2`
`MMM4` = `list_last_entry`,作用為取出整串 `dhead` 的最後一個 `LRUNode->dlink` 的地址,其定義為:
```c
/**
* list_last_entry() - get last entry of the list
* @head: pointer to the head of the list
* @type: type of the entry containing the list node
* @member: name of the list_head member variable in struct @type
*
* Return: @type pointer of last entry in list
*/
#define list_last_entry(head, type, member) \
list_entry((head)->prev, type, member)
```
`lRUCachePut` 目的為當有找到對應 `key` 值的 `LRUNode` 時將其值取代,並將此 `LRUNode` 移到 `dhead` 的開頭,代表目前最常被使用過。
倘若 `LRUCache` 沒有空間時,利用 `list_last_entry` 刪掉 `dhead` 最後一個節點(代表目前最不常使用的)。若仍有空間,`配置一個` `LRUNode` 的空間。最後更新 `LRUNode` 所需資訊。
:::warning
TODO:
* 撰寫完整的測試程式, 指出其中可改進之處並實作
* 在 Linux 核心找出 LRU 相關程式碼並探討
:::
### 測驗`4` Longest Consecutive Sequence
#### 說明及補完程式碼
針對 [LeetCode 128. Longest Consecutive Sequence](https://leetcode.com/problems/longest-consecutive-sequence/description/),以下是可能的合法 C 程式實作:
```c
#include <stdio.h>
#include <stdlib.h>
#include "list.h"
struct seq_node {
int num;
struct list_head link;
};
static struct seq_node *find(int num, int size, struct list_head *heads)
{
struct seq_node *node;
int hash = num < 0 ? -num % size : num % size;
list_for_each_entry (node, &heads[hash], link) {
if (node->num == num)
return node;
}
return NULL;
}
int longestConsecutive(int *nums, int n_size)
{
int hash, length = 0;
struct seq_node *node;
struct list_head *heads = malloc(n_size * sizeof(*heads));
for (int i = 0; i < n_size; i++)
INIT_LIST_HEAD(&heads[i]);
for (int i = 0; i < n_size; i++) {
if (!find(nums[i], n_size, heads)) {
hash = nums[i] < 0 ? -nums[i] % n_size : nums[i] % n_size;
node = malloc(sizeof(*node));
node->num = nums[i];
list_add(&node->link, &heads[hash]);
}
}
for (int i = 0; i < n_size; i++) {
int len = 0;
int num;
node = find(nums[i], n_size, heads);
while (node) {
len++;
num = node->num;
list_del(&node->link);
int left = num, right = num;
while ((node = find(LLL, n_size, heads))) {
len++;
list_del(&node->link);
}
while ((node = find(RRR, n_size, heads))) {
len++;
list_del(&node->link);
}
length = len > length ? len : length;
}
}
return length;
}
```
根據程式碼畫出結構示意圖:
```graphviz
digraph structs {
rankdir=LR
node [shape=record]
heads [label="{heads|{<h1>[hash1]|<h2>[hash2]|...|<hn>[hashn]}}"]
node1 [label="{seq_node1|{num|<link>link}}"]
node2 [label="{seq_node2|{num|<link>link}}"]
node3 [label="{seq_node3|{num|<link>link}}"]
node4 [label="{seq_node4|{num|<link>link}}"]
node5 [label="{seq_node5|{num|<link>link}}"]
node6 [label="{seq_node6|{num|<link>link}}"]
heads -> node1 -> node2 -> node3[weight = 10, style = invis];
heads -> node4 -> node5[weight = 10, style = invis];
heads -> node6[weight = 10, style = invis];
heads:h1 -> node1:link:w;
node1:link -> node2:link:w;
node2:link -> node3:link:w;
heads:h2 -> node4:link:w;
node4:link -> node5:link:w;
heads:hn -> node6:link:w;
}
```
`find` 函式裡的參數 `size` 即為 `heads` 的大小。根據參數 `num` 做模餘對應到 hash 串列開頭,透過此串列(`link`)逐一尋訪 `seq_node`,試圖找到有著 `num` 的 `seq_node`。
`longestConsecutive` 函式在開始時配置 `n_size` 的 `heads` 空間,並根據 `nums` array 裡的值,逐一配置 `seq_node` 的空間再將其串在對應 hash 值的 `heads` 後。
最後一個 for loop 即在找尋`nums` array 裡有幾個連續數值。迭代`nums` array,將每個值連續相加及連續相減直到找不到相應值的 `seq_node` 為止,過程中將長度記下。且在找到相應值的 `seq_node` 時透過 `list_del` 將該 `seq_node` 的 `link` 從 `heads` 串列中移除,避免接下來的尋訪再次重複。
因為在此 for loop 中任意 `nums` array 的值已經找過,接下來要分別往上及往下尋找,因此`LLL` = `--left`,`RRR` = `++right`。
:::warning
TODO:
* 撰寫完整的測試程式, 指出其中可改進之處並實作
* 嘗試用 Linux 核心風格的 hash table 重新實作上述程式碼
:::