陳彥甫
    • Create new note
    • Create a note from template
      • Sharing URL Link copied
      • /edit
      • View mode
        • Edit mode
        • View mode
        • Book mode
        • Slide mode
        Edit mode View mode Book mode Slide mode
      • Customize slides
      • Note Permission
      • Read
        • Only me
        • Signed-in users
        • Everyone
        Only me Signed-in users Everyone
      • Write
        • Only me
        • Signed-in users
        • Everyone
        Only me Signed-in users Everyone
      • Engagement control Commenting, Suggest edit, Emoji Reply
    • Invite by email
      Invitee

      This note has no invitees

    • Publish Note

      Share your work with the world Congratulations! 🎉 Your note is out in the world Publish Note No publishing access yet

      Your note will be visible on your profile and discoverable by anyone.
      Your note is now live.
      This note is visible on your profile and discoverable online.
      Everyone on the web can find and read all notes of this public team.

      Your account was recently created. Publishing will be available soon, allowing you to share notes on your public page and in search results.

      Your team account was recently created. Publishing will be available soon, allowing you to share notes on your public page and in search results.

      Explore these features while you wait
      Complete general settings
      Bookmark and like published notes
      Write a few more notes
      Complete general settings
      Write a few more notes
      See published notes
      Unpublish note
      Please check the box to agree to the Community Guidelines.
      View profile
    • Commenting
      Permission
      Disabled Forbidden Owners Signed-in users Everyone
    • Enable
    • Permission
      • Forbidden
      • Owners
      • Signed-in users
      • Everyone
    • Suggest edit
      Permission
      Disabled Forbidden Owners Signed-in users Everyone
    • Enable
    • Permission
      • Forbidden
      • Owners
      • Signed-in users
    • Emoji Reply
    • Enable
    • Versions and GitHub Sync
    • Note settings
    • Note Insights New
    • Engagement control
    • Make a copy
    • Transfer ownership
    • Delete this note
    • Save as template
    • Insert from template
    • Import from
      • Dropbox
      • Google Drive
      • Gist
      • Clipboard
    • Export to
      • Dropbox
      • Google Drive
      • Gist
    • Download
      • Markdown
      • HTML
      • Raw HTML
Menu Note settings Note Insights Versions and GitHub Sync Sharing URL Create Help
Create Create new note Create a note from template
Menu
Options
Engagement control Make a copy Transfer ownership Delete this note
Import from
Dropbox Google Drive Gist Clipboard
Export to
Dropbox Google Drive Gist
Download
Markdown HTML Raw HTML
Back
Sharing URL Link copied
/edit
View mode
  • Edit mode
  • View mode
  • Book mode
  • Slide mode
Edit mode View mode Book mode Slide mode
Customize slides
Note Permission
Read
Only me
  • Only me
  • Signed-in users
  • Everyone
Only me Signed-in users Everyone
Write
Only me
  • Only me
  • Signed-in users
  • Everyone
Only me Signed-in users Everyone
Engagement control Commenting, Suggest edit, Emoji Reply
  • Invite by email
    Invitee

    This note has no invitees

  • Publish Note

    Share your work with the world Congratulations! 🎉 Your note is out in the world Publish Note No publishing access yet

    Your note will be visible on your profile and discoverable by anyone.
    Your note is now live.
    This note is visible on your profile and discoverable online.
    Everyone on the web can find and read all notes of this public team.

    Your account was recently created. Publishing will be available soon, allowing you to share notes on your public page and in search results.

    Your team account was recently created. Publishing will be available soon, allowing you to share notes on your public page and in search results.

    Explore these features while you wait
    Complete general settings
    Bookmark and like published notes
    Write a few more notes
    Complete general settings
    Write a few more notes
    See published notes
    Unpublish note
    Please check the box to agree to the Community Guidelines.
    View profile
    Engagement control
    Commenting
    Permission
    Disabled Forbidden Owners Signed-in users Everyone
    Enable
    Permission
    • Forbidden
    • Owners
    • Signed-in users
    • Everyone
    Suggest edit
    Permission
    Disabled Forbidden Owners Signed-in users Everyone
    Enable
    Permission
    • Forbidden
    • Owners
    • Signed-in users
    Emoji Reply
    Enable
    Import from Dropbox Google Drive Gist Clipboard
       Owned this note    Owned this note      
    Published Linked with GitHub
    1
    • Any changes
      Be notified of any changes
    • Mention me
      Be notified of mention me
    • Unsubscribe
    --- tags: linux2022 --- # 2022q1 Homework1 (quiz1) contributed by < [`qwe661234`](https://github.com/qwe661234) > [測驗題目](https://hackmd.io/@sysprog/linux2022-quiz1) ## 測驗一 Two Sum ### Hash Table Structure ![](https://i.imgur.com/5FQZ6Lo.png) hash table bucket 以 non-circular doubly-linked list 來儲存 * hlist_head: 指向 bucket list 的第一個節點 * hlist_node: bucket list 中的節點 `map_t` * bits: hash table 的 size. * ht: array of hlist_head ```c struct hlist_head { struct hlist_node *first; }; struct hlist_node { struct hlist_node *next, **pprev; }; typedef struct { int bits; struct hlist_head *ht; } map_t; ``` ### map_init Create hash table ```c #define MAP_HASH_SIZE(bits) (1 << bits) map_t *map_init(int bits) { map_t *map = malloc(sizeof(map_t)); if (!map) return NULL; map->bits = bits; map->ht = malloc(sizeof(struct hlist_head) * MAP_HASH_SIZE(map->bits)); if (map->ht) { for (int i = 0; i < MAP_HASH_SIZE(map->bits); i++) (map->ht)[i].first = NULL; } else { free(map); map = NULL; } return map; } ``` ### container_of 給定結構體成員的位址, 回推該結構體的起始位置 reference: https://hackmd.io/@sysprog/linux-macro-containerof ```c #define container_of(ptr, type, member) \ ({ \ void *__mptr = (void *) (ptr); \ ((type *) (__mptr - offsetof(type, member))); \ }) ``` ### hash function ```c #define GOLDEN_RATIO_32 0x61C88647 static inline unsigned int hash(unsigned int val, unsigned int bits) { /* High bits are more random, so use them. */ return (val * GOLDEN_RATIO_32) >> (32 - bits); } ``` ### hash key * key: value of array element * data: array index ```c struct hash_key { int key; void *data; struct hlist_node node; }; ``` ### find_key 由於不同的 key value 經過 hash funciton 後可能得到一樣的 value, 所以經過 hash 後需要到對應的 bucket 中尋找 `node->key == key` 的 node 並回傳 ```c static struct hash_key *find_key(map_t *map, int key) { struct hlist_head *head = &(map->ht)[hash(key, map->bits)]; for (struct hlist_node *p = head->first; p; p = p->next) { struct hash_key *kn = container_of(p, struct hash_key, node); if (kn->key == key) return kn; } return NULL; } ``` ### map_get 在 hash table 中尋找對應 key value 的 node ```c void *map_get(map_t *map, int key) { struct hash_key *kn = find_key(map, key); return kn ? kn->data : NULL; } ``` ### map_add 一開始會在 hash table 中尋找對應 key value 的 node, 若已存在就不重複加入 若不存在, 先配置記憶體給 new node 並設好 key 與 data, 接著以 hash funciton 得到的 value 去找出 bucket 對應的 `hlist_head`,`first` 為 bucket 中的第一個 node (或是 NULL), 接下來的操作其實就是把原本的 `first` 接在 new node 後面, 把 new node 變成 `first`. ```graphviz digraph structs { rankdir="LR" node[shape=record]; struct1 [label="0|<1>1|2|3|4|5|..."] node[shape=record]; struct2 [shape=record, label="{first|{<p>pprev|<n>next}}"]; node[shape=record]; struct3 [label= "{new node|{<p>pprev|<n>next}}"]; struct1:1 -> struct2; struct2:p -> struct1:1; struct2:n -> NULL; } ``` ```graphviz digraph structs { rankdir="LR" node[shape=record]; struct1 [label="0|<1>1|2|3|4|5|..."] node[shape=record]; struct2 [shape=record, label="{<f>first|{<p>pprev|<n>next}}"]; node[shape=record]; struct3 [label= "{<f>new node|{<p>pprev|<n>next}}"]; struct1:1 -> struct3:f; struct2:p -> struct3:n; struct3:n -> struct2:f struct3:p -> struct1:1 struct2:n -> NULL; } ``` :::info 1. AAA = `n->next = first` 將 new node 的 next 指向原本的 `first` 2. BBB = `n->pprev = &h->first` 將 new node 的 pprev 指向此 bucket 的 hlist_head ::: ```c void map_add(map_t *map, int key, void *data) { struct hash_key *kn = find_key(map, key); if (kn) return; kn = malloc(sizeof(struct hash_key)); kn->key = key, kn->data = data; struct hlist_head *h = &map->ht[hash(key, map->bits)]; struct hlist_node *n = &kn->node, *first = h->first; n->next = first; if (first) first->pprev = &n->next; h->first = n; n->pprev = &h->first; } ``` ### map_deinit 清除 hash table ```c void map_deinit(map_t *map) { if (!map) return; for (int i = 0; i < MAP_HASH_SIZE(map->bits); i++) { struct hlist_head *head = &map->ht[i]; for (struct hlist_node *p = head->first; p;) { struct hash_key *kn = container_of(p, struct hash_key, node); struct hlist_node *n = p; p = p->next; if (!n->pprev) /* unhashed */ goto bail; struct hlist_node *next = n->next, **pprev = n->pprev; *pprev = next; if (next) next->pprev = pprev; n->next = NULL, n->pprev = NULL; bail: free(kn->data); free(kn); } } free(map); } ``` ### main funciton 1. 首先建立 hash table 並配置 memory 給 return array 2. 接著 iterate array nums 3. 尋找 key vlaue 為 target - nums[i] 的 node 是否有在 hash table 中 * 找到就將 return array 的第一跟第二個 element 設為這兩個值的 array index, returnSize 設為 2 並 return * 找不到就建立 new node, key 為 nums[i], data 為 array index, 放入 hash table 中 4. 如果 nums 中任兩個 element 相加都不為 target, 則 returnSize = 0 ```c int *twoSum(int *nums, int numsSize, int target, int *returnSize) { map_t *map = map_init(10); *returnSize = 0; int *ret = malloc(sizeof(int) * 2); if (!ret) goto bail; for (int i = 0; i < numsSize; i++) { int *p = map_get(map, target - nums[i]); if (p) { /* found */ ret[0] = i, ret[1] = *p; *returnSize = 2; break; } p = malloc(sizeof(int)); *p = i; map_add(map, nums[i], p); } bail: map_deinit(map); return ret; } ``` ## Hash funciton Hash funciton: hash function 的功能是用來將 key 值轉換成對應的 index ### **Perfect function** 當所有的 key value 經過 hash funciton 轉換後, 得到 index 都不一樣時, 我們稱此 hash funciton 為 **perfect funciton**。 **Perfect funciton** 可以做到完美的 1-to-1, 也就是一組 key 對應到一組 value。 假如 hash function 的時間複雜度是 O(1), 這樣我們僅僅只需要時間複雜度 O(1) 的時間就可以查找到 hash table 中的資料。 但實際上 **perfect funciton** 很難做到, 我們不知道 key value 的 range 有多大, 所以不知道 bucket 的數量要多少才能滿足 1-to-1, 如果 hash table size 很大, 但其中有很多空間都沒有使用到也會造成不必要的浪費。由於多個 key value 可能對應到同一個 index, 所以一個 index 會對應到一個可以存數筆 data 的 bucket。 綜合以上, 我們只能透過設計 hash function, 使其盡可能接近 **perfect funciton**。 ### Hash function 考慮因素 #### 1. 計算所需的時間複雜度要低 希望 hash funciton 計算的時間複雜度為 O(1)。 #### 2. Collision 減少 Collision 是指不同的 key value 對應到相同的 index, collision 越少越接近 **perfect funciton**。 #### 3. 避免 Clustering 現象 Clustering 是指某些 bucket 存放很多筆資料, 而某些 bucket 存放的很少, 應該盡量讓 bucket 存放的資料筆數平衡, 否則容易造成 `Overflow`, 這樣會造成存取效率變差。 `Overflow` 是指 bucket 的記憶體空間滿了, 需要刪除一筆資料才能存入新的資料, `Overflow` 發生時, 決定哪一筆資料要被刪除也是重要的議題。 ### 常見 hash function 作法 #### 1. Division method h(k) = k % N 以 mod 運算後的餘數作為 index, 由於 key 值得範圍無法得知, 所以如何挑選適合的 N 很關鍵。 #### 2. Mid-square h(k) = $bits_{i,i + r - 1}(k^2)$ 將 key 值平方後, 取其 bit 第 $i$ ~ $i + r - 1$ 位, 得到的數字範圍會是 0 ~ $2^{r - 1}$, 所以 bucket 數量為 $2^r$ 即可。 例如: key value = 6, $6^2 = 36$, 假設 i = 2, r = 3, 所以取其 bit 第 2 ~ 4 位 所以將 $36_{10}$ = $100100_2$, 取第 2 ~ 4 位得 $010_2$, 所以 index = $2_{10}$。 #### 3. Folding addition 先將 key value 切割成片段後相加, 亦可以對相加後的結果做其他運算 例如: key value = 3823749374, 將此 value 每三個數字分成一段 382 | 374 | 937 | 4, 再將這四個數字相加 index = 382 + 374 + 937 + 4 = 1697 可以進一步對 1697 進行其他運算例如: mod, 反轉... #### 4. Multiplication Method 由於大部分的情況下, 我們不知道 key value 的範圍以及分佈情形, 這樣的情形適用 `Multiplication Method` `Multiplication Method` 步驟: ![](https://raw.githubusercontent.com/alrightchiu/SecondRound/master/content/Algorithms%20and%20Data%20Structures/HashTable%20series/Intro/f8.png) key value 乘上 constant A -> 取小數點部分 -> 再乘上 m -> 再取整數部分的一系列步驟讓 hash function 能夠儘量把更多的 key value 中的 bit 納入考慮而增加隨機性。 (原因參考[這篇](http://alrightchiu.github.io/SecondRound/hash-tableintrojian-jie.html#dm)) K: key value A: a constant, 且 0 < A < 1 m: bucket 數量, 且 m = $2^p$ ## Linux hash function 在 linux 的 [hash.h](https://github.com/torvalds/linux/blob/master/tools/include/linux/hash.h) 中, hash funciton 的設計是使用上面所提到的 `Multiplication Method`, 但上面提到的 hash funciton 應該是要長得像 $h(K) = \lfloor m * (KA - \lfloor KA \rfloor) \rfloor$ 這個樣子, 但在 [hash.h](https://github.com/torvalds/linux/blob/master/tools/include/linux/hash.h) 中卻用到了 bit-shifting 的方式, 這是由於上述的 funciton 與 $h(K) = K * 2^w * A >> (w - p)$ 是等價的, 而且 bit-shifting 的效率比較好, 所以以這種方式來實作。 K: key value A: a constant, 且 0 < A < 1 m: bucket 數量, 且 m = $2^p$ $w$: 一個 word 有幾個 bit ### 為什麼兩個 function 等價 我們假設一個 word 是 8 個 bit, 將 KA 的結果以 8 bit 整數配上 8 bit 浮點數表示 ```graphviz graph structs { node[shape=record]; struct1 [label="0|0|0|0|0|1|0|1"] node[shape=point] dot node[shape=record]; struct2 [label="1|1|1|0|1|0|0|0"] } ``` 根據 $h(K) = \lfloor m * (KA - \lfloor KA \rfloor) \rfloor$, 我們只需要 KA 的浮點數部分, 也就是後面的 8 個 bits, 再假設 m 是 8 = $2^3$, 所以乘上 m, 所以以上步驟等同於將前8個 bit 清除再往左移三位 ```graphviz graph structs { node[shape=record]; struct1 [label="0|0|0|0|0|1|1|1"] node[shape=point] dot node[shape=record]; struct2 [label="0|1|0|0|0|0|0|0"] } ``` 最後做下高斯只留下整數部分 ```graphviz graph structs { node[shape=record]; struct1 [label="0|0|0|0|0|1|1|1"] node[shape=point] } ``` 而這樣的操作等同於 KA 的結果先左移 8 位, 這樣前面 8 個 bit 為 KA 結果的浮點數部分, 依照前面的操作我們知道浮點數部分其實只要留下前 p 個 bit, 所以要右移 (w - p) 個 bit KA ```graphviz graph structs { node[shape=record]; struct1 [label="0|0|0|0|0|1|0|1"] node[shape=point] dot node[shape=record]; struct2 [label="1|1|1|0|1|0|0|0"] } ``` K * $2^w$ * A, 乘上 $2^w$$ 等同於左移 8 個 bit ```graphviz graph structs { node[shape=record]; struct1 [label="1|1|1|0|1|0|0|0"] node[shape=point] dot node[shape=record]; struct2 [label="0|0|0|0|0|0|0|0"] } ``` 接著右移 (w - p), 也就是 8 - 3 = 5 個 bit, 因為在 linux 的實用中是用 uint, 所以只會留下整數部分 ```graphviz graph structs { node[shape=record]; struct1 [label="0|0|0|0|0|1|1|1"] node[shape=point] } ``` ### linux 程式碼 我們擷取 32 bit 的 hash funciton 的程式碼 (64 bit 的 hash funciton 邏輯相同, 只是 word size 為 64) source: [hash.h](https://github.com/torvalds/linux/blob/master/tools/include/linux/hash.h) #### 程式碼解釋 * GOLDEN_RATIO_32: A * $2^{32}$ 後的結果 * __hash_32_generic(u32 val) * val: key value K * 這個 function 做的事情是算出 $K * 2^w * A$ 的結果 * hash_32(u32 val, unsigned int bits) * val: key value K * bits: 其實就是 p, $2^{bits}$ = bucket 數量 * 這個 function 做的事情是將 `__hash_32_generic(u32 val)` 算出的結果 $K * 2^w * A$ 右移 (32 - p) 因此, val * GOLDEN_RATIO_32 >> (32 - bits) $\equiv$ K * A * $2^w$ >> (w - p) ```c #define GOLDEN_RATIO_32 0x61C88647 #ifndef HAVE_ARCH__HASH_32 #define __hash_32 __hash_32_generic #endif static inline u32 __hash_32_generic(u32 val) { return val * GOLDEN_RATIO_32; } static inline u32 hash_32(u32 val, unsigned int bits) { /* High bits are more random, so use them. */ return __hash_32(val) >> (32 - bits); } ``` ### GOLDEN_RATIO 至於 A 為什麼使用 [golden ratio](https://zh.wikipedia.org/wiki/%E9%BB%84%E9%87%91%E5%88%86%E5%89%B2%E7%8E%87), A = $(\sqrt{5} - 1) / 2$ = 0.6180339887..., 這個數字是由 [Donald Knuth](https://en.wikipedia.org/wiki/Donald_Knuth) 所建議的, 在這份 [hunter college 的課程講義](http://www.compsci.hunter.cuny.edu/~sweiss/course_materials/csci335/lecture_notes/chapter05.pdf) 中有提出實驗結果圖, 其中當 A = [golden ratio](https://zh.wikipedia.org/wiki/%E9%BB%84%E9%87%91%E5%88%86%E5%89%B2%E7%8E%87) 時, 此 hash function 稱為 `Fibonacci hashing`, 且 key value 經此 funciton 轉換得到的 index 是相當分散的, 意即 Collision 很低。 ![](https://i.imgur.com/TU1DpWk.png) ### 比較 golden ratio 與非 golden ratio 這邊挑選了4個不同的數字作為 A * $2^{32}$, 並紀錄對 key value 0 ~ 1500 做 `hash_32(key value, 10)` 的結果 1. 0x61C88647 2. 0x80000000 3. 0x12345678 4. 0x54061094 圖的 x 軸是 key value, y 軸是 `hash_32(key value, 10)` 計算出的值 ![](https://i.imgur.com/kP2RoI9.png =50%x)![](https://i.imgur.com/BFgUtcA.png =50%x) ![](https://i.imgur.com/xEmZWP7.png =50%x)![](https://i.imgur.com/X1D9qBi.png =50%x) 由實驗個果圖可發現, golden ratio 作為 A 結果是最均勻分散的, 另外結果也證明 A 的值的對於 Collision 有很大的影響, 例如 A * $2^{32}$ = 0x80000000 時, 0 ~ 1500 只有兩種可能, 不僅每兩個數字就發生一次 Collision, 由於 data 只會存放於兩個 bucket, 因此 Cluster 程度相當嚴重。 ### Reference: * https://meteorv.dev/Data-Structure/hashing/#Collision-Overflow-%E7%9A%84%E8%99%95%E7%90%86%E6%96%B9%E6%B3%95 * http://alrightchiu.github.io/SecondRound/hash-tableintrojian-jie.html#dm * https://www.cs.hmc.edu/~geoff/classes/hmc.cs070.200101/homework10/hashfuncs.html * http://www.cs.nthu.edu.tw/~wkhon/ds/ds12/lecture/lecture18.pdf * http://www.compsci.hunter.cuny.edu/~sweiss/course_materials/csci335/lecture_notes/chapter05.pdf ## 測驗二 Remove Duplicates from Sorted List II :::info `COND1` = `COND2` = `head->next && head->val == head->next->val` ::: 邏輯是將值不重複的 node 的 next 指向下一個值不重複的 node, 忽略中間所有值重複的 node 先檢查 next node 是否存在, 接著檢查 next node 的值是否與 current node 的值重複。 遇到重複會進入 `while` loop 直到 next node 與 current node 的值不重複, 對 next node 做遞迴, 唯有值不重複的 node 才會被回傳 ```c #include <stddef.h> struct ListNode { int val; struct ListNode *next; }; struct ListNode *deleteDuplicates(struct ListNode *head) { if (!head) return NULL; if (head->next && head->val == head->next->val) { /* Remove all duplicate numbers */ while (head->next && head->val == head->next->val) head = head->next; return deleteDuplicates(head->next); } head->next = deleteDuplicates(head->next); return head; } ``` ### 非遞迴版本: 邏輯與遞迴版本類似, 以指標的指標操作可以省去檢查 current node 是否為 head #### `*cur = (*cur)->next` 與 `cur = &(*cur)->next`差異 `*cur = (*cur)->next`: 將 head or 前一個 node 的 next 指向位置改為 next node `cur = &(*cur)->next`: iterate next, 將 cur 設為 next node ```c #include <stddef.h> struct ListNode { int val; struct ListNode *next; }; struct ListNode *deleteDuplicates(struct ListNode *head) { struct ListNode **cur; cur = &head; while (*cur) { if ((*cur)->next && (*cur)->val == (*cur)->next->val) { while ((*cur)->next && (*cur)->val == (*cur)->next->val) *cur = (*cur)->next; *cur = (*cur)->next; } else { cur = &(*cur)->next; } } return head; } ``` ## circular doubly-linked list 改寫 ### doubly-linked list structure Doubly-linked list and its value is integer. ```c struct list_head { struct list_head *prev; struct list_head *next; }; typedef struct { int value; struct list_head list; } element_t; ``` ### 迭代 (iterative) version 由於 list_head 不會變, 所以不用回傳 list_head 由於 safe 會保存 next node, 所以直接刪除當前節點, `while` loop 終止時, current node 會指向一串存在重複值 node list 中的最後一個 node, 所以 `while` loop 結束後要刪除 current node ```c void deleteDuplicates(struct list_head *head) { if (!head) return; struct list_head *cur, *safe; list_for_each_safe (cur, safe, head) { if (list_entry(cur, element_t, list)->value == list_entry(safe, element_t, list)->value) { while(safe != head && list_entry(cur, element_t, list)->value == list_entry(safe, element_t, list)->value) { list_del(cur); cur = safe; safe = safe->next; } list_del(cur); } } } ``` ### 遞迴 (recursive) version 如果 current node 有重複, 先遞迴 next node, 遞迴返回後再刪除 current node, 考慮到一串存在重複值的 node list 中的最後一個 node, 所以要檢查與 previous node 的值是否相同. ```c void deleteDuplicates(struct list_head *cur, struct list_head *head) { if(cur == head) return; if ((cur->next != head && list_entry(cur, element_t, list)->value == list_entry(cur->next, element_t, list)->value) || (cur->prev != head && list_entry(cur, element_t, list)->value == list_entry(cur->prev, element_t, list)->value)) { deleteDuplicates(cur->next, head); list_del(cur); } deleteDuplicates(cur->next, head); } ``` ## 測驗三 [LeetCode 146. LRU Cache](https://leetcode.com/problems/lru-cache/) [Least recently used (LRU)](https://en.wikipedia.org/wiki/Cache_replacement_policies#Least_recently_used_(LRU)) 是一種 cache 的置換機制, 主要是當 cache 記憶體已滿, 將最近最少使用的一筆資料與新資料替換, 這樣的機制是為了降低 cache miss rate, 在這題中要替換的是使用時間距離目前最遠的那筆資料。 ### Cache Data Structure 在 cache 的實作類似測驗一的 hash table, hheads[index] 會指向對應此 index 的 bucket, 而 bucket 是以 circular doubly-linked list 來實作。 除了 hheads array 之外, 還有另外一個 dhead, 會指向一條用來紀錄的 circular doubly-linked list, 這邊紀錄的是 LRUNode 的使用情況, 最近使用的 node 會被擺在越前面, 越之前使用的 node 會被擺在越後面, 所以要替換時, 會選擇這一條 list 中最後面的 node 來做替換。 Cache 中的 node 和 dhead 所指向的 list 中的 node 是同一份。 ```c struct list_head { struct list_head *prev; struct list_head *next; }; ``` capacity: cache 容量, 即可存放的 data 筆數 count: data 目前存放了多少筆 data ```c typedef struct { int capacity, count; struct list_head dhead, hheads[]; } LRUCache; ``` hlink 用來連接 cache 中的 bucket, dlink 則是用來連接紀錄 node 使用情形的 list。 ```c typedef struct { int key, value; struct list_head hlink, dlink; } LRUNode; ``` ### lRUCacheCreate Create LRU cache, 並初始化 dhead and hhead array。 這邊的 `sizeof(*obj)` 印出來是 24, 代表的是 count(4 bytes) + capacity(4 bytes) + dhead(16 bytes) 的空間, 另外的 `capacity * sizeof(struct list_head)` 則是配置給 hheads array 的空間。 ```c LRUCache *lRUCacheCreate(int capacity) { LRUCache *obj = malloc(sizeof(*obj) + capacity * sizeof(struct list_head)); obj->count = 0; obj->capacity = capacity; INIT_LIST_HEAD(&obj->dhead); for (int i = 0; i < capacity; i++) INIT_LIST_HEAD(&obj->hheads[i]); return obj; } ``` ### lRUCacheFree :::info MMM1 = list_for_each_entry_safe ::: dhead 所指向的 list 中代表目前存在 cache 中的所有 node, 所以將這條 list 中的所有 node 釋放等同於釋放 cache, 用 `list_for_each_entry_safe` 是因為有移除 node, 所以要先紀錄下一個 node 位置。 ```c void lRUCacheFree(LRUCache *obj) { LRUNode *lru, *n; list_for_each_entry_safe (lru, n, &obj->dhead, dlink) { list_del(&lru->dlink); free(lru); } free(obj); } ``` ### lRUCacheGet :::info MMM2 = list_for_each_entry ::: key 經過 hash funciton 後, 得到對應的 index, 再以此 index 找到對應的 bucket, 由於 不同的 key 經過 hash function 可能得到相同的 index, 所以要從 bucket 中所有的 node 找到 `node>key == key` 的 node。 ```c int lRUCacheGet(LRUCache *obj, int key) { LRUNode *lru; int hash = key % obj->capacity; list_for_each_entry (lru, &obj->hheads[hash], hlink) { if (lru->key == key) { list_move(&lru->dlink, &obj->dhead); return lru->value; } } return -1; } ``` ### lRUCachePut :::info MMM3 = list_for_each_entry MMM4 = list_last_entry **MM4 的正確答案給 list_for_each_entry, 但應該是 list_last_entry, 如果是 list_for_each_entry 參數對不上** ::: key 經過 hash funciton 後, 得到對應的 index, 再以此 index 找到對應的 bucket, 接著到該 bucket 中找 `node->key == key` 的 node。 1. 有找到: 將 value 更新, 並將該 node 移到用來記錄使用情形的 list 中的第一個, 因為這個 node 當前被使用到。 2. 沒找到: 如果找不到代表需要新增 node 到 cache 中, 此時要檢查 cache 記憶體是否滿了。 * cache 記憶體已滿, 這邊的做法是先找到用來記錄使用情形的 list 中的最後一個 node, 因為這個 node 的使用時間是距離目前最遠的, 接著把他從 cache 和用來記錄使用情形的 list 中移除, 但是沒有 free 掉, 而是用同一份記憶體空間, 只是將其 key 和 value 更新, 從新放入 cache 中對應的 bucket 和用來記錄使用情形的 list 中。 * cache 記憶體還有剩餘空間的話, 則配置記憶體給新的 node 並設定好 key 和 value, 放入 cache 中對應的 bucket 和用來記錄使用情形的 list 中。 ```c void lRUCachePut(LRUCache *obj, int key, int value) { LRUNode *lru; int hash = key % obj->capacity; list_for_each_entry (lru, &obj->hheads[hash], hlink) { if (lru->key == key) { list_move(&lru->dlink, &obj->dhead); lru->value = value; return; } } if (obj->count == obj->capacity) { lru = list_last_entry(&obj->dhead, LRUNode, dlink); list_del(&lru->dlink); list_del(&lru->hlink); } else { lru = malloc(sizeof(LRUNode)); obj->count++; } lru->key = key; list_add(&lru->dlink, &obj->dhead); list_add(&lru->hlink, &obj->hheads[hash]); lru->value = value; } ``` ### Test fucntion 測試 `lRUCacheCreate` 是否有成功 ```c bool testLRUCacheCreate(LRUCache *cache) { if(!cache) return false; return true; } ``` 在 lRUCachePut 後測試 cache 是否有新增或更新該 key 所對應的 value。 ```c bool testCacheGet(LRUCache *cache, int key, int rightvalue) { if(!cache) { printf("cache is NULL"); return false; } int target = lRUCacheGet(cache, key); if(target != rightvalue) return false; return true; } ``` ## 改進 觀察程式碼可以發現這邊所使用的 hash function 是 division method, 也就是直接把 key mod cache capacity, 可是者樣的作法可能導致 collision 和 cluster。 例如: 假設 cache capacity 是 10, 而插入的 key value pair 是 (1, 1), (11, 11), (21, 21), (31, 31), (41, 41), 經過 hash funciton `key mod 10` 後, 這五個 node 得到的 index 都是 1, 而且這 5 個 node 都會接在 `hheads[1]` 之後, 因而導致存取效率變差。 我們可以考慮將 hash funciton 改成在 hash.h 中的 `hash_32`, 並比較改變前後的執行時間。 ### 實驗 本次實驗的 capacity 為 128 = $2^{7}$, bits = 7 #### hash_function_1 ```c int hash = key % obj->capacity(128); ``` #### hash_funciton_2 ```c int hash = hash_32(key, bits) % 128; ``` ```c #define GOLDEN_RATIO_32 0x61C88647 static inline u32 __hash_32_generic(u32 val) { return val * GOLDEN_RATIO_32; } static inline u32 hash_32(u32 val, unsigned int bits) { /* High bits are more random, so use them. */ return __hash_32(val) >> (32 - bits(7)); } ``` ### 實驗步驟 取一個隨機整數, 接著到 cache 中尋找, 如果沒找到就將 key 和 value 都設為此隨機整數並加入 cache 中, 重複 n 次 ( n 從 0 ~ 2000000 )。 ```c int main(int argc, char** argv) { for(int size = 0; size <= 20000000; size+= 100000) { LRUCache *cache = lRUCacheCreate(128); int rand_num; clock_t start, end; start = clock(); srand(time(NULL)); for(int i = 0; i < size; i++) { rand_num = rand(); if(!lRUCacheGet(cache, rand_num)) lRUCachePut(cache, rand_num, rand_num); } end = clock(); printf("%d %lf\n", size, (end - start) / (double)(CLOCKS_PER_SEC) * 1000); lRUCacheFree(cache); } } ``` ### 實驗結果 實驗結果圖顯示這兩個 hash function 在效能上的表現並無明顯差異 ![](https://i.imgur.com/6nGLiw8.png) ### 分析原因 我對亂數經過 hash_function 後得到的 index 紀錄並作圖, 結果圖為對 2000 個亂數做 hash funciton 後得到的 index, 對比兩張圖可發現, 其實兩個 hash function 對亂數計算後, 產生的結果分佈狀況差不多, 故推論其為效能無明顯差異的原因。 ![](https://i.imgur.com/6C9OJQ1.png) ![](https://i.imgur.com/iln1OJZ.png) ## 測驗四 [LeetCode 128. Longest Consecutive Sequence](https://leetcode.com/problems/longest-consecutive-sequence/description/) ### Data Structure 這題的 data structure 實作類似測驗一的 hash table, heads[index] 會指向對應此 index 的 bucket, 而 bucket 是以 circular doubly-linked list 來實作。 ```c struct list_head { struct list_head *prev; struct list_head *next; }; struct seq_node { int num; struct list_head link; }; ``` ### find hash table 的 hash function 是 abs(key) % size, size 是 input array 的大小, function `find` 就是在 hash table 中尋找是否存在 key = num 的 node。 ```c static struct seq_node *find(int num, int size, struct list_head *heads) { struct seq_node *node; int hash = num < 0 ? -num % size : num % size; list_for_each_entry (node, &heads[hash], link) { if (node->num == num) return node; } return NULL; } ``` ### longestConsecutive 這個 function 拆成 3 個部分來看 #### part1 建立 hash table 並初始化 ```c int hash, length = 0; struct seq_node *node; struct list_head *heads = malloc(n_size * sizeof(*heads)); for (int i = 0; i < n_size; i++) INIT_LIST_HEAD(&heads[i]); ``` #### part2 interate input array 並為 array 中的數字建立 node 加入對應的 bucket 中。 假設 inpur array = [98,1,3,2], size = 6 圖中的 head 分別有 prev 指向 bucket 的最後一個 node 和 next 指向 bucket 的第一個 node。 ```graphviz digraph structs { rankdir="LR" node[shape=record]; ht [label="0|<1>1|<2>2|<3>3|4|5"] node[shape=record]; node1 [shape=record, label="{<f>1|{<p>pprev|<n>next}}"]; node[shape=record]; node2 [label= "{<f>2|{<p>pprev|<n>next}}"]; node3 [label= "{<f>3|{<p>pprev|<n>next}}"]; node98 [label= "{<f>98|{<p>pprev|<n>next}}"]; node1:p -> ht:1; node1:n -> ht:1; node3:p -> ht:3; node3:n -> ht:3; node2:p -> ht:2; node2:n -> node98; node98:p -> node2:n; node98:n -> ht:2 } ``` ```c for (int i = 0; i < n_size; i++) { if (!find(nums[i], n_size, heads)) { hash = nums[i] < 0 ? -nums[i] % n_size : nums[i] % n_size; node = malloc(sizeof(*node)); node->num = nums[i]; list_add(&node->link, &heads[hash]); } } ``` #### part3 iterate input array 中的所有 element, 對於每一個 element 都去 hash table 找其左半邊連續數字是否存在在 hash table 中, 再搜尋其右半邊。在查找的過程中, 會把走訪過的 node 移除, 避免重複走訪的情形。 例如: 當前 iterate 到的 element 是 5, 他就會先走 4 是否存在 hash table 中, 存在就 `len++`, 接著找 3, 直到該 key 在 hash table 找不到, 接著找 6, 7 ..., 找到做 `len++`, 找不到就結束, 繼續 iterate 下一個 element。 :::info LLL = --left RRR = ++right ::: ```c for (int i = 0; i < n_size; i++) { int len = 0; int num; node = find(nums[i], n_size, heads); while (node) { len++; num = node->num; list_del(&node->link); int left = num, right = num; while ((node = find(--left, n_size, heads))) { len++; list_del(&node->link); } while ((node = find(++right, n_size, heads))) { len++; list_del(&node->link); } length = len > length ? len : length; } } ``` #### 完整 funciton ```c int longestConsecutive(int *nums, int n_size) { int hash, length = 0; struct seq_node *node; struct list_head *heads = malloc(n_size * sizeof(*heads)); for (int i = 0; i < n_size; i++) INIT_LIST_HEAD(&heads[i]); for (int i = 0; i < n_size; i++) { if (!find(nums[i], n_size, heads)) { hash = nums[i] < 0 ? -nums[i] % n_size : nums[i] % n_size; node = malloc(sizeof(*node)); node->num = nums[i]; list_add(&node->link, &heads[hash]); } } for (int i = 0; i < n_size; i++) { int len = 0; int num; node = find(nums[i], n_size, heads); while (node) { len++; num = node->num; list_del(&node->link); int left = num, right = num; while ((node = find(LLL, n_size, heads))) { len++; list_del(&node->link); } while ((node = find(RRR, n_size, heads))) { len++; list_del(&node->link); } length = len > length ? len : length; } } return length; } ``` ### 改進之處 原本的 fucntion `longestConsecutive` 在查找後將 node 從 hash table 中移除, 但移除後並未釋放記憶體, 以及在 funciton 結束後並未將配置給 heads 的記憶體釋放, 因此會造成 **memory leak**。 ### 測試 我們以執行以下程式碼並透過 `Valgrind` 分析。 ```c int main(int argc, char** argv) { int a[10] = { 0, 3, 7, 2, 5, 8, 4, 6, 0, 1 }; printf("Longest Consecutive Sequence = %d\n", longestConsecutive(a, 10)); return 0; } ``` #### 分析結果 * `indirectly lost` 中的 1 個 blocks 是我們配置給 array of heads, 我們配置 10 個 `struct list_head`, 而一個 `struct list_head` 需要 16 bytes 的空間。 * `definitely lost` 是我們配置給 node 的空間, 我們總更配置了 9 個 node (key = 0, 1, 2, 3, 4, 5, 6, 7, 8, 9), 但奇怪的地方是 `struct list_head` 需要 16 bytes, 而 `int` 需要 4 個 bytes, 總共只需 20 個 bytes, 為何印出卻是 24 個 bytes? 透過 `offsetof(struct seq_node, link)` 可以發現, link 的位移量是 8 而不是預期中的 4, 這邊推測是 memory alignment 所導致這樣的結果。 ![](https://i.imgur.com/Ixr7lF8.png) 為了驗證, 我們可以在 `struct seq_node` 的後方加上, 讓結構體的成員實際排列和空間佔用跟程式碼順序一致, 再次印出 `offsetof(struct seq_node, link)` 的結果, 可以發現結果為 4, 驗證我們的推測是正確的。 ```c struct seq_node { int num; struct list_head link; } __attribute__((packed)); ``` ##### reference: https://hackmd.io/@sysprog/linux-macro-containerof #### 改進程式碼 在移除 node 之後, 將 node 記憶體釋放, 並在走訪完每個 node 後將配置給 head 的記憶體釋放。 ```c int longestConsecutive(int *nums, int n_size) { int hash, length = 0; struct seq_node *node; struct list_head *heads = malloc(n_size * sizeof(*heads)); for (int i = 0; i < n_size; i++) INIT_LIST_HEAD(&heads[i]); for (int i = 0; i < n_size; i++) { if (!find(nums[i], n_size, heads)) { hash = nums[i] < 0 ? -nums[i] % n_size : nums[i] % n_size; node = malloc(sizeof(node)); node->num = nums[i]; list_add(&node->link, &heads[hash]); } } for (int i = 0; i < n_size; i++) { int len = 0; int num; node = find(nums[i], n_size, heads); while (node) { len++; num = node->num; list_del(&node->link); free(node); int left = num, right = num; while ((node = find(--left, n_size, heads))) { len++; list_del(&node->link); free(node); } while ((node = find(++right, n_size, heads))) { len++; list_del(&node->link); free(node); } length = len > length ? len : length; } } free(heads); return length; } ``` #### 分析結果 解決 memory leak 問題 ![](https://i.imgur.com/TbjPHj9.png) ### 以 Linux 核心風格的 hash table 重新實作 ```c #include <stddef.h> #include <stdlib.h> #include <stdio.h> struct hlist_node { struct hlist_node *next, **pprev; }; struct hlist_head { struct hlist_node *first; }; typedef struct { int size; struct hlist_head *ht; } map_t; struct hash_key { int key; struct hlist_node node; }; map_t *map_init(int size) { map_t *map = malloc(sizeof(map_t)); if (!map) return NULL; map->size = size; map->ht = malloc(sizeof(struct hlist_head) * size); if (map->ht) { for (int i = 0; i < size; i++) (map->ht)[i].first = NULL; } else { free(map); map = NULL; } return map; } #define container_of(ptr, type, member) \ ({ \ void *__mptr = (void *) (ptr); \ ((type *) (__mptr - offsetof(type, member))); \ }) static struct hash_key *find_key(map_t *map, int key) { int hash = key < 0 ? -key % map->size : key % map->size; struct hlist_head *head = &(map->ht)[hash]; for (struct hlist_node *p = head->first; p; p = p->next) { struct hash_key *kn = container_of(p, struct hash_key, node); if (kn->key == key) return kn; } return NULL; } void map_add(map_t *map, int key) { struct hash_key *kn = find_key(map, key); if (kn) return; kn = malloc(sizeof(struct hash_key)); kn->key = key; int hash = key < 0 ? -key % map->size : key % map->size; struct hlist_head *h = &map->ht[hash]; struct hlist_node *n = &kn->node, *first = h->first; n->next = first; if (first) first->pprev = &n->next; h->first = n; n->pprev = &h->first; } void map_del_node(struct hlist_node *n) { struct hlist_node *next = n->next, **pprev = n->pprev; *pprev = next; if (next) next->pprev = pprev; n->next = NULL, n->pprev = NULL; } int longestConsecutive(int *nums, int n_size) { int length = 0; struct hash_key *node; map_t *map = map_init(n_size); for (int i = 0; i < n_size; i++) { if (!find_key(map, nums[i])) { map_add(map, nums[i]); } } for (int i = 0; i < n_size; i++) { int len = 0; int num; node = find_key(map, nums[i]); while (node) { len++; num = node->key; map_del_node(&node->node); free(node); int left = num, right = num; while ((node = find_key(map, --left))) { len++; map_del_node(&node->node); free(node); } while ((node = find_key(map, ++right))) { len++; map_del_node(&node->node); free(node); } length = len > length ? len : length; } } free(map->ht); free(map); return length; } ```

    Import from clipboard

    Paste your markdown or webpage here...

    Advanced permission required

    Your current role can only read. Ask the system administrator to acquire write and comment permission.

    This team is disabled

    Sorry, this team is disabled. You can't edit this note.

    This note is locked

    Sorry, only owner can edit this note.

    Reach the limit

    Sorry, you've reached the max length this note can be.
    Please reduce the content or divide it to more notes, thank you!

    Import from Gist

    Import from Snippet

    or

    Export to Snippet

    Are you sure?

    Do you really want to delete this note?
    All users will lose their connection.

    Create a note from template

    Create a note from template

    Oops...
    This template has been removed or transferred.
    Upgrade
    All
    • All
    • Team
    No template.

    Create a template

    Upgrade

    Delete template

    Do you really want to delete this template?
    Turn this template into a regular note and keep its content, versions, and comments.

    This page need refresh

    You have an incompatible client version.
    Refresh to update.
    New version available!
    See releases notes here
    Refresh to enjoy new features.
    Your user state has changed.
    Refresh to load new user state.

    Sign in

    Forgot password
    or
    Sign in via Google Sign in via Facebook Sign in via X(Twitter) Sign in via GitHub Sign in via Dropbox Sign in with Wallet
    Wallet ( )
    Connect another wallet

    New to HackMD? Sign up

    By signing in, you agree to our terms of service.

    Help

    • English
    • 中文
    • Français
    • Deutsch
    • 日本語
    • Español
    • Català
    • Ελληνικά
    • Português
    • italiano
    • Türkçe
    • Русский
    • Nederlands
    • hrvatski jezik
    • język polski
    • Українська
    • हिन्दी
    • svenska
    • Esperanto
    • dansk

    Documents

    Help & Tutorial

    How to use Book mode

    Slide Example

    API Docs

    Edit in VSCode

    Install browser extension

    Contacts

    Feedback

    Discord

    Send us email

    Resources

    Releases

    Pricing

    Blog

    Policy

    Terms

    Privacy

    Cheatsheet

    Syntax Example Reference
    # Header Header 基本排版
    - Unordered List
    • Unordered List
    1. Ordered List
    1. Ordered List
    - [ ] Todo List
    • Todo List
    > Blockquote
    Blockquote
    **Bold font** Bold font
    *Italics font* Italics font
    ~~Strikethrough~~ Strikethrough
    19^th^ 19th
    H~2~O H2O
    ++Inserted text++ Inserted text
    ==Marked text== Marked text
    [link text](https:// "title") Link
    ![image alt](https:// "title") Image
    `Code` Code 在筆記中貼入程式碼
    ```javascript
    var i = 0;
    ```
    var i = 0;
    :smile: :smile: Emoji list
    {%youtube youtube_id %} Externals
    $L^aT_eX$ LaTeX
    :::info
    This is a alert area.
    :::

    This is a alert area.

    Versions and GitHub Sync
    Get Full History Access

    • Edit version name
    • Delete

    revision author avatar     named on  

    More Less

    Note content is identical to the latest version.
    Compare
      Choose a version
      No search result
      Version not found
    Sign in to link this note to GitHub
    Learn more
    This note is not linked with GitHub
     

    Feedback

    Submission failed, please try again

    Thanks for your support.

    On a scale of 0-10, how likely is it that you would recommend HackMD to your friends, family or business associates?

    Please give us some advice and help us improve HackMD.

     

    Thanks for your feedback

    Remove version name

    Do you want to remove this version name and description?

    Transfer ownership

    Transfer to
      Warning: is a public team. If you transfer note to this team, everyone on the web can find and read this note.

        Link with GitHub

        Please authorize HackMD on GitHub
        • Please sign in to GitHub and install the HackMD app on your GitHub repo.
        • HackMD links with GitHub through a GitHub App. You can choose which repo to install our App.
        Learn more  Sign in to GitHub

        Push the note to GitHub Push to GitHub Pull a file from GitHub

          Authorize again
         

        Choose which file to push to

        Select repo
        Refresh Authorize more repos
        Select branch
        Select file
        Select branch
        Choose version(s) to push
        • Save a new version and push
        • Choose from existing versions
        Include title and tags
        Available push count

        Pull from GitHub

         
        File from GitHub
        File from HackMD

        GitHub Link Settings

        File linked

        Linked by
        File path
        Last synced branch
        Available push count

        Danger Zone

        Unlink
        You will no longer receive notification when GitHub file changes after unlink.

        Syncing

        Push failed

        Push successfully