2022q1 Homework1 (quiz1)

# 2022q1 Homework1 (quiz1) contributed by < [haoyu0970624763](https://github.com/haoyu0970624763) > ### LeetCode 1 [TWO Sum](https://leetcode.com/problems/two-sum/) #### 解釋程式碼運作原理先來看 **`twoSum` 函式** 的程式碼 ```c= int *twoSum(int *nums, int numsSize, int target, int *returnSize) { /* 雜湊表的建立 , 設置雜湊表的 entry 總數為 2^10=1024 * 要設置除了 1024 以外的數字也可以 * 通常來說雜湊表 entry 數越多 , 平均尋找時間越短 ,但記憶體消耗越大 * */ map_t *map = map_init(10); *returnSize = 0; int *ret = malloc(sizeof(int) * 2); /* if mallocing the memory of return address fail , direct to the end of code */ if (!ret) goto bail; for (int i = 0; i < numsSize; i++) { /* 尋找當前 input 所對應之 target 之數字是否存在於雜湊表中 * 例如：要找兩數相加為9 , 當前 input 為 2 , 就要找 7 這個數字是否存在於雜湊表 * */ int *p = map_get(map, target - nums[i]); if (p) { /* found */ ret[0] = i, ret[1] = *p; *returnSize = 2; break; } /* 如果當前 input 對應 target 之數字不存在於雜湊表中 * 就把此 input 加至雜湊表裡面 * */ p = malloc(sizeof(int)); *p = i; map_add(map, nums[i], p); } bail: map_deinit(map); return ret; } ``` 第六行的 **`map_init` 函式**：建立 hash table entry 總數為 $2^{bits}$ , 此題設定成 1024 個 entry ```c #define MAP_HASH_SIZE(bits) (1 << bits) map_t *map_init(int bits) { map_t *map = malloc(sizeof(map_t)); if (!map) return NULL; map->bits = bits; map->ht = malloc(sizeof(struct hlist_head) * MAP_HASH_SIZE(map->bits)); if (map->ht) { for (int i = 0; i < MAP_HASH_SIZE(map->bits); i++) /* 初始化雜湊表中的每一個 entry */ (map->ht)[i].first = NULL; } else { free(map); map = NULL; } return map; } ``` 第十七行的 **`map_get` 函式** : 尋找特定數值是否存在於雜湊表 ```c void *map_get(map_t *map, int key) { struct hash_key *kn = find_key(map, key); return kn ? kn->data : NULL; } ``` `map_get` 函式中的 **find_key**: 判斷給定的 key 值是否存在於雜湊表 ```c /* 判斷特定 key 值是否存在於雜湊表 , 並回傳 */ static struct hash_key *find_key(map_t *map, int key) { /* hash(key, map->bits) 是計算此 key 值對應到哪個 entry * 在對應的 entry 中之串列中依序搜尋 key 值是否在串列中 * */ struct hlist_head *head = &(map->ht)[hash(key, map->bits)]; for (struct hlist_node *p = head->first; p; p = p->next) { struct hash_key *kn = container_of(p, struct hash_key, node); if (kn->key == key) return kn; } return NULL; } ``` `find_key` 函式中的 **hash** : 去計算給定值的 hash 值可參考 [黃金切割](https://zhuanlan.zhihu.com/p/40515974) , 簡單來說 GOLDEN_RATIO 可以讓 input 平均分散在雜湊表中 ```c #define GOLDEN_RATIO_32 0x61C88647 static inline unsigned int hash(unsigned int val, unsigned int bits) { /* High bits are more random, so use them. */ return (val * GOLDEN_RATIO_32) >> (32 - bits); } ``` 第二十八行的 **`map_add` 函式** : 將 input 加至雜湊表中 ```c void map_add(map_t *map, int key, void *data) { /* 先看此 key 值是否存在於雜湊表中,不在的話再繼續 */ struct hash_key *kn = find_key(map, key); if (kn) return; /* 分配空間以及將資料複製到該節點 */ kn = malloc(sizeof(struct hash_key)); kn->key = key, kn->data = data; /* h 指向雜湊表中相對應 entry 的位址 * first 指向雜湊表中相對應 entry 的首個節點位址 * n 指向剛分配的資料結構中的 node 位址 * */ struct hlist_head *h = &map->ht[hash(key, map->bits)]; struct hlist_node *n = &kn->node, *first = h->first; /* n->next = first * 把原本雜湊表中相對應 entry 的首個節點接在新分配的節點後面 * * if (first) first->pprev = &n->next * 如果原本雜湊表中相對應 entry 的首個節點存在的話 * 將他的 pprev 指向指向新分配的節點的 next 位址 * * h->first = n * 改變雜湊表中相對應 entry 所指向的首個節點 * * n->pprev = &h->first * 將新分配的節點的 pprev 指向指向雜湊表中相對應 entry 的位址 * 可搭配題目上方的圖示可以更好的理解此部份 * */ n->next = first; if (first) first->pprev = &n->next; h->first = n; n->pprev = &h->first; } ``` 第三十二行的 **`map_deinit`函式** : 把原本建立的 hash table 的空間還給記憶體 ```c void map_deinit(map_t *map) { if (!map) return; /* 造訪雜湊表中的每一個 entry*/ for (int i = 0; i < MAP_HASH_SIZE(map->bits); i++) { struct hlist_head *head = &map->ht[i]; /* 造訪雜湊表中的每一個 entry 內部連結的節點 */ for (struct hlist_node *p = head->first; p;) { struct hash_key *kn = container_of(p, struct hash_key, node); /* n 紀錄目前節點 * p 則移至下一個節點（主要是用來決定 loop 是否繼續之依據） */ struct hlist_node *n = p; p = p->next; /* 如果當前的節點的 pprev 為空(非正常行為) * 跳至 bail 標籤 , 否則的話處理以下行為 * */ if (!n->pprev) /* unhashed */ goto bail; struct hlist_node *next = n->next, **pprev = n->pprev; *pprev = next; if (next) next->pprev = pprev; n->next = NULL, n->pprev = NULL; bail: free(kn->data); free(kn); } } free(map); } ``` #### Hash Table 設計與實做手法: 採用的方式是當有兩筆或以上的資料發生衝突時採用 chain 的方式 , 把同一個 bucket 的元素用鏈結串列連接起來因此會希望 * hash collision 發生的機率降低 * 如果發生 collision 則希望隨機分佈在各 bucket 中 , 不希望集中在特定 bucket 使得 list 過長 * hash function 計算效率快參考在 hash.h 註解第 31 行中出現的文章 [Linux Kernel Hash Table Behavior: Analysis and Improvements](http://www.citi.umich.edu/techreports/reports/citi-tr-00-1.pdf) 我們可以發現本題的 hash function 採用的是 multiply and shift方法 , 由於 CPU 執行乘法運算的延遲通常低於除法運算,所以我們用此種方法取代常見的取餘數運算, 並且在上面的參考文章中的第 11 頁也有提及此觀點上述引文中的 5-1 提及 :::info The theory posits that machine multiplication by a large number that is likely to cause overflow is the same as finding the modulus by a different number ::: 我們藉由乘上一個很大的數字使其 overflow，造成類似取餘數的結果 , 但讓這個運算從原本複雜的指令變成用簡單的乘法指令達成 `0x61C88647` 則是透過文中的演算法可以求出來的結果 (引文使用的是 INT_64, 我們這邊採用的是 INT_32) 所以算出來的結果不同 , 但概念上是相同的 ### LeetCode 82 [Remove Duplicates from Sorted List II](https://leetcode.com/problems/remove-duplicates-from-sorted-list-ii/) #### 避免遞迴的版本此題的目的是將一個已經排序過的 singly-linked list 中重複的節點移除解題想法：先把 singly-linked list 中的節點分成兩種狀況 * 節點為空 * 直接回傳 NUL * 節點不為空：又可細分成兩種狀況 * 此節點之值與後面的節點值相同 , 因此需要刪除重複的值 * 此節點之值與後面的節點值不同 ```c #include <stddef.h> struct ListNode { int val; struct ListNode *next; }; struct ListNode *deleteDuplicates(struct ListNode *head) { /* node is empty*/ if (!head) return NULL; /* node is not empty*/ /* this node->val is equal to node->next->val */ if (head->next && head->val == head->next->val) { /* check there are more than one repeated number equal to head node */ while (head->next && head->val == head->next->val) head = head->next; // return the node which val is not equal to head node return deleteDuplicates(head->next); } /* this node->val is not equal to node->next->val directly return the next node */ head->next = deleteDuplicates(head->next); return head; } ``` 以上程式為考試提供之參考解答考量到 linked list 通常是用 mallloc 實做 , 若有節點不會再被使用到應呼叫 free 函式 , 以免造成 memory leak 修改上述程式的一小段部份 ```c struct ListNode *deleteDuplicates(struct ListNode *head) { /* node is empty*/ if (!head) return NULL; /* node is not empty*/ /* this node->val is equal to node->next->val */ if (head->next && head->val == head->next->val) { /* check there are more than one repeated number equal to head node */ while (head->next && head->val == head->next->val) { struct ListNode *tmp = head; head = head->next; /* free the memory allocated to the duplicate node */ free(tmp); } struct ListNode *tmp = head; head = head->next; free(tmp); // return the node which val is not equal to head node return deleteDuplicates(head); } /* this node->val is not equal to node->next->val directly return the next node */ head->next = deleteDuplicates(head->next); return head; } ``` 非遞迴版本：最一開始的解題思路： * Step1: 判斷串列是否為空,空則直接回傳NULL * Step2: 判斷串列最前端的值是否為重複(也就是會不會更改到 head指標指向的位置) , 若重複則把串列前面重複值刪除且更改 head 指向的位置 * Step3 : 檢查修改過 head 後的串列是否為空,空則直接回傳NULL * Step4 : 從被修改過的串列依序追蹤串列 , 檢查後兩個點是否有重複 , `current` 為現在跑到哪邊 , `prev` 紀錄一開始從哪個點開始追蹤 , `prev->next = current` 表示將原本的點的 next 換成沒重複的點,並回到step4直到 current 沒有下一個點為止 ```c struct ListNode *deleteDuplicates(struct ListNode *head) { if (!head) { return NULL; } /* Deal with the duplicate case appear in head */ while (head) { if (head->next && head->val == head->next->val) { while (head->next && head->val == head->next->val) { struct ListNode *tmp = head; head = head->next; /* free the memory allocated to the duplicate node */ free(tmp); } struct ListNode *tmp = head; head = head->next; free(tmp); } else { /* if no duplicate case appear in head , then out of the while loop*/ break; } } if (!head) { return NULL; } /* use current to traverse the list */ struct ListNode *current = head; /* In here , first node will definitely not equal to second node , if there * has second node */ /* if list have just one node , we just return it , otherwise we deal with it */ while (current->next) { /* Use variable prev to record now we traverse and simply check two nodes behind it are duplicate or not if two nodes behind it are duplicate , deal with it otherwise just traverse next node */ if (current->next->next && current->next->val == current->next->next->val) { struct ListNode *prev = current; int same_value = current->next->val; current = current->next; while (current && current->val == same_value) { struct ListNode *tmp = current; current = current->next; /* free the memory allocated to the duplicate node */ free(tmp); } prev->next = current; current = prev; } else { current = current->next; } } return head; } ``` 上述程式需要處理兩種不同的 case * 有改變 head 的 * 沒有改變 head 的並且程式碼整體也偏長使用 pointer to pointer 改寫老師給的範例用這樣的方式可以不用多判斷目前的節點是不是 head 這個 node ```c struct ListNode *deleteDuplicates(struct ListNode *head) { if (!head) return NULL; if(!head->next) return head; struct ListNode **ptr= &head; struct ListNode *current = head; while (current) { if (current->next && current->val == current->next->val) { current = current->next; while (current->next && current->val == current->next->val) current = current->next; current = current->next; *ptr = current; } else { ptr = &(*ptr)->next; current = current->next; } } return head; } ``` 讓此題的程式碼相比自己寫的 version 1 精簡許多 :::info 這一題 LeetCode的討論區以及其他同學寫的程式碼比較沒有在討論到要不要 free 掉不需要的 node 的空間並且 C 不像其他語言有 garbage collection , 所以我認為理論上要對不需要的 node 空間進行回收 , 但回收空間這件事本身也需要執行時間 , 因此程式會相較沒回收空間來的慢 > 搭配閱讀 ==[Hazard pointer](https://hackmd.io/@sysprog/linux2021-summer-quiz2)==: > 在沒有 garbage collection 的語言環境中，lock-free data 的主動釋放需要考慮是否有其他執行緒也同時在持有的問題，而 [hazard pointer](https://en.wikipedia.org/wiki/Hazard_pointer) (簡稱 `HP`) 是一種可以解決 lock-free data 在動態記憶體配置的問題之方法。之後探討並行程式設計時，就會實作 HP 一類的垃圾回收機制。因此，在本次測驗題中，其實就是先作預先準備。 > :notes: jserv 從不同觀點看 * 解題觀點來說 free 是拖慢效率的原因 * 系統觀點來說這樣比較安全 ::: #### circular doubly-linked list 改寫 ##### 迭代版本 ```c struct ListNode { int val; struct ListNode *next; struct ListNode *prev; }; struct ListNode *deleteDuplicates(struct ListNode *head) { if (!head) return NULL; struct ListNode **ptr= &head; struct ListNode *current = head; while (current) { if (current->next && current->val == current->next->val) { struct ListNode *tmp = current-> prev; while (current->next && current->val == current->next->val) current = current->next; tmp->next=current->next; current = current->next; current->prev = tmp; *ptr = current; } else { ptr = &(*ptr)->next; if(current->next){ current = current->next; } else{ current->next = head ; } } } return head; } ``` ### LeetCode 146 [LRU Cache](https://leetcode.com/problems/lru-cache/) #### 解析LRU Cache 程式碼實做 **Least Recently Used (LRU) cache** 需要實做下列函式 * **LRUCache(int capacity)** : Initialize the LRU cache with positive size capacity * **int get(int key)** : Return the value of the key if the key exists, otherwise return -1 * **void put(int key, int value)** : Update the value of the key if the key exists. Otherwise, add the key-value pair to the cache. If the number of keys exceeds the capacity from this operation, evict the least recently used key **get 跟 put 函式平均時間應為 $O(1)$** 此題使用 [lab0-c](https://github.com/sysprog21/lab0-c) 所提供的 [list.h](https://github.com/sysprog21/lab0-c/blob/master/list.h) 實作 linux 風格的 LRU Cache 資料結構 `LRUNode` ```c typedef struct { int key, value; struct list_head hlink, dlink; } LRUNode; ``` * key 紀錄此節點的 index 值 * value 紀錄此節點的值資料結構 `LRUCache` ```c typedef struct { int capacity, count; struct list_head dhead, hheads[]; } LRUCache; ``` * capacity 紀錄 cache 的 entry 數 * count 紀錄 cache 當前的資料數 **首先解析`lRUCacheCreate()`** `LRUCache` 須紀錄 * 1. 其 capicity 以及當前資料數 (count) 其空間需要 malloc(sizeof(*obj)) * 2. hash table 的空間 malloc(capacity * sizeof(struct list_head)) * 3. 接著就是初始化 ```c LRUCache *lRUCacheCreate(int capacity) { /* LRUCache 須紀錄 * 1.capicity 以及當前資料數 (count) * 其空間需要 malloc(sizeof(*obj)) * 2.hash table 的空間 * malloc(capacity * sizeof(struct list_head)) */ LRUCache *obj = malloc(sizeof(*obj) + capacity * sizeof(struct list_head)); obj->count = 0; obj->capacity = capacity; INIT_LIST_HEAD(&obj->dhead); for (int i = 0; i < capacity; i++) INIT_LIST_HEAD(&obj->hheads[i]); return obj; } ``` 接著從 `lRUCacheGet` 這個函式下去解析老師提供的 code ```c int lRUCacheGet(LRUCache *obj, int key) { LRUNode *lru; /* 根據給定的 key 值去計算其應該在 Cache 中的哪個 entry */ int hash = key % obj->capacity; /* 根據計算出的 hash 值去尋找 cache 中 , 此 hash 值的 entry * 所以 &obj->hheads[hash] 表示 Cache 中對應的 entry 的 head * 而同一個 hash 值的 node 會用 linked list 串接起來 , 用 hlink 串連 */ list_for_each_entry(lru, &obj->hheads[hash], hlink) { if (lru->key == key) { /* * 如果有找到相同的 key 值 , 在這個瞬間此 key 值是最近被存取到的 * 把此 key 值放到 cache 的 Least Recently node 的最前頭(最近使用) * dhead 為 cache last reference time 的串列 , 用 dlink 串接 */ list_move(&lru->dlink, &obj->dhead); return lru->value; } } return -1; } ``` 解析完 `lRUCacheGet` 函式, 即可了解 `LRUNode` 此資料結構中各自內部成員的意義 **再接著解析 `lRUCachePut()`** ```c void lRUCachePut(LRUCache *obj, int key, int value) { LRUNode *lru; /* 根據給定的 key 值去計算其應該在 Cache 中的哪個 entry */ int hash = key % obj->capacity; list_for_each_entry(lru, &obj->hheads[hash], hlink) { /* 若有相同的 key 值存在於 Cache 當中 * 則把此 key 值的 node 放到 cache 的 Least Recently node 的最前頭(最近使用) * 並修改其 value 為新 put 的 value */ if (lru->key == key) { list_move(&lru->dlink, &obj->dhead); lru->value = value; return; } } /* count 紀錄 Cache 中的資料數量 * 如果 Cache 的資料數量 < capacity 則向記憶體要求 node 的空間 * 否則的話 , 找出要取代的 node ( dhead's tail) */ if (obj->count == obj->capacity) { lru = list_last_entry(&obj->dhead, LRUNode, dlink); list_del(&lru->dlink); list_del(&lru->hlink); } else { lru = malloc(sizeof(LRUNode)); obj->count++; } lru->key = key; list_add(&lru->dlink, &obj->dhead); list_add(&lru->hlink, &obj->hheads[hash]); lru->value = value; } ``` 解析完 `lRUCachePut` 函式, 即可了解 `LRUCache` 此資料結構中各自內部成員的意義 **最後解析`lRUCacheFree()`** &obj->dhead 是根據 reference time 前後排序的 linked list , 把內部元素都 free 掉以及 LRUCache free 掉就不會造成 memory leak 的問題 ```c void lRUCacheFree(LRUCache *obj) { LRUNode *lru, *n; list_for_each_entry_safe(lru, n, &obj->dhead, dlink) { list_del(&lru->dlink); free(lru); } free(obj); } ``` #### 題目為什麼要用這些資料結構之分析 * **此題之所以要用 `hash` 是因為這樣可以使平均搜尋時間複雜度為 $O(1)$**：假設輸入的資料是平均分佈在各 entry 中 , 這樣可以使資料先找到 hash entry , 再進去 entry list 裡面搜尋 , 可以避免從 cache 的頭找到尾的 O(capacity) 時間 * **此題之所以要用 `linked list` 是因為可以使平均移動資料時間複雜度為 $O(1)$**：而不管是 put 操作或是 get 操作都會修改到 referce time , 如果是用陣列實做的話 , 移動 data的時間是 O(capacity) , 用 linked list 實做的話則是 O(1) #### 設計實驗的想法 **1. 根據測驗1的 [Hash Table 設計與實做手法](https://hackmd.io/bN7BKitBTUqzh3JGkJrMzg?view#Hash-Table-%E8%A8%AD%E8%A8%88%E8%88%87%E5%AF%A6%E5%81%9A%E6%89%8B%E6%B3%95)** 考量到 CPU 的乘法指令通常相較於除法指令迅速 , 我將原本題目中用到的 hash function (mod 取餘數) 改成像測驗一的形式以下為簡易的測試程式 ```c int main(int argc, char *argv[]) { LRUCache *test = lRUCacheCreate(1024); int IndexTable[1024]; srand(1); for (int j=0; j < 1024 ;j++ ) { int index = abs(rand()); int value = abs(rand()); IndexTable[j]=index; lRUCachePut(test, index , value); } for (int j=0; j < 1024 ;j++ ) { lRUCacheGet(test, IndexTable[j]); } lRUCacheFree(test); return 0; } ``` 由於此目標是要測驗 hash function 的效能 , 所以我將此題的測試數目控制在 LRU cache 的 entry 數量之下 , 只需執行 Put 跟 Get 的指令 , 不涉及超出 entry 數量的節點時要刪除哪個節點的選擇 ![](https://i.imgur.com/Zaho5TH.png) :::warning 文字訊息==不要==用圖片來展現，永遠要想到視覺障礙的朋友們。 :notes: jserv ::: 但測試之後發現 , 效能幾乎沒有差異 , 即使是關掉編譯器的自動優化之後 , 發現效能仍舊幾乎沒有差異所以我判斷使用 hash 的部份在 LRU cache 中並不是效能上的重點 **2.刪除多餘的程式碼** 由於 dlink 並非是動態規劃的記憶體空間 , 將 lru free 掉即可一併將 dlink free 掉 ```c void lRUCacheFree(LRUCache *obj) { LRUNode *lru, *n; list_for_each_entry_safe(lru, n, &obj->dhead, dlink) { /* list_del(&lru->dlink); */ free(lru); } free(obj); } ``` 使用指令 `valgrind 執行檔` 可發現刪除 list_del(&lru->dlink) 程式依舊沒有 memory leak **3. 檢查輸入是否合法的機制** ```c LRUCache *lRUCacheCreate(int capacity) { /* LRUCache 須紀錄 * 1.capicity 以及當前資料數 (count) * 其空間需要 malloc(sizeof(*obj)) * 2.hash table 的空間 * malloc(capacity * sizeof(struct list_head)) */ if (capacity <= 0) { exit(-1); } LRUCache *obj = malloc(sizeof(*obj) + capacity * sizeof(struct list_head)); obj->count = 0; obj->capacity = capacity; INIT_LIST_HEAD(&obj->dhead); for (int i = 0; i < capacity; i++) INIT_LIST_HEAD(&obj->hheads[i]); return obj; } int lRUCacheGet(LRUCache *obj, int key) { if (key < 0) { exit(-1); } LRUNode *lru; /* 根據給定的 key 值去計算其應該在 Cache 中的哪個 entry */ int hash = key % obj->capacity; /* 根據計算出的 hash 值去尋找 cache 中 , 此 hash 值的 entry * 所以 &obj->hheads[hash] 表示 Cache 中對應的 entry 的 head * 而同一個 hash 值的 node 會用 linked list 串接起來 , 用 hlink 串連 */ list_for_each_entry(lru, &obj->hheads[hash], hlink) { if (lru->key == key) { /* 如果有找到相同的 key 值 , 在這個瞬間此 key 是最近被存取到的 * 把此 key 值放到 cache 的 Least Recently node 的最前頭(最近使用) * dhead 為 cache last reference time 的串列 , 用 dlink 串接 */ list_move(&lru->dlink, &obj->dhead); return lru->value; } } return -1; } void lRUCachePut(LRUCache *obj, int key, int value) { if (key < 0 || value < 0) { exit(-1); } LRUNode *lru; /* 根據給定的 key 值去計算其應該在 Cache 中的哪個 entry */ int hash = Hash(key, 10); list_for_each_entry(lru, &obj->hheads[hash], hlink) { /* 若有相同的 key 值存在於 Cache 當中 * 則把此 key 值的 node 放到 cache 的 Least Recently node 的最前頭(最近使用) * 並修改其 value 為新 put 的 value */ if (lru->key == key) { list_move(&lru->dlink, &obj->dhead); lru->value = value; return; } } /* count 紀錄 Cache 中的資料數量 * 如果 Cache 的資料數量 < capacity 則向記憶體要求 node 的空間 * 否則的話 , 找出要取代的 node ( dhead's tail) */ if (obj->count == obj->capacity) { lru = list_last_entry(&obj->dhead, LRUNode, dlink); list_del(&lru->dlink); list_del(&lru->hlink); } else { lru = malloc(sizeof(LRUNode)); obj->count++; } lru->key = key; list_add(&lru->dlink, &obj->dhead); list_add(&lru->hlink, &obj->hheads[hash]); lru->value = value; } ``` #### 在 Linux 核心找出 LRU 相關程式碼並探討 [linux/include/linux/lru_cache.h](https://github.com/torvalds/linux/blob/9b76d71fa8be8c52dbc855ab516754f0c93e2980/include/linux/lru_cache.h) 以及 [linux/lib/lru_cache.c](https://github.com/torvalds/linux/blob/master/lib/lru_cache.c) 跟我們的 LRU cache 中顯而易見的不同主要有 1.在 linux 中 LRU cache 中有三種 list * in_use: currently in use (refcnt > 0, lc_number != LC_FREE) * lru: unused but ready to be reused or recycled (lc_refcnt == 0, lc_number != LC_FREE), * free: unused but ready to be recycled (lc_refcnt == 0, lc_number == LC_FREE) 每一種 list 可以視作一種狀態首先：**當 LRU cache 滿的時候 , linux 會把 cache 的一筆資料從 in_use list 放到 lru list 中** , 這樣做的目的是保存那些暫時沒使用但未來有可能使用到的資料,而我們的寫法就沒辦法保留這些資料(用記憶體空間換取未來讀取時的效率) 如果 lru list 中有放置一段時間沒被使用的資料, 會被移至 free list 中若 free list 積累了一定量的資料之後 , 會一次性的回收掉所有空間 , 而不是一個一個回收空間（相較於多次呼叫回收記憶體空間 , 一次回收多個記憶體空間的效能更好） 2.linux 中還考慮到多執行緒同時處理 LRU cache 的情況 , 因此有 lock , 我們寫的是簡化版 , 主要考慮單線程的情況可以搭配 [lru_cache.c 簡易文件](https://docs.huihoo.com/doxygen/linux/kernel/3.7/lru__cache_8c.html) 參考可以更好的理解 linux 的 lru_cache 的原始碼 ### LeetCode 128 [Longest Consecutive Sequence](https://leetcode.com/problems/longest-consecutive-sequence/) #### 改良版的LCS 用 LeetCode 的範例1 input [100,4,200,1,3,2] 來做說明如果是這個 input , 原有的 code 會先找跟 100 相鄰的數字 , 因為沒有 99 或 101 所以最大長度暫時為 1 接著找跟 4 相鄰的數字 , 發現最大長度為 4 此時程式會再找 200 周圍的數字 , **但實際上 200 周圍的數字是不需要找的** **因為根據 input 的長度來說 , 已經不可能找到比長度 4 還要更長的 sequence** 也就是說此時再繼續找 sequence 完全只是多餘的動作 **解決方法: 每次找 sequence 時 , 同時紀錄剩下的 input 數為多少個** 如果當前最大的 sequence 長度超過剩餘的長度時 , 剩下的就不用再找了 , 因為繼續找也不可能超越當前的 sequence 長度改良版附上程式解析如下 ```c int longestConsecutive(int *nums, int n_size) { int hash, length = 0; int remain = n_size ; struct seq_node *node; /* heads 為 size 為 n_size 的 hash table*/ struct list_head *heads = malloc(n_size * sizeof(*heads)); /* 初始化每一個 hash table entry */ for (int i = 0; i < n_size; i++) INIT_LIST_HEAD(&heads[i]); /* 如果 input 的數字不在相對應 hash table 的 entry 中 * 則把此數字放入 hash table entry 的串列當中 */ for (int i = 0; i < n_size; i++) { if (!find(nums[i], n_size, heads)) { /* 因為 input 不一定為正數 , 所以如果是負數要先加負號再進行 hash */ hash = nums[i] < 0 ? -nums[i] % n_size : nums[i] % n_size; node = malloc(sizeof(*node)); node->num = nums[i]; list_add(&node->link, &heads[hash]); } } for (int i = 0; i < n_size; i++) { int len = 0; int num; /* 看當前的 input 存不存在於hash table * 如果不存在表示 input 列表中有跟當前 input 相鄰的數 * 已經被算進去 sequence 長度 */ node = find(nums[i], n_size, heads); remain--; if (node) { /* 若此 input 存在於 hash table中 * 則 Longest Consecutive Sequence長度+1 * 先用 num 紀錄此數字 * 並在串列中刪除此數 , 用意是避免之後重複的搜尋 */ len++; num = node->num; list_del(&node->link); int left = num, right = num; /* 找 num 相鄰的數字(負的方向) * 找到的話在串列中刪除此數 , 用意是避免之後重複的搜尋 */ while ((node = find(--left, n_size, heads))) { len++; list_del(&node->link); remain --; } /* 找 num 相鄰的數字(正的方向) * 找到的話在串列中刪除此數 , 用意是避免之後重複的搜尋 */ while ((node = find(++right, n_size, heads))) { len++; list_del(&node->link); remain --; } /* 判斷目前搜尋的相鄰數字長度有沒有大於當前之最大值 */ length = len > length ? len : length; if(length >= remain){ return length; } } } return length; } ``` ![](https://i.imgur.com/j3JF4rN.png) 使用 LeetCode 測試也可發現這個程式碼已經快過 100 % 的 C 語言