2022q1 Homework1 (quiz1-1)

# 2022q1 Homework1 (quiz1-1) 題目 : [2022q1 第 1 週測驗題](/@sysprog/linux2022-quiz1#%E6%B8%AC%E9%A9%97-1) ## [Two Sum](https://leetcode.com/problems/two-sum/) Leetcode的第一題。給你一個數列和target，求出哪兩個index相加的合為target，返回這兩個index。其中題目說答案只有一個。因為C++ 內建map所以先用 C++驗證一次。走訪這個數列的同時，在i的時候往前看(0 ~ i - 1)是否有target - nums[i]，如果有就返回這兩個的index。 time complexity為$$O(NlogN)$$ 因為hashmap取值為$$O(logN)$$ ```cpp= vector<int> twoSum(vector<int>& nums, int target) { auto n = nums.size(); unordered_map<int, int> m; vector<int> rtn; for(int i = 0; i < n; ++i) { if(m.count(target - nums[i])) { rtn = {m[target - nums[i]], i}; break; } m[nums[i]] = i; } return rtn; } ``` 可是C裡面沒有hash map，所以必須自己實作一個。 ## hlist struct的定義在[include/linux/types.h](https://github.com/torvalds/linux/blob/master/tools/include/linux/types.h) 和list一樣操作都定義在[include/linux/list.h](https://github.com/torvalds/linux/blob/master/tools/include/linux/list.h) 和list不太一樣的是，==hlist_head為singly linked list==。list_head和hlist_node為double linked list。所以要取得tail的time complexity為O(N)。我覺得使用的時候盡量在node數量少的應用上。或是盡量不要對back做操作。像queue的實作應該就不適合，因為要同時存取front和back。 | push_front | pop_front | push_back | pop_back | -------- | -------- | -------- | -------- | | $$O(1)$$ | $$O(1)$$ | $$O(N)$$ | $$O(N)$$ | ```cpp= struct list_head { struct list_head *next, *prev; }; struct hlist_head { struct hlist_node *first; }; struct hlist_node { struct hlist_node *next, **pprev; }; ``` 其中list_head的指標都是指向list_head。關於list_head的操作可以參考之前的筆記[kernel study – list_head](/7dpIj_RnTZyt24ay8lCxDQ)。這邊想專注在hlist。 ### 為什麼 pprev是point到strcut hlist_node *的pointer? 使用hlist_node有兩種情況， case1是hlist_node的前一個是hlist_head。 case2是hlist_node的前一個也是hlist_node。 ![](https://i.imgur.com/B5fRMJQ.png) 如果像list_head一樣使用以下的code，會無法滿足case1，因為case1是指向不同的struct。 ```cpp= struct hlist_node { struct hlist_node *next; struct hlist_node *pprev; }; ``` 反過來如果使用以下的code，則會無法滿足case2，因為struct不一樣。 ```cpp= struct hlist_node { struct hlist_node *next; struct hlist_head *pprev; }; ``` 因為pprev只是為了指到前一個的next，==不管前一個是hlist_node或是hlist_head，裡面的next或是first都是指向sturct hlist_node。所以把pprev宣告成，struct hlist_node **prev就可以解決這個問題。== ## HashMap 參考jserv老師的程式碼 [linux/include/hash.h](https://github.com/torvalds/linux/blob/master/tools/include/linux/hash.h) 計算hash值。==把val乘上GOLDEN_RATIO_32然後取前面MSB長度為bits來當hash值。== > This hash multiplies the input by a large odd number and takes the high bits. Since multiplication propagates changes to the most significant end only, ==it is essential that the high bits of the product be used for the hash value.== ```c= #define MAP_HASH_SIZE(bits) (1 << bits) #define GOLDEN_RATIO_32 0x61C88647 static inline unsigned int hash(unsigned int val, unsigned int bits) { /* High bits are more random, so use them. */ return (val * GOLDEN_RATIO_32) >> (32 - bits); } ``` 程式碼所呈現的hashmap架構圖如下: ![](https://i.imgur.com/sEbAm3G.png) 當key計算出hash之後就去map->ht[]找出對應的hlist_head，然後再traverse整個hlist找出一樣的key值。 ```c= static struct hash_key *find_key(map_t *map, int key) { struct hlist_head *head = &(map->ht)[hash(key, map->bits)]; for (struct hlist_node *p = head->first; p; p = p->next) { struct hash_key *kn = container_of(p, struct hash_key, node); if (kn->key == key) return kn; } return NULL; } ``` + 有找到key-value pair，則看需要更新裡面的value值。 + 沒找到則把新的key-value pair加到hlist的最前面，因為加到前面time complexity比較好。第一種情況first後面沒node。則直接把new node加到h->first之後。 ![](https://i.imgur.com/VS3IKUA.png) 第二種情況first後面有node，則把new node加到h->first和first中間。 ![](https://i.imgur.com/7Ei0Vgs.png) ```c= void map_add(map_t *map, int key, void *data) { struct hash_key *kn = find_key(map, key); if (kn) return; kn = malloc(sizeof(struct hash_key)); kn->key = key, kn->data = data; struct hlist_head *h = &map->ht[hash(key, map->bits)]; struct hlist_node *n = &kn->node, *first = h->first; n->next = first; if (first) first->pprev = &n->next; h->first = n; n->pprev = &h->first; } ``` ###### tags: `linux2022`