# 2024q1 Homework2 (quiz1+2) contributed by < `cheezad` > ## [Week 1](https://hackmd.io/@sysprog/linux2024-quiz1) ### Problem 1 #### Code analysis and explanation - `node_t` contains `left` and `right` node forming into a doubly-linked list. `next` is used to point to the next doubly-linked list. `long value` is used to store the value of the current node. ```cpp typedef struct __node { struct __node *left, *right; struct __node *next; long value; } node_t; ``` - `list_add` - adds node to the head of the list - `list_tail` - returns the last node in the list - `list_length` - returns the length of the list - `quick_sort` - `begin[]`, `end[];` - two stacks that stores the starting and ending position of the part of linked list to be sorted - `i` records the current position. When `i < 0`, it means that the stack is empty and there is nothing left to sort. My idea of improving the code above is 1. Find a better way to choose the pivot instead of picking the one on the right-hand side. 2. Reduce the number of max_level, so the program doesn't need to allocate as much space. Maybe with a better pivot found, the maximum level can be reduced. Ways of finding a better pivot: - Median of 3: Picks the first, middle, and last element of the array to be sorted and use the median of the three element as pivot. - Randomly pick pivot: Randomly pick a number N. Modular N by the length of the list, and choose the Nth element as pivot. ##### Original code The original code chooses the first element of the list as pivot. It also sets the `max_level` of `begin` and `end` as `2*n` where `n` is the length of the list. I did a running to see what the actual `max_level` is. The worst case I use is when the input of the list is in descending order, this way every time the pivot would be the largest number. The results are as below: ``` list length(n): 10, max depth is 10 list length(n): 100, max depth is 100 list length(n): 1000, max depth is 1000 list length(n): 10000, max depth is 10000 list length(n): 100000, max depth is 100000 ``` Even in the worst case, the depth would only reach the size of the list length instead of `2*n` as the `max_level` is set. Here is the result where the input is shuffled: ``` count: 10, max depth is 4 count: 100, max depth is 14 count: 1000, max depth is 26 count: 10000, max depth is 38 count: 100000, max depth is 48 ``` This can be set as a standard for the following methods. ##### Median of Three In the case of median of three, it picks out the first, middle, and last element of the list and finds the median of the three elements. Doing so decreases the chance of picking the maxium or minimum value. Just like the original code, I ran a test on it with an input of decending values. The results are as below: ``` list length(n): 10, max depth is 6 list length(n): 100, max depth is 12 list length(n): 1000, max depth is 18 list length(n): 10000, max depth is 24 list length(n): 100000, max depth is 32 ``` We can see that the `max_depth` is significantly smaller than those in the original code. However, this method might come with a cost, it might use more time(CPU cycles). ##### Randomly pick pivot Randomly pick pivot picks a pivot with the value r generated from a random generator and modular it to the length of the segment, r is then used to pick the r^th^ element as the pivot. In the case where the input is shuffled the results are as below: ``` list length(n): 10, max depth is 8 list length(n): 100, max depth is 14 list length(n): 1000, max depth is 28 list length(n): 10000, max depth is 40 list length(n): 100000, max depth is 48 ``` In the case of the input is in desending order, the result are as below: ``` list length(n): 10, max depth is 4 list length(n): 100, max depth is 14 list length(n): 1000, max depth is 28 list length(n): 10000, max depth is 40 list length(n): 100000, max depth is 48 ``` We can see that the results are relatively stable compared the original code. I later did a testing on all three methods where the input is shuffled and in descending order. This time the results also include a clock which is the runtime of each sorting. In this testing the `max_level` of each `quick_sort` is all set to the length of the list. ``` running orginal list length(n): 10, max depth is 4, clock: 46 list length(n): 100, max depth is 14, clock: 71 list length(n): 1000, max depth is 26, clock: 861 list length(n): 10000, max depth is 38, clock: 4995 list length(n): 100000, max depth is 48, clock: 41933 running original descending list length(n): 10, max depth is 10, clock: 9 list length(n): 100, max depth is 100, clock: 37 list length(n): 1000, max depth is 1000, clock: 3259 list length(n): 10000, max depth is 10000, clock: 316207 list length(n): 100000, max depth is 100000, clock: 32950489 running median of three count: 10, max depth is 4, clock: 12 count: 100, max depth is 14, clock: 17 count: 1000, max depth is 22, clock: 277 count: 10000, max depth is 30, clock: 3086 count: 100000, max depth is 38, clock: 47194 running median of three descending list length(n): 10, max depth is 6, clock: 10 list length(n): 100, max depth is 12, clock: 13 list length(n): 1000, max depth is 18, clock: 197 list length(n): 10000, max depth is 24, clock: 1728 list length(n): 100000, max depth is 32, clock: 20529 running random pivot list length(n): 10, max depth is 8, clock: 11 list length(n): 100, max depth is 14, clock: 22 list length(n): 1000, max depth is 26, clock: 266 list length(n): 10000, max depth is 38, clock: 3985 list length(n): 100000, max depth is 48, clock: 58486 running random pivot descending list length(n): 10, max depth is 4, clock: 11 list length(n): 100, max depth is 14, clock: 17 list length(n): 1000, max depth is 28, clock: 212 list length(n): 10000, max depth is 40, clock: 2403 list length(n): 100000, max depth is 48, clock: 30642 ``` :::danger Visualize via gnuplot and do some mathematical analysis. ::: ### Problem 2 #### Code analysis and explanation Timsort is a sorting algorithm used to perform better sorting in real life data. Since data in real life is already partially sorted, timsort exploits this characteristic to speed up merging process and reduce comparing count. Timsort uses an important concept called `run`. `run` is chunks of the original data. The optimal run number should be equal or slightly smaller than any power of 2. `minrun` is the minimum length of every run. This is set to control each length of the run. If the length is too short, binary insert the first element behind the run to the run. Choosing the optimal `minrun` is very crucial in timsort. Timsort uses stack to cache run, this avoids the program to go through the whole data at the start, and thus lowering memory caching. `merge_collapse` is used to ensure the length of the run is balanced. This function checks the top three runs in the stack with the following requirements: 1. The length of A is greater than the total length of B and C. 2. The length of B is greater than C. `merge_collapse` also ensures that only runs that are adjacent to each other are merged, this ensures the sorting is stable. Galloping mode: There are two sorted array of number A and B. Put the shorter array in a cache temp, this means less cache is used. In timsort, we find where the first element of B (B~0~) can fit inside A. We can then know that there is a chunk of A that is smaller than B~0~, store then back in A. Do the same with the rest of A. Find the first element of the remaining A and check where it fits in B. Repeat until the two array are sorted. Example: Given sorted array A and B. ```graphviz digraph G0{ node[shape=plaintext]; A; B; temp; node[shape=box]; ele1[label=1]; ele2[label=2]; ele3[label=3]; ele7[label=7]; ele8[label=8]; ele9[label=9]; { rank="same" A->ele1->ele2->ele3->ele7->ele8->ele9; } node[shape=box]; ele4[label=4]; ele5[label=5]; ele6[label=6]; ele82[label=8]; ele92[label=9]; ele10[label=10]; ele11[label=11]; { rank="same" B->ele4->ele5->ele6->ele82->ele92->ele10->ele11; } temp; } ``` Pick the shorter array and put into temp. (length of A is shorter than length of B) ```graphviz digraph G1{ node[shape=plaintext]; A; B; temp; node[shape=box]; ele1[label=1]; ele2[label=2]; ele3[label=3]; ele7[label=7]; ele8[label=8]; ele9[label=9]; { rank="same" temp->ele1->ele2->ele3->ele7->ele8->ele9; } node[shape=box]; ele4[label=4]; ele5[label=5]; ele6[label=6]; ele82[label=8]; ele92[label=9]; ele10[label=10]; ele11[label=11]; { rank="same" B->ele4->ele5->ele6->ele82->ele92->ele10->ele11; } A; } ``` Find the first element of B and see where it can fit in temp. ```graphviz digraph G1{ node[shape=plaintext]; temp; node[shape=box]; ele1[label=1]; ele2[label=2]; ele3[label=3]; ele7[label=7]; ele8[label=8]; ele9[label=9]; { rank="same" temp->ele1->ele2->ele3 ele7->ele8->ele9; } node[shape=box]; ele4[label=4][color="red"]; { rank="same" ele3->ele4->ele7 } } ``` Put the elements before B~0~ back to A. ```graphviz digraph G0{ node[shape=plaintext]; A; B; temp; node[shape=box]; ele1[label=1]; ele2[label=2]; ele3[label=3]; ele7[label=7]; ele8[label=8]; ele9[label=9]; { rank="same" A->ele1->ele2->ele3; } node[shape=box]; ele4[label=4]; ele5[label=5]; ele6[label=6]; ele82[label=8]; ele92[label=9]; ele10[label=10]; ele11[label=11]; { rank="same" B->ele4->ele5->ele6->ele82->ele92->ele10->ele11; } { rank="same"; temp->ele7->ele8->ele9; } } ``` Find where the first element of temp fits in B. ```graphviz digraph G0{ node[shape=plaintext]; // A; B; node[shape=box]; // ele1[label=1]; // ele2[label=2]; // ele3[label=3]; ele7[label=7][color="red"]; // { // rank="same" // A->ele1->ele2->ele3; // } node[shape=box]; ele4[label=4]; ele5[label=5]; ele6[label=6]; ele82[label=8]; ele92[label=9]; ele10[label=10]; ele11[label=11]; { rank="same" B->ele4->ele5->ele6->ele7->ele82->ele92->ele10->ele11; } } ``` Move the elements before temp~0~ to A. ```graphviz digraph G0{ node[shape=plaintext]; A; B; temp; node[shape=box]; ele1[label=1]; ele2[label=2]; ele3[label=3]; ele7[label=7]; ele8[label=8]; ele9[label=9]; { rank="same" A->ele1->ele2->ele3->ele4->ele5->ele6; } node[shape=box]; ele4[label=4]; ele5[label=5]; ele6[label=6]; ele82[label=8]; ele92[label=9]; ele10[label=10]; ele11[label=11]; { rank="same" B->ele82->ele92->ele10->ele11; } { rank="same"; temp->ele7->ele8->ele9; } } ``` Rinse and repeat the process until B and temp is empty, then return A. #### Improvements How can we decrease the number of comparisons? A way [HotMercury](https://hackmd.io/UAq6Xx8DRieVVlxJUbrzoQ?view#Testing-2--Timsort) mentioned is to improve how run is calculated.`minrun` range is usually chosen from the range 32 to 64, and the size of the data divided by minrun is equal or slightly smaller than a power of two. The way we can calculate `minrun` better is by extracting the 6 bits of msb as `minrun`. If there is any bit left that is `1`, add one to `minrun` for each bit. :::info on my way to change the code 🏃 ::: ## [Week 2](https://hackmd.io/@sysprog/linux2024-quiz2) ### Problem 1 Problem 1 dives into [LeetCode 105 Construct Binary Tree from Preorder and Inorder Traversal](https://leetcode.com/problems/construct-binary-tree-from-preorder-and-inorder-traversal/description/) combined with using the hash table implementation used in Linux Kernel. We can first understand how hash table works in Linux Kernel. ```c struct hlist_node { struct hlist_node *next, **pprev; }; ``` In the structure of `hlist_node`, we can see that `pprev` is declared as a pointer to a pointer. This is because when removing a node, if `pprev` is declared as a pointer, it will need extra operations to update `list_head`. [hash.h](https://github.com/torvalds/linux/blob/master/tools/include/linux/hash.h) uses the Multiplication Method to hash but with a tweak, it uses bit-shifting instead of multiplication because bit-shifting is more efficient. On to the functions. `hlist_add_head`: adds a node to the start of a list. `find`: calculates the hash of the node it wants to search for the node that has the same value as the target, returns the index of the node. `dfs`: builds the tree from inorder and preorder nodes. `buildTree`: initializes the hash list, populates with values of inorder sequence, then calls dfs to build the binary tree out. ### Problem 2 Problem 2 is an implementation of [Least Recently Used Cache](https://en.wikipedia.org/wiki/Cache_replacement_policies#Least_recently_used_(LRU)). When a LRU cache is full, it picks out the least recently used node and replaces it with the new incoming node. In `lRUCachePut` we can see that it checks if the node is in cache. If not, it then checks if the cache is full. When the cache is full, it removes the last entry of the list and adds the new node to the head. ```c if (!cache) { if (obj->count == obj->capacity) { cache = list_last_entry(&obj->dhead, LRUNode, link); list_move(&cache->link, &obj->dhead); hlist_del(&cache->node); hlist_add_head(&cache->node, &obj->hhead[hash]); } else { cache = malloc(sizeof(LRUNode)); hlist_add_head(&cache->node, &obj->hhead[hash]); list_add(&cache->link, &obj->dhead); obj->count++; } cache->key = key; } ``` ### Problem 3 Problem 3 is a implementation of `find_nth_bit`, it returns the N'th set bit in the memory. It is usually used to find the N'th set bit in a bitmap efficiently. ```c static inline unsigned long find_nth_bit(const unsigned long *addr, unsigned long size, unsigned long n) { if (n >= size) return size; if (small_const_nbits(size)) { unsigned long val = *addr & GENMASK(size - 1, 0); return val ? fns(val, n) : size; } return FIND_NTH_BIT(addr[idx], size, n); } ``` From the code above, we can see that it first checks if nbits is a constant that fits within the range of a long integer, then checks if the extracted bits from `addr` is zero. If it is greater than zero, it returns the function `fns`, else returns `size`. If size is not constant or greater than long int, it calls the `FIND_NTH_BIT`. Both `fns` and `FIND_NTH_BIT` finds the n'th set bit and returns it.