Try   HackMD

2024q1 Homework2 (quiz1+2)

contributed by < cheezad >

Week 1

Problem 1

Code analysis and explanation

  • node_t contains left and right node forming into a doubly-linked list. next is used to point to the next doubly-linked list. long value is used to store the value of the current node.
typedef struct __node {
    struct __node *left, *right;
    struct __node *next;
    long value;
} node_t;
  • list_add
    • adds node to the head of the list
  • list_tail
    • returns the last node in the list
  • list_length
    • returns the length of the list
  • quick_sort
    • begin[], end[];
      • two stacks that stores the starting and ending position of the part of linked list to be sorted
    • i records the current position. When i < 0, it means that the stack is empty and there is nothing left to sort.

My idea of improving the code above is

  1. Find a better way to choose the pivot instead of picking the one on the right-hand side.
  2. Reduce the number of max_level, so the program doesn't need to allocate as much space. Maybe with a better pivot found, the maximum level can be reduced.

Ways of finding a better pivot:

  • Median of 3: Picks the first, middle, and last element of the array to be sorted and use the median of the three element as pivot.
  • Randomly pick pivot: Randomly pick a number N. Modular N by the length of the list, and choose the Nth element as pivot.
Original code

The original code chooses the first element of the list as pivot. It also sets the max_level of begin and end as 2*n where n is the length of the list. I did a running to see what the actual max_level is. The worst case I use is when the input of the list is in descending order, this way every time the pivot would be the largest number.
The results are as below:

list length(n): 10, max depth is 10
list length(n): 100, max depth is 100
list length(n): 1000, max depth is 1000
list length(n): 10000, max depth is 10000
list length(n): 100000, max depth is 100000

Even in the worst case, the depth would only reach the size of the list length instead of 2*n as the max_level is set.
Here is the result where the input is shuffled:

count: 10, max depth is 4
count: 100, max depth is 14
count: 1000, max depth is 26
count: 10000, max depth is 38
count: 100000, max depth is 48

This can be set as a standard for the following methods.

Median of Three

In the case of median of three, it picks out the first, middle, and last element of the list and finds the median of the three elements. Doing so decreases the chance of picking the maxium or minimum value. Just like the original code, I ran a test on it with an input of decending values. The results are as below:

list length(n): 10, max depth is 6
list length(n): 100, max depth is 12
list length(n): 1000, max depth is 18
list length(n): 10000, max depth is 24
list length(n): 100000, max depth is 32

We can see that the max_depth is significantly smaller than those in the original code.
However, this method might come with a cost, it might use more time(CPU cycles).

Randomly pick pivot

Randomly pick pivot picks a pivot with the value r generated from a random generator and modular it to the length of the segment, r is then used to pick the rth element as the pivot. In the case where the input is shuffled the results are as below:

list length(n): 10, max depth is 8
list length(n): 100, max depth is 14
list length(n): 1000, max depth is 28
list length(n): 10000, max depth is 40
list length(n): 100000, max depth is 48

In the case of the input is in desending order, the result are as below:

list length(n): 10, max depth is 4
list length(n): 100, max depth is 14
list length(n): 1000, max depth is 28
list length(n): 10000, max depth is 40
list length(n): 100000, max depth is 48

We can see that the results are relatively stable compared the original code.

I later did a testing on all three methods where the input is shuffled and in descending order. This time the results also include a clock which is the runtime of each sorting. In this testing the max_level of each quick_sort is all set to the length of the list.

running orginal
list length(n): 10, max depth is 4, clock: 46
list length(n): 100, max depth is 14, clock: 71
list length(n): 1000, max depth is 26, clock: 861
list length(n): 10000, max depth is 38, clock: 4995
list length(n): 100000, max depth is 48, clock: 41933
running original descending
list length(n): 10, max depth is 10, clock: 9
list length(n): 100, max depth is 100, clock: 37
list length(n): 1000, max depth is 1000, clock: 3259
list length(n): 10000, max depth is 10000, clock: 316207
list length(n): 100000, max depth is 100000, clock: 32950489
running median of three
count: 10, max depth is 4, clock: 12
count: 100, max depth is 14, clock: 17
count: 1000, max depth is 22, clock: 277
count: 10000, max depth is 30, clock: 3086
count: 100000, max depth is 38, clock: 47194
running median of three descending
list length(n): 10, max depth is 6, clock: 10
list length(n): 100, max depth is 12, clock: 13
list length(n): 1000, max depth is 18, clock: 197
list length(n): 10000, max depth is 24, clock: 1728
list length(n): 100000, max depth is 32, clock: 20529
running random pivot
list length(n): 10, max depth is 8, clock: 11
list length(n): 100, max depth is 14, clock: 22
list length(n): 1000, max depth is 26, clock: 266
list length(n): 10000, max depth is 38, clock: 3985
list length(n): 100000, max depth is 48, clock: 58486
running random pivot descending
list length(n): 10, max depth is 4, clock: 11
list length(n): 100, max depth is 14, clock: 17
list length(n): 1000, max depth is 28, clock: 212
list length(n): 10000, max depth is 40, clock: 2403
list length(n): 100000, max depth is 48, clock: 30642

Visualize via gnuplot and do some mathematical analysis.

Problem 2

Code analysis and explanation

Timsort is a sorting algorithm used to perform better sorting in real life data. Since data in real life is already partially sorted, timsort exploits this characteristic to speed up merging process and reduce comparing count. Timsort uses an important concept called run.

run is chunks of the original data. The optimal run number should be equal or slightly smaller than any power of 2.

minrun is the minimum length of every run. This is set to control each length of the run. If the length is too short, binary insert the first element behind the run to the run. Choosing the optimal minrun is very crucial in timsort.

Timsort uses stack to cache run, this avoids the program to go through the whole data at the start, and thus lowering memory caching.

merge_collapse is used to ensure the length of the run is balanced. This function checks the top three runs in the stack with the following requirements:

  1. The length of A is greater than the total length of B and C.
  2. The length of B is greater than C.
    merge_collapse also ensures that only runs that are adjacent to each other are merged, this ensures the sorting is stable.

Galloping mode:
There are two sorted array of number A and B. Put the shorter array in a cache temp, this means less cache is used. In timsort, we find where the first element of B (B0) can fit inside A. We can then know that there is a chunk of A that is smaller than B0, store then back in A. Do the same with the rest of A. Find the first element of the remaining A and check where it fits in B. Repeat until the two array are sorted.

Example:
Given sorted array A and B.







G0



A
A



ele1

1



A->ele1





B
B



ele4

4



B->ele4





temp
temp



ele2

2



ele1->ele2





ele3

3



ele2->ele3





ele7

7



ele3->ele7





ele8

8



ele7->ele8





ele9

9



ele8->ele9





ele5

5



ele4->ele5





ele6

6



ele5->ele6





ele82

8



ele6->ele82





ele92

9



ele82->ele92





ele10

10



ele92->ele10





ele11

11



ele10->ele11





Pick the shorter array and put into temp.
(length of A is shorter than length of B)







G1



A
A



B
B



ele4

4



B->ele4





temp
temp



ele1

1



temp->ele1





ele2

2



ele1->ele2





ele3

3



ele2->ele3





ele7

7



ele3->ele7





ele8

8



ele7->ele8





ele9

9



ele8->ele9





ele5

5



ele4->ele5





ele6

6



ele5->ele6





ele82

8



ele6->ele82





ele92

9



ele82->ele92





ele10

10



ele92->ele10





ele11

11



ele10->ele11





Find the first element of B and see where it can fit in temp.







G1



temp
temp



ele1

1



temp->ele1





ele2

2



ele1->ele2





ele3

3



ele2->ele3





ele4

4



ele3->ele4





ele7

7



ele8

8



ele7->ele8





ele9

9



ele8->ele9





ele4->ele7





Put the elements before B0 back to A.







G0



A
A



ele1

1



A->ele1





B
B



ele4

4



B->ele4





temp
temp



ele7

7



temp->ele7





ele2

2



ele1->ele2





ele3

3



ele2->ele3





ele8

8



ele7->ele8





ele9

9



ele8->ele9





ele5

5



ele4->ele5





ele6

6



ele5->ele6





ele82

8



ele6->ele82





ele92

9



ele82->ele92





ele10

10



ele92->ele10





ele11

11



ele10->ele11





Find where the first element of temp fits in B.







G0



B
B



ele4

4



B->ele4





ele7

7



ele82

8



ele7->ele82





ele5

5



ele4->ele5





ele6

6



ele5->ele6





ele6->ele7





ele92

9



ele82->ele92





ele10

10



ele92->ele10





ele11

11



ele10->ele11





Move the elements before temp0 to A.







G0



A
A



ele1

1



A->ele1





B
B



ele82

8



B->ele82





temp
temp



ele7

7



temp->ele7





ele2

2



ele1->ele2





ele3

3



ele2->ele3





ele4

4



ele3->ele4





ele8

8



ele7->ele8





ele9

9



ele8->ele9





ele5

5



ele4->ele5





ele6

6



ele5->ele6





ele92

9



ele82->ele92





ele10

10



ele92->ele10





ele11

11



ele10->ele11





Rinse and repeat the process until B and temp is empty, then return A.

Improvements

How can we decrease the number of comparisons?
A way HotMercury mentioned is to improve how run is calculated.minrun range is usually chosen from the range 32 to 64, and the size of the data divided by minrun is equal or slightly smaller than a power of two. The way we can calculate minrun better is by extracting the 6 bits of msb as minrun. If there is any bit left that is 1, add one to minrun for each bit.

on my way to change the code 🏃

Week 2

Problem 1

Problem 1 dives into LeetCode 105 Construct Binary Tree from Preorder and Inorder Traversal combined with using the hash table implementation used in Linux Kernel.
We can first understand how hash table works in Linux Kernel.

struct hlist_node {
    struct hlist_node *next, **pprev;
};

In the structure of hlist_node, we can see that pprev is declared as a pointer to a pointer. This is because when removing a node, if pprev is declared as a pointer, it will need extra operations to update list_head.
hash.h uses the Multiplication Method to hash but with a tweak, it uses bit-shifting instead of multiplication because bit-shifting is more efficient.
On to the functions.
hlist_add_head: adds a node to the start of a list.
find: calculates the hash of the node it wants to search for the node that has the same value as the target, returns the index of the node.
dfs: builds the tree from inorder and preorder nodes.
buildTree: initializes the hash list, populates with values of inorder sequence, then calls dfs to build the binary tree out.

Problem 2

Problem 2 is an implementation of Least Recently Used Cache. When a LRU cache is full, it picks out the least recently used node and replaces it with the new incoming node.

In lRUCachePut we can see that it checks if the node is in cache. If not, it then checks if the cache is full. When the cache is full, it removes the last entry of the list and adds the new node to the head.

if (!cache) {
        if (obj->count == obj->capacity) {
            cache = list_last_entry(&obj->dhead, LRUNode, link);
            list_move(&cache->link, &obj->dhead);
            hlist_del(&cache->node);
            hlist_add_head(&cache->node, &obj->hhead[hash]);
        } else {
            cache = malloc(sizeof(LRUNode));
            hlist_add_head(&cache->node, &obj->hhead[hash]);
            list_add(&cache->link, &obj->dhead);
            obj->count++;
        }
        cache->key = key;
    }

Problem 3

Problem 3 is a implementation of find_nth_bit, it returns the N'th set bit in the memory. It is usually used to find the N'th set bit in a bitmap efficiently.

static inline unsigned long find_nth_bit(const unsigned long *addr,
                                           unsigned long size,
                                           unsigned long n)
{
    if (n >= size)
        return size;

    if (small_const_nbits(size)) {
        unsigned long val = *addr & GENMASK(size - 1, 0);

        return val ? fns(val, n) : size;
    }

    return FIND_NTH_BIT(addr[idx], size, n);
}

From the code above, we can see that it first checks if nbits is a constant that fits within the range of a long integer, then checks if the extracted bits from addr is zero. If it is greater than zero, it returns the function fns, else returns size. If size is not constant or greater than long int, it calls the FIND_NTH_BIT. Both fns and FIND_NTH_BIT finds the n'th set bit and returns it.