contributed by < cheezad
>
node_t
contains left
and right
node forming into a doubly-linked list. next
is used to point to the next doubly-linked list. long value
is used to store the value of the current node.list_add
list_tail
list_length
quick_sort
begin[]
, end[];
i
records the current position. When i < 0
, it means that the stack is empty and there is nothing left to sort.My idea of improving the code above is
Ways of finding a better pivot:
The original code chooses the first element of the list as pivot. It also sets the max_level
of begin
and end
as 2*n
where n
is the length of the list. I did a running to see what the actual max_level
is. The worst case I use is when the input of the list is in descending order, this way every time the pivot would be the largest number.
The results are as below:
Even in the worst case, the depth would only reach the size of the list length instead of 2*n
as the max_level
is set.
Here is the result where the input is shuffled:
This can be set as a standard for the following methods.
In the case of median of three, it picks out the first, middle, and last element of the list and finds the median of the three elements. Doing so decreases the chance of picking the maxium or minimum value. Just like the original code, I ran a test on it with an input of decending values. The results are as below:
We can see that the max_depth
is significantly smaller than those in the original code.
However, this method might come with a cost, it might use more time(CPU cycles).
Randomly pick pivot picks a pivot with the value r generated from a random generator and modular it to the length of the segment, r is then used to pick the rth element as the pivot. In the case where the input is shuffled the results are as below:
In the case of the input is in desending order, the result are as below:
We can see that the results are relatively stable compared the original code.
I later did a testing on all three methods where the input is shuffled and in descending order. This time the results also include a clock which is the runtime of each sorting. In this testing the max_level
of each quick_sort
is all set to the length of the list.
Visualize via gnuplot and do some mathematical analysis.
Timsort is a sorting algorithm used to perform better sorting in real life data. Since data in real life is already partially sorted, timsort exploits this characteristic to speed up merging process and reduce comparing count. Timsort uses an important concept called run
.
run
is chunks of the original data. The optimal run number should be equal or slightly smaller than any power of 2.
minrun
is the minimum length of every run. This is set to control each length of the run. If the length is too short, binary insert the first element behind the run to the run. Choosing the optimal minrun
is very crucial in timsort.
Timsort uses stack to cache run, this avoids the program to go through the whole data at the start, and thus lowering memory caching.
merge_collapse
is used to ensure the length of the run is balanced. This function checks the top three runs in the stack with the following requirements:
merge_collapse
also ensures that only runs that are adjacent to each other are merged, this ensures the sorting is stable.Galloping mode:
There are two sorted array of number A and B. Put the shorter array in a cache temp, this means less cache is used. In timsort, we find where the first element of B (B0) can fit inside A. We can then know that there is a chunk of A that is smaller than B0, store then back in A. Do the same with the rest of A. Find the first element of the remaining A and check where it fits in B. Repeat until the two array are sorted.
Example:
Given sorted array A and B.
Pick the shorter array and put into temp.
(length of A is shorter than length of B)
Find the first element of B and see where it can fit in temp.
Put the elements before B0 back to A.
Find where the first element of temp fits in B.
Move the elements before temp0 to A.
Rinse and repeat the process until B and temp is empty, then return A.
How can we decrease the number of comparisons?
A way HotMercury mentioned is to improve how run is calculated.minrun
range is usually chosen from the range 32 to 64, and the size of the data divided by minrun is equal or slightly smaller than a power of two. The way we can calculate minrun
better is by extracting the 6 bits of msb as minrun
. If there is any bit left that is 1
, add one to minrun
for each bit.
on my way to change the code 🏃
Problem 1 dives into LeetCode 105 Construct Binary Tree from Preorder and Inorder Traversal combined with using the hash table implementation used in Linux Kernel.
We can first understand how hash table works in Linux Kernel.
In the structure of hlist_node
, we can see that pprev
is declared as a pointer to a pointer. This is because when removing a node, if pprev
is declared as a pointer, it will need extra operations to update list_head
.
hash.h uses the Multiplication Method to hash but with a tweak, it uses bit-shifting instead of multiplication because bit-shifting is more efficient.
On to the functions.
hlist_add_head
: adds a node to the start of a list.
find
: calculates the hash of the node it wants to search for the node that has the same value as the target, returns the index of the node.
dfs
: builds the tree from inorder and preorder nodes.
buildTree
: initializes the hash list, populates with values of inorder sequence, then calls dfs to build the binary tree out.
Problem 2 is an implementation of Least Recently Used Cache. When a LRU cache is full, it picks out the least recently used node and replaces it with the new incoming node.
In lRUCachePut
we can see that it checks if the node is in cache. If not, it then checks if the cache is full. When the cache is full, it removes the last entry of the list and adds the new node to the head.
Problem 3 is a implementation of find_nth_bit
, it returns the N'th set bit in the memory. It is usually used to find the N'th set bit in a bitmap efficiently.
From the code above, we can see that it first checks if nbits is a constant that fits within the range of a long integer, then checks if the extracted bits from addr
is zero. If it is greater than zero, it returns the function fns
, else returns size
. If size is not constant or greater than long int, it calls the FIND_NTH_BIT
. Both fns
and FIND_NTH_BIT
finds the n'th set bit and returns it.