# Algorithm note ## Stirling's approximation $n! \sim \sqrt{2\pi n}(\frac{n}{e})^n$ ### Natural log form $$ \begin{align*} \ln (n!) &\sim \ln(\sqrt{2\pi n}(\frac{n}{e})^n) \\ &= \ln\sqrt{2\pi n} + \ln(\frac{n}{e})^n \\ &= \frac{1}{2} \ln(2\pi n) + n\ln n - n \end{align*} $$ Change to $\log_2$-based: $$ \begin{align*} \log_2(n!) &= \log_2(e^{\ln(n!)}) \\ &= \ln(n!) \log_2(e) \\ &= \frac{\ln(n!)}{\ln 2} \\ &\sim n \log_2 n - \frac{1}{\ln 2} n + \frac{1}{2} \log_2 (2\pi n) \end{align*} $$ ## Time coplexity The summary table of the time complexity of diffent sorting algorithms: | Sorting Algorithm | Worst case | Best case | Average case | | -------- | -------- | -------- | - | | Insertion sort | $\Theta(n)$ | $\Theta(n^2)$ | $\Theta(n^2)$ | | Merge sort | $\Theta(n\lg n)$ | $\Theta(n \lg n)$ | $\Theta(n\lg n)$ | |Quicksort | $\Theta(n^2)$ | $\Theta(n \lg n)$ | $O(n\lg n)$ | ### Time complexity of insersion sort $$ \begin{align*} T(n) &= c_1 n + c_2 (n - 1) + c_4 (n - 1) + c_5 \sum_{i=2}^{n} t_i + c_6 \sum_{i=2}^{n} (t_i - 1) \\ &\phantom{={}} + c_7 \sum_{i=2}^{n} (t_i - 1) + c_8 (n - 1) \\ \end{align*} $$ #### Best case Best case occurs when $t_i = 1$. The time complexity is linear time. $$ \begin{align*} T(n) &= c_1 n + c_2 (n - 1) + c_4 (n - 1) + c_5 (n - 1) + c_8 (n - 1) \\ &= (c_1 + c_2 + c_4 + c_5 + c_8) n - (c_2 + c_4 + c_5 + c_8) \end{align*} $$ #### Worst case The worst case arises when the array is in descending order and $t_i = i$. The running time is a quadratic function $n$. $$ \sum_{i=2}^n i = \frac{n (n + 1)}{2} - 1 $$ $$ \begin{align*} T(n) &= c_1 n + c_2 (n - 1) + c_4 (n - 1) + c_5 \sum_{i=2}^{n} i + c_6 \sum_{i=2}^{n} (i - 1) \\ &\phantom{={}} + c_7 \sum_{i=2}^{n} (i - 1) + c_8 (n - 1) \\ &= c_1 n + c_2 (n - 1) + c_4 (n - 1) + c_5 (\frac{n (n + 1)}{2} - 1) \\ &\phantom{={}} + (c_6 + c_7) (\frac{n (n + 1)}{2} - 1 - (n - 1)) + c_8 (n - 1) \\ &= (\frac{c_5 + c_6 + c_7}{2}) n^2 + ... \end{align*} $$ #### Average case The average case arises when $t_i = i/2$. The running time is still a quadratic function $n$. ### Time complexity of merge sort Recurence of merge sort: $$ T(n) = \begin{cases} \Theta(1) & \text{if } n < n_0, \\ D(n) + 2 T(n/2) + C(n) & \text{otherwise.} \end{cases} $$ #### Worst case $$ \begin{align*} T(n) &= D(n) + 2 T(n/2) + C(n) \\ &= 2 T(n/2) + \Theta (n) \end{align*} $$ ![image](https://hackmd.io/_uploads/Hyf5Nrb5ll.png) The time complexity is $\Theta(n\lg n)$. #### Best case and average case Best and average cases still have the time complexity of $\Theta(n\lg n)$. The slight difference is whether the merge process needs to process through all element of both left and right arrays. ### Time complexity of quick sort #### Worst case Partitioning unbalancedly produces one subproblem with `n-1` elements and one with 0 element in every recursive call. The recurrance for the running time is $$ \begin{align*} T(n) &= T(n-1) + T(0) + \Theta(n) \\ &= T(n-1) + \Theta(n), \end{align*} $$ where $\Theta(n)$ is the running time to partition. $T(n)$ ia a general arithmetic series: $$ T(n) = \sum_{k=1}^n (a + bk) = \Theta(n^2) $$ #### Best case Best case happens when partitioning produces two subproblems, each of size no more than $n/2$. The recurrence is $$ \begin{align*} T(n) &= 2 T(n/2) + \Theta(n) \\ &= \Theta(n\lg n). \end{align*} $$ Except that the master theorem can solve this recurrence, the figure to solve the worst case of merge sort can also solve it. #### Average case Suppose that the partioning always produces a 9-to-1 proportional and the recurrence is $$ T(n) = T(9n/10) + \Theta(n/10) + \Theta(n) $$ The depth of the recursion tree is between $\log_{10} n$ and $\log_{10/9} n$, which both equals $\Theta (\lg n)$ because of the change-of-base formula $$ \log_a n = \frac{\log_b n}{\log_b a}. $$ Therefore, the running time is $O(n\lg n)$ whenever the split has constant proportionality. ## Buttom-up mergesort Refer to the the paper *Bottom-up mergesort - A detailed analysis*. ### Best case $$ \begin{align*} C_{b,bu}(N) &= \sum_{0 < j \le k} 2^{j-1} \left\lfloor \frac{N}{2^j} \right\rfloor + \sum_{0 < j \le k} b_j (b_{j-1} \ldots b_1 b_0)_2 \\ &= \sum_{n < N} \nu(n) \end{align*} $$ For example, if $N = 15 = (1111)_2$, the buttom-up mergesort proc ```graphviz digraph BottomUpMergeSort15 { rankdir=TB; graph [label="Bottom-up Merge Sort (N = 15, starting from single elements)", labelloc=top, fontsize=20]; node [shape=box, style="rounded,filled", fillcolor="#f5f5f5", fontname="Inter"]; // Length = 1 (original single elements) subgraph cluster_length1 { label="Length = 1 (original single elements)"; color="#dddddd"; n1 [label="1"]; n2 [label="2"]; n3 [label="3"]; n4 [label="4"]; n5 [label="5"]; n6 [label="6"]; n7 [label="7"]; n8 [label="8"]; n9 [label="9"]; n10 [label="10"]; n11 [label="11"]; n12 [label="12"]; n13 [label="13"]; n14 [label="14"]; n15 [label="15"]; {rank=same; n1 -> n2 -> n3 -> n4 -> n5 -> n6 -> n7 -> n8 -> n9 -> n10 -> n11 -> n12 -> n13 -> n14 -> n15 [style=invis]} } // Length = 2 (merge 1:1 runs) subgraph cluster_length2 { label="Length = 2 (after first 1:1 merges)"; color="#dddddd"; a0 [label="1–2"]; a1 [label="3–4"]; a2 [label="5–6"]; a3 [label="7–8"]; a4 [label="9–10"]; a5 [label="11–12"]; a6 [label="13–14"]; a7 [label="15 (leftover)"]; {rank=same; a0 -> a1 -> a2 -> a3 -> a4 -> a5 -> a6 -> a7 [style=invis]} } // Length = 4 subgraph cluster_length4 { label="Length = 4"; color="#dddddd"; b0 [label="1–4"]; b1 [label="5–8"]; b2 [label="9–12"]; b3 [label="13–15"]; {rank=same; b0 -> b1 -> b2 -> b3 [style=invis]} } // Length = 8 subgraph cluster_length8 { label="Length = 8"; color="#dddddd"; c0 [label="1–8"]; c1 [label="9–15"]; {rank=same; c0 -> c1 [style=invis]} } // Length = 16 (final) subgraph cluster_length16 { label="Final merge (sorted sequence)"; color="#dddddd"; d0 [label="1–15"]; {rank=same; d0} } // Edges: length=1 → length=2 n1 -> a0; n2 -> a0; n3 -> a1; n4 -> a1; n5 -> a2; n6 -> a2; n7 -> a3; n8 -> a3; n9 -> a4; n10 -> a4; n11 -> a5; n12 -> a5; n13 -> a6; n14 -> a6; n15 -> a7; // Edges: length=2 → length=4 a0 -> b0; a1 -> b0; a2 -> b1; a3 -> b1; a4 -> b2; a5 -> b2; a6 -> b3; a7 -> b3; // Edges: length=4 → length=8 b0 -> c0; b1 -> c0; b2 -> c1; b3 -> c1; // Edges: length=8 → final c0 -> d0; c1 -> d0; } ``` $$ \begin{align*} C_{b,bu}(15) &= \sum_{0 < j \le 3} 2^{j-1} \left\lfloor \frac{N}{2^j} \right\rfloor + \sum_{0 < j \le 3} b_j (b_{j-1} \ldots b_1 b_0)_2 \\ &= 2^0 \times 7 \\ &\phantom{={}} + 2^1 \times 3 + 1 \times (1)_2 \\ &\phantom{={}} + 2^2 \times 1 + 1 \times (11)_2 \\ &\phantom{={}} + 1 \times (111)_2 \\ &= 28 \end{align*} $$ $N = 13$ ```graphviz digraph BottomUpMergeSort13 { rankdir=TB; graph [label="Bottom-up Merge Sort (N = 13, starting from single elements)", labelloc=top, fontsize=20]; node [shape=box, style="rounded,filled", fillcolor="#f5f5f5", fontname="Inter"]; // Length = 1 (original single elements) subgraph cluster_length1 { label="Length = 1 (original single elements)"; color="#dddddd"; n1 [label="1"]; n2 [label="2"]; n3 [label="3"]; n4 [label="4"]; n5 [label="5"]; n6 [label="6"]; n7 [label="7"]; n8 [label="8"]; n9 [label="9"]; n10 [label="10"]; n11 [label="11"]; n12 [label="12"]; n13 [label="13"]; {rank=same; n1 -> n2 -> n3 -> n4 -> n5 -> n6 -> n7 -> n8 -> n9 -> n10 -> n11 -> n12 -> n13 [style=invis]} } // Length = 2 subgraph cluster_length2 { label="Length = 2 (after first 1:1 merges)"; color="#dddddd"; a0 [label="1–2"]; a1 [label="3–4"]; a2 [label="5–6"]; a3 [label="7–8"]; a4 [label="9–10"]; a5 [label="11–12"]; a6 [label="13 (leftover)"]; {rank=same; a0 -> a1 -> a2 -> a3 -> a4 -> a5 -> a6 [style=invis]} } // Length = 4 subgraph cluster_length4 { label="Length = 4"; color="#dddddd"; b0 [label="1–4"]; b1 [label="5–8"]; b2 [label="9–12"]; b3 [label="13 (leftover)"]; {rank=same; b0 -> b1 -> b2 -> b3 [style=invis]} } // Length = 8 subgraph cluster_length8 { label="Length = 8"; color="#dddddd"; c0 [label="1–8"]; c1 [label="9–13"]; {rank=same; c0 -> c1 [style=invis]} } // Final merge subgraph cluster_final { label="Final merge (sorted sequence)"; color="#dddddd"; d0 [label="1–13 (sorted)"]; {rank=same; d0} } // Edges: length=1 → length=2 n1 -> a0; n2 -> a0; n3 -> a1; n4 -> a1; n5 -> a2; n6 -> a2; n7 -> a3; n8 -> a3; n9 -> a4; n10 -> a4; n11 -> a5; n12 -> a5; n13 -> a6; // Edges: length=2 → length=4 a0 -> b0; a1 -> b0; a2 -> b1; a3 -> b1; a4 -> b2; a5 -> b2; a6 -> b3; // Edges: length=4 → length=8 b0 -> c0; b1 -> c0; b2 -> c1; b3 -> c1; // Edges: length=8 → final c0 -> d0; c1 -> d0; } ``` ## [Commit b5c56e0 in Linux kernel](https://github.com/torvalds/linux/commit/b5c56e0cdd62979dd538e5363b06be5bdf735a09) ### What is CONFIG_RETPOLINE? `CONFIG_RETPOLINE` is a Linux kernel build option that enables **retpoline (return trampoline)** mitigations against *Spectre variant 2* vulnerabilities. It works by replacing **indirect branches**—instructions whose targets are computed at runtime—with a special code sequence that uses `ret` instructions to safely **trap speculative execution**. This prevents the CPU from speculatively jumping to unintended or unsafe locations, ensuring that speculative paths cannot access or leak sensitive data, while allowing normal execution to continue correctly and securely. ### Why can the commit save 0.2\*n comparisons? #### How to derive the theoreticl limit? An array of size $n$ has $n!$ possible permutations. Each comparison in a comparison-based sorting slgorithm represents a binary decision, and we can visualize all possible mermutations as the leaves of a binary decision tree. Therefore, the minimal number of comparision required to distinguish all permutations is $\log_2 (n!)$. According to [Stirling's approcimation](https://hackmd.io/0P1WCIddQWa4v-6EgNTQfA#Sterling-approximation) $$ \begin{align*} \log_2 (n!) &\sim n \log_2 n - \frac{1}{\ln 2} n + \frac{1}{2} \log_2 (2\pi n) \\ &= n \log_2 n - 1.4427 n + 0.5 \log_2 (n) + 1.3257 \end{align*} $$ #### What is the number of comparisons of merge sort (top-down)? According to the paper *Bottom-up mergesort — A detailed analysis*, the comparison number of the average case is $$ C_{a,tp} = n \lg n + n B(\lg n) + O(n^\epsilon), \epsilon > 0. $$ For the "best case" (array size is power of $2$) of the average case, $B(x)$ has the minimum of $-1.2645$. #### Why simple eager merge pattern causes bad performance * Why is the simple eager merge pattern causes bad performance when n is just over a power of 2? * Because the final merge involves runs of very different sizes, making the process inefficient. * Why does bottom-up mergesort achieve K < 0.5 for such sizes? * Refer to the Fig. 3 of *Bottom-up mergesort — A detailed analysis*, the worst performance has $K < 0.5$. #### Why the merge passes at worst 1:2 unbalanced achieve $K=1.207$? Proved by experiment. ### How to experiment? * Refer to https://github.com/python/cpython/blob/main/Objects/listsort.txt * Evaluate with comparison counts. * Refer to [sortperf.py](https://github.com/python/cpython/blob/main/Tools/scripts/sortperf.py) #### Issues 1. How to version control the external Linux Kernel repository? Use `git submodule` and `git sparse-checkout` 2. To solve inclusion error, 1. I use `linux/tools/include` to build for user-space application. 2. I use symlink of `linux/include/linux/list_sort.h` to 3. I try to make.