Linux 核心專題: ksort

# Linux 核心專題: ksort contributed by < [tintinjian12999](https://github.com/tintinjian12999) > > [ksort](https://github.com/sysprog21/ksort) ### Quick sort 在此介紹 quick sort 的基本作法 quick sort 主要分為兩個部份，選取 pivot 與 partition，選定一個 pivot 作為比較的基準點，而後進行 partition 將數列分為小於 pivot 等於 pivot 和大於 pivot 三個部份，並且針對分割後的子數列重複進行直到數列裡元素個數為 1 時則排序完成。具體程式碼大致如下 ```c void iqsort0(int *a, int n) { int i, j; if (n <= 1) return; for (i = 1, j = 0; i < n; i++) if (a[i] < a[0]) swap(++j, i, a); swap(0, j, a); iqsort0(a, j); iqsort0(a+j+1, n-j-1); } ``` 用以下例子作示範，假設選取第一個元素為 pivot。 ```graphviz digraph structs { node[shape=plaintext]; i [label = "i"]; j [label = "j"]; node[shape=box]; struct0 [label= "4"]; struct1 [label= "1"]; struct2 [label= "3"]; struct3 [label= "5"]; struct4 [label= "2"]; struct5 [label= "7"]; i->struct1 j-> struct0 { rank="same"; struct0 -> struct1 struct1 -> struct2 struct2 -> struct3 struct3 -> struct4 struct4 -> struct5 } } ``` ```graphviz digraph structs { node[shape=plaintext]; i [label = "i"]; j [label = "j"]; node[shape=box]; struct0 [label= "4"]; struct1 [label= "1"]; struct2 [label= "3"]; struct3 [label= "5"]; struct4 [label= "2"]; struct5 [label= "7"]; i->struct1 j-> struct1 { rank="same"; struct0 -> struct1 struct1 -> struct2 struct2 -> struct3 struct3 -> struct4 struct4 -> struct5 } } ``` ```graphviz digraph structs { node[shape=plaintext]; i [label = "i"]; j [label = "j"]; node[shape=box]; struct0 [label= "4"]; struct1 [label= "1"]; struct2 [label= "3"]; struct3 [label= "5"]; struct4 [label= "2"]; struct5 [label= "7"]; i->struct2 j-> struct1 { rank="same"; struct0 -> struct1 struct1 -> struct2 struct2 -> struct3 struct3 -> struct4 struct4 -> struct5 } } ``` ```graphviz digraph structs { node[shape=plaintext]; i [label = "i"]; j [label = "j"]; node[shape=box]; struct0 [label= "4"]; struct1 [label= "1"]; struct2 [label= "3"]; struct3 [label= "5"]; struct4 [label= "2"]; struct5 [label= "7"]; i->struct2 j-> struct2 { rank="same"; struct0 -> struct1 struct1 -> struct2 struct2 -> struct3 struct3 -> struct4 struct4 -> struct5 } } ``` ```graphviz digraph structs { node[shape=plaintext]; i [label = "i"]; j [label = "j"]; node[shape=box]; struct0 [label= "4"]; struct1 [label= "1"]; struct2 [label= "3"]; struct3 [label= "5"]; struct4 [label= "2"]; struct5 [label= "7"]; i->struct3 j-> struct2 { rank="same"; struct0 -> struct1 struct1 -> struct2 struct2 -> struct3 struct3 -> struct4 struct4 -> struct5 } } ``` ```graphviz digraph structs { node[shape=plaintext]; i [label = "i"]; j [label = "j"]; node[shape=box]; struct0 [label= "4"]; struct1 [label= "1"]; struct2 [label= "3"]; struct3 [label= "5"]; struct4 [label= "2"]; struct5 [label= "7"]; i->struct4 j-> struct2 { rank="same"; struct0 -> struct1 struct1 -> struct2 struct2 -> struct3 struct3 -> struct4 struct4 -> struct5 } } ``````graphviz digraph structs { node[shape=plaintext]; i [label = "i"]; j [label = "j"]; node[shape=box]; struct0 [label= "4"]; struct1 [label= "1"]; struct2 [label= "3"]; struct3 [label= "5"]; struct4 [label= "2"]; struct5 [label= "7"]; i->struct4 j-> struct2 { rank="same"; struct0 -> struct1 struct1 -> struct2 struct2 -> struct3 struct3 -> struct4 struct4 -> struct5 } } ``` ```graphviz digraph structs { node[shape=plaintext]; i [label = "i"]; j [label = "j"]; node[shape=box]; struct0 [label= "4"]; struct1 [label= "1"]; struct2 [label= "3"]; struct3 [label= "2"]; struct4 [label= "5"]; struct5 [label= "7"]; i->struct4 j-> struct3 { rank="same"; struct0 -> struct1 struct1 -> struct2 struct2 -> struct3 struct3 -> struct4 struct4 -> struct5 } } ``` ```graphviz digraph structs { node[shape=plaintext]; i [label = "i"]; j [label = "j"]; node[shape=box]; struct0 [label= "4"]; struct1 [label= "1"]; struct2 [label= "3"]; struct3 [label= "2"]; struct4 [label= "5"]; struct5 [label= "7"]; i->struct5 j-> struct3 { rank="same"; struct0 -> struct1 struct1 -> struct2 struct2 -> struct3 struct3 -> struct4 struct4 -> struct5 } } ``` 最後將 pviot 與 j 交換 ```graphviz digraph structs { node[shape=plaintext]; i [label = "i"]; j [label = "j"]; node[shape=box]; struct0 [label= "2"]; struct1 [label= "1"]; struct2 [label= "3"]; struct3 [label= "4"]; struct4 [label= "5"]; struct5 [label= "7"]; i->struct5 j-> struct3 { rank="same"; struct0 -> struct1 struct1 -> struct2 struct2 -> struct3 struct3 -> struct4 struct4 -> struct5 } } ``` 切割成兩個子數列執行 quick sort 直到子數列元素數量為 1 ，此時數列便排序完成。 quick sort 的最差情況發生在每次 pivot 的選擇皆為數列最大或最小值時，此時劃分出的兩個子數列一個為空數列，另一個為剩下的所有元素，且每次分割都需走訪數列一次，因此此時的時間複雜度為 $O(n ^ 2)$，而最佳情況為每次都選取到數列的中位數，此時劃分出的兩個子數列長度相等，共 $lgn$次劃分，乘上每次分割走訪數列一次最佳情況的時間複雜度為 O(nlgn)，因此為了避免最差情況的發生會使用數列裡任幾個數的中位數或隨機選取一個數當作 pivot 。 ### ksort 效能測試 Kernal Mode 與 User Mode 的時間計算分別可以透過 [ktime](https://hackmd.io/@sysprog/linux2024-integration/%2F%40sysprog%2Flinux2024-integration-a#%E6%A0%B8%E5%BF%83%E6%A8%A1%E5%BC%8F%E7%9A%84%E6%99%82%E9%96%93%E6%B8%AC%E9%87%8F) 和 [clock_gettime](https://man7.org/linux/man-pages/man3/clock_gettime.3.html) 完成。 > commit [a2f2575](https://github.com/tintinjian12999/ksort/commit/a2f2575029a7b62e661161c8fef99a34409997d7) 透過 [taskset](https://hackmd.io/@sysprog/linux2024-integration/%2F%40sysprog%2Flinux2024-integration-a#%E6%8E%92%E9%99%A4%E5%B9%B2%E6%93%BE%E6%95%88%E8%83%BD%E5%88%86%E6%9E%90%E7%9A%84%E5%9B%A0%E7%B4%A0) 將程式指定在 cpu #0 執行，並將 element 的數量從 1000 以 500 為間隔遞增到 20000，且資料分佈為隨機分佈。得到的結果為 ![runtime](https://hackmd.io/_uploads/rkwrqwJrC.png) 參考 [去除極端值](https://hackmd.io/@sysprog/linux2024-integration/%2F%40sysprog%2Flinux2024-integration-a#%E7%94%A8%E7%B5%B1%E8%A8%88%E6%89%8B%E6%B3%95%E5%8E%BB%E9%99%A4%E6%A5%B5%E7%AB%AF%E5%80%BC) 使用統計手法取 100 次的結果並去除極端值的結果為 ![runtime_com](https://hackmd.io/_uploads/ry6fCo88A.png) 使用 $A nlg2(n) + Bn$ 進行曲線擬合後的結果為 $27.3 nlg2(n) + 2.08 \times 10^5$ 為 $O(nlgn)$ 的演算法 ![Figure_1](https://hackmd.io/_uploads/BJE0CiLLR.png) ### 閱讀 <[Engineering a Sort Function](https://cs.fit.edu/~pkc/classes/writing/papers/bentley93engineering.pdf)> [ksort](https://github.com/sysprog21/ksort) 使用的程式碼基於 [Engineering a Sort Function](https://cs.fit.edu/~pkc/classes/writing/papers/bentley93engineering.pdf) 這篇文章探討的 qsort ，其對 qsort 排序做了以下討論 #### 在 n < 7 時使用 insertion sort 由於在少量部份有序的資料分佈時插入排序法的比較次數會低於快速排序法，以 [1,2,3,4,5,10,9,8,7,6] 這個資料分佈為例，插入排序法需要使用前半段 : $5$次加上後半段 : $\frac{(5 - 1)5}{2} = 10$次共 20 次比較而快速排序法最少也需使用 $10lg10 = 33$次比較，因此在少量且部份有序的資料中使用插入排序法比使用快速排序法合適，在此論文的程式碼中以 n < 7 為界，當資料量小於 7 或快速排序法將資料量切割到 7 以下時會轉為使用插入排序法。 ```c if (n < 7) { for (pm = (char *) a + es; pm < (char *) a + n * es; pm += es) for (pl = pm; pl > (char *) a && CMP(thunk, pl - es, pl) > 0; pl -= es) q_swap(pl, pl - es); return; } ``` 以下測試將資料量由 3 遞增到 19，並以 3 個元素為週期作階梯狀的遞增如 [0,1,2,0,1,2] 分別使用單純的插入排序或快速排序作測試結果如下 ![runtime](https://hackmd.io/_uploads/rkDd83hBR.png) 可以看到在 n < 19 前插入排序法的速度皆快於快速排序法。 #### Pivot 若 pivot 的選擇正好為資料的中位數，此時比較次數為最佳情況 $nlgn$ ，通常會將選擇 pivot 的手法為隨機選取，如此一來的平均比較次數為 $1.386nlgn$ ，為了進一步減少前方的常數，能夠隨機選取三個數並取其中位數，這麼作能夠將平均比較次數降為 $1.188nlgn$ ，但其會增加交換所需的時間，因此對於數量小的數列不適合，而對於數量更大的數列，能夠進一步取樣三次，每次取樣取三個元素計算其中位數得出三組中位數，並以三組中位數的中位數作分割，此作法能夠進一步降低平均比較次數，但相較隨機取三個數的中位數會增加至多 12 次的比較次數。中位數的尋找使用 ```c static inline char *med3(char *a, char *b, char *c, cmp_t *cmp, __attribute__((unused)) void *thunk) { return CMP(thunk, a, b) < 0 ? (CMP(thunk, b, c) < 0 ? b : (CMP(thunk, a, c) < 0 ? c : a)) : (CMP(thunk, b, c) > 0 ? b : (CMP(thunk, a, c) < 0 ? a : c)); } ``` 按照以下的決策樹尋找三個數的中位數 ![2024-06-23 01-40-31 的螢幕擷圖](https://hackmd.io/_uploads/B1Z1pYNLA.png) 若要尋找三個中位數的中位數 ```c pm = (char *) a + (n / 2) * es; if (n > 7) { pl = (char *) a; pn = (char *) a + (n - 1) * es; if (n > 40) { d = (n / 8) * es; pl = med3(pl, pl + d, pl + 2 * d, cmp, thunk); pm = med3(pm - d, pm, pm + d, cmp, thunk); pn = med3(pn - 2 * d, pn - d, pn, cmp, thunk); } pm = med3(pl, pm, pn, cmp, thunk); } ``` 可以看出在 `ksort` 的實作中，若 n <= 7 則直接以數列中間的元素作為 pivot，而當 7 < n <= 40 時則取頭, 中間, 尾三個元素的中位數作為 pivot，當數列的元素多於 40 個時則用分別以頭, 中間, 尾三個元素為選擇點，各取其和距離 d (此處 d 為 n / 8) 的兩個元素共三個元素的中位數，再取三個結果的中位數作為 pivot 透過設定全域變數 `cmp_cnt` 並在調用比較函式時使其加 1 可以得到不同資料數量時對應的比較次數 ```c //kernel space sort_mod.c static size_t cmp_cnt = 0; static int num_compare(const void *a, const void *b) { cmp_cnt++; return (*(int *) a - *(int *) b); } static ssize_t sort_write(struct file *file, const char *buf, size_t size, loff_t *offset) { return cmp_cnt; } //userspace com_cnt = write(fd, msg, strlen(msg)); ``` 以下針對不同資料分別為 rand, sawtooth, stagger, shuffle, plateau 的情況對三種不同的 pivot 選擇策略進行比較次數的量測，對於資料的切換可以透過終端機命令 ``` sudo ./user qsort [rand/sawtooth/stagger/shuffle] ``` 做切換 > commit [978d1d9](https://github.com/tintinjian12999/ksort/commit/978d1d91c7d0e75b105065dab7219e3f4d74081d) ##### rand ```c inbuf[i] = rand() % n; ``` 數據分佈如下 ![data_dis](https://hackmd.io/_uploads/Sy8ej6MU0.png) 比較次數 ![comparison](https://hackmd.io/_uploads/BJjJ5t480.png) ##### sawtooth ```c inbuf[i] = i % 5; ``` 數據分佈如下 ![data_dis](https://hackmd.io/_uploads/HJd5cTG8C.png) 比較次數 ![comparison](https://hackmd.io/_uploads/B1B45FEIR.png) ##### stagger ```c inbuf[i] = (i * 100 + i) % n; ``` 數據分佈如下 ![data_dis](https://hackmd.io/_uploads/SJjMipML0.png) 比較次數 ![comparison](https://hackmd.io/_uploads/Hk5IcK4LA.png) ##### shuffle ```c inbuf[i] = (rand() % 2) ? (j += 2) : (k += 2); ``` 數據分佈如下 ![data_dis](https://hackmd.io/_uploads/rkYEjazLA.png) 比較次數 ![comparison](https://hackmd.io/_uploads/S1uXsjL8A.png) ##### plateau ```c inbuf[i] = ((i * 100 + i) % n < 500) ? 0 : 500; ``` 數據分佈如下 ![data_dis](https://hackmd.io/_uploads/ryRSAazIA.png) 比較次數 ![comparison](https://hackmd.io/_uploads/rkNviYN8R.png) 總的來看取三組中位數的中位數表現比較穩定且在大多數情況下比較次數小於另外兩種方法。 #### Partition 為了加速原始 quick sort 分割的速度，其中一種方法是利用兩個指標分別從頭和尾進行掃描，使數列分割成下圖所示的結果 ![2024-06-23 22-51-19 的螢幕擷圖](https://hackmd.io/_uploads/SyG2I2rUA.png) 具體作法為先將 pivot 與數列的第一個元素交換，後分別從第二個元素(以 i 為指標)和最後一個元素 (以 j 為指標)，接著往後移動 i 直到其指向的元素大於 pivot 並往前移動 j 直到其指向的元素小於 pivot 為止，而後交換兩者指向的元素，如此重複直到 j 超越 i 也就是 j < i，最後再將 pivot 與 j 指標的位址交換便能將數列分割成小於等於, 等於, 大於等於三個部份。以下圖為例 ```graphviz digraph structs { node[shape=plaintext]; node[shape=box]; struct0 [label= "4"]; struct1 [label= "1"]; struct2 [label= "3"]; struct3 [label= "5"]; struct4 [label= "2"]; struct5 [label= "7"]; { rank="same"; struct0 -> struct1 struct1 -> struct2 struct2 -> struct3 struct3 -> struct4 struct4 -> struct5 } } ``` 假設選定 pivot 為 3，首先將 3 與第一個元素 4 做交換 ```graphviz digraph structs { node[shape=plaintext]; node[shape=box]; struct0 [label= "3"]; struct1 [label= "1"]; struct2 [label= "4"]; struct3 [label= "5"]; struct4 [label= "2"]; struct5 [label= "7"]; { rank="same"; struct0 -> struct1 struct1 -> struct2 struct2 -> struct3 struct3 -> struct4 struct4 -> struct5 } } ``` 選定 i, j 並分別掃描 ```graphviz digraph structs { node[shape=plaintext]; i [label = "i"]; j [label = "j"]; node[shape=box]; struct0 [label= "3"]; struct1 [label= "1"]; struct2 [label= "4"]; struct3 [label= "5"]; struct4 [label= "2"]; struct5 [label= "7"]; i->struct1; j->struct5; { rank="same"; struct0 -> struct1 struct1 -> struct2 struct2 -> struct3 struct3 -> struct4 struct4 -> struct5 } } ``` ```graphviz digraph structs { node[shape=plaintext]; i [label = "i"]; j [label = "j"]; node[shape=box]; struct0 [label= "3"]; struct1 [label= "1"]; struct2 [label= "4"]; struct3 [label= "5"]; struct4 [label= "2"]; struct5 [label= "7"]; i->struct2; j->struct4; { rank="same"; struct0 -> struct1 struct1 -> struct2 struct2 -> struct3 struct3 -> struct4 struct4 -> struct5 } } ``` 交換 ```graphviz digraph structs { node[shape=plaintext]; i [label = "i"]; j [label = "j"]; node[shape=box]; struct0 [label= "3"]; struct1 [label= "1"]; struct2 [label= "2"]; struct3 [label= "5"]; struct4 [label= "4"]; struct5 [label= "7"]; i->struct2; j->struct4; { rank="same"; struct0 -> struct1 struct1 -> struct2 struct2 -> struct3 struct3 -> struct4 struct4 -> struct5 } } ``` ```graphviz digraph structs { node[shape=plaintext]; i [label = "i"]; j [label = "j"]; node[shape=box]; struct0 [label= "3"]; struct1 [label= "1"]; struct2 [label= "2"]; struct3 [label= "5"]; struct4 [label= "4"]; struct5 [label= "7"]; i->struct3; j->struct3; { rank="same"; struct0 -> struct1 struct1 -> struct2 struct2 -> struct3 struct3 -> struct4 struct4 -> struct5 } } ``` ```graphviz digraph structs { node[shape=plaintext]; i [label = "i"]; j [label = "j"]; node[shape=box]; struct0 [label= "3"]; struct1 [label= "1"]; struct2 [label= "2"]; struct3 [label= "5"]; struct4 [label= "4"]; struct5 [label= "7"]; i->struct3; j->struct2; { rank="same"; struct0 -> struct1 struct1 -> struct2 struct2 -> struct3 struct3 -> struct4 struct4 -> struct5 } } ``` 交換 pivot 與 j 指向的元素 ```graphviz digraph structs { node[shape=plaintext]; i [label = "i"]; j [label = "j"]; node[shape=box]; struct0 [label= "2"]; struct1 [label= "1"]; struct2 [label= "3"]; struct3 [label= "5"]; struct4 [label= "4"]; struct5 [label= "7"]; i->struct3; j->struct2; { rank="same"; struct0 -> struct1 struct1 -> struct2 struct2 -> struct3 struct3 -> struct4 struct4 -> struct5 } } ``` 如此便完成分割。但這種分割方法在遇到大量相同資料的時候會花費較多的時間，因為其不能很好區分出等於 pivot 與小於大於 pivot 的元素，因此 ksort 實際上使用的是四個指標將數組如下分割成等於, 小於, 大於, 等於 pivot 四個區域，並在最後將等於的區域交換到數列中央 ![2024-06-23 23-56-57 的螢幕擷圖](https://hackmd.io/_uploads/HkYZL6B8R.png) 具體作法如下首先同樣將 pivot 交換到數列開頭，有四個指標，a 與 b 一開始起點為數列的起始點，接著移動 b ，過程中若遇到和 pivot 相同的元素則將其和 a 指向的元素作交換，交換的同時增加 a ，c 與 d 也是作相同的操作，只是交換後是將 d 作遞減，直到 d < c，爾後將與 pivot 相等的部份交換到數列中央。用以下數列作示範，假設 pivot 為 3 ```graphviz digraph structs { node[shape=plaintext]; a [label = "a"]; b [label = "b"]; d [label = "d"]; c [label = "c"]; node[shape=box]; struct0 [label= "3"]; struct1 [label= "8"]; struct2 [label= "2"]; struct3 [label= "7"]; struct4 [label= "3"]; struct5 [label= "3"]; struct6 [label= "1"]; struct7 [label= "2"]; a->struct0; b->struct0; d->struct7; c->struct7; { rank="same"; struct0 -> struct1 struct1 -> struct2 struct2 -> struct3 struct3 -> struct4 struct4 -> struct5 struct5 -> struct6 struct6 -> struct7 } } ``` ```graphviz digraph structs { node[shape=plaintext]; a [label = "a"]; b [label = "b"]; d [label = "d"]; c [label = "c"]; node[shape=box]; struct0 [label= "3"]; struct1 [label= "8"]; struct2 [label= "2"]; struct3 [label= "7"]; struct4 [label= "3"]; struct5 [label= "3"]; struct6 [label= "1"]; struct7 [label= "2"]; a->struct0; b->struct1; d->struct7; c->struct6; { rank="same"; struct0 -> struct1 struct1 -> struct2 struct2 -> struct3 struct3 -> struct4 struct4 -> struct5 struct5 -> struct6 struct6 -> struct7 } } ``` ```graphviz digraph structs { node[shape=plaintext]; a [label = "a"]; b [label = "b"]; d [label = "d"]; c [label = "c"]; node[shape=box]; struct0 [label= "3"]; struct1 [label= "1"]; struct2 [label= "2"]; struct3 [label= "7"]; struct4 [label= "3"]; struct5 [label= "3"]; struct6 [label= "8"]; struct7 [label= "2"]; a->struct0; b->struct2; d->struct7; c->struct5; { rank="same"; struct0 -> struct1 struct1 -> struct2 struct2 -> struct3 struct3 -> struct4 struct4 -> struct5 struct5 -> struct6 struct6 -> struct7 } } ``` ```graphviz digraph structs { node[shape=plaintext]; a [label = "a"]; b [label = "b"]; d [label = "d"]; c [label = "c"]; node[shape=box]; struct0 [label= "3"]; struct1 [label= "1"]; struct2 [label= "2"]; struct3 [label= "7"]; struct4 [label= "3"]; struct5 [label= "2"]; struct6 [label= "8"]; struct7 [label= "3"]; a->struct0; b->struct3; d->struct7; c->struct5; { rank="same"; struct0 -> struct1 struct1 -> struct2 struct2 -> struct3 struct3 -> struct4 struct4 -> struct5 struct5 -> struct6 struct6 -> struct7 } } ``` ```graphviz digraph structs { node[shape=plaintext]; a [label = "a"]; b [label = "b"]; d [label = "d"]; c [label = "c"]; node[shape=box]; struct0 [label= "3"]; struct1 [label= "1"]; struct2 [label= "2"]; struct3 [label= "2"]; struct4 [label= "3"]; struct5 [label= "7"]; struct6 [label= "8"]; struct7 [label= "3"]; a->struct0; b->struct4; d->struct6; c->struct4; { rank="same"; struct0 -> struct1 struct1 -> struct2 struct2 -> struct3 struct3 -> struct4 struct4 -> struct5 struct5 -> struct6 struct6 -> struct7 } } ``` ```graphviz digraph structs { node[shape=plaintext]; a [label = "a"]; b [label = "b"]; d [label = "d"]; c [label = "c"]; node[shape=box]; struct0 [label= "3"]; struct1 [label= "3"]; struct2 [label= "2"]; struct3 [label= "2"]; struct4 [label= "1"]; struct5 [label= "7"]; struct6 [label= "8"]; struct7 [label= "3"]; a->struct1; b->struct5; d->struct6; c->struct4; { rank="same"; struct0 -> struct1 struct1 -> struct2 struct2 -> struct3 struct3 -> struct4 struct4 -> struct5 struct5 -> struct6 struct6 -> struct7 } } ``` ```graphviz digraph structs { node[shape=plaintext]; a [label = "a"]; b [label = "b"]; d [label = "d"]; c [label = "c"]; node[shape=box]; struct0 [label= "2"]; struct1 [label= "2"]; struct2 [label= "1"]; struct3 [label= "3"]; struct4 [label= "3"]; struct5 [label= "3"]; struct6 [label= "7"]; struct7 [label= "8"]; a->struct1; b->struct5; d->struct6; c->struct4; { rank="same"; struct0 -> struct1 struct1 -> struct2 struct2 -> struct3 struct3 -> struct4 struct4 -> struct5 struct5 -> struct6 struct6 -> struct7 } } ``` 如此分割完成程式碼實作如下 ```c q_swap(a, pm); pa = pb = (char *) a + es; pc = pd = (char *) a + (n - 1) * es; for (;;) { while (pb <= pc && (r = CMP(thunk, pb, a)) <= 0) { if (r == 0) { if (swap_cnt == 0) swap_cnt = 1; q_swap(pa, pb); pa += es; } pb += es; } while (pb <= pc && (r = CMP(thunk, pc, a)) >= 0) { if (r == 0) { if (swap_cnt == 0) swap_cnt = 1; q_swap(pc, pd); pd -= es; } pc -= es; } if (pb > pc) break; q_swap(pb, pc); swap_cnt = 1; pb += es; pc -= es; } pn = (char *) a + n * es; r = min(pa - (char *) a, pb - pa); vecswap(a, pb - r, r); r = min(pd - pc, pn - pd - (long) es); vecswap(pb, pn - r, r); ``` #### 進一步結合 quick sort 與 insertion sort 除了在 n < 7 時使用 insertion sort 以外，為了進一步避免 quick sort 的最差情況，ksort 的實作會在數列快排序好時試著使用 insertion sort，在前一上面 partition 的實作程式碼中可以看到當發生元素交換時會將 `swap_cnt` 設為 1，這代表 `swap_cnt` 若為 0 的話表示原本的數列已經被分為小於 pivot, 等於 pivot, 大於 pivot 三區了，因此這裡猜測此時數列應該已經接近排序完成而會試著使用 insertion sort。 ```c if (swap_cnt == 0) { /* Switch to insertion sort */ r = 1 + n / 4; /* n >= 7, so r >= 2 */ for (pm = (char *) a + es; pm < (char *) a + n * es; pm += es) for (pl = pm; pl > (char *) a && CMP(thunk, pl - es, pl) > 0; pl -= es) { q_swap(pl, pl - es); if (++swap_cnt > r) goto nevermind; } return; } ``` 但同時也會檢查是否真是如此，這裡會在發生交換時遞增 swap_cnt 若超過一個臨界則繼續對現有的數列作 partition。 ```c nl = (pb - pa) / es; nr = (pd - pc) / es; if (nl > 100 && nr > 100) { struct qsort *q = kmalloc(sizeof(struct qsort), GFP_KERNEL); init_qsort(q, a, nl, c); queue_work(workqueue, &q->w); } else if (nl > 0) { qs->a = a; qs->n = nl; /* The struct qsort is used for recursive call, so don't free it in * this iteration. */ do_free = false; qsort_algo(w); } if (nr > 0) { a = pn - nr * es; n = nr; goto top; } ``` 當需要繼續 partition 時若元素的數量過多則會創建一個新的任務並加入到 work_queue 中。其餘小於 pivot 的數列會繼續往下遞迴而大於 pivot 的數列會回到函式一開始的部份重新作 partition ，這麼作的目的是為了防止遞迴的深度過深。 ### 不同資料的執行時間測量五種不同資料分佈對核心執行時間的影響 ![runtime](https://hackmd.io/_uploads/r1JPuiILC.png) 可以看到 qsort 對於 sawtooth 和 plateau 這類有大量重複元素的資料分佈效果較好 ### 整合 lib/sort.c #### lib/sort.c 為了使用 lib/sort.c 作為對照組，在將 lib/sort.c 引入 ksort > [commit](https://github.com/tintinjian12999/ksort/commit/6d2cce5cbbbd670603ec1747be3f8eb88f03617c) 同樣隨機資料的執行結果如下 ![runtime](https://hackmd.io/_uploads/SyQ28z5HR.png) #### 改變資料分佈 ##### sawtooth ![runtime_com](https://hackmd.io/_uploads/SkpqFjUIC.png) 時間複雜度:$2.79 nlg2(n) + 7.65 \times 10^4$ ##### stagger ![runtime_com](https://hackmd.io/_uploads/H1b6KiULC.png) 時間複雜度:$25.8 nlg2(n) + 2.69 \times 10^5$ ##### shuffle ![runtime_com](https://hackmd.io/_uploads/ryrlqiII0.png) 時間複雜度:$29 nlg2(n) + 2.06 \times 10^5$ ##### plateau ![runtime_com](https://hackmd.io/_uploads/SyaZqsIU0.png) 時間複雜度:$1.19 nlg2(n) + 5.4 \times 10^4$ 同樣能夠發現 qsort 在有大量重複元素的資料分佈上較 lib/sort.c 具有優勢，並且執行時間皆為 O(nlgn)。