第 4, 5, 6 週課堂問答簡記

--- tags: linux2023 --- # 第 4, 5, 6 週課堂問答簡記 ## yeiogy123 ### [第四週測驗題](https://hackmd.io/@sysprog/linux2023-quiz4) 提到的精簡紅黑樹，為何在沒有親代節點的狀況可以正常執行? 在 rb.h 中 ```cpp x_prefix##path_entry_t path[RB_MAX_DEPTH]; x_prefix##path_entry_t *pathp; ``` 的 path array 可以儲存了原本 node 所指向的 left/ right child 回家請計算空間複雜度 : O(N) ## YangYeh-PD ### 測試 Linux 核心的紅黑樹在[第四週測驗題](https://hackmd.io/@sysprog/linux2023-quiz4)中，考慮以下程式碼 ```cpp template <typename T> double getDepth(T tree, T null = static_cast<T>(0)) { auto depth(doGetGepth(tree, null, 0U)); return depth.first ? double(depth.second) / depth.first : 0; } ``` :::warning 揣摩紅黑樹 $\Theta$ 的高度計算方式。 ::: 上面的 `template` 在做什麼？它在創造一個 **function template**。在這個程式碼，`getDepth` 函式在 `main` 才會被呼叫(如下列程式碼)，而當中所傳遞的參數類型則是在 `main` 裡才被==決定==的，因此我們利用 function template，就可以增加 `getDepth` 的使用彈性。 ```cpp int main(int argc, char *argv[]) { tree_t tree; tree_new(&tree); ... cout << " RB depth: " << getDepth(tree.root) << endl; ``` 我們發現 `getDepth` 裡面又呼叫了 `doGetGepth` 函式，其中的實作如下: ```cpp std::pair<unsigned int, unsigned long long> doGetGepth(node_t *tree, node_t *null, unsigned int level) { if (tree == null) return std::make_pair(0U, 0ULL); auto left(doGetGepth(rbtn_left_get(node_t, link, tree), null, level + 1)), right(doGetGepth(rbtn_right_get(node_t, link, tree), null, level + 1)); return {1U + left.first + right.first, left.second + right.second + level}; } ``` 請問這個函式在做什麼事情?找出紅黑樹的高。但是我們可以實際執行看看，我們發現結果既然包含小數點... ``` RB depth: 18.8082 ``` :::info 補充 : 藉由這個測試，我們可以看出在 Linux OS 與 WSL 執行程式碼的效率差異。當資料量越大的時候，執行效率差越多。以下是輸入 $10^6$ 資料點的對比結果。 **On Linux** ``` tree_insert time: 0.421294 seconds RB depth: 18.8082 tree_search time: 0.471369 seconds alternative tree_search time: 0.544996 seconds tree_remove time: 0.486769 seconds ``` **WSL** ``` tree_insert time: 0.718512 seconds RB depth: 18.6851 tree_search time: 0.621801 seconds alternative tree_search time: 0.737633 seconds tree_remove time: 0.776088 seconds ``` ::: 首先第一個 `if` 在做什麼事? 先判斷該樹是否有任何節點。如果沒有，則直接 return。那第二個呢?從這我們可以知道它是透過遞迴關係做什麼? 它利用兩個遞迴分別走訪左子樹與右子樹的節點，並計算最深的高。 ### 證明 AVL Tree 的高 :::success AVL Tree 的樹高屬於 $1.44 \times O[log_2(n)]$。 ::: 想要證明它，我們需要先證明以下的引理 $(lemma)$。 :::success 在高度為$\ h$ 的 AVL Tree 當中，最少擁有 $F_h$ 個節點。其中 $F_n$ 為[費波那契數列](https://zh.wikipedia.org/wiki/%E6%96%90%E6%B3%A2%E9%82%A3%E5%A5%91%E6%95%B0)。 > 對照 [The Fibonacci number of Fibonacci trees and a related family of polynomial recurrence systems](http://math.sun.ac.za/swagner/fibotrees.pdf) ::: $proof\ of\ lemma:$ 1. 當樹高 $h = 0$ 時，樹的節點 $n_{h=0}$ 至少為 $0$ 個，因此$$n_{h=0}=0 \ge F(0)=0 \implies h=0$$ 成立。 2. 當樹高 $h = 1$ 時，樹的節點 $n_{h=1}$ 至少為 $2$ 個，因此$$n_{h=1}=2 \ge F(1)=1 \implies h=1$$ 成立。 3. 我們假設當樹高為 $h-1$ 與 $h-2$ 時，上述等式成立，即$$\left\{ \begin{array}{c} n_{h-1} \ge F(h-1) \\ n_{h-2} \ge F(h-2) \\ \end{array} \right.$$ 根據 AVL Tree 的定義，若以根為參考點，則==左子樹 $T_l$ 與右子樹 $T_r$ 之間高的差不能超過 $1$==。當左子樹比較高，有 $h-1$ 層，那右子樹最少有 $h-2$ 層。 ```graphviz digraph AVL { node[shape = circle width = 0.5]; root[label = "root"]; node[shape = triangle height = 2] left[label = "h - 1"]; node[shape = triangle height = 1.8] right[label = "h - 2"]; root -> left root -> right } ``` 因此，當 AVL Tree 的樹高為 $h$ 時，總節點數量$$n_h=1+n_{h-1}+n_{h-2} \ge 1+F(h-1)+F(h-2) = 1+F(h) \ge F(h)$$根據[數學歸納法](https://zh.wikipedia.org/zh-tw/%E6%95%B0%E5%AD%A6%E5%BD%92%E7%BA%B3%E6%B3%95)，我們可以證明在高度為$\ h$ 的 AVL Tree 當中，最少擁有 $F(h)$ 個節點。$\blacksquare$ $proof:$ 根據費波那契數列的數學特性，我們有以下結果 $$F_{n+2} \ge \phi^n,\ \ for\ \ n\ \ge 0 \\Where\ \phi\ is\ gloden\ ratio\ 1.618 $$ 因此，樹的總節點 $$\begin{split}\ n_h &\ge 1+F(h-1)+F(h-2) \\&\ge 1+\phi^{h-3}+\phi^{h-4} \ge \phi^{h-3} \\\implies\ log_{\phi}(n_h) &\ge h-3 \end{split}$$根據對數的==換底公式==$$\begin{split}\implies\ \dfrac{log_2(n_h)}{log_2(\phi)} &\ge h-3 \\\implies\ h &\le 3+\dfrac{log_2(n_h)}{log_2(\phi)} \\&\approx 3+1.44\ log_2(n_h) \in 1.44\times O\ [log_2(n_h)] \ \blacksquare\end{split} $$ :::warning TODO :: 證明紅黑樹的高為 $2\times O\ [log_2(n)]$。 ::: ## sherrygottalent 紅黑樹特性： - 根節點為黑色 - 紅色節點的子節點必為黑色的 - 所有 leaf(葉子節點) 必為黑色的 - 根節點到任一葉子節點，中間經過的黑色節點數量是一樣的那紅黑樹左樹跟右樹可能不平衡嘛，可以差到多少 from AVL tree，左右樹平衡係數相差到 2 的時候，會旋轉以得到平衡 ## tintinjian12999 ### 將 Bubble sort 轉換為 Insertion Sort Bubble sort ```clike void q_sort(struct list_head *head) { if (head == NULL || list_empty(head)) return; struct list_head *current, *tail; tail = head; element_t *current_node, *next_node; while (tail != head->next) { current = head->next; while (current->next != tail) { current_node = list_entry(current, element_t, list); next_node = list_entry(current->next, element_t, list); if (strcmp(current_node->value, next_node->value) > 0) list_move(current, current->next); else current = current->next; } tail = current; } } ``` Insertion sort ```clike void q_sort(struct list_head *head) { if (head == NULL || list_empty(head)) return; struct list_head *current, *target, *temp; target = head->next->next; element_t *current_node, *target_node; while (target != head) { temp = target->next; target_node = list_entry(target, element_t, list); current = head->next; while (current != target) { current_node = list_entry(current, element_t, list); if (strcmp(target_node->value, current_node->value) < 0) { list_move(target, current->prev); break; } else current = current->next; } target = temp; } } ``` ## Jerejere0808 分配 bn 的記憶體空間之前### 預先計算儲存 $F(n)$ 所需的空間以下節錄[2023 年 Linux 核心設計/實作課程作業 —— fibdrv Linux 核心模組的解說](https://hackmd.io/@sysprog/linux2023-fibdrv/%2F%40sysprog%2Flinux2023-fibdrv-b) 倘若一開始就知道該配置多少空間給 Fibonacci 運算，就能避免空間浪費，或在計算過程中重複呼叫 `krealloc`。首先，我們可用 [Binet 公式](https://en.wikipedia.org/wiki/Fibonacci_number)計算任意 Fibonacci 數列中第 $n$ 個數 $$Fn= \frac{(\phi^n)-(1-\phi)^n}{\sqrt5}$$ $$\phi = \frac{1+\sqrt5}{2} (golden \ ratio)$$ 上式的近似值: $$Fn= \frac{\phi^n}{\sqrt5} \ (since\ (1-\phi)^n\ is\ very\ small\ for\ large\ n)$$ 知道近似值後，我們可使以 10 為底的對數計算其位數。具體如下： $$ Digits = \log_{10}(\frac{\phi^n}{\sqrt5}) \\ Digits = n\log_{10}\phi - (\frac{\log_{10}5}{2}) \\ $$ 當 $n$ 為足夠大的正整數時， $$ F(n)\approx0.4472135955\times1.61803398875^n $$ 將近似值取 2 為底的對數，可得到需要使用的位元數 $$ log_2(0.4472135955\times1.61803398875^n)\approx-1.160964 + n\times0.694242 $$ 再除以 32 可得到需要使用的 `uint32` 數量 $$ (-1.160964 + n\times0.694242)\div32 = -0.036280125 + n\times0.0216950625 $$ Linux 核心無法使用 (正確說法是，不能直接使用) 浮點數運算，因此將算式乘以 $10^{10}$，亦即: $$\lfloor\frac{-362801250 + n\times216950625}{10^{10}}\rfloor+1 $$ 此算式的結果就是該事先配置的 `uint32` 數量。參考的 C 程式: ```c #define BUFFSIZE (8 * sizeof(int) * LENGTH / 3 + 2) #define LOGPHI (0.20898764025) #define LOGSQRT5 (0.34948500216) int main() { int offset = 100; /* TODO: try test something bigger than the limit */ int fd = open(FIB_DEV, O_RDWR); if (fd < 0) { perror("Failed to open character device"); exit(1); } float digits; int size; for (int i = 0; i <= offset; i++) { // calculate how many digits are needed for fib(i) // digits needed for fib(n) = n*LOG(Phi) - (LOG √5) digits = i * LOGPHI - LOGSQRT5; float buf_size = floor(digits / 9); size = (int) buf_size; char *buf = malloc(sizeof(char) * (BUFFSIZE * size)); lseek(fd, i, SEEK_SET); long long time1 = read(fd, buf, 0); long long time2 = read(fd, buf, 1); printf("%d %lld %s\n", i, time1, buf); free(buf); } ``` 觀察計算 fibonacci 的時間會發現會有震盪的奇況每過一段時間執行時間就會往上跳一小段是因為 cache miss，若要排除這種 cache 對執行時間所造成的影響可以藉由降低 realloc 次數，因為 realloc 可能導致需要將原本的資料搬到更大的記憶體空間而新的空間並不是在原本的附近，造成原本 cache 裡的資料需要被更新。 ## chun61205 ### 使用 `uint128_t` 最多可以表示到第幾個 fibonacci 數？參考 [yanjiew1](https://hackmd.io/v4itYIvpToq0rICnuRAvLQ?both) 透過公式解可直接計算第 $n$ 個費氏數： $$ F(n)= \frac{(\phi^n)-(1-\phi)^n}{\sqrt5} \\ \phi \text{ (golden ratio)} = \frac{1+\sqrt5}{2} \approx 1.61803398875 $$ 當 n 足夠大時，可得到約略值: $$ F(n) \approx \frac{\phi^n}{\sqrt5} \approx 0.4472135955 \times 1.61803398875^n $$ 取得以 2 為底的對數 $$ \log F(n) \approx -1.160964 + n\times0.694242 $$ 這裡的 $logF(n)$ 正好表示需要多少 bits 才能完全表達 $F(n)$ 。 $$ logF(93) \approx 63.4 $$ 因此會造成 overflow 。若是代入 $logF(n)=128$ ，則 $n\approx186$ 大概可以表示到 $F(186)$ ## GaberPlaysGame 用迭代法實作 Mergesort：給定一個長度為 `n` 的陣列 `A[n]`，並建立一個相同長度的陣列 `temp[n]`。 ```c mergeSort(int A[], int temp[], int left, int right) { for (int m = 1; m <= right-left; m *= 2) { for (int i = left; i < right; i += 2*m) { int bound = min(i + 2*m - 1, right); merge(A, temp, i, (i+m-1), bound); } } } ``` mergeSort 函式利用兩層 for 迴圈實作。 - 第一層迴圈處理每次進行合併的範圍長度，由 1 開始以 2 的指數增加，直至長度 m 超過陣列的範圍 n。 - 第二層迴圈處理每次在 A[n] 內的遍歷，以第一層迴圈產出的 m 為基礎，每次執行 2m 長度的合併。 - `bound` 變數取 right (A[n] 陣列右邊界) 與本次遍歷的次陣列右邊界 (i+2m-1) 的最小值，來避免對超出陣列外的數值進行取值造成錯誤。 ```c merge(int A[], int temp[], int left, int mid, int right) { // i 與 j 表示兩個次陣列在 A[] 內的起始點 int i = left, j = mid + 1; // k 表示 temp 陣列的起始點 int k = left; // 進行迴圈直到兩個次陣列有一個已經到底 while (i <= mid && j <= right) { // 比較兩個次陣列數值並放入 temp 陣列 A[i] < A[j] ? temp[k++] = A[i++] : temp[k++] = A[j++] } // 如果較左的次陣列還沒到底，將剩餘的內容放入 temp while (i <= mid) { temp[k++] = A[i++]; } // 執行上面操作後剩下右邊的陣列可能還有值，但一定是已經排序的，所以不用管 // 把排序好的陣列 temp[] 重新貼回 A[] for (i = left; i < j; i++) A[i] = temp[i]; } ``` ## Thegoatistasty [quiz2 測驗2](https://hackmd.io/@pigwei/S1ax6He12#%E6%B8%AC%E9%A9%97-2) 最後一行如果不使用 modulus 可能造成 overflow > ans = (i | (ans << len)) % M; 要如何在不使用大數運算的情形下，支援 n 在 int (32 bits) 的範圍，同時避免 overflow ? 分析： 1. 因為一個 bit 可為 0 或 1，所以 32 bits 的正整數最多需要 62 + 60 + ... + 2 = 992 個 bits (第一個 bit 只會是 0)，也就是 124 bytes 才能表示結果 2. 可以在拿到 n 的時候就知道一共需要幾個 bits 表示答案 *根據分析 2 可知道一開始要宣告多少空間儲存答案* -------------- 參考資料：https://zhuanlan.zhihu.com/p/29768999 解釋如下 (假設已拿到 ans 數列)：其實就是一直 mod 10 然後除以 10 (文章裡為了手算方便才改為除以 5)，計算時改為 2 進位方式即可 ``` 以 1234 (binary:10011010010) 當例子 1234 = 123 * 10 + 4 123 = 12 * 10 + 3 12 = 1 * 10 + 2 1 = 0 * 10 + 1 在 2 進位就是 10011010010 = 1111011 * 1010 + 100 1111011 = 1100 * 1010 + 11 1100 = 1 * 1010 + 10 1 = 0 * 1010 + 1 ``` 如此一來就不需使用大數運算了 ## tab0822 ```c uint64_t next_pow2(uint64_t x) { x = -1+x<<2; b = __builtin_popcount(x); b--; for(i=0;i<b;i++) { x = x & (x-1); } return x; } ``` ## fennecJ 想請教教授為何說 [quiz2 測驗2](https://hackmd.io/@pigwei/S1ax6He12#%E6%B8%AC%E9%A9%97-2) 中的 ```c int concatenatedBinary(int n) { const int M = 1e9 + 7; int len = 0; /* the bit length to be shifted */ long ans = 0; for (int i = 1; i <= n; i++) { if (!(i & (i-1))) len++; ans = (i | (ans << len)) % M; } return ans; } ``` 其中 ```c ans = (i | (ans << len)) % M; ``` 會造成 overflow ? 你把 $2^{31}$ 代入 i 看看會發生什麼事？ $2^{31}$ 代入 i，此時 len = 32，但它仍然不會發生 overflow 不是嗎？因為 ans 在前一次的迭代中有 mod (1e9 + 7)，因此 ans 的最大值只會是 (1e9 + 6) ，而 (1e9 + 6) 的二進位為 $$ 0011\_1011\_1001\_1010\_1100\_1010\_0000\_0110 $$ 也就是說它最多只會佔用 30 個 bits，因此 ```c ans = (i | (ans << len)) % M; ``` 在 $i = 2^{31}$ 時會變為 ```graphviz digraph structs { node[shape=record, width = 2.5, height = .75 ]; struct1 [label="<f0> ans[61:32]|<f1> i[31:0]"]; } ``` 即 ans 佔用的位置為第 bit 32 ~ bit 61 ，而 i 佔用的位置是 bit 0 ~ bit 31 ，但 ans 本身有 64 bits 的空間，應該不會發生 overflow 才是。沒錯，但是不會發生 overflow 的原因是因為我們把 %M 放在 for loop 中，也就是在中間就做 modulo 了，但我想問的其實是如果全部做完之後才做 modulo 的有沒有辦法避免 overflow 。 ## chiwawachiwawa ### setjmp 非零的回傳當 setjmp() 返回非 0 的時候(通常情況下為指定過的值，用以確認出錯的位置)，也就是當生異常的情況下(通常情況下利用了 longjmp() 這個函式)，可以提醒我們來進行一些額外的處理，當 setjmp() 返回 0，則代表一切安好依照核心提供的 "setjmp.h" 我近日會研讀這周的 quiz 來探討更深的情況，這邊我先提供一些很簡單的案例。 ```c #include <stdio.h> #include <setjmp.h> jmp_buf jump_buffer; void error_handler() { printf("An error has occurred.\n"); } void do_something() { // Simulate an error condition int error_occurred = 1; if (error_occurred) { longjmp(jump_buffer, 1); } printf("Do something...\n"); } int main() { // Set the jump point if (setjmp(jump_buffer)) { // Jumped back here due to an error error_handler(); } else { // Normal execution path do_something(); } return 0; } ``` 這個例子中，我們使用 setjmp() 函式來設置一個跳轉點，然後在 do_something() 函式中模擬一個錯誤情況，如果發生了錯誤，就使用 longjmp() 函式跳轉回 setjmp() 函式呼叫的位置。 ## Thegoatistasty ### 為什麼要使用 `qsort_r` 而非 `qsort` ? 因為他們共同使用到比較的函式，會有牽涉 reentrancy 的問題(不是 race condition)。所以要透過事先宣告更多的變數，讓不同執行序可以使用不同變數。