2026-03-03/10 問答簡記

# 2026-03-03/10 問答簡記 :::info :information_source: 注意須知 1. 在下方填入你的 GitHub 帳號 (務必確保有效)、題目，和你的答覆及相關討論 2. 流程: 問 → 答 → 複問 → 總結回答 → 課後檢討補充並發送信件，完成此循環才算完成測驗 > 課後檢討的內容需要更新在本頁面，應提及答覆提問所需要的 C 語言規格描述、Linux 核心原始程式碼、推論、數學分析、可完整執行的程式碼 (放於 GitHub/gist，在此列出超連結和關鍵部分)、實驗、測試，和對應的討論 > 信件標題: `Linux2026/Week2: 法定姓名`，寄送到 `jserv.tw@gmail.com`，信件內文註明你的 GitHub 帳號並簡述自身在課程的付出狀況，以及對於期末專題的期許和規劃 > 信件內容不要有任何客套話，也不該提「您」或者無助於溝通、假裝有禮貌的詞彙 $\to$ 真正的「禮貌」是把事情做好，且互利共惠 > 可將問答過程加入[第一份作業](https://hackmd.io/@sysprog/linux2026-warmup)中 3. 務必以第一手材料 (教材、規格書、Linux 核心原始程式碼及其 git log，以及你自己的實驗) 為主 4. 關於期初測驗，當你確認充分進行第一份和第二份作業後 (這是前提)，請發信到上方第 2 點提及的電子郵件信箱，標題是 `Linux2026/Eval: 法定姓名`，並闡述你在四月可補考的時段 ::: :::success 速查單 * Linux 核心風格的鏈結串列: [list.h](https://github.com/sysprog21/lab0-c/blob/master/list.h) * Linux 核心的系統負載統計: [loadavg.c](https://github.com/torvalds/linux/blob/master/kernel/sched/loadavg.c) * Linux 核心原始程式碼的符號檢索: [LXR](https://elixir.bootlin.com) ::: ## What I cannot create, I do not understand ![Feynman](https://hackmd.io/_uploads/rJXW03XYWg.png) 美國理論物理學家費曼 (Richard Feynman) 在辦公室黑板書寫以下話語: * What I cannot create, I do not understand. (我所不能創造的東西，我就不瞭解) * Know how to solve every problem that has been solved. (知道如何解出每個已被解過的問題) ## 畢業前就該證明自己能解決工程難題當舉薦與任用充斥流弊，個人能力難以被客觀辨識，於是以科舉所代表的學歷制度，成為相對可檢驗、可比較的衡量機制。當高等教育普及，學位不再稀缺，文憑所承載的訊號強度隨之下降，社會開始轉而重視作品、專案經驗與可量化的實績。當 GenAI 能在極短時間內大量產出程式碼、設計稿與文章，作品本身的稀缺性再次被削弱，單純「產出」已不足以構成差異化優勢。在校生若僅滿足於取得一紙文憑，或僅止於準時完成課程指定作業，實則仍停留在上個世代評量框架之中。更前瞻的思維應該是：在畢業之前，如何已經能對真實世界的問題給出可驗證的解法？如何讓他人不必透過學歷推測能力，而能直接從你的貢獻與影響看見實力？如今參與開放原始碼協作正是有效的途徑。開放原始碼社群以實際程式碼與公開審查為核心，所有討論、修正與改進皆可被追溯。這種透明機制，使能力不再依賴頭銜背書，而是透過問題定位、架構設計、效能分析與長期維護等實質貢獻來體現。它同時要求溝通能力、跨文化協作與對品質的自我約束，這些特質往往比單一作品更具辨識度。 > 有效「溝通」變得更關鍵文憑或許仍是起點，但不應是終點。學習如何學習。 ## Meta 台北開缺 > [Meta Careers](https://www.metacareers.com/jobsearch?offices[0]=Taipei%2C%20Taiwan) Software Engineer, OS * 2+ years' Software Engineering experience in the following: device driver development, embedded systems, or operating systems * 2+ years’ experience working on systems software in a large-scale C/C++ code base * Experience with Software Development processes including: source control, bug tracking, and design documentation. * Experience in hardware bring up using interfaces like ADC, GPIO, SPI, I2C, etc * Experience in one or more of the following areas: SoC BSP/Board Support Package, Operating Systems, Android OS, RTOS/Zephyr, Bootloader, Power Management, Linux, Graphics and Display Drivers, MCU (Microcontroller) ## Qwen 研發團隊出走和人才的新定義 > [阿里 AI 核心林俊暘突辭職！曾警告中美差距股價重挫逾 4%](https://hk.finance.yahoo.com/news/%E9%98%BF%E9%87%8Cai%E6%A0%B8%E5%BF%83%E6%9E%97%E4%BF%8A%E6%9A%98%E7%AA%81%E8%BE%AD%E8%81%B7-%E6%9B%BE%E8%AD%A6%E5%91%8A%E4%B8%AD%E7%BE%8E%E5%B7%AE%E8%B7%9D-%E8%82%A1%E5%83%B9%E9%87%8D%E6%8C%AB%E9%80%BE4-074006033.html) [林俊暘](https://baike.baidu.com/item/%E6%9E%97%E4%BF%8A%E6%97%B8/67148281)的履歷相當特別，碩士畢業於北京大學外國語學院，2019 年加入阿里巴巴後，他參與多個關鍵模型的研究與工程實作，包括 OFA 與 Chinese CLIP 等多模態專案。隨著通義千問 (Qwen) 計畫在 2022 年啟動，他逐漸成為關鍵技術負責人，並帶領團隊推動 Qwen 系列模型的發展與開源。在大模型競賽迅速升溫的背景下，團隊持續推進模型能力，從長時序強化學習到跨領域資料平衡，再到多語言能力的提升，逐步擴展至更廣泛的多模態與具身智能方向。對一名技術領導者而言，短時間內承擔如此高密度的研發與產品責任，正是大模型時代的典型縮影。中國大型科技企業向來重視平台與商業體系，開放原始碼通常並非其長期策略核心。阿里過去確實推出過多個開放原始碼專案，但整體而言，公司並未持續建立起一個穩定且具有全球影響力的開源品牌。相比之下，千問團隊在大模型浪潮中迅速崛起，反而使開放模型成為其在公司內部建立技術影響力的重要途徑。藉由開放原始碼模型吸引開發者社群與產業關注，團隊不僅擴大技術影響力，也在公司內部形成更高的戰略地位。對任何大型企業而言，最終仍必須回到清晰的商業閉環。當模型品牌、開發者生態與社群影響力逐漸形成，這些投入如何轉化為可持續的商業價值，始終是一個無法迴避的課題。無論是依託阿里雲的 AI 平台能力，還是藉由千問產品建立直接面向使用者的應用場景，都需要在開放與商業化之間取得長期平衡。 > 延伸閱讀: [千問核心人才大地震，阿里大模型要涼？](https://x.com/BiteyeCN/status/2029399969564901843) ## Donald Knuth 教授的驚嘆 Hacker News 上有一則[引起廣泛討論的事件](https://news.ycombinator.com/item?id=47230710)，與 Donald Knuth 及大型語言模型在數學探索有關。事情的背景是 Knuth 在撰寫《[The Art of Computer Programming](https://en.wikipedia.org/wiki/The_Art_of_Computer_Programming)》(TAOCP) 新卷時，遇到一個與有向圖 Hamiltonian cycle 分解相關的組合數學問題。考慮一個具有 $m^3$ 個頂點的有向圖。每個頂點以三個座標 $(i,j,k)$ 表示，其中 $0 \le i,j,k < m$。從每個頂點出發有三條有向邊，分別指向 $(i+1,j,k)$、$(i,j+1,k)$、$(i,j,k+1)$，其中加法採用模 $m$ 的運算。因此整個圖共有 $m^3$ 個頂點與 $3m^3$ 條邊。問題是是否可以將所有邊恰好分解為三條 Hamiltonian cycle，使得每條 cycle 都遍歷所有 $m^3$ 個頂點一次，而三條 cycle 的邊集合互不重疊並共同覆蓋整個圖。 Knuth 能夠構造出 $m=3$ 的情況，但無法找到對所有 $m$ 的一般性構造。他推測對奇數 $m$ 應該存在解，但缺乏證明。Knuth 的同事 Filip Stappers 以程式搜尋的方式驗證到至少 $m \le 16$ 都存在解，然而仍無法歸納出結構性的規律。 Stappers 隨後將問題交給 Claude Opus 4.6，並要求系統在每次實驗後都將思考與實驗過程記錄到研究筆記檔案 `plan.md` 中。Claude 的探索過程大致包含三個階段。首先重新將問題形式化，例如將每個頂點對應到一個排列，藉此描述三條 cycle 在圖上的選邊方式。接著嘗試多種直覺性構造，例如線性函數、深度優先搜尋、以及由二維蛇形路徑延伸到三維的方式，但這些方法都在邊界處產生衝突而失敗。最後在分析搜尋結果時，Claude 注意到一個關鍵量 $s = (i+j+k) \bmod m$，這個量可以將整個圖分成多個結構相似的層，並形成一種可稱為 fiber decomposition 的分解方式。透過這種結構化觀察，Claude 最終找到一組簡潔的構造規則，可以對多個奇數 $m$ 生成三條 Hamiltonian cycle 的分解，並以程式驗證 $m=3,5,7,9,11$ 等案例均成立。 Knuth 隨後對該構造進行完整的數學分析與形式化證明，並將結果整理為論文 [Claude’s Cycles](https://cs.stanford.edu/~knuth/papers/claude-cycles.pdf)。論文開頭寫道： > Shock! Shock! I learned yesterday that an open problem I'd been working on for several weeks had just been solved by Claude Opus 4.6. 這句話表達他的驚訝，也意味著他需要重新評估自己對 GenAI 的看法。分析後，Knuth 發現 Claude 找到的構造並非唯一，而是屬於一個更大的解族。他辨識出大量具有類似結構的分解方式，並對這些結構給出系統性的描述。換言之，AI 的角色是在探索過程中找到一個關鍵構造，而後續的理論整理與證明仍由人類數學家完成。之所以受到關注，不僅因 AI 找到一個 combinatorics 構造，更因為整個探索過程呈現出類似研究工作的模式。系統先重新建模問題，嘗試不同假設，從失敗案例中抽取規律，最後形成可以泛化的構造。對許多研究者而言，這更像是一種輔助研究的工具，而不是單純的暴力搜尋或符號推理系統。 Knuth 在文章結尾還做一個輕鬆的雙關。他提到 Claude Shannon 是資訊理論的奠基者，而如今另一個名為 Claude 的系統在離散數學問題上帶來新的進展，彷彿 Shannon 的名字再次出現在資訊科學的前瞻發展。 --- ## ascodeasice > 鏈結串列在 Linux 核心中用在哪些地方 Linux 核心會使用 intrusive linked list，將鏈結串列嵌入 process 或其他 struct，讓 Linked list 可以重複使用在不同的物件。例如 Linux 核心中的 [sched.h](https://github.com/torvalds/linux/blob/v5.15/include/linux/sched.h) ```c struct sched_entity { /* For load-balancing: */ struct load_weight load; struct rb_node run_node; struct list_head group_node; unsigned int on_rq; ... }; ``` :::danger 除了列出程式碼，你真正該回答的議題是「為何如此」。強化你的論述，尤其考慮在面試場合中，該如何精準回答，且能反映出自己對於 Linux 核心的投入。 ::: Linux核心會使用鏈結串列管理需要動態加入、移除的資料，並且節點可以配置在非連續記憶體中。例如 hash table、LRU cache，或者是 wait queue 都會使用鏈結串列。例如在 hash table 中，會使用鏈結串列管理 bucket 的資料。而在 Linux 核心中會使用 intrusive linked list，讓定義好的 `list_head` 可以嵌入其他 struct 中,這樣可以重複使用`list_head` 的 API，並且不需要分開管理物件與節點的記憶體。但是 [Linux 核心文件](https://docs.kernel.org/core-api/list.html#introduction)中也有提到，鏈結串列在效能上，不一定是最好的資料結構，由於記憶體不連續導致 data locality 不佳，因此在重視效能的情況並不是好選擇。 > 為什麼 process 要用鏈結串列儲存資訊因為 Linux 核心會大量建立與刪除 process，需要分配與釋放記憶體，使用鏈結串列可以方便插入、刪除等。 :::danger 還有其他考量。 ::: Linux 核心使用 `task_struct` 記錄 process 資訊，而這個物件會使用鏈結串列表示 task 的 `children` 與 `sibling`，以及其他 task 關係。要使用鏈結串列的原因是因為這些關係是動態的，會隨著 fork、exit 等系統呼叫持續變動，因此需要方便插入、刪除、走訪的資料結構，並且鏈結串列的節點不需要連續配置在記憶體中，因此適合描述這些關係。 > 參考以下 struct，設計一個用來排序鏈結串列的 sort 函式型別 ```c typedef struct list_item { int value; struct list_item *next; } list_item_t; typedef struct { list_item_t *head; } list_t; ``` ```c typedef int (*list_cmp_func_t)(const list_item_t *, const list_item_t *); void sort(list_item_t **head, list_cmp_func_t cmp); ``` > 為什麼使用 `list_t *` 作為 `sort` 函式的參數型別會比使用 `list_item_t **` 更好設計成 `void sort(list_item_t *list, list_cmp_func_t cmp);`，可以使用 `list->head` 操作鏈結串列，而不需要操作間接指標，使程式更容易理解。 > comparator 的功能是什麼、回傳值會有哪幾種情況？ comparator 是用來比較兩筆資料大小並回傳結果的函式，回傳值可能會有正整數、0、負整數，分別表示第一筆資料大於、等於、小於第二筆資料的情況。 --- 補充：在某些排序函式中，comparator 可以修改為只回傳 <=0 與 >0 兩種值，用來表示 a 是否應該排在 b 後面（>0 則表示需要排在 b 後面），這樣有機會減少運算次數。 > 為什麼 `sort` 函式要有 comparator 作為參數？為了能重複使用 `sort` 函式的程式碼。將 comparator 作為 function pointer 傳入 `sort` 函式後，可以讓呼叫者自行定義要進行 ascending、descending 排序，或是使用其他比較方式。 :::danger 對應的執行時期成本是什麼？如何量化？ ::: 使用 `cmp` 參數的 `sort`，與內嵌比較程式碼的 `sort` 相比，會多出需要呼叫 `cmp` 函式的成本。成本主要來自於： 1. 將 return address push 到 stack、將兩個參數儲存至暫存器 2. 在 `cmp` 結束時跳轉到 return address 這些操作的 instruction 可以使用多出的 instruction count 量化並估計成本，但是實際使用幾個 instruction 取決於 ABI，在這個特定範例中，每次呼叫 `cmp` 會多出 4 個 instruction： 1. 將 parameter 1 儲存到暫存器 2. 將 parameter 2 儲存到暫存器 3. 跳躍到 cmp 的地址、將 return address 推到 stack (`call` instruction) 4. cmp 結束後，pop return address 並跳躍回 return address (`ret` instruction) 在這個範例中，與不使用 `cmp` 參數比較，可粗略估計每次呼叫 `sort` 總共會多出 $4\times \text{compare count}$ 個 instruction。廣義地說，使用 `cmp` 參數的 `sort`，相較於內嵌比較程式碼的 sort，其額外成本可近似視為與比較次數成正比。 > 如果我們想對鏈結串列做 descending 排序，應該如何設計 comparator 的型別與函式？ ```c typedef int (*list_cmp_func_t)(void *, const list_item_t *, const list_item_t *); int cmp_descending(const list_item_t *a, const list_item_t *b){ // NOTE: might overflow/underflow return b->value - a->value; } ``` > 目前實作中的 comparator 會遇到什麼問題？因為 value 的型別為 32-bit signed int，目前的 comparator 在 a 與 b 的 value 為一正一負時，可能會造成 integer overflow 或 underflow，因此不能直接使用減法運算。重新思考 descending 排序的回傳值： | 條件 | 回傳值 | | ----- | ------ | | a < b | 正整數 | | a = b | 0 | | a > b | 負整數 | 可以寫出 ```c int cmp_descending(const list_item_t *a, const list_item_t *b){ if(a->value < b->value) return 1 if(a->value > b->value) return -1 return 0; } ``` 但是此實作有 branch，可以進一步精簡為 branchless 的實作： ```c int cmp_descending(const list_item_t *a, const list_item_t *b){ return (a->value < b->value) - (b->value < a->value); } ``` 列出所有情況驗證： | 條件 | 回傳值 | | ----- | ------ | | a < b | 1-0=1 | | a = b | 0-0=0 | | a > b | 0-1=-1 | > 你會怎麼實作 merge sort？我會使用遞迴實作 merge sort，將過程分成拆分節點為左右兩半，與合併兩個步驟。問答中有提到 merge sort 並沒有一定將資料分割成兩半進行合併，而 Linux 核心中的 `list_sort` 考量到 cache friendly 與比較次數，選擇使用 2:1 比例的鏈結串列進行合併。 > 你的 merge sort 的遞迴式是什麼？（與程式實作） merge sort 的遞迴式為： - 如果只有零或一個節點：回傳輸入節點 - 如果有多個節點，回傳 `merge(sort(左半節點),sort(右半節點))` 參考[你所不知道的 C 語言: linked list 和非連續記憶體](https://hackmd.io/@sysprog/c-linked-list)與 [Merge Sort 與它的變化](https://hackmd.io/@lambert-wu/list-merge-sort#%E9%81%9E%E8%BF%B4%E7%89%88) ```c list_item_t *merge(list_item_t *a, list_item_t *b, list_cmp_func_t cmp) { list_item_t *head = NULL, **cur = &head; while (a && b) { list_item_t **node = cmp(a, b) <= 0 ? &a : &b; // ensure stability *cur = *node; cur = &(*cur)->next; *node = (*node)->next; } *cur = (list_item_t *)((uintptr_t)a | (uintptr_t)b); return head; } /* helper function for sort */ list_item_t *sort_list(list_item_t *head, list_cmp_func_t cmp) { if (!head || !head->next) { return head; } list_item_t *slow = head, *fast = head->next; while (fast && fast->next) { slow = slow->next; fast = fast->next->next; } // 此時 slow 在偶數長度的前半、奇數長度的中間 list_item_t *mid = slow->next; // 確保 mid 在偶數長度的中間、奇數長度的後半 slow->next = NULL; // 斷開 slow 和 mid list_item_t *left = sort_list(head, cmp); list_item_t *right = sort_list(mid, cmp); return merge(left, right, cmp); } // - stable // - O(n log n) void sort(list_t *list, list_cmp_func_t cmp) { list->head = sort_list(list->head, cmp); } ``` :::danger 指出程式碼的改進機會並落實 ::: 我們可以將目前的遞迴 merge sort 修改為 iterative bottom-up merge sort。改寫的主要理由是避免遞迴呼叫的 stack 開銷，而迭代版本的 stack 使用為常數，在 Linux Kernel 中的行為更可預測。只修改 `sort_list` 函式，並且新增 `split` 函式。 `split` 函式接受一個 list head 與長度 n，將 head 的前 n 個節點切割出來，並回傳剩餘節點的開頭。如果節點數量不足 n 個，則回傳 NULL。 `sort_list` 的邏輯如下： 1. 將鏈結串列視為多個已排序的 run，初始時每個節點是長度為 1 的 run 2. 每輪迴圈中，以當前 run size 為單位，用 `split` 切出相鄰的兩個 run 並 merge，處理完整條鏈結串列後將 run size 加倍 3. 當 run size 大於等於鏈結串列長度時，表示已合併為單一 run，回傳結果實作上使用 dummy 節點作為串接合併結果的起點，並且使用 `tail` 指標追蹤當前合併結果的尾端，讓下一次 merge 的結果能正確接續。 ```c /* Cut off n nodes from head; return the remainder */ static list_item_t *split(list_item_t *head, int n) { // suppose n>=1 while (--n && head->next) head = head->next; list_item_t *rest = head->next; head->next = NULL; return rest; } /* helper function for sort */ list_item_t *sort_list(list_item_t *head, list_cmp_func_t cmp) { if (!head || !head->next) return head; int len = 0; for (list_item_t *p = head; p; p = p->next) len++; list_item_t dummy = {.value = 0, .next = head}; for (int width = 1; width < len; width <<= 1) { list_item_t **tail = &dummy.next; list_item_t *cur = dummy.next; while (cur) { list_item_t *left = cur; list_item_t *right = split(left, width); // 將右側的鏈結串列也切割出 width 個節點，並且使用 cur 追蹤下一次的左側鏈結串列 cur = right ? split(right, width) : NULL; // tail 會將上一輪合併的結尾，和這一輪合併的開頭連接起來 *tail = merge(left, right, cmp); while (*tail) tail = &(*tail)->next; } } return dummy.next; } ``` > 我們該如何驗證這個程式是對的、merge sort 有哪些特性？已知 merge sort 演算法是正確的，則我們需要驗證程式是否和遞迴版本的 pseudocode 相同，以及證明實作的正確性。參考《Introduction to Algorithms》 2.3.1，遞迴版 merge sort 的 pseudocode 為 ```text MERGE-SORT(A,p,r) if p < r then q <- floor((p+r)/2) MERGE-SORT(A,p,q) MERGE-SORT(A,q+1,r) MERGE(A,p,q,r) ``` 由於零個與一個節點的情形相當於 $p\ge r$，所以不進行操作是正確的行為。所以需要證明 1. 使用快慢指標切分鏈結串列，是否正確的切成左右兩半考慮長度 $n\ge 2$ 為奇數與偶數的情況（$p=1$）： - $n=2k+1$，`slow` 會是第 $\lfloor \frac{n+1}{2} \rfloor$ 個節點，`mid` 會是第 $\lfloor \frac{n+1}{2} \rfloor+1$ 個節點 - $n=2k$，`slow` 會是 $\lfloor \frac{n+1}{2} \rfloor$ 個節點，`mid` 會是第 $\lfloor \frac{n+1}{2} \rfloor+1$ 個節點因此 `mid` 為第 $q+1$ 個節點，與 pseudocode 相同 2. `merge` 函式是否正確的將兩個鏈結串列合併程式會將 `*cur` 持續更新為 `a` 與 `b` 中 key 較小的節點，直到其中一個鏈結串列沒有節點，再將剩下的節點接在後面，因此可以確保合併後的鏈結串列是正確排序過的。 3. `merge` 函式是否有確保排序的穩定性 (stability) 在 `list_item_t **node = cmp(a, b) <= 0 ? &a : &b;` 中，在合併左側的 a 與右側的 b 時，如果兩者的 key 大小相同，則會選擇使用 a 的值，可以確保 `merge` 的穩定性，確保排序是穩定的。 merge sort 有以下兩種特性： - 是一種 stable sort - worst case 時間複雜度為 $O(n\log n)$ > 為什麼 stable 是 sorting 中一個重要的特性？ - 可能影響演算法的正確性，例如 radix sort 依賴 subroutine 的穩定性。 - 可以保留資料原本的相對順序，讓每次 sort 的結果 deterministic，例如在多欄位排序時，可以避免第一次排序的結果被第二次打亂。 :::danger 參照 Linux 核心的 git log，討論 Linux 核心開發者曾沒有謹慎處理 stable sort 特性，而導致錯誤的案例。這樣實務的經驗值得在面試場合提及。 ::: Commit `3b4309546b48` - ALSA: hda: Fix headset detection failure due to unstable sort > ALSA: hda: Fix headset detection failure due to unstable sort > > The auto_parser assumed sort() was stable, but the kernel's sort() uses > heapsort, which has never been stable. After commit 0e02ca29a563 > ("lib/sort: optimize heapsort with double-pop variation"), the order of > equal elements changed, causing the headset to fail to work. > > Fix the issue by recording the original order of elements before > sorting and using it as a tiebreaker for equal elements in the > comparison function. > > Fixes: b9030a005d58 ("ALSA: hda - Use standard sort function in hda_auto_parser.c") ALSA 的 auto_parser 在實作時假設 `lib/sort()` 具備 stable sort 的特性，但實際上 Linux 核心的 `sort()` 採用 heapsort，本來就不是 stable 的排序。問題來自這兩個 commit： - Commit `b9030a005d58` 將原本 `hda_auto_parser.c` 中的 bubble sort 重構為呼叫 `lib/sort()`。由於 bubble sort 本身是 stable 的，而當時的 heapsort 實作恰好不改變相對順序，因此重構後未立即發生問題 - Commit `0e02ca29a563` 對 heapsort 進行 double-pop 最佳化後，相等元素的相對順序發生改變，導致 headset 偵測失敗修正方式為： - 在 `struct auto_pin_cfg_item` 中新增 `order` 欄位，於排序前記錄每個元素原本的相對順序 - 修改 comparator，在比較值相等時改以 `order` 作為 tiebreaker，藉此在不依賴排序演算法本身穩定性的前提下，維持原有的相對順序 > 如何使用程式驗證程式的時間複雜度是 $O(n\log n)$？可以透過修改 comparator，統計對於不同輸入大小，排序函式呼叫 comparator 的次數，然後計算演算法的時間複雜度。 > 程式操作數量與時間複雜度的數學關係是什麼？參考《Introduction to Algorithms》第二、三章，分析操作數量與時間複雜度首先需要有一些假設。假設有一個 random-access machine (RAM) model，指令會依序執行，並且 arithmetic、data movement、control 指令都會各自花費固定長度的時間執行。為了使演算法的分析 machine-independent，我們忽略實際執行時間，假設第 $i$ 行 pseudocode 會花費 $c_i$ 的時間，並且計算執行的次數，定義總運行時間 (total running time) $T(n)$ 為 $c_i$ 乘上每行執行次數的總和。而分析時，可以進一步忽略 $c_i$ 之間的差異，將每一行定義為運行相同時間。因此最終得出的 $T(n)$ 相當於 (執行次數 \* 基本操作數量)。; 而比起演算法花費的時間，我們更關注演算法時間的 order of growth，因此我們可以只保留多項式 $T(n)$ 的首項，例如 $an^2 +bn+c$ 只保留 $an^2$。並且由於對於大量輸入，常數會遠比 order of growth 還要更小，因此會改用 asymptotic notation 表示演算法的時間複雜度。例如： $$ \begin{align} O(g(n))=\{f(n): \text{ there exist positive constants c and } n_0 \text{ such that } \\ 0\le f(n) \le cg(n) for all n\ge n_1 \} \end{align} $$ 例如 insertion sort 的 pseudocode 在 worst case 總共有 $an^2+bn+c$ 個操作，則可以找到 $c,n_0$ 使得 $an^2+bn+c=O(n^2)$，因此 insertion sort 在 worst case 的時間複雜度為 $O(n^2)$ --- ## PinkNekoFist > Quick sort 的策略是什麼選一個 Pivot，然後將鏈結串列的所有元素對比 Pivot 的大小分成兩部分，再對兩部分作 Quick sort，然後重複遞迴直到鏈結串列大小為 1。 :::danger 用更精準的說法 ::: > 該如何選擇 Pivot 選最左邊的元素作為 Pivot，優點為方便實作，且不需要額外的時間搜尋 pivot，worst case 時間複雜度為 $O(log(n^2))$。或是使用 median-of-three 或隨機選取等方式，減少 worst case 發生的機率。 :::danger 提供數學分析 ::: 以下為上課時實作的程式碼。 ```c typedef struct list_item { int value; struct list_item *next; } list_item_t; typedef struct { list_item_t *head; list_item_t *tail; } list_t; // Properties: // - unstable void sort(list_t *list, list_cmp_func_t cmp) { if (!list || !list->head) return; list_item_t *pivot = list->head; list_t *left_list, *right_list; /* split list to left and right */ list_item_t *curr = head->next; while (curr) { if (cmp(curr, pivot) > 0) { left_list->tail->next = curr; left_list->tail = curr; } else { right_list->tail->next = curr; right_list->tail = curr; } curr = curr->next; } left_list->tail->next = NULL; right_list->tail->next = NULL; sort(left_list, cmp); sort(right_list, cmp); left_list->tail->next = pivot; pivot->next = right_list->head; list->head = left_list->head; list->tail - right_list->tail; } // 考慮以下鏈結串列： // 1' 2 9 4 3 3' // 選 1 作為 pivot，比 pivot 小的鏈結串列長度為 0 (worst case) // *以上程式碼有許多問題，在下方有修正後的程式碼 // 1. 未對 left_list, right_list 初始化, 存取 left_list->tail left_list->tail->next 時會發生 segmentation fault // 2. 未對遞迴結束設定條件 // 3. 當 pivot 為最大或最小值時，left_list 或 right_list 必定有一為 NULL ``` 以下是修正後的程式碼 ```c typedef struct list_item { int value; struct list_item *next; } list_item_t; typedef struct { list_item_t *head; list_item_t *tail; } list_t; void sort(list_t *list, list_cmp_func_t cmp) { if (!list || !list->head) return; list_item_t *pivot = list->head; list_t *left_list = malloc(sizeof(list_t)); list_t *right_list = malloc(sizeof(list_t)); left_list->tail = NULL; right_list->tail = NULL; /* split list to left and right */ list_item_t *curr = list->head->next; while (curr) { if (cmp(curr, pivot) > 0) { if (!left_list->tail) left_list->head = curr; else left_list->tail->next = curr; left_list->tail = curr; } else { if (!right_list->tail) right_list->head = curr; else right_list->tail->next = curr; right_list->tail = curr; } curr = curr->next; } if (left_list->tail) left_list->tail->next = NULL; else left_list->head = NULL; if (right_list->tail) right_list->tail->next = NULL; else right_list->head = NULL; sort(left_list, cmp); sort(right_list, cmp); if (left_list->head) { list->head = left_list->head; left_list->tail->next = pivot; } else list->head = pivot; if (right_list->head) { pivot->next = right_list->head; list->tail = right_list->tail; } else { pivot->next = NULL; list->tail = pivot; } free(left_list); free(right_list); } ``` 以上程式需同時維護 `head` 以及 `tail`，為了檢查 NULL pointer 變得十分攏長，因此以下使用「間接指標」技巧，並移除 `tail` 縮短程式碼。 ```c typedef struct list_item { int value; struct list_item *next; } list_item_t; typedef struct { list_item_t *head; } list_t; void sort(list_t *list, list_cmp_func_t cmp) { if (!list || !list->head) return; list_item_t *pivot = list->head; list_t left_list, right_list; list_item_t **indir_left = &left_list.head; list_item_t **indir_right = &right_list.head; list_item_t *curr = pivot->next; while (curr) { if (cmp(curr, pivot) > 0) { *indir_left = curr; indir_left = &(*indir_left)->next; } else { *indir_right = curr; indir_right = &(*indir_right)->next; } curr = curr->next; } *indir_left = NULL; *indir_right = NULL; sort(&left_list, cmp); sort(&right_list, cmp); if (!left_list.head) { list->head = pivot; list->head->next = right_list.head; } else { list->head = left_list.head; curr = left_list.head; while(curr->next) { curr = curr->next; } curr->next = pivot; pivot->next = right_list.head; } } ``` :::danger 針對特定的操作，提出可重用的程式碼 (inline function 和 macro)，並重新審視 Linux 核心的 `<linux/list.h>`，看哪些函式可用 ::: 以上程式已簡潔許多，但還有多處可改進，例如在將 `left_list` 和 `right_list` 連接時，必須走訪整個 `left_list` 才能找到 `left_list` 的尾端，造成額外的時間開銷。 > 關於走訪 `left_list` 的時間開銷，需留意到先前已走訪過，因此 cache 可以有效減少時間開銷，需要實驗詳細的分析。 :::danger 說好的實驗呢？ ::: 為了快速拿到 `left_list` 的尾端，可以改為使用 circular doubly linked list，已在 $O(1)$ 的時間複雜度拿到尾端。以下為使用 circular doubly linked list 的實作程式碼。 ```c typedef struct list_item { int value; struct list_item *next; struct list_item *prev; } list_item_t; typedef struct { list_item_t *head; } list_t; static void list_append(list_t *list, list_item_t *item) { if (list->head == NULL) { list->head = item; item->next = item; item->prev = item; } else { item->prev = list->head->prev; item->next = list->head; item->prev->next = item; item->next->prev = item; } } void sort(list_t *list, int (*cmp)(list_item_t *, list_item_t *)) { if (!list || !list->head || list->head->next == list->head) { return; } list_item_t *pivot = list->head; list_t left_list = {0}, right_list = {0}; list_item_t *curr = pivot->next; while (curr != pivot) { list_item_t *curr_next = curr->next; if (cmp(curr, pivot) < 0) { list_append(&left_list, curr); } else { list_append(&right_list, curr); } curr = curr_next; } sort(&left_list, cmp); sort(&right_list, cmp); if (left_list.head) { list->head = left_list.head; } list_append(&list, pivot); if (right_list.head) { list->head->prev = right_list.head->prev; list->head->prev->next = list->head; pivot->next = right_list.head; pivot->next->prev = pivot; } } ``` - [ ] quick sort 為什麼 unstable，該如何修改成 stable ，為什麼要 stable quick sort 並不保證穩定性，穩定性指的是對於任兩個元素大小相同時，排序前後相對位置不變，quick sort 會將與 pivot 大小相同的元素統一分到 pivot 的左或右，導致相對位置的變化。但在上方的實作的 quick sort 中皆為 stable，因為選擇最左邊的元素作為 pivot，並將相同的元素分往右邊，因此相同大小元素相對位置不會改變。為保證 stable，可以選擇相同大小中位於最左/最右的元素，並將其餘元素分往右/左，以保證相對位置的不變。或透過紀錄 pivot 索引值，判斷相同大小元素該分往左或右。 - [ ] 如何驗證 stable 驗證 stable 只需確認在元素在大小與 pivot 相同時的行為，確保所有相同元素的相對位置不變。 - [ ] 遞迴的缺點，在 Linux 核心中為什麼不是遞迴，不用遞迴如何實作？每次 <s>遞回</s> 都會產生一個 stack frame，stack fram 包含 parameters, return address, local variables。以上方程式作為範例，stack frame 的大小為 $2 \times 8 + 8 + 5 \times 8 = 64 \; \mathrm{bytes}$ 根據 [kernel.org](https://www.kernel.org/doc/html/next/x86/kernel-stacks.html)，在 kernel 中 stack size 的大小為 `THEAD_SIZE = 2 * PAGE_SIZE`，在 x86-64 上，實際 stack 大小為 8KiB，在呼叫遞回 128 次後會導致 stack overflow。不用遞回可以使用 stack 及 while 取代遞回。 :::danger 為何 Linux 核心採用如此小的 stack 空間？ ::: ```c typedef struct list_item { int value; struct list_item *next; struct list_item *prev; } list_item_t; typedef struct { list_item_t *head; } list_t; static void list_append(list_t *list, list_item_t *item) { if (list->head == NULL) { list->head = item; item->next = item; item->prev = item; } else { item->prev = list->head->prev; item->next = list->head; item->prev->next = item; item->next->prev = item; } } void sort(list_t *list, int (*cmp)(list_item_t *, list_item_t *)) { if (!list || !list->head || list->head->next == list->head) { return; } size_t stack_cap = 1024; list_item_t **stack = malloc(stack_cap * sizeof(list_item_t *)); if (!stack) return; int top = -1; stack[++top] = list->head; list_t sorted_list = {NULL}; while (top >= 0) { list_item_t *curr_head = stack[top--]; if (curr_head->next == curr_head) { list_append(&sorted_list, curr_head); } else { list_item_t *pivot = curr_head; list_t left_list = {NULL}, right_list = {NULL}; list_item_t *curr = pivot->next; while (curr != curr_head) { if (cmp(curr, pivot) < 0) { list_append(&left_list, curr); } else { list_append(&right_list, curr); } curr = curr_next; } pivot->next = pivot; pivot->prev = pivot; if (top + 3 >= stack_cap) { stack_cap *= 2; list_item_t **new_stack = realloc(stack, stack_cap * sizeof(list_item_t *)); stack = new_stack; } if (right_list.head) stack[++top] = right_list.head; stack[++top] = pivot; if (left_list.head) stack[++top] = left_list.head; } } free(stack); list->head = sorted_list.head; } ``` --- ## [Eason0729](https://github.com/Eason0729/) > 找到 linked list 中的元素你的答覆 ```c // find target in list // return NULL if target is NULL or list is NULL // return pointer to NULL if not found list_item_t** find(list_t* l, list_item_t* target){ if (target==NULL) return NULL; if(l==NULL) return NULL; list_item_t** indirect_head=&l->head; while(*indirect_head!=NULL){ if((*indirect_head)->value==target->value) break; indirect_head=&(*indirect_head)->next; } return indirect_head; } ``` :::danger 比較使用間接指標與否，在編譯器最佳化開啟的狀況，產生的組合語言是否有行為上的差異？ ::: > 刪除 linkedlist 中的元素你的答覆 ```c // delete // return true if success bool delete(list_t* l, list_item_t* target, void(* destructor)(/* 有加 const 和沒加的差別 */ const void *)) { list_item_t** to_delete = find(l, target); if (to_delete==NULL) return false; if (*to_delete==NULL) return false; list_item_t* next = (*to_delete)->next; destructor(*to_delete); *to_delete=next; return true; } ``` > 實驗：有加 const 和沒加的差別呼叫函數的差別： ```c int test_one(const char* p) { extern void free_plain(void*); free_plain(p); return 0; } ``` ```c int test_two(char* p) { extern void free_const(const void*); free_const(p); return 0; } ``` :::danger 注意用語 ::: <s>函數實做</s> 的差別： ```c extern int external_read(int* arr, int index); extern void external_write(int* arr, int index, int val); int add_array(int* arr, int n) { int sum = 0; for (int i = 0; i < n; i++) { sum += arr[i]; } return sum; } int add_array_const(const int* arr, int n) { int sum = 0; for (int i = 0; i < n; i++) { sum += arr[i]; } return sum; } void modify_array(int* arr, int index) { int val = external_read(arr, index); external_write(arr, index, val + 1); } void read_array(int* const arr, int index) { int val = external_read(arr, index); external_write(arr, index, val); } ``` 發現只有 compiler warning 的差別 ``` warning_test.c:7:16: warning: passing argument 1 of ‘free_plain’ discards ‘const’ qualifier from pointer target type [-Wdiscarded-qualifiers] 7 | free_plain(p); ``` > https://hackmd.io/@sysprog/c-pointer#%E9%87%9D%E5%B0%8D%E6%8C%87%E6%A8%99%E7%9A%84%E4%BF%AE%E9%A3%BE-qualifier > - `const char* p` 表示指標指向的內容為 const > - `char * const p` 表示指標為 const :::danger 從 C 語言規格書來解讀以上實驗 ::: --- ## [illdoCc](https://github.com/illdoCc) > 下列的 min function 要怎麼加上 underflow 及 overflow 的判斷式，並且以 branchless 的方法實作你的答覆 `gcc -S` ```c int32_t min(int32_t a, int32_t b) { int32_t diff = a - b; return b + (diff & (diff >> 31)); } // a - b < INT32_MIN ``` 參照 C99 規格書 6.5.7.5 提到 right shift 的結果是 implementation defined: >The result of `E1 >> E2` is `E1` right-shifted `E2` bit positions. If `E1` has an unsigned type or if `E1` has a signed type and a nonnegative value, the value of the result is the integral part of the quotient of `E1 / 2^E2`. If `E1` has a signed type and a negative value, the resulting value is implementation-defined. 而 [Re: right shift of signed type - GCC](https://gcc.gnu.org/legacy-ml/gcc/2000-04/msg00152.html) 的討論得知，GCC 當中的 right shift 是使用算術位移: >An arithmetic left shift is equivalent to a logical left shift. For right shifts, if the type is signed, then we use an arithmetic shift; if the type is unsigned then we use a logical. 因此若 `diff >= 0`，`diff >> 31` 的值為 `0x00000000`；而若 `diff < 0`，`diff >> 31` 的值為 `0xFFFFFFFF`。結合來看，若 `a > b`，`diff & (diff >> 31)` 的值為 `0`；`a < b`，`diff & (diff >> 31)` 的值為 `a - b`，因此 `b + (a - b)` 的值即為 `a`。 :::danger 參照 C 語言規格書，來解讀上述行為 ::: 以下是不用到 if else 的操作方式，不過這樣的 code 很沒有品味，可讀性差，並且使用到 && 跟 || 也不能算 branchless。 ```c= int32_t min(int32_t a, int32_t b) { int32_t diff = a - b; // 利用該變數檢查是否有 overflow 的情形(underflow 也算在內) int32_t overflow_mask = 0; // underflow overflow_mask = a < 0 && b > 0 && diff > 0; // overflow overflow_mask = overflow_mask || (a > 0 && b < 0 && diff < 0); // 先 left shift 再 right shift，把 1(0x00000001) 變為 -1(0xFFFFFFFF)，0 不變 overflow_mask <<= 31; overflow_mask >>= 31; return overflow_mask & (a - (diff & (diff >> 31))) | ~overflow_mask & (b + (diff & (diff >> 31))); } ``` 檢視上述程式，我認為第 14 行 overflow_mask 用法是正確的，因此需重新考量如何在 underflow、overflow 情形發生時，把 `overflow_mask` 用單純的 bitwise operator 設為 0xFFFFFFFF；而在一般情形則是 0x00000000。和 ascodeasice 同學討論後，發現完全可以將上述的 logical operator 改為 bitwise operator 的寫法: ```c int32_t min(int32_t a, int32_t b) { int32_t diff = a - b; int32_t underflow, overflow, mask; underflow = (a >> 31) & ~(b >> 31) & ~(diff >> 31); //對應到上述第 7 行 overflow = ~(a >> 31) & (b >> 31) & (diff >> 31); // 對應到上述第 9 行後半段 mask = underflow | overflow; // 對應到上述第 9 行前半段 return mask & (a - (diff & (diff >> 31))) | ~mask & (b + (diff & (diff >> 31))); } ``` :::danger 改寫為更簡潔的形式 ::: 上述程式碼略為冗長，重新觀察 underflow 跟 overflow 這兩種情況，會發現 `a` 和 `b、diff` 皆為異號，而 `b` 跟 `diff` 皆同號： - underflow: `a < 0` 且 `b > 0` 且 `diff > 0` - overflow: `a > 0` 且 `b < 0` 且 `diff < 0` 因此就可利用這個特性，用 xor 一次把這兩種情形考慮進來，不用額外區分： ```c mask = ((a >> 31) ^ (diff >> 31)) & ~((b >> 31) ^ (diff >> 31)); ``` 可進一步簡化為： ```c mask = ((a ^ b) & (a ^ diff)) >> 31; ``` 最後一行的 `return mask & (a - (diff & (diff >> 31))) | ~mask & (b + (diff & (diff >> 31)));` 也可簡化為 `return (b + (diff & (mask ^ (diff >> 31))));`，以下為實際程式: ```c int32_t min(int32_t a, int32_t b) { int32_t diff = a - b, mask; mask = ((a ^ b) & (a ^ diff)) >> 31; return (b + (diff & (mask ^ (diff >> 31)))); } ``` > 在運行 1000 萬次 context switch 並測量其耗費時間時，期間可能會有其他程式產生的中斷，導致測量出的時間不準確，要用哪種統計概念才有辦法統計出確切時間? 使用經驗分布函數(empirical CDF)，統計出 context switch 在各個耗費時間下的比例。若想得知理想情況下的 context switch 占用時間，可以查看最左端，沒被其他雜訊干擾的花費時間。 :::danger 提出數學模型和實際在 Linux 環境可測量的方案，並予以施行 ::: 令 $T_{cs}$ 為 context switch 的實際運行時間，$T_{noise}$ 為其他程式產生的中斷所造成的干擾，可得出總測量時間的公式為 $T_{measure} = T_{cs} + T_{noise}$，其中 $T_{noise} \ge 0$，因此可推導出 $T_{measure} \ge T_{cs}$。當進行 $N=10,000,000$ 次的測量時，會得到集合 $\{T_1, T_2, ..., T_N\}$，將集合中的時間繪製於二維系統的 $x$ 軸，資料筆數繪製於 $y$ 軸，即可得到 empirical CDF。因為 $T_{noise}$ 是一個大於等於 $0$ 的隨機變數，在沒有發生中斷的理想情況下，$T_{measure}=T_{cs}$。所以當樣本數 $N$ 夠大時，取 empirical CDF 當中的最小值，就會逼近於 $T_{cs}$。在 Linux 環境下，要測量 context switch 可以建立兩個 process，讓他們互相傳遞一個 byte 的資訊，製造出 context switch，並利用 `perf` 去測量每次 context switch 的時間，此時就可以得到 $T_{measure}$，也就可以畫出 empirical CDF。 [gist](https://gist.github.com/illdoCc/e8f2f836429f9a8bf4ae993c63f25caf) 當中是產生 context switch 的程式。下圖是利用 `sudo perf record -e sched:sched_switch ./file`，並將其輸出的數值畫出來的 empirical CDF: ![ecdf_plot](https://hackmd.io/_uploads/S1lXYWtjZe.png) 其中，最小值也就是最接近理想的 context switch 的時間為 1235 ns。 --- ## [sakinu](https://github.com/sakinu/) > 下列程式在做什麼，有沒有哪裡寫得不好或可以精簡？ ```c= void remove_if(list_t *l, int value) { list_item_t **p = &l->head; while (*p) { if ((*p)->value == value) { list_item_t *del = *p; *p = (*p)->next; free(del); } else { p = &(*p)->next; } } } ``` 這段程式利用指標的指標依序走訪以 `l->head` 為第一個 node 的鏈結串列，並將 value 為目標值的 node 刪除。此程式無法透過指標的指標減少需要的特判，因為第 7 行和第 10 行代表的是在應對兩種不同的狀況時所需的兩種不同操作。 :::danger 改進你的漢語表達，避免「無法再更優雅的精簡」這種無助於溝通的說法。 ::: 目前想到有可能不好的地方是 1. 這個函式 while 的條件，導致若鏈結串列是環狀的，則存在會導致無窮迴圈的 case（只要存在某個 node 的 value 不是要刪除的目標 value） 2. remove_if 這個函式名稱不夠好，第一是無法看出 if 的條件，第二是沒有預留給其他條件的名稱，例如刪除所有 value 小於目標 value 的 node 或刪除 value > 目標 value 的 node。解決方法是更好的命名方式或用 function pointer 來定義 comparator function。 > 有品味的程式是什麼？ :::danger 程式碼呢？ ::: linus 在 [此影片](https://www.youtube.com/watch?v=o8NPllzkFhE) 中有用兩段程式簡單示範有無品味的差別 ```c= // Bed taste remove_list_entry(entry) { prev = NULL; walk = head; // Walk the list while (walk != entry) { prev = walk; walk = walk->next; } // Remove the entry by updating the // head or the previous entry if (!prev) head = entry->next; else prev->next = entry->next; } ``` ```c // Good taste void remove_list_node(List *list, Node *target) { // The "indirect" pointer points to the *address* // of the thing we'll update. Node **indirect = &list->head; // Walk the list, looking for the thing that // points to the node we want to remove. while (*indirect != target) indirect = &(*indirect)->next; *indirect = target->next; } ``` 兩份程式的最大差異是，後者利用指標的指標，減少了前者在 12~15 行需要特判的狀況 linus 在影片後面還有提到： ```sometimes you can see a problem in a different way and rewrite it so that special case goes away and becomes normal case and that's good code.``` 而有品味的人： ```good taste is really seeming the big patterns and kind of instinctively knowing what's the right way to do things.``` 總的來說： --- ## [I3IRI3IR](https://github.com/I3IRI3IR) >求 linked list 中第 $k$ 大的節點一開始題目保証了各個節點的值兩兩相異，在問答時老師提到可以用 heap，於是我想到可以在走訪整個鍊結串列的同時維護一個大小為 $k$ 的 min heap，若是目前的節點小於這個 min heap 目前的最小值，則表示這個節點不可能為第 $k$ 大的節點(已經有至少 $k$ 個節點比他大了)，那麼就直接跳過他，否則就把他加進 min heap，然後維持 min heap 內的節點數為 $k$ 個，如此當我們完成走訪後，min heap ~~彈出~~ pop 的第一個節點就是第 $k$ 大的節點。 :::danger 使用精準的詞彙，避免用「彈出」這類含糊的說法 ::: 但是如果使用了上述版本的實作的話，在最差的情況下我們在走訪每個節點的時候都需要進行 heap_push()，那時間複雜度便是 $O(nlogk)$。如果我們捨棄 min heap，而是想到用快速排序中分組的想法，每次選定一個 pivot，接著把結點們分組後，我們不需要真的進行整個鍊結串列的快速排序，只需要關注有第 $k$ 大的節點的那一邊即可，乍看之下看似需要進行 $logn$ 次分組，從而得到時間複雜度為 $O(nlogn)$ 的結論，但其實每次分組後需要繼續分組的節點也會變少，實際上的複雜度是 $O(n)$。要做的比較次數可以用每個節點被選作 pivot 的機率乘上他被選中後需要的操作數，表示為 $$ T(n) \leq n + \frac{1}{n}\Sigma_{i=0}^{n-1}T(max(i,n-1-i)) $$ 為了使用歸納法來證明，我們假設 $T(m) \leq \alpha m$ 在 $m<n$ 的情況下皆成立。以下分別討論 $n$ 為奇數或偶數的情況 $n$ 為偶數，令 $n=2m$ $\Sigma_{i=0}^{2m-1}T(max(i,n-1-i))\leq\Sigma_{i=0}^{2m-1}max(i,n-1-i)$ (根據歸納假設) $\Sigma_{i=0}^{2m-1}max(i,m-1-i)=2\Sigma_{i=0}^{m-1}(2m-1-i)=2\Sigma_{j=m}^{2m-1}j=2\times\frac{(m+(2m-1))m}{2}=3m^2-m$ $3m^2-m=\frac{3}{4}n^2+\frac{n}{2}$ $n$ 為奇數，令 $n=2m+1$ 經由類似的計算可得 $\Sigma_{i=0}^{2m}T(max(i,n-i))=\frac{3}{4}n^2+\frac{n}{2}+\frac{1}{4}$ 回到原本的式子 $T(n) \leq n + \frac{1}{n}\Sigma_{i=0}^{n-1}T(max(i,n-1-i))$ 可以用一個更高的上界來估計 $T(n) \leq n + \frac{1}{n}\times\frac{3\alpha}{4}n^2$ 選取 $\alpha > 4$ 即可，故得證。實作前我覺得，我們好像可以一直使用快速排序來解決這種找第 $k$ 大的問題，但實作之後我發現可能要端看問題發生的場景決定，若是這種找第 $k$ 大的問題會蒐集一段時間再一次解完，min heap 的解法就會有效率很多，因為我們相當於離線在解決這些問題，可以解最大的 $k$，我們會得到 $k$ 以內的問題的答案，之後只要一直 pop 就好了，而 quick select 各個問題之間沒有關聯，對於每個 $k$ 都需要解決一次問題；或是我們查詢的 $k$ 總是固定的，但是節點會不斷增加時，一開始建好的 min heap 之後可以一直用，又或是 $k$ 非常小或非常大時，不需要遞迴的做法可能常數較小使 $O(nlogk)$ 表現好過 $O(n)$。 :::danger 確認上述是否為 optimal，為此需要更多證明，先建立對應的數學模型。 ::: 不確定這邊的數學模型的意思，也許是希望先假設一些基礎條件？那麼以下的討論建立在所有的比較操作需要的時間一樣，並且只關注於各次獨立的查詢的平均期望比較次數(各次查詢之間的結果不能保存)。首先根據之前的回答，我們已經可以知道 $O(n)$ 具體上可以做到期望 $4n$ 以下，但是要證明是否已經是最佳策略或接近最佳策略，我打算先捨棄上一個寬鬆的上界估計，然後估計這個問題的理論下界，如果兩者足夠接近，則我們提到的快速選擇算法就是一個可以算是 optimal 的策略。捨棄估計把遞迴式完整地展開的話，可以表示為 $F(n,k)=(n+1)+\frac{1}{n}(\Sigma_{j=0}^{n-2}F(n-1-j,k-1-j)+\Sigma_{j=k}^{n-1}F(j,k))$ [min heap 版本實作](https://github.com/I3IRI3IR/find_kth_node/blob/main/minheap.c) [quick select 版本實作](https://github.com/I3IRI3IR/find_kth_node/blob/main/quickslelect.c) ## x807x ```c #include <stdint.h> #include <stdio.h> struct packet { uint32_t len; uint8_t data[1]; }; int main() { printf("%ld\n", sizeof(struct packetg)); } // flexible array member ``` C99 sizeof() is an operator 4 + 8 (pointer) ## rota1001 為什麼網路封包不一定是完整的？怎麼定義一個網路封包是由解析者決定的，譬如說在傳輸層可以以 TCP 或 UDP 的封包格式去將一段資料解析成一個封包。而對於更底層的協定來說，上層的協定是沒有意義的，對它來說只是一段資料而已，所以根本不需要去理會一段連續的資料是否有截斷上層的封包。 ## deantee ### About Type Punning and Strict Aliasing C99 §6.5p7 requires that an object stored value be accessed only through an lvalue of a compatible type, a corresponding signed/unsigned type, an aggregate containing such a type, or a *character type*. `unsigned char` qualifies under that description, so it may legally alias any object. `memcpy` is safe becuase its implementation accesses bytes through a character-typed lvalues, never through a lvalue of the wrong effective type. :::danger Illustrate with C standard. ::: ```clike #include <stdint.h> #include <stdio.h> #include <string.h> _Static_assert(sizeof(float) == sizeof(uint32_t), "float must be 32 bits for complete punning"); /* UB: violating strict aliasing */ uint32_t pun_via_union(float f) { union { float f; uint32_t u; } pun; pun.f = f; return pun.u; } /* safe */ uint32_t pun_via_memcpy(float f) { uint32_t u; memcpy(&u, &f, sizeof u); return u; } /* UB: violating strict aliasing */ uint32_t pun_via_pointer_cast(float f) { return *(uint32_t*)&f; } int main(void) { const float f = 3.14f; /* UB: coincidentally correct here */ printf("%u\n", pun_via_union(f)); printf("%u\n", pun_via_memcpy(f)); /* UB: coincidentally correct here */ printf("%u\n", pun_via_pointer_cast(f)); } ``` Here's a diff of assemblies generated at `-O0` and at `-O2`: ```diff ... pun_via_union: -.LFB0: +.LFB11: .cfi_startproc - push rbp - .cfi_def_cfa_offset 16 - .cfi_offset 6, -16 - mov rbp, rsp - .cfi_def_cfa_register 6 - movss DWORD PTR -20[rbp], xmm0 - movss xmm0, DWORD PTR -20[rbp] - movss DWORD PTR -4[rbp], xmm0 - mov eax, DWORD PTR -4[rbp] - pop rbp - .cfi_def_cfa 7, 8 + movd eax, xmm0 ret .cfi_endproc ... pun_via_memcpy: -.LFB1: +.LFB16: .cfi_startproc - push rbp - .cfi_def_cfa_offset 16 - .cfi_offset 6, -16 - mov rbp, rsp - .cfi_def_cfa_register 6 - sub rsp, 32 - movss DWORD PTR -20[rbp], xmm0 - mov rax, QWORD PTR fs:40 - mov QWORD PTR -8[rbp], rax - xor eax, eax - mov eax, DWORD PTR -20[rbp] - mov DWORD PTR -12[rbp], eax - mov eax, DWORD PTR -12[rbp] - mov rdx, QWORD PTR -8[rbp] - sub rdx, QWORD PTR fs:40 - je .L5 - call __stack_chk_fail@PLT -.L5: - leave - .cfi_def_cfa 7, 8 + movd eax, xmm0 ret .cfi_endproc ... pun_via_pointer_cast: -.LFB2: +.LFB18: .cfi_startproc - push rbp - .cfi_def_cfa_offset 16 - .cfi_offset 6, -16 - mov rbp, rsp - .cfi_def_cfa_register 6 - movss DWORD PTR -4[rbp], xmm0 - lea rax, -4[rbp] - mov eax, DWORD PTR [rax] - pop rbp - .cfi_def_cfa 7, 8 + movd eax, xmm0 ret .cfi_endproc ... ``` At `-O0`, the three functions differ. At `-O2`, all three reduced to `movd eax, xmm0`. They all became the same thing after optimization, meaning no UB with compiling with compiler optimization. This should be a coincidence, only the `pun_via_memcpy` is guaranteed by the standard. :::danger Check assembly instructions generated with and without compiler optimizations. ::: ### About RMW (Read, Modify, Write） Both read and write operation induce segmentation when accessing `0` (`NULL`) Memory operations like `memset` and `memcpy` are operating on raw bytes via `unsigned char` ```clike #include <stddef.h> /* a naive implementation of `memset` from `string.h` */ void memset_naive(void* dst, unsigned char val, size_t size) { unsigned char* ptr = dst; while (size--) { *ptr++ = val; } } /* a naive implementation of `memcpy` from `string.h` */ void memcpy_naive(void* dst, const void* src, size_t size) { const unsigned char* ptr_src = src; unsigned char* ptr_dst = dst; while (size--) { /* operator precedence: postfix increment (modify) > dereference (read) > * assignment (write) */ *ptr_dst++ = *ptr_src++; } } ``` ### About MISRA-C MISRA-C is a separate set of C standards from ISO C standards ### About Circular Linked List - Dummy: avoid handling edge cases where the list is empty - Circular/Doubly: to access the last item of the list in `O(1)` by accessing the "previous" pointer from the head - Applications of these kind of linked lists in Linux kernel? Queues, schedulers. ### References // C99 6.5 §7 的 allowed access --- ## 反思飛蛾撲火的原理影片〈[我們仍未知道飛蛾為何撲火](https://youtu.be/89hK5DhZpt4)〉回顧近年生物學與生物力學研究並指出，昆蟲圍繞燈光飛行的行為，並非單純的趨光或導航錯誤，而與飛行姿態控制機制受到光源干擾有關。這些研究多建立於高速攝影、三維軌跡追蹤以及姿態分析等手法之上，使研究者能同時觀察昆蟲的飛行路徑與身體姿態。過去的觀測方法通常只能記錄昆蟲的平面飛行軌跡，因此理論多集中於「導航錯誤」的解釋。近年研究則開始分析昆蟲在空中的三維姿態，例如俯仰角(pitch)、橫滾角 (roll) 與偏航角 (yaw)。昆蟲在燈光附近往往出現翻滾、側傾與倒飛等姿態變化，而非僅僅沿某條曲線接近光源。正是這些姿態資料，使傳統論點逐漸受到挑戰。在傳統解釋中，最常被引用的概念稱為 [transverse orientation](https://en.wikipedia.org/wiki/Transverse_orientation)，可理解為「利用遠方光源維持固定方位角」。夜行昆蟲可能利用月光或星光作為方向參考。由於這些天體距離極遠，照射到地面的光線近似平行。若昆蟲在飛行時始終保持與光線固定夾角，則其飛行方向理論上會保持直線。在此假設下，人造光源被視為造成導航錯誤的因素。燈泡或火焰距離極近，其光線呈放射狀。若昆蟲仍試圖與光線保持固定角度，則在極座標系中其路徑會滿足 $$ r = a e^{b\theta} $$ 其中 $r$ 表示昆蟲與光源之間的距離，$\theta$ 為旋轉角度，這正是對數螺線，也稱等角螺線。數學上可證明，若運動方向與指向光源的方向保持固定角度，軌跡必然為此類螺旋，並逐漸收斂到光源位置。由於這個模型在幾何與動力學上完全一致，因此長期被視為合理解釋。 ``` 平行光（月光） → → → → → → → → → → → moth \ \ α \ ------------------------------ 直線飛行點光源（燈泡） * /| / | / | / α | / | o-----+ o : 昆蟲位置 * : 光源軌跡（俯視圖） * / / / o \ \ \ \ 逐漸收斂到光源 ``` 然而高速攝影與三維追蹤資料顯示，昆蟲在燈光附近的飛行並不呈現等角螺線。實際軌跡通常高度不規則，包含翻滾、急轉與短暫停滯。研究也發現，昆蟲對遠距離光源的反應有限，例如數公尺到十公尺之外時多數昆蟲並不顯著接近光源。這些觀測結果使研究者重新檢視飛行姿態本身，而不僅僅是平面軌跡。新的研究指出，關鍵機制與所謂的背光反應 (dorsal light response) 有關。許多飛行昆蟲具有一種基本姿態控制策略，即讓背部朝向最亮的方向。在自然環境中，夜晚最亮的區域通常是天空，因此此反射行為能協助昆蟲維持身體直立並穩定飛行。 ``` 天空 ↑ │ 光 │ (背) ── / \ / \ ← 翅膀 ↑ Lift │ │ ● 昆蟲 ↓ Gravity 背部朝向天空時，Lift ≈ −Gravity，昆蟲可維持穩定飛行 ``` 從控制理論角度看，天空光場相當於一個視覺姿態參考系。當光源位於上方時，昆蟲的飛行控制系統會將背部朝向光源，使升力向量保持大致向上。然而在人造照明環境中，局部強光會破壞原本的光場結構。昆蟲會將側方或下方的燈泡誤判為天空方向，於是持續調整姿態，使背部朝向該光源。當燈光位於側方時，昆蟲會產生側傾或翻滾。由於翅膀產生的升力大致垂直於背部方向，姿態偏轉會使升力向量偏離垂直方向。一部分升力轉化為水平分量，使昆蟲圍繞光源運動。從力學觀點看，這個水平分量提供類似向心力的效果，使昆蟲形成環繞飛行。其運動方式與飛機側傾轉彎的原理相似。 ``` * 光源 → (背) ── / \ / \ ↗ Lift / / ● ↓ Gravity ``` > 升力被分解為 - 垂直分量: 維持高度 - 水平分量: 形成向心力若昆蟲從燈光下方飛過，情況可能更加不穩定。為使背部朝向光源，昆蟲可能逐漸後仰甚至翻轉。當身體接近上下顛倒時，翅膀產生的升力方向也會隨之翻轉。升力不再抵消重力，而可能與重力同向。此時飛行穩定性迅速惡化，昆蟲容易失速並墜落。這種姿態失控常被高速攝影觀察到。 ``` * 燈光 ↓ 背 \ / \ / \/ ● ↓ Lift ↓ Gravity ``` > Lift 與 Gravity 同向，失去抵消重力的能力，致使昆蟲快速墜落為理解此機制，還需要考慮昆蟲感測系統的限制。大型動物可以依賴前庭系統中的加速度感受器辨識重力方向，但昆蟲體型極小。飛行時翅膀拍動與氣流擾動會產生頻繁加速度變化。這些瞬時加速度所產生的慣性力可能與重力同量級，使重力訊號難以穩定辨識。若以工程術語描述，可說重力訊號的訊噪比相當低。因此昆蟲多依賴視覺線索，例如天空亮度分布或偏振光圖樣來判定姿態方向。這種策略在自然環境中非常可靠，但在人造照明普及的環境中容易受到干擾。該機制與航空事故中著名的[空間迷向](https://en.wikipedia.org/wiki/Spatial_disorientation)現象高度相似。空間迷向指飛行員在缺乏可靠視覺參考時，身體感知系統對姿態產生錯誤判斷。其根源同樣來自於重力與加速度之間的物理混淆。人類的前庭系統依靠內耳中的耳石與半規管感測加速度。然而對內耳而言，重力與加速度在物理上並不可區分。根據等效原理，身體感受到的是兩者的合力向量。當飛機進行強烈加速或轉彎時，慣性力可能掩蓋原本的重力方向，使人體直覺判斷失效。在夜間或雲霧中飛行時，飛行員往往失去地平線等視覺參考。此時大腦只能依賴已受干擾的前庭系統，於是容易產生錯覺。例如在緩慢傾斜轉彎時，若轉動速度低於內耳的感知閾值，飛行員可能感覺自己仍在平飛。當高度逐漸下降時，本能反應可能是拉起機頭而非修正傾角，最終導致螺旋下降。另一常見錯覺發生於強烈加速時：當飛機快速前進，耳石會因慣性向後移動，大腦可能將此解讀為機頭正在急速抬升。飛行員若依循直覺向前壓低機頭，反而可能使飛機迅速下降。這種錯覺在夜間起飛或海面上空飛行時尤為危險。 > 延伸閱讀: [30 年內奪走 10 條生命！F-16 再失事，飛官失聯前喊的「空間迷向」是什麼？](https://www.cw.com.tw/article/5139282) 從物理與生理機制看，飛蛾撲火與空間迷向其實是同一問題的不同尺度版本。兩者皆源於生物感測系統在極端環境中受到干擾，導致姿態判斷出現錯誤。昆蟲依賴光場作為姿態參考，人類依賴內耳與視覺地平線，但在特殊環境下兩者都可能被欺騙。差異在於，人類可藉由儀器修正感知錯誤。航空訓練中其一要點是在缺乏視覺參考時，信任姿態儀等飛行儀表，而非依賴身體直覺。這種方法相當於以工程裝置取代生物感測器，使飛行控制回到可靠的物理參考系。飛蛾撲火與空間迷向揭示同一個演化限制。生物的感測與控制系統是在自然環境中逐步形成，當環境條件發生劇烈改變時，這些系統可能出現錯誤。昆蟲面對人工光源，人類面對高速飛行與高加速度，其本質皆是生物感知系統與現代人工環境之間的落差。理解該機制不僅解釋昆蟲行為，也具有實際應用價值。例如夜間照明設計若能減少向上或側向散射的光線，並控制光束方向與波長分布，可降低對昆蟲的干擾。這些研究顯示，對生物感測機制的理解，能為工程設計與生態保護提供重要的科學基礎。 ## [patata0717](https://github.com/patata0717) Question: When will Quicksort has worst-case complexity? How to deal with it? Answer: reverse-ordered, nearly sorted, all same value :::danger 提供數學分析和證明 ::: QuickSort 遞迴式[已訂正]： $$T(n) = T(k) + T(n - k - 1) + O(n)$$ Worst case 發生在每次都是 $$T(n) = T(n-1) + T(0) + O(n)$$ $$T(n) = O(n) + O(n-1) + O(n-2)... + O(1)$$ Which is O(n^2) 而nearly sorted, reverse-ordered 如果 pivot 的選擇是最後一個index(或第一個)的話，都會造成 partition 會分成1和 n - 1，造成 worst case。而 all same value 的情況下，無論(≦, >)或(≧, <)，都會分至同一邊，導致 partition 的結果是1和 n - 1。 Program: input list list, detect which scenerio we should use quicksort, and switch to other sorting algorithm(insertion sort). [complete code](https://codeshare.io/G6owkm) My wrapper ```c void Analyze_list_for_sorting(char **list, int n) { // 1. Check size if (n <= 16) { InsertionSort(list, n); return; } // 2. Check sortedness by counting out-of-order pairs int out_of_order_count = 0; for (int i = 0; i < n - 1; i++) { if (strcmp(list[i], list[i + 1]) > 0) { out_of_order_count++; } } // print out of order count printf("Out of order pairs: %d out of %d, %d%%\n", out_of_order_count, n, (out_of_order_count * 100) / n); if (out_of_order_count < n / 10 ) { InsertionSort(list, n); } else if (out_of_order_count > 9 * n / 10) { InsertionSortRev(list, n); } else { QuickSort(list, 0, n - 1); } } ``` My wrapper is to traverse the list 1 time, and count reversed-order cases (a[i] > a[i+1]), if out-of-order is bigger than 90%, means the list tend to be in reverse-ordered, perform InsertionSortRev(), if it is smaller than 10%, perform InsertionSort(), otherwise, perform QuickSort(), the modified Quicksort improve in bost worst case(reverse-ordered, nearly-sorted) - Experiment results testcase size: 2560 vocabularies sorting for 1000 times monitor CPU time | | Randomly- | Nearly- | Reversed- | |--- |---|---|---| | Quick | 0.000235487 | 0.000478914 | 0.00628214 | | Insert | 0.00458313 | 0.000148434 | 0.00864014 | | Combined | 0.000259702 | 0.000168939 | 2.59891e-05 | | Combined using | Quick | Insert | InsertRev | | Out-of-order | 50% | 2% | 99% | | Improved? | - | Yes | Yes | We can see that, the improved Quicksort perform better in both special case. - Mathemetical prove[已訂正] [ref for the recursion equation](https://kopu.chat/%E5%BF%AB%E9%80%9F%E6%8E%92%E5%BA%8F-quick-sort/#:~:text=T(n)%20%3D%20c*n%20%2B%20T(n%2D1)%20%2B%20T(0)%2C%20c*n%20%E7%82%BA%E5%88%87%E5%89%B2%E6%99%82%E9%96%93) 我希望 out-of-order 這個指標能衡量insertion sort(selection sort)和 QuickSort 的時間。並決定何時該用 insertion sort。但實際上 out-of-order 並不能影響 QuickSort 和 insertion sort。比如[1 3 5 7 9 2 4 6 8 10]，這個 sequence 的 out-of-order 數是1(10%)，但在 QuickSort 每次都選最後一個 pivot 的情況，並不會是 worst case。同樣，insertion sort 也不會是O(n)。我做的測資僅是隨機挑幾個數組左右交換，並無考慮 out-of-order 可以是上面的狀況，所以才會得出此結論。 :::danger 不嚴謹，用統計模型來分析 ::: Comments from jserv: We should achieve log n to detect unpleasant case for Quicksort, using introsort(quicksort + heapsort), which is the STL Sort(). Modification: From STL implementation, IntroSort, depth limit is $2*log_2(N)$, which is 23 in our case, if depth is reach the limit. switch to heap sort. ```c void IntroSort(std::vector<int>& arr, int left, int right, int depthLimit) { // Base case: if pointers cross, stop recursion if (left >= right) { return; } int size = right - left + 1; // 1. Array is small enough: use Insertion Sort for optimal constant time if (size <= 16) { InsertionSort(arr, left, right); return; } // 2. Recursion is too deep: switch to Heap Sort to guarantee O(N log N) if (depthLimit == 0) { HeapSort(arr, left, right); return; } // 3. QuickSort logic: partition the array int pivotIndex = Partition(arr, left, right); // 4. Recurse on both halves, strictly decrementing the depth limit IntroSort(arr, left, pivotIndex - 1, depthLimit - 1); IntroSort(arr, pivotIndex + 1, right, depthLimit - 1); } ``` testcase size: 2560 vocabularies sorting for 1000 times monitor CPU time | | Randomly- | Nearly- | Reversed- | |--- |---|---|---| | Quick | 0.000235487 | 0.000478914 | 0.00628214 | | Insert | 0.00458313 | 0.000148434 | 0.00864014 | | Combined | 0.000259702 | 0.000168939 | 2.59891e-05 | | Intro | 0.000316616 | 0.000259627 | 0.000280069 | We can see that, IntroSort didn't affected by both worst case, achieve slightly slower than Quicksort for randomly-distributed data. Quiestion: When mergesort has worst-case complexity? Answer: The order of data affects the efficiency. Because mergesort compare 2 list, A and B, starting from head of the list, if A = [1, 2, 3, 4], B = [5, 6, 7, 8], it will go through A[0], A[1], A[2], A[3], and done, cost lenght-of-A steps, if A = [1, 3, 5, 7], B = [2, 4, 6, 8], it will go through A[0], B[0], A[1], B[1], A[2], B[2], A[3], totoal A+B-1. So we should make the data as "interlaced" as possible. Thus we can achieve A+B-1 for all comparison, thus N - 2^(depth-1) for each level, total (N - 1) + (N - 2) + (N - 4)...+(N - 2^(depth-1), depth is of O(log N), so O(N log N) - number of merges. Best case is (N/2) for each level, still O(N log N) Worstcase: N log N - N Bestcase: N log (1/2 * N) - Tim sort: If the data is interlaced, still merge 1-by-1, if the data is in-ordered, using binary search. 除了在教科書出現的經典排序演算法，如 merge sort 和 quick sort，在諸多軟體會採用截長補短的策略，融合兩項或更多排序演算法，這樣的案例包含 [Timsort](https://en.wikipedia.org/wiki/Timsort) 和 [Introsort](https://en.wikipedia.org/wiki/Introsort)。 [Introsort](https://en.wikipedia.org/wiki/Introsort) 可看做 quick sort 的強化，遞迴分割陣列，區間越來越短，數字也幾乎排序好。對於幾乎已經排序好的短區間，quick sort 表現不佳，於是切換為 heapsort。分水嶺通常設定成 $\log{N^2} = 2 \times log{N}$，`N` 是輸入資料的長度。之前學員已嘗試實作 [Introsort](https://en.wikipedia.org/wiki/Introsort)，可見: * [ccs100203](https://hackmd.io/@cccccs100203/linux2021-quiz1) * [hankluo6](https://hackmd.io/@hankluo6/2021q1quiz1) 以下是參考的 [Introsort](https://en.wikipedia.org/wiki/Introsort) 實作: ```c /** * introsort is comprised of: * - Quicksort with explicit stack instead of recursion, and ignoring small * partitions * - Binary heapsort with Floyd's optimization, for stack depth > 2log2(n) * - Final shellsort pass on skipped small partitions (small gaps only) */ #include <limits.h> #include <stdint.h> #include <string.h> typedef int (*comparator_t)(const void *, const void *); typedef struct { char *low, *high; } stack_node_t; #define idx(x) (x) * data_size /* manual indexing */ #define STACK_SIZE (sizeof(size_t) * CHAR_BIT) /* size constant */ static inline int __log2(size_t x) { return 63 - __builtin_clzll(x); } #define WORD_ALIGNED 1 static inline void swap(void *restrict _a, void *restrict _b, size_t data_size) { #if WORD_ALIGNED long *a = (long *) _a, *b = (long *) _b; while ((data_size--) / sizeof(long) > 0) { long tmp = *a; *a++ = *b; *b++ = tmp; } #else char *a = (char *) _a, *b = (char *) _b; while (data_size-- > 0) { char tmp = *a; *a++ = *b; *b++ = tmp; } #endif } void sort(void *_array, size_t length, size_t data_size, comparator_t comparator) { if (length == 0) return; char *array = (char *) _array; const size_t max_thresh = data_size << 2; const int max_depth = __log2(length) << 1; /* Temporary storage used by both heapsort and shellsort */ char tmp[data_size]; if (length > 16) { char *low = array, *high = array + idx(length - 1); stack_node_t stack[STACK_SIZE] = {0}; stack_node_t *top = stack + 1; int depth = 0; while (stack < top) { /* Exceeded max depth: do heapsort on this partition */ if (depth > max_depth) { size_t part_length = (size_t)((high - low) / data_size) - 1; if (part_length > 0) { size_t i, j, k = part_length >> 1; /* heapification */ do { i = k; j = (i << 1) + 2; memcpy(tmp, low + idx(i), data_size); while (j <= part_length) { if (j < part_length) j += (comparator(low + idx(j), low + idx(j + 1)) < 0); if (comparator(low + idx(j), tmp) <= 0) break; memcpy(low + idx(i), low + idx(j), data_size); i = j; j = (i << 1) + 2; } memcpy(low + idx(i), tmp, data_size); } while (k-- > 0); /* heapsort */ do { i = part_length; j = 0; memcpy(tmp, low + idx(part_length), data_size); /* Floyd's optimization: * Not checking low[j] <= tmp saves nlog2(n) comparisons */ while (j < part_length) { if (j < part_length - 1) j += (comparator(low + idx(j), low + idx(j + 1)) < 0); memcpy(low + idx(i), low + idx(j), data_size); i = j; j = (i << 1) + 2; } /* Compensate for Floyd's optimization by sifting down * tmp. This adds O(n) comparisons and moves. */ while (i > 1) { j = (i - 2) >> 1; if (comparator(tmp, low + idx(j)) <= 0) break; memcpy(low + idx(i), low + idx(j), data_size); i = j; } memcpy(low + idx(i), tmp, data_size); } while (part_length-- > 0); } /* pop next partition from stack */ --top; --depth; low = top->low; high = top->high; continue; } /* 3-way "Dutch national flag" partition */ char *mid = low + data_size * ((high - low) / data_size >> 1); if (comparator(mid, low) < 0) swap(mid, low, data_size); if (comparator(mid, high) > 0) swap(mid, high, data_size); else goto skip; if (comparator(mid, low) < 0) swap(mid, low, data_size); skip:; char *left = low + data_size, *right = high - data_size; /* sort this partition */ do { while (comparator(left, mid) < 0) left += data_size; while (comparator(mid, right) < 0) right -= data_size; if (left < right) { swap(left, right, data_size); if (mid == left) mid = right; else if (mid == right) mid = left; left += data_size, right -= data_size; } else if (left == right) { left += data_size, right -= data_size; break; } } while (left <= right); /* Prepare the next iteration * Push larger partition and sort the other; unless one or both * smaller than threshold, then leave it to the final shellsort. * Both below threshold */ if ((size_t)(right - low) <= max_thresh && (size_t)(high - left) <= max_thresh) { --top; --depth; low = top->low; high = top->high; } /* Left below threshold */ else if ((size_t)(right - low) <= max_thresh && (size_t)(high - left) > max_thresh) { low = left; } /* Right below threshold */ else if ((size_t)(right - low) > max_thresh && (size_t)(high - left) <= max_thresh) { high = right; } else { /* Push big left, sort smaller right */ if ((right - low) > (high - left)) { top->low = low, top->high = right; low = left; ++top; ++depth; } else { /* Push big right, sort smaller left */ top->low = left, top->high = high; high = right; ++top; ++depth; } } } } /* Clean up the leftovers with shellsort. * Already mostly sorted; use only small gaps. */ const size_t gaps[2] = {1ul, 4ul}; int i = 1; do { if (length < 4) continue; for (size_t j = gaps[i], k = j; j < length; k = ++j) { memcpy(tmp, array + idx(k), data_size); while (k >= gaps[i] && comparator(array + idx(k - gaps[i]), tmp) > 0) { memcpy(array + idx(k), array + idx(k - gaps[i]), data_size); k -= gaps[i]; } memcpy(array + idx(k), tmp, data_size); } } while (i-- > 0); } ``` [Introsort](https://en.wikipedia.org/wiki/Introsort) 又可進一步擴充為 [Pattern Defeating Quicksort](https://github.com/orlp/pdqsort)，可簡稱為 pdqsort，縮減比較的次數，進行細部改良，可參見論文 [Pattern-defeating Quicksort](https://drive.google.com/file/d/0B1-vl-dPgKm_T0Fxeno1a0lGT0E/view)。 ![](https://i.imgur.com/Ol3xr7j.png) [swenson/sort](https://github.com/swenson/sort) 彙整多項排序實作，並提供多種情境的效能評估，參考執行輸出: ``` Qsort 100000 x86_64 109 9216431.2 ns/op Binary_insertion_sort 100000 x86_64 1 1020912000.0 ns/op Bitonic_sort 100000 x86_64 2 994466000.0 ns/op Quick_sort 100000 x86_64 185 5419470.3 ns/op Merge_sort 100000 x86_64 151 6638079.5 ns/op Heap_sort 100000 x86_64 164 6099676.8 ns/op Shell_sort 100000 x86_64 113 8874380.5 ns/op Tim_sort 100000 x86_64 135 7445088.9 ns/op Merge_sort_in_place 100000 x86_64 158 6372259.5 ns/op Grail_sort 100000 x86_64 122 8244967.2 ns/op Sqrt_sort 100000 x86_64 144 6978180.6 ns/op Rec_stable_sort 100000 x86_64 36 27847444.4 ns/op Grail_sort_dyn_buffer 100000 x86_64 139 7201316.5 ns/op ``` 延伸閱讀: * CppCon 2019: Andrei Alexandrescu "[Speed Is Found In The Minds of People](https://youtu.be/FJJTYQYB1JQ)" - [簡報檔案](https://github.com/CppCon/CppCon2019/blob/master/Presentations/speed_is_found_in_the_minds_of_people/speed_is_found_in_the_minds_of_people__andrei_alexandrescu__cppcon_2019.pdf) > In all likelihood, sorting is one of the most researched classes of algorithms. It is a fundamental task in Computer Science, both on its own and as a step in other algorithms. Efficient algorithms for sorting and searching are now taught in core undergraduate classes. Are they at their best, or is there more blood to squeeze from that stone? This talk will explore a few less known – but more allegro! – variants of classic sorting algorithms. And as they say, the road matters more than the destination. Along the way, we'll encounter many wondrous surprises and we'll learn how to cope with the puzzling behavior of modern complex architectures. --- ## [Shaoen-Lin](https://github.com/Shaoen-Lin?tab=repositories) ### Q1：`void *` 引入的意義是什麼？在 ANSI C (C89) 標準化之前，C 語言沒有 `void *` 的。當時的開發者面臨著一個問題：如何表示一個「通用指標」（Generic Pointer）根據 C99 6.3.2.3 Pointers 的歷史脈絡說明，在 C89 之前，C 語言社群約定俗成使用 `char *` 來充當通用指標（例如早期的 malloc 和 memcpy 都是回傳/接收 `char *`）。但這帶來了嚴重的語意混淆與漏洞包含： * `char *` 明確代表「指向字元或字串的指標」，用它來傳遞 struct 或 int 的記憶體位址在語意上並不合理。 * 由於 char 的大小為 1 byte，編譯器允許對 char * 直接進行 Dereference 與指標運算（例如：ptr++）。這意味著當傳遞一個通用記憶體區塊時，編譯器無法阻止你意外地存取或偏移它，容易造成嚴重的記憶體越界或非預期的行為。為了解決這個問題，ANSI C 委員會引入了 `void` 與 `void *`。根據 C99 6.2.5 > The void type comprises an empty set of values; it is an incomplete type that cannot be completed. 因為 void 是一種「永遠無法被補全的不完整型別」，這賦予了 `void *` 兩項關鍵特性： 1. 禁止解指標（Dereferencing）：因為編譯器不知道 void 的大小，所以 *ptr 在標準 C 中是違法的。 2. 禁止指標運算（Pointer Arithmetic）：因為缺乏 sizeof(void)，ptr++ 在標準 C 中也是未定義或違法的。 :::danger 你沒有回覆最關鍵的問題，C 語言標準化的過程中，為何引入 `void`？參閱規格書。 ::: 接者我們來探討下方的問題： ### Q2：為什麼下面這段程式會編譯的過？會印出什麼東西？又是什麼意思？ ```c #include <stdio.h> int main(int num) { void* ptr = 0; ptr++; printf("%x %x\n", ptr, *(char *)ptr); return 0; } ``` #### 輸出結果：可編譯無法執行 #### 解釋：在 [GCC extension](https://www.gnu.org/software/c-intro-and-ref/manual/html_node/Pointer-Arithmetic.html#Pointer-Arithmetic) 中，`sizeof(void *) = 1`，因此 `ptr++` 代表的是把 ptr 的位址 + 1 (byte)。根據 C99 6.3.2.3 > An integer constant expression with the value 0, or such an expression cast to type void , is called a null pointer constant. 所以一開始 `void* ptr = 0` 其實等於寫了 `void* ptr = (void *)0`。接者又根據 [Wikipedia - Segmentation fault](https://en.wikipedia.org/wiki/Segmentation_fault) > Dereferencing a null pointer, which results in undefined behavior, will usually cause a segmentation fault. This is because a null pointer cannot be a valid memory address. 說明為什麼我們始終可以成功編譯的出程式碼但是沒辦法去成功執行的問題 - 就是不斷的去存取一個 Null pointer. #### 實驗： ```c (gdb) break main Breakpoint 1 at 0x1158: file demo.c, line 5. (gdb) run Starting program: /home/phlab/Desktop/Practice/demo Breakpoint 1, main (num=1) at demo.c:5 5 void* ptr = 0; (gdb) print ptr $1 = (void *) 0x7fffffffd698 (gdb) step 8 ptr++; (gdb) print ptr $2 = (void *) 0x0 (gdb) x/1bx ptr 0x0: Cannot access memory at address 0x0 (gdb) step 11 printf("%x %x\n", ptr, (char *)ptr); (gdb) x/1bx ptr 0x1: Cannot access memory at address 0x1 ``` 上述實驗證明了解釋的正確性，但因為它顯示了 Cannot access memory at address 0x0 和 Cannot access memory at address 0x1，所以此時我們來觀察一下記憶體的內容。 ```c (gdb) info proc mappings process 8082 Mapped address spaces: Start Addr End Addr Size Offset Perms objfile 0x555555554000 0x555555555000 0x1000 0x0 r--p /home/phlab/Desktop/Practice/demo 0x555555555000 0x555555556000 0x1000 0x1000 r-xp /home/phlab/Desktop/Practice/demo 0x555555556000 0x555555557000 0x1000 0x2000 r--p /home/phlab/Desktop/Practice/demo 0x555555557000 0x555555558000 0x1000 0x2000 r--p /home/phlab/Desktop/Practice/demo 0x555555558000 0x555555559000 0x1000 0x3000 rw-p /home/phlab/Desktop/Practice/demo 0x7ffff7c00000 0x7ffff7c28000 0x28000 0x0 r--p /usr/lib/x86_64-linux-gnu/libc.so.6 0x7ffff7c28000 0x7ffff7db0000 0x188000 0x28000 r-xp /usr/lib/x86_64-linux-gnu/libc.so.6 0x7ffff7db0000 0x7ffff7dff000 0x4f000 0x1b0000 r--p /usr/lib/x86_64-linux-gnu/libc.so.6 0x7ffff7dff000 0x7ffff7e03000 0x4000 0x1fe000 r--p /usr/lib/x86_64-linux-gnu/libc.so.6 0x7ffff7e03000 0x7ffff7e05000 0x2000 0x202000 rw-p /usr/lib/x86_64-linux-gnu/libc.so.6 0x7ffff7e05000 0x7ffff7e12000 0xd000 0x0 rw-p 0x7ffff7fa7000 0x7ffff7faa000 0x3000 0x0 rw-p 0x7ffff7fbb000 0x7ffff7fbd000 0x2000 0x0 rw-p 0x7ffff7fbd000 0x7ffff7fc1000 0x4000 0x0 r--p [vvar] 0x7ffff7fc1000 0x7ffff7fc3000 0x2000 0x0 r--p [vvar_vclock] 0x7ffff7fc3000 0x7ffff7fc5000 0x2000 0x0 r-xp [vdso] --Type <RET> for more, q to quit, c to continue without paging--q Quit ``` 可以看到作業系統的第一個映射區段從 0x555555554000 開始, 因此從位址 0x0 到 0x555555554000 之間的區域是 Unmapped （未映射）的。任何對此區間的 Dereference 都會觸發 CPU 的異常，進而由 OS 發送 [SIGSEGV（SIGnal SEGmentation Violation）](https://zh.wikipedia.org/zh-tw/SIGSEGV)。 :::danger 改用 GDB 實驗並分析記憶體內容 ::: ### Q3：為什麼他會編譯的過？會印出什麼東西？ ```c #include <stdio.h> int main(int num) { float x = -1; void *ptr = &x; printf("%c\n", *(char *)ptr); return 0; } ``` #### 輸出結果： ```c PS C:\Users\User\Desktop\Linux> ./temp.exe PS C:\Users\User\Desktop\Linux> ``` #### 解釋：在 C 語言中，float 佔用 4 bytes。根據 IEEE 754 的單精度浮點數格式，-1.0 轉換成二進位後，再表示成十六進位，其值為： `0xBF800000`。由於我的電腦是 x86 架構，因此採用 Little-Endian 的方式將資料存入記憶體，`0xBF800000` 存進記憶體的順序變成這樣： ``` 第 1 個 byte：00 第 2 個 byte：00 第 3 個 byte：80 第 4 個 byte：BF ``` 接著，`*(char *)ptr` 代表對這個位置去進行強制轉型，又 char 永遠只佔用 1 個 byte，因此此程式等同於去**讀取 ptr 所指向位址中的第一個 byte**。也就是讀到 `0x00`。當程式嘗試將 `0x00` 以 ASCII 字元的形式輸出時，在 ASCII 字元表中，數值 0 代表的是 NUL（Null character），這是一種不可見的控制字元。這時如果設計一個實驗： ```c #include <stdio.h> int main(int num) { float x = -1; unsigned char *ptr = (unsigned char *)&x; printf("Byte 0: %02x\n", *(ptr + 0)); printf("Byte 1: %02x\n", *(ptr + 1)); printf("Byte 2: %02x\n", *(ptr + 2)); printf("Byte 3: %02x\n", *(ptr + 3)); return 0; } ``` 輸出為： ```c Byte 0: 00 Byte 1: 00 Byte 2: 80 Byte 3: bf ``` 此實驗證明以上文獻正確無誤。 ### Q4： #### 1. incompatible cast 在 C 語言中，Incompatible Cast（不相容的轉型/轉換）指的是在編譯階段，程式碼試圖將兩種底層意義完全不同、且記憶體佈局無法直接對應的資料型別進行混用或賦值。根據 C99 6.2.7 >Two types have compatible type if their types are the same. >If both are completed types, then the following additional requirements apply: there shall be a one-to-one correspondence between their members such that each pair of corresponding members are declared with compatible types, and such that if one member of a corresponding pair is declared with a name, the other member is declared with the same name 如上述 Q2，一個『浮點數指標 float*』直接指派給一個『void *』，在編譯器的眼中，這是兩種不相容（Incompatible）的物種。如果我們沒有明確加上強制轉型符號 ()，編譯器會基於安全防護機制，拋出 incompatible pointer type 等警告或錯誤。但若加上 () 進行強制轉型後，去讀取或修改那個位址裡面的值，此情形則觸犯了「Strict Aliasing Rule」。 #### 2. printf limitation：不能用 printf 來測試時能怎麼測試程式？有時候我們不能用 printf，可能是因為印出資料會拖慢時間導致 Bug 消失（類似 [Heisenbug](https://en.wikipedia.org/wiki/Heisenbug)）或是寫作業系統核心（當時標準函式庫根本還沒載入）等問題而不能使用 printf 幫我們 Debug。因此我們只能使用其他的手段進行除錯： 1. [Debugger](https://en.wikipedia.org/wiki/Debugger)(GDB)：當程式執行時，除錯器允許開發者在不修改任何原始碼的情況下，動態介入並透視程式的記憶體與變數狀態。透過設立「中斷點（Breakpoints）」，我們可以讓程式在特定的行數暫停，接著利用「單步執行」觀察邏輯流向，或是透過「觀察點（Watchpoints）」監聽特定位址的數值何時被意外修改。這種方式不僅不會像 printf 那樣大量消耗 I/O 資源而干擾程式的執行時序，還能讓我們在程式錯誤時，透過回溯 Call Stack 快速定位出問題發生的源頭。 2. [Unit Testing](https://en.wikipedia.org/wiki/Unit_testing)：它的核心精神是將龐大複雜的程式碼，隔離並拆解成最微小的「可測試單元」，然後另外撰寫獨立的測試程式，以嚴格驗證這些單元在面對各種正常輸入、邊界條件與錯誤格式時，回傳的結果是否符合預期。導入單元測試後，開發者只要一鍵就能瞬間執行成千上百個測試案例，這能確保未來修改程式碼時，只要跑過之前的測試新程式碼就不會意外破壞舊有功能。 3. [Assertion](https://en.wikipedia.org/wiki/Assertion_(software_development))：是一種將「絕對成立的條件」直接寫入程式碼中的設計。平常只要資料狀態符合預期，就不會產生任何輸出與干擾；但只要運作狀態一出現違規（例如傳入的指標為空，或是陣列索引越界），斷言機制就會立刻強制中斷程式，並在終端機印出錯誤發生的具體檔案名稱與行號。「Fail-Fast」的機制，能有效避免錯誤的變數狀態繼續往下傳遞，演變成更難追蹤的深層當機。 #### 3. `**ptr;` real usage 在 C 語言中，因為參數傳遞預設是「pass-by-value」，如果只傳遞單一指標 *ptr，函數內部只會拿到該指標的「副本」。如果想要在函數內部改變指標本身的指向，就必須傳遞該指標的位址，也就是指標的指標 `**ptr`。在 Linux kernel 中，最典型的例子就是：字串處理函數 `strsep` (位於 [lib/string.c](https://github.com/torvalds/linux/blob/master/lib/string.c) 中) strsep 的功能是「切割字串」。當它在字串中找到分隔符號時，會將分隔符號替換成字串結束字元 `\0`，並且將原本的指標向後移動，指向下一個字串的開頭。為了做到「更新呼叫者原本的指標」，它的第一個參數就必須接收 `char **s`。 ```c /** * strsep - Split a string into tokens * @s: The string to be searched * @ct: The characters to search for * * strsep() updates @s to point after the token, ready for the next call. */ char *strsep(char **s, const char *ct) { char *sbegin = *s; char *end; if (sbegin == NULL) return NULL; end = strpbrk(sbegin, ct); if (end) *end++ = '\0'; *s = end; return sbegin; } ``` 這個程式碼的重點不外乎是 `*s = end;`，透過解開指標的指標 *s，直接把呼叫者的原本指標更新為 end。這樣一來，呼叫者在迴圈中下次再呼叫 strsep 時，指標就已經自動推進，從剩下的字串開始找起了。 ## yushiuan9499 > 為什麼 linked list 比較適合 mergesort，而不是 quicksort？ - quicksort 在最差條件下會退化成 $O(N^2)$ - 從兩種 sort 在以下描述的性質可以發現， merge sort 的比較次數比 quicksort 少。雖然在陣列上 merge sort 會有「不利 cache」、「多次的資料搬動」的缺點，但是在 linked list 上此缺點就會變得不明顯。 > mergesort 和 quicksort 的執行成本的機率分佈，例如交換 / 比較的最多 / 平均次數。 - [ ] mergesort 的比較：最差是 $N\log_2{N} - N$ ，最佳是 $(N/2)\log_2{N}$ **上界證明** 令 $F(N)$ 表示 merge sort $N$ 個元素的最多比較次數。在合併出長度為 $N$ 的陣列時，當最大和第二大的元素在來自不同分塊時，此時會有最多比較次數 $N-1$ 次。因此可以列式 $$ F(N) = \begin{cases} 0, & \text{if}~N = 1 \\ N-1 + F(\lfloor \frac{N}{2} \rfloor) + F(\lceil \frac{N}{2} \rceil), & \text{if}~N > 1 \end{cases} $$ 接著列出前 6 項， | N | F(N) | | - | ---- | | 1 | 0 | | 2 | 1 | | 3 | 3 | | 4 | 5 | | 5 | 8 | | 6 | 11 | 可以發現有以下關係 $$f(N) = f(N-1) + \lceil \log_2{N} \rceil$$ 可以藉由證明 $N$ 是偶數時會與前後兩項有以上關係來證明。 - 它的前項因為 $N$ 是偶數，所以可知 $F(N) = N - 1 + F(\frac{N}{2}) + F(\frac{N}{2})$ $F(N-1) = N - 2 + F(\frac{N}{2} - 1) + F(\frac{N}{2})$ 故 $F(N) = F(N-1) + 1 + F(\frac{N}{2}) - F(\frac{N}{2} - 1) = F(N-1) + 1 + \lceil \log_2{N/2} \rceil = F(N-1) + \lceil \log_2{N} \rceil$ - 它的後項因為 $N$ 是偶數，所以可知 $f(N) = N - 1 + f(\frac{N}{2}) + f(\frac{N}{2})$ $f(N+1) = N - 2 + f(\frac{N}{2} + 1) + f(\frac{N}{2})$ 故 $f(N+1) = f(N) + 1 + f(\frac{N}{2} + 1) - f(\frac{N}{2}) = f(N-1) + 1 + \lceil \log_2{(N+2)/2} \rceil$ 因為 $N$ 是偶數，因此 $\lceil \log_2{(N+2)} \rceil = \lceil \log_2{(N+1)} \rceil$ 。 $f(N+1) = f(N-1) + \lceil \log_2{(N+1)} \rceil$ 接下來就是求 $\sum_{k = 2}^{N} \lceil \log_2{N} \rceil = N \lceil \log_2{N} \rceil - 2^{\lceil \log_2{N} \rceil} + 1$ 故上界是 $N \lceil \log_2{N} \rceil - 2^{\lceil \log_2{N} \rceil} + 1 \sim N \log{N} - N$ **下界證明** 令 $f(N)$ 表示 merge sort $N$ 個元素的最少比較次數。在合併出長度為 $N$ 的陣列時，當較小的元素都來自較小的分塊時，此時會有最少比較次數 $\lfloor N/2 \rfloor$ 次。已知當 $N = 1$ 時， $f(N) = 0$ ，符合下界 $(\lfloor N/2 \rfloor)\log_2{N}$ 若 $N = \lfloor \frac{k}{2} \rfloor$ 和 $N = \lceil \frac{k}{2} \rceil$ 時都滿足下界 $(\lfloor N/2 \rfloor)\log_2{N}$ ，當 $N = k$ 時 $$ f(N) = \lfloor \frac{k}{2} \rfloor + f(\lfloor \frac{k}{2} \rfloor) + f(\lceil \frac{k}{2} \rceil) >= \lfloor \frac{k}{2} \rfloor + \lfloor \frac{k}{2} \rfloor\log_2{\lfloor \frac{k}{2} \rfloor} = \lfloor \frac{k}{2} \rfloor(\log_2{\lfloor \frac{k}{2} \rfloor} + 1)$$ $$ f(N) = \lfloor \frac{N}{2} \rfloor\log_2{N} $$ 故下界成立。 - mergesort 的移動：在陣列上，最差、最佳、平均是 $2N\log_2{N}$ 不論是何種狀況， merge sort 都要來回搬運 $N$ 個元素。又因為 merge sort 會遞迴 $\lceil \log_2{N} \rceil$ ，每一層的每一個分塊都來回搬運 $2N$ 次，所以一定是 $2N\log_2{N}$ 。 - quicksort 的比較：最差是 $(N^2 - N)/2$ ，最佳是 $N\log{N} - 2N$，平均是 $1.39 N\log{N} - N$ **上界證明** 當每次選到的 pivot 是所有元素的最小或最大時，此時會分割出最不均勻的 $0$ 和 $N-1$ 。因此可以得到 $$ F(N) = N - 1 + F(N-1) $$ $$ F(N) = \sum_{k = 1}^{N} (k - 1) = (N^2 - N)/2 $$ **平均證明** 另 $E(N)$ 表示 quicksort $N$ 個元素的期望比較次數。因為每次選到的 pivot 會均勻的來自排序後的不同 rank ，所以 $$ E(N) = N - 1 + \frac{1}{N} \sum_{k = 0}^{N-1} (E(k) + E(N - k - 1)) $$ 兩邊同乘 $N$ $$ NE(N) = N(N - 1) + \sum_{k = 0}^{N-1} (E(k) + E(N - k - 1)) $$ 再將 $NE(N)$ 減去 $(N-1)E(N-1)$ ，得 $$ NE(N) - (N-1)E(N-1) = N(N-1) - (N-1)(N-2) + 2E(N-1) $$ 移項整理後得 $$ \frac{E(N)}{N+1} = \frac{E(N-1)}{N} + 2[\frac{1}{N+1} - \frac{1}{N(N+1)}] $$ 考慮到 $1/N(N+1) = 1/N - 1/(N+1)$，所以不斷迭代後得到 $$ \frac{E(N)}{N+1} = 2\sum_{k = 3}^{N+1}\frac{1}{k} + (\frac{2}{N+1} - 1) $$ 其中 $\sum_{k = 3}^{N+1}\frac{1}{k} \sim \ln{(N+1)}$ 故 $$ E(N) \sim 2N\ln{(N)} - N = 1.39 N \log_2{N} - N $$ **下界證明** 當 pivot 能夠將當前的元素等分時，會得到很像 merge sort 上界的公式。 $$ f(N) = \begin{cases} 0, & \text{if}~N \le 1 \\ N-1 + f(\lfloor \frac{N-1}{2} \rfloor) + f(\lceil \frac{N-1}{2} \rceil), & \text{if}~N > 1 \end{cases} $$ 同理， $$ f(N) = f(N-1) + \lfloor \log_2{N} \rfloor $$ 故下界是 $(N+1)\lfloor \log_2{N} \rfloor - 2^{\lfloor \log_2{N} + 1 \rfloor} + 2 \sim N\log_2{N} - 2N$ - quicksort 的交換：最佳是 $0$ ，平均是 $\frac{0.69 N}{3} \log_2{N} + \frac{1}{4}N$ ，上界猜測是 $frac{N}{2} \log_2{N}$ 註：因為 quicksort 的 partition 有很多種，所以我選擇分析 g++ 用的方法 ```c /* 這是簡化版的 */ int *__unguarded_partition(int *__first, int *__last, int *__pivot) { while (true) { while (*__first < *__pivot) ++__first; --__last; while (*__pivot < *__last) --__last; if (__first >= __last) return __first; swap(__first, __last); ++__first; } } ``` **下界證明** 只要陣列一開始就有序，就不用交換。故下界為 0 。 **平均** 證明是參考 [The Quicksort algorithm and related topics](https://arxiv.org/pdf/1503.02504)。令 $k$ 表示 pivot 在當前區間的 rank ，則每個元素大於 pivot 的機率是 $\frac{N - k}{N - 1}$ ，原本佔據前 $k-1$ 個位置的元素交換數量的期望值是 $\frac{N - k}{N - 1} (k-1)$ 。因為交換是兩兩相對，因此這就是單次交換數量的期望值。對於所有可能的 $k$ 交換數量的期望值是 $\frac{1}{N}\sum_{k = 1}^{N} \frac{N - k}{N - 1} (k-1) + 1$ ， $1$ 是把 pivot 換到中間，得到的值是 $\frac{6}{N} + \frac{2}{3}$ 。故可以得到遞迴關係式 $$ E(N) = \frac{6}{N} + \frac{2}{3} + \frac{1}{N} \sum_{k = 0}^{k = N - 1} (E(k) + E(N-k-1)) $$ 可以注意到這與交換次數的公式很像，因此同理 $$ N E(N) = (N+1)E(N) + \frac{2N + 3}{6} $$ 再同理得到 $$ E(N) = \frac{1}{3} \ln{N} + \frac{1}{4} N $$ **上界** 因為最多交換次數取決於 partition 後較小的塊，這個值越大代表切的越均勻，因此我還沒想到該如何計算。根據我用 DP 模擬出來的結果，發現上界幾乎都發生在 partition 均勻的時候。 ```c for (int i = 2; i < 64; i++) { int from = -1; for (int j = 0; j <= i / 2; j++) { if (j + dp[j] + dp[i - j - 1] >= dp[i]) { dp[i] = j + dp[j] + dp[i - j - 1]; from = j; } } printf("dp[%d] = %lu, from = %d\n", i, dp[i], from); } ``` 假設這個規律對所有數量的排序為真，那麼根據 merge sort 的下界證明， quick sort 的交換次數的上界同樣也會是 $\frac{N}{2} \log_2{N}$ > mergesort 為什麼對 cache 不友善？ merge sort 在每層遞迴的排序中，會先把當前陣列上的資料按照大小移動到暫存用的陣列，最後在移動回來，亦即來回存取多個地方，這樣會降低 locality。假如當前要排序的資料的記憶體位置是 $X$ ，暫存用的陣列位於 $Y$ ，那麼在搬移的過程中，會需要先讀取一處的資料，再寫入到另一邊，兩者差距 $X-Y$ 。因為記憶體上要有空間擺放其中一個陣列，因此這個值一定是大於當前要排序的區間的長度。相較之下， quick sort 是直接在陣列上交換，有較好的 locality。假如當前要排序的區間是 $[l,r]$ ，那麼交換時就一定是讀取和寫入在這個區間的記憶體，因此存取之間的最遠距離就是 $r-l$ ，這個值一定小於當前要排序區間的長度。 :::danger 需要更精準的描述 ::: ### 實驗 [程式碼](https://gist.github.com/yushiuan9499/0ecf65560ef033d9b0259b5cbd54d4fa) 讓 merge sort 和 quick sort 對隨機的資料排序的時間如下。以下皆是在 `-O2` 的環境下執行 | len | msort | qsort | | :-: | :-----: | :-----: | | 5e6 | 0.466 s | 0.399 s | | 1e7 | 0.967 s | 0.820 s | | 1e8 | 10.972 s| 9.261 s | 三種長度下 quicksort 的執行時間都較短。在 `cmp` 當中插入計數器計算比較次數。 | len | msort | qsort | $1.39 N \log_2{N} - N$ | | :-: | :-----: | :-----: | :----: | | 5e6 | 1.051e8 | 1.40e8 | 1.492e8 | | 1e7 | 2.201e8 | 2.971e8 | 3.124e8 | | 1e8 | 2.147e8 | 3.42e9 | 3.584e8 | 三種長度下 merge sort 的比較次數較少。比較在 1e7 和 1e8 的情況下 quick sort 和 merge sort 的 cache-misses 。 merge sort 1e7 ``` <not counted> cpu_atom/cache-references/ (0.00%) 16,461,235 cpu_core/cache-references/ <not counted> cpu_atom/cache-misses/ (0.00%) 5,388,430 cpu_core/cache-misses/ # 32.73% of all cache refs ``` quick sort 1e7 ``` 1,950,099 cpu_atom/cache-references/ (0.11%) 6,303,664 cpu_core/cache-references/ (99.89%) 78,955 cpu_atom/cache-misses/ # 4.05% of all cache refs (0.11%) 2,136,008 cpu_core/cache-misses/ # 33.89% of all cache refs (99.89%) ``` merge sort 1e8 ``` 93,743,053 cpu_atom/cache-references/ (0.04%) 242,768,421 cpu_core/cache-references/ (99.96%) 73,026,926 cpu_atom/cache-misses/ # 77.90% of all cache refs (0.04%) 88,601,667 cpu_core/cache-misses/ # 36.50% of all cache refs (99.96%) ``` quick sort 1e8 ``` 78,142,867 cpu_atom/cache-references/ (0.23%) 95,455,523 cpu_core/cache-references/ (99.77%) 34,747,834 cpu_atom/cache-misses/ # 44.47% of all cache refs (0.23%) 35,353,544 cpu_core/cache-misses/ # 37.04% of all cache refs (99.77%) ``` 上面的結果可以看出， quick sort 的 cache-misses 少於 merge sort 的 cache-misses ，原因應該就是 cache locality 。而且 merge sort 存取 cache 的次數也比較多，原因是因為 merge sort 的移動次數比 quick sort 的交換次數多。這解釋了為什麼 merge sort 的比較次數較少，但是執行速度卻比較慢。再來比較串列的版本讓 merge sort 和 quick sort 對隨機的資料排序的時間如下。以下皆是在 `-O2` 的環境下執行 | len | msort | qsort | | :-: | :-----: | :-----: | | 1e6 | 0.296 s | 0.512 s | | 1e7 | 6.115 s | 12.809 s | 這次就是 merge sort 的速度比 quick sort 快兩倍，且差距比陣列的還明顯。這支持 merge sort 有更少的比較次數，又在 linked list 下所有演算法都很難利用 cache locality 的論點。 --- ## Patriciaath999 **問：最小的數在2補數系統做 abs(int_min) 時輸出是什麼？** 根據 C 語言規格書 (6.2.6.2p6,7)定義所有有號數皆為二補數，如下圖： ![image](https://hackmd.io/_uploads/H10uo39c-e.png) 根據 C 語言規格書 (5.2.5.3.2p3)定義 abs(int_min) 是未定義行為： ![image](https://hackmd.io/_uploads/By-uu655-e.png) ```c #include <stdint.h> int32_t abs(int32_t x) { int32_t mask = (x >> 31); return (x + mask) ^ mask; } ``` 此時 x = 10000000000000000000000000000000， mask 在左移 31 位時會複製符號位變成：11111111111111111111111111111111 = -1 根據 C 語言規格書 (C23, 6.5.8p5) 關於有號數負數位移的定義如下圖： ![image](https://hackmd.io/_uploads/rJjAU3q9Wg.png) (x + mask) 運算造成 overflow 根據C 語言規格書 (C23, 6.5.1p5) 此為未定義行為： ![image](https://hackmd.io/_uploads/Ska6Vpqc-x.png) 但在硬體的設計上，他只會保留最後 32 bits 所以輸出 (x + mask) = 01111111111111111111111111111111 = 2147483647， (x + mask) 跟 mask 做 XOR = 10000000000000000000000000000000，所以會輸出 -2147483648。 :::danger 參照 C 語言規格書 (C23)，探討其針對二補數系統和 INT32_MIN 的相關定義，來說明上述行為 ::: **問：為什麼選用2補數而不是1補數？** 在加法器設計上有優勢，不需要再考慮 carry in，而且1補數不符合 Abelian group ，違反 inverse element ，會有產生-0的可能，所以不屬於 Abelian group ，會讓 ALU 的運行變成不可預測(有兩答案)。 ## grawis > https://github.com/torvalds/linux/blob/master/lib/list_sort.c ### Q1 > merge sort worst case 分析比較合併次數如何創造最糟情況不論是 1. Top-Down Merge Sort * 找中點對半切分 * 遞迴排序 left/right * 直到切分為每個元素為size 1 list後merge 2. Bottom-Up Merge Sort（Linux kernel中使用） * 把每個元素當作 size = 1 的 sorted list * 兩兩大小相等時merge (count 的 binary bit實作) 其最壞情況的時間複雜度分析是相同的。 #### Merge 的比較次數當合併兩個已排序的 list * list A 長度 = a * list B 長度 = b 最壞情況下需要比較 $a + b - 1$ 次，因此 merge 一次的成本為 $O(a + b)$。 #### 每一層 merge 的成本 Merge sort 每一層都要合併整個 array 的所有元素 Ex. $n = 8$ ``` 1 1 1 1 1 1 1 1 ↓ merge 2 2 2 2 ↓ merge 4 4 ↓ merge 8 ``` 每一層總共處理的元素數量都是 n，因此每一層的時間複雜度為 $O(n)$ #### merge 的層數無論是 Top-Down 還是 Bottom-Up 做法，最終都會把 size=1 的 list 合併成 size=n 的 list。由於每次合併大小會乘 2，因此需要 $log_2 n$ 層因此 worst case 總時間為 $$ T(n) = n + n + n + \dots \; (\log_2 n \text{ 層}) $$ $$ T(n) = O(n \log_2 n) $$ ### Q2 > 改寫 list_sort.c的merge pointer to pointer改一層pointer 驗證比較兩者行為品味 ```c static struct list_head *merge(void *priv, list_cmp_func_t cmp, struct list_head *a, struct list_head *b) { struct list_head *head, **tail = &head; u8 count = 0; for (;;) { /* if equal, take 'a' -- important for sort stability */ if (cmp(priv, a, b) <= 0) { *tail = a; tail = &a->next; a = a->next; if (!a) { *tail = b; break; } } else { *tail = b; tail = &b->next; b = b->next; if (!b) { *tail = a; break; } } } return head; } ``` 以下為修改為單層指標的 merge 函式寫法 ```c static struct list_head *merge(void *priv, list_cmp_func_t cmp, struct list_head *a, struct list_head *b) { struct list_head *head; struct list_head *tail; struct list_head *picked; /* first node */ if (cmp(priv, a, b) <= 0) { head = a; a = a->next; } else { head = b; b = b->next; } tail = head; for (;;) { /* a or b end*/ if (!a) { tail->next = b; break; } if (!b) { tail->next = a; break; } /* if equal, take 'a' -- important for sort stability */ if (cmp(priv, a, b) <= 0) { picked = a; a = a->next; } else { picked = b; b = b->next; } tail->next = picked; tail = picked; } return head; } ``` #### 兩者行為比較：第一個節點的判定 * 單層 pointer 在單層 pointer 版本裡，tail 代表的是：目前 merged list 的最後一個節點。而一開始 merged list 還是空的時候，根本沒有最後一個節點，所以不能直接寫： ```c tail->next = picked; ``` 因此單層 pointer 寫法一定要先做初始化 ```c if (cmp(priv, a, b) <= 0) { head = a; a = a->next; } else { head = b; b = b->next; } tail = head; ``` 手動選出第一個節點，建立 head 與 tail * pointer-to-pointer 寫法 pointer-to-pointer 寫法裡的 tail 並不是尾巴節點，而是：下一個要被填入的 pointer 欄位位置。一開始的 ```c tail = &head; ``` 就表示「第一個節點應填入 head」因此不論現在要接的是第一個節點（接到 head），或是後續節點（接到某個節點的 next）都能寫成 ```c *tail = picked; tail = &picked->next; ``` 可以用同一套寫法同時處理 head 初始化與後續串接。 #### 程式碼品味藉由pointer-to-pointer 操作，第一個節點的特例判斷就不存在，仍能運用一致的寫法達到原本的功能。因為把 first-node case 消掉了，所以這類 bug surface 會縮小，只需維持一種 invariant、一種規則思考： > tail always points to the place where the next node should be attached. 較不容易因未來修改(如改成 circular list merge )而破壞 head/tail 初始化邏輯導致兩種 case 行為不一致。 :::danger 務必依循[第一份作業](https://hackmd.io/@sysprog/linux2026-warmup)規範的書寫方式，留意中英文間用一個半形空白字元區隔 :::

Syntax	Example	Reference
# Header	Header	基本排版
- Unordered List	Unordered List
1. Ordered List	Ordered List
- [ ] Todo List	Todo List
> Blockquote	Blockquote
Bold font	Bold font
Italics font	Italics font
~~Strikethrough~~	~~Strikethrough~~
19^th^	19^th
H~2~O	H₂O
++Inserted text++	Inserted text
==Marked text==	Marked text
[link text](https:// "title")	Link
![image alt](https:// "title")	Image
`Code`	`Code`	在筆記中貼入程式碼
```javascript var i = 0; ```	`var i = 0;`
:smile:		Emoji list
{%youtube youtube_id %}	Externals
$L^aT_eX$	L^aT_eX
:::info This is a alert area. :::	This is a alert area.