2024q1 Homework5 (assessment)

# 2024q1 Homework5 (assessment) contributed by < `LULser0204` > ## 測驗題改進與提問 ## 閱讀〈因為自動飲料機而延畢的那一年〉的啟發作者在周圍同學選擇念研究所或是工作的時候，選擇了風險更高且前途未卜的路險，整個計畫的開發紀錄給我一些啟發。 > 「學生往往都是先求有不求好，焊點沒焊牢、虛焊假焊的一堆。簡單的電路還好，但是遇到複雜的電路就弄得亂七八糟，電線不牢一拔就掉、以為焊好了結果發現到處都是短路、小小一塊洞洞板弄得到處都是銲錫」首先是解決現實的問題，跟在學校課程中的作業、實驗是完全不同的。在課堂上碰到的事情抱持的心態都是能動就好，但現實中往往要考慮到很多因素，如果寫出來的東西沒人用，這種程度不能稱作為會寫程式。 > 「你不能現在就放棄，要是現在就放棄的話，你這輩子日後遇到這種等級的困難，就只會想逃避而已。」 >「你最大的問題在太害怕失敗了，既然都已經決定要延畢做飲料機了，那就要好好做，才不會辜負當初自己的期望」這兩句話深有感觸，我也很害怕失敗。想做好一件事，但又會想東想西，怕時間砸下去沒辦法在 deadline 前做出來，或者製作過程不無預期，碰到點問題，網路上沒有解法就選擇放推，到最後什麼都沒做好，自身也沒有任何成長。回顧前五周 linux 核心實作這堂課，前面的實作對我來說真的是一大考驗，相比於過去上的課，這堂課的強度相當高。還有因為可以參考其他人的進度，可以看到許多修這門課的同學完成度相當之高，我都會陷入自我懷疑明明我也花了不少時間學習，但總是落後他人一節。常常被上述情緒影響，我應該修正態度，那些人是因為過去付出了足夠的努力才能做出來，而且和老師說的一樣「缺什麼就補什麼」，好好補足自己基礎/實作能力與數學理論的基礎在繼續前進。 > 飲料機的程式愷宏是寫不出來的，之所以能在一個月內完成，是因為我在大學期間就寫過好幾個網站了。愷宏能和工廠溝通、設計出可用的零件，是因為他在大一就在跑工廠做東西了。紘銘能輕易的設計出飲料機的電路，是因為他曾花了很多時間在電子電路課上頭，做了很多習題，纏著教授把每個疑惑都搞懂才罷休。 > 這個世界比任何人都殘酷，也比任何人都公平，犧牲了多少就會得到多少。飲料機現在能做到的事，就只是付出犧牲的結果罷了。想想自己在大學時期的學習態度，很多時候都是蒙混過去而沒學到什麼東西，基礎更是不用說。導致現在讀碩士連看論文都有問題，得花很多時間來補齊過去所欠下的技術債。 ### 課程心得在學習本課程後，我發現自己對於 C 語言和基礎學科的掌握度相當差勁，統計、線性代數、機率，連最基本的英文都有問題。所以我在這堂課絕大多數的時間都是花在彌補過去大學四年欠下的 **技術債**，好讓我可以看懂教材。而這些科目當初修課都取得不錯的成績，但要拿來使用卻拿不出手，得仰賴 Google 、 ChatGpt 尋求「速成」，但這種事隨便抓個人也辦的到，那為什麼要我呢 ? 然後要怎麼確認從這兩個管道獲得的資訊是不是正確的呢 ? **還是得回歸對規格書的重視，回去閱讀第一手材料**。作業一 lab0 還勉強能跟得上這堂課的節奏，但後面就開始落後很多。觀察到作業區其他同學的作業，甚至有些人開始對 Linux 核心作出貢獻，反觀自己還在努力補償技術債。面對日漸累積起來未完成的作業，每周的課堂教材，還有實驗室的進度，更難過的是本來實驗室同學一起修後來退選剩我一人，一度讓我思考是否該退場。但看到好幾篇找到工作的心得文都有提到這堂課，加上文章提到的 > 「你不能現在就放棄，要是現在就放棄的話，你這輩子日後遇到這種等級的困難，就只會想逃避而已。」 >「你最大的問題在太害怕失敗了，既然都已經決定要延畢做飲料機了，那就要好好做，才不會辜負當初自己的期望」既然都選下去，而且都過了 7 周了，乾脆就走到最後，盡量去完成每一個作業。這堂課真的是我截至目前為止，求學生涯中最硬的課，也是最實用的課。每次花一定的時間都能學習到新的東西，像是 git/github、makefile的運用、linked list/bitwise等，每次都能感受到自己慢慢再變強，讓我回想到當初準備考研究所的那幾個月，通過每一次重複刷考古->檢討->刷考古...，讓自己學得更扎實。 ## 研讀第 1 到第 6 週「課程教材」和 CS:APP #### CS:APP ##### 第一章 > 第一章主要在說明為什麼要理解計算機系統運作的原理 > 複習作業系統的知識， process 、 thread 、 I/O device...等大多數時候不太需要擔心這些事，但有時候可能非常重要，例如控制一個火箭、核電廠，你不希望正向的推力成為負向或類似的東西，所以就需要理解臨界情況。以最基礎的 hello 程式生命週期為例，會分別經過 preprocesseor -> Compiler -> Assembler -> Linker 這四個階段。 :::spoiler ```graphviz digraph G { node [shape=box]; // Set the shape of the nodes graph [rankdir=LR] "hello.c\nSource program\n(text)" -> "Pre-processor\n(cpp)\nhello.i\nModified source program\n(text)"; "Pre-processor\n(cpp)\nhello.i\nModified source program\n(text)" -> "Compiler\n(cc1)\nhello.s\nAssembly program\n(text)"; "Compiler\n(cc1)\nhello.s\nAssembly program\n(text)" -> "Assembler\n(as)\nhello.o\nRelocatable object programs\n(binary)"; "Assembler\n(as)\nhello.o\nRelocatable object programs\n(binary)" -> "Linker\n(ld)\nhello\nExecutable object program\n(binary)"; "printf.o\nRelocatable object program\n(binary)" -> "Linker\n(ld)\nhello\nExecutable object program\n(binary)"; } ``` ::: 疑問: 我觀看課堂PPT 舉 Memory Referencing Bug Example ```c typedef struct { int a[2]; double d; } struct_t; double fun(int i) { volatile struct_t s; s.d = 3.14; s.a[i] = 1073741824; /* Possibly out of bounds */ return s.d; } ``` volatile 不知道是什麼意思，去翻看規格書以及網路其他人的解釋 > An access to an object through the use of an lvalue of volatile-qualified type is a volatile access .A volatile access to an object, modifying a file, or calling a function that does any of those operations are all side effects volatile 英文意思是可變的。告訴編譯器變數的值可能隨時更改，編譯器在附近找到的程式碼不會執行任何操作。(因為編譯器會進行最佳化，如果變數的值更新的話，將出現 inconsistent 的現場) 通常用於 : Memory-mapped I/O devices 、 `sig_atomic_t` variables in signal handlers :::danger C 語言規格書談及 violate 時，並未指定虛擬記憶體。 ::: ##### 第二章 > Bits,Bytes,and Integers ###### 數值表示 >Bits 之所以在電腦科學領域廣為使用，是因為在數字世界中可以採取吉他方式的模擬訊號對其進行量化前面主要討論的有布林代數常見的: Bit-level operation : AND 、 OR 、 NOT 、XOR Logic operation : && 、 || 、 ! (念作 bang ) Shift operation : left shift ( x << y )、right shift( x >> y )，其中 right shift 還有分 logical shift (左邊補0)、Arithmetic shift(右移後，補上 most significant bit ) 接著，開始討論數值表示，有分無號數和有號數: 無號數表示法少了正負號，而有號數的格式則有 ( 正負號加上大小、一補數、二補數 ) 其中，比較無號數和二補數差別 | | 可表示範圍 | 數值 | | ------ |:------------------------ | ---- | | 無號數 | 0 ~ $2^{n}-1$ | $\sum_{i=0}^{n-1} Xi * 2^i$ | | 二補數 | $2^{n-1}-1$ ~ $-2^{n-1}$ | $Xw-1*2^{n-1}$-$\sum_{i=0}^{w-2} Xi * 2^i$ | 例如 5 個位元{ 10110 }，二補數計算方式是( -16+4+2 ) = -10，無號數則為(16+4+2)=22 二補數最大值和最小值可以用簡單的方法去思考如: max(8+4+2+1)=15 | min(-16+0+0+0+0)=-16 ## 簡述想投入的專案 <s>完整地完成每一次作業</s> https://hackmd.io/@sysprog/concurrency >to do 紀錄問題或心得 ### 概念: >mutex 和 semaphore semaphore :主要操作為 signal 或者 wait 。可用於從 interrupt service routine (ISR) 向 task 發出信號。 (可在完全不依賴硬體的情況下去實作，只是效率低) 例如:有三條thread A,B,C 。當 thread A 取得資源後，B 便會 waiting ，反之亦然。而 thread C 負責從第三方去實現 unlock 而 mutex 則是有 ownership 的概念，上述的操作不被允許。因為 mutex 所有權在 A ，但 C 想要去解鎖，然後未取得資源的 thread 會 deadlock :::warning > process 和 thread 差別 ? 參考:[Linux 核心設計: 不僅是個執行單元的 Process](https://hackmd.io/@sysprog/linux-process#Linux-%E8%A8%AD%E8%A8%88%E7%9A%84-trade-off-%E5%92%8C-evolution) > 取決於作業系統怎麼看待它。前者是目的，後者是手段，Process 本質上是 Processor 的虛擬化， Thread 又可稱作 Light-weight process > 在教科書上 Process 是 OS 進行資源分配最基本的單位(包含 Code Section、Data Section、OS Resources ) ; Thread 是 OS 進行 CPU Scheduling 運算最基本的單位，無法獨立存在，存在於 Process 中，包含( Stack、Register Set、Program Counter ) > 然而 Linux 並沒有一個明確定義 Thread 或 Process 。在特定狀況下，甚至可以互相轉換 ::: :::warning 什麼是 weak memory model ? >參考:[並行程式設計: Atomics 操作](https://hackmd.io/@sysprog/concurrency-atomics?utm_source=preview-mode&utm_medium=rec#%E9%BB%9E%E9%A1%8C) >不是說能力的 weak，而是「順序」的 weak，可詮釋為「身段的柔軟」) 的 memory order，從而使整體的存取效率得以提升 >在保持「data dependency ordering」的條件下進行重新排序 ::: :::warning 什麼是 side effect ? >參考 [side effect wiki](https://en.wikipedia.org/wiki/Side_effect_(computer_science)) >[你所不知道的 C 語言：函式呼叫篇](https://hackmd.io/@sysprog/c-function) >In computer science, an operation, function or expression is said to have a side effect if it has any observable effect other than its primary effect of reading the value of its arguments and returning a value to the invoker of the operation. 例如:修改或讀取非局部變數、提出錯誤或例外、執行I/O操作等，這些副作用會使的程式的行為依賴執行的順序 ::: :::warning 查看 [pthread.h](https://pubs.opengroup.org/onlinepubs/7908799/xsh/pthread.h.html) 中 PThread 的 api ，它的 attr 具體來說是什麼? 例如:int pthread_attr_destroy(pthread_attr_t *); >查詢[官方的文件](https://hpc-tutorials.llnl.gov/posix/pthreads_api/) 沒看到具體的定義 >後來查詢 [stackoverflow](https://stackoverflow.com/questions/4252005/what-is-the-attribute-of-a-pthread-mutex) 底下有人回覆原來是放在 [SEE ALSO](https://pubs.opengroup.org/onlinepubs/9699919799/basedefs/pthread.h.html#tag_13_35) 部分除了預設的 "NULL" 屬性，還有 * the type (deadlocking, deadlock-detecting, recursive, etc). * the robustness (what happens when you acquire a mutex and the original owner died while possessing it). * the process-shared attribute (for sharing a mutex across process boundaries). * the protocol (how a thread behaves in terms of priority when a higher-priority thread wants the mutex). * the priority ceiling (the priority at which the critical section will run, a way of preventing priority inversion). ::: :::warning Coroutine 是什麼 > 查詢 [wiki](https://en.wikipedia.org/wiki/Coroutine) + [圖片](https://www.csl.mtu.edu/cs4411.ck/www/NOTES/non-local-goto/coroutine.html) > There is no single precise definition of coroutine.In 1980 Christopher D. Marlin[4] summarized two widely-acknowledged fundamental characteristics of a coroutine: 1. the values of data local to a coroutine persist between successive calls 2. the execution of a coroutine is suspended as control leaves it, only to carry on where it left off when control re-enters the coroutine at some later stage. Coroutine 可看作 light-weight threads 。它允許在多個入口點之間進行切換，並在暫停時保存其狀態。和 thread 差別在 thread 可以通過排序演算法不同去搶佔 CPU ::: :::warning memory barrier 和 critical section 的差別 ? > 查詢 [wiki](https://en.wikipedia.org/wiki/Memory_barrier)和 [geeksforgeeks](https://www.geeksforgeeks.org/g-fact-70/) > memory barrier : In computing, a memory barrier, also known as a membar, memory fence or fence instruction, is a type of barrier instruction that causes a central processing unit (CPU) or compiler to enforce an ordering constraint on memory operations issued before and after the barrier instruction. critical section : a critical section refers to a segment of code that is executed by multiple concurrent threads or processes, and which accesses shared resources. 我的理解是這兩個目的不同，前者是確保記憶體操作的順序，後者是保護共享資源。 ::: pthread_barrier_wait 查詢[官方手冊](https://pubs.opengroup.org/onlinepubs/007904975/functions/pthread_barrier_wait.html) 他是說 synchronize at a barrier ，主要在應用層面，影響多個 thread 的同步執行 :::warning 在[這段程式碼](https://hackmd.io/@sysprog/concurrency/%2F%40sysprog%2Fconcurrency-atomics#wait-free-amp-lock-free)中 __attribure__ 是什麼? ``` typedef struct { struct { /** Ring producer status. */ uint32_t watermark; /**< Maximum items before EDQUOT. */ uint32_t size; /**< Size of ring. */ uint32_t mask; /**< Mask (size - 1) of ring. */ volatile uint32_t head, tail; /**< Producer head and tail. */ } prod __attribute__((__aligned__(CACHE_LINE_SIZE))); struct { /** Ring consumer status. */ uint32_t size; /**< Size of the ring. */ uint32_t mask; /**< Mask (size - 1) of ring. */ volatile uint32_t head, tail; /**< Consumer head and tail. */ } cons __attribute__((__aligned__(CACHE_LINE_SIZE))); void *ring[] __attribute__((__aligned__(CACHE_LINE_SIZE))); } ringbuf_t; ``` > 參考 [GCC手冊](https://gcc.gnu.org/onlinedocs/gcc/Function-Attributes.html) 還有 [你所不知道的 C 語言：技巧篇](https://hackmd.io/@sysprog/c-trick) > In GNU C and C++, you can use function attributes to specify certain function properties that may help the compiler optimize calls or check code more carefully for correctness. __attribute__ 為 GCC extension 的定義函數屬性，可以讓編譯器在編譯時做些特殊處理或是檢查動作。在這裡就是強制變數按照指定的方式對齊 ::: :::warning 還有這段[程式碼](https://hackmd.io/@sysprog/concurrency/%2F%40sysprog%2Fconcurrency-atomics#%E8%83%8C%E5%BE%8C%E4%BD%9C%E7%A5%9F%E7%9A%84-cache)中 `#pragma pack(1)` 是什麼意思? > 參考 C99 section 6.10.6 Pragma directive 和 [microsoft learn](https://learn.microsoft.com/zh-tw/cpp/preprocessor/pack?view=msvc-170) > Apreprocessing directive ofthe form `#pragma pp-tokensopt new-line` where the preprocessing token STDC does not immediately follow pragma in the directive (prior to anymacro replacement)152) causes the implementation to behave inan implementation-defined manner. The behavior might cause translation to fail or cause the translator or the resulting program to behave inanon-conforming manner. Any suchpragma that is not recognized by the implementation is ignored pragma 會指定特定的編譯器功能，在這裡 #pragma pack(1) 作用是命令編譯器將 struct 內的成員以 1-byte 方式對齊儲存 ::: :::warning 為什麼 lock-free 實作 doubly-linked list 要做 mark ? > 因為 CAS 只能保護一個整數，但是 linked list 是雙向的舉紅黑樹 atomic 操作只能保證一個操作是完成的。做檢查的時候可能已經發生旋轉 > 參考這篇論文:[A Pragmatic Implementation of Non-Blocking Linked Lists](https://www.cl.cam.ac.uk/research/srg/netos/papers/2001-caslists.pdf) >We say that a node is logically deleted after the rst stage and that it is physically deleted after the second. A marked eld may still be traversed but takes a numerically distinct value from its previous unmarked state; the structure of the list is retained while signalling concurrent insertions to avoid introducing new nodes immediately after those that are logically deleted. 在刪除一個節點時，先 mark 預計刪除的節點，使得該節點在邏輯上被標記為已刪除。邏輯刪除後，節點仍然存在該鏈結串列中，但在 search 或 insert 時，可以判斷並且跳過該節點。接著再物理上的刪除該節點，可以保證在多執行緒的前提下，不會因為 reordering 而做出預料之外的操作。 >以下是嘗試將其改成 doubly-linked list ，每個節點新增一個 prev 的指針，然後在新增或是刪除的時候也需要去檢查 *prev 內容是否保持一致 > [commit ebba39d](https://github.com/LULser0204/concurrent-ll/commit/ebba39dfa3b71e4ab39016fec51c2c5e107f03cd) ::: :::spoiler ./scripts/test_correctness.sh Use 8 as the number of cores. If not correct, change it in scripts/config Testing: out/test-lock Expected size: 1030 Actual size: 1030 Expected size: 36 Actual size: 36 Expected size: 20 Actual size: 20 Testing: out/test-lockfree Expected size: 1024 Actual size: 1024 Expected size: 31 Actual size: 31 Expected size: 15 Actual size: 15 ::: :::warning Linux 是如何實作 mutex 的? 缺點是什麼 >傳統 mutex 在切換 locked 以及 unlocked 是需要使用 system call 去確認是否有在等待 lock 的執行緒。但這樣頻繁的切換 CPU 模式會增加系統的開銷而 futex 是上述問題的解法之一，設計目標是將大部分操作放在使用者層級空間完成，只有在必要時才進入核心空間 ::: ## TODO: 研讀[實作輕量級的 Mutex Lock](/@sysprog/concurrency-mutex)和[建立 PThread 相容實作](/@sysprog/concurrency-thread-package)，重現實驗並紀錄問題 :::warning 我嘗試在 [exameple](https://github.com/sysprog21/concurrent-programs/blob/master/mutex/example/main.c) 的makefile 跳出 ``` Running test_pthread … FATAL: ThreadSanitizer: unexpected memory mapping 0x5c0c629fb000-0x5c0c629fc000 Running test_linux … FATAL: ThreadSanitizer: unexpected memory mapping 0x5ea429899000-0x5ea42989a000 make: *** [Makefile:27: check] Error 66 ``` ::: 這個問題是 ThreadSanitizer (TSan) 出現了意外的記憶體映射。 [參考這個網站](https://github.com/google/sanitizers/issues/806)，我一開始認為是我的 GCC 版本問題，但更新後還是碰到相同的問題。後來查到另外一篇 [stackoverflow](https://stackoverflow.com/questions/77850769/fatal-threadsanitizer-unexpected-memory-mapping-when-running-on-linux-kernels) 的文章，只要執行 `sudo sysctl vm.mmap_rnd_bits=28` 就可以正確執行了。因為 ThreadSanitizer 需要對程式的記憶體操作進行精確的追蹤和檢測。如果記憶體地址的隨機化程度太高，可能會導致 ThreadSanitizer 無法正確預測和管理記憶體映射，通過 `vm.mmap_rnd_bits=28` 減少了地址的隨機性。 TODO: 在[實作輕量級的 Mutex Lock](/@sysprog/concurrency-mutex)的程式碼基礎，引入 MutexShootout TODO: 將[實作輕量級的 Mutex Lock](/@sysprog/concurrency-mutex)的成果整合到[建立 PThread 相容實作](/@sysprog/concurrency-thread-package)