NTOU Operating System - Lab Scheduler

# NTOU Operating System - Lab Scheduler ###### tags: `CS-course`, `OS2022` ---- ## Abstract 此次作業會說明 User-mode thread 以及介紹 Context Switch 實際會以何種形式實作，並簡介過程中須要注意的細節。最後，同學須在 userspace 以 coroutine 實作出自己的 schedule 演算法。 :::success **補充資訊** 恐龍書只有提及 CFS 的部份資訊，可以參考以下連結來得到更多關於此 schedule 的演算法： * [Linux CFS and task group](https://mechpen.github.io/posts/2020-04-27-cfs-group/index.html) * [Linux schedulers – overview continued CFS, BFS, Deadline, MuQSS, etc. What’s new in process scheduling?](https://students.mimuw.edu.pl/ZSO/Wyklady/15_CPUschedulers2/ProcessScheduling2.pdf) * [Linux CFS Simulator](https://github.com/sysprog21/linux-cfs-sim) * [Bachelor’s Thesis In Computing Science Linux CPU Schedulers: CFS and MuQSS Comparison](https://www.diva-portal.org/smash/get/diva2:1566002/FULLTEXT01.pdf) ::: ## Context Switch 以 [Wikipedia](https://en.wikipedia.org/wiki/Context_switch) 的描述，說明 Context Switch 大致是指儲存當前行程或是執行緒 (task) 的相關狀態，並載入之前所儲存的狀態，並接續執行： > **In computing, a context switch is the process of storing the state of a process or thread, so that it can be restored and resume execution at a later point, and then restoring a different, previously saved, state.** This allows multiple processes to share a single central processing unit (CPU), and is an essential feature of a multitasking operating system. > > The precise meaning of the phrase "context switch" varies. **In a multitasking context, it refers to the process of storing the system state for one task, so that task can be paused and another task resumed.** A context switch can also occur as the result of an interrupt, such as when a task needs to access disk storage, freeing up CPU time for other tasks. Some operating systems also require a context switch to move between user mode and kernel mode tasks. The process of context switching can have a negative impact on system performance. 在現代作業系統中因各項技術的完善， Context Switch 在現今不只單單處理 task 中 stack 、heap 等資訊，還需要考量如何處理 register、TLB flushing 、swap、file descriptor、I/O permission 等。並且還須考量相關資料是否會被其他 CPU 存取，是否會造成 [concurrency](https://hackmd.io/@sysprog/Skh_AaVix#Toward-Concurrency) bugs 、會不會被 [interrupt](https://lwn.net/Articles/146861/) 等，一些關於資源管理以及 task preempt 等議題。例如，在 Linux Kernel Pre 1.0 ，以 80386/i386 為例，只須處理 FPU 的狀態，關於 register 等 task state 可藉由硬體特性自動處理。 ![](https://www.maizure.org/projects/evolution_x86_context_switch_linux/early_linux_context_switch_flow.png) > [Evolution of the x86 context switch in Linux](https://www.maizure.org/projects/evolution_x86_context_switch_linux/) 但在現代，必須考量到如上所列出的諸多議題外，也要考量不同硬體架構的特性。 ![](https://www.maizure.org/projects/evolution_x86_context_switch_linux/linux41467_context_switch_flow.png) > [Evolution of the x86 context switch in Linux](https://www.maizure.org/projects/evolution_x86_context_switch_linux/) 而且，對於 context switch 的執行進入點，interrupt，也有諸多議題需要考量。詳細可請參考 [Linux 核心設計: 中斷處理和現代架構考量](https://hackmd.io/@sysprog/linux-interrupt)。還有，當前 task 處於 user mode 或是 kernel mode 等，也會有不同的資源管理方式。例如，TLB flushing 的成本在多處理器下需要以 [IPI](https://en.wikipedia.org/wiki/Inter-processor_interrupt) 等方式進形 [coherence](https://www.usenix.org/conference/atc17/technical-sessions/presentation/amit) ，此會造成在執行成本過高，但如 Kernel space 的資訊，就可利用 [lazy TLB](https://forum.osdev.org/viewtopic.php?f=15&t=23569) 等技術，減少切換時的成本。但這些都是以作業系統的 kernel、硬體等方面來探討。如果是以 userspace 來模擬 context switch ，雖然不需要處理 register state 、stack memory 等以外硬體方面(interrupt、I/O 等)的事情，但需要考量不同面向的議題，如變數儲存形式以及執行效率等。以 C 語言為例，物件的儲存期限分成 static、automatic、thread、allocated storage duration 。並且，也須考量變數的 linkage 的型態為何。詳細請參考 [Storage-class specifiers](https://en.cppreference.com/w/c/language/storage_duration) 或翻閱規格書。若在 user space 進行 context switch ，需要考量 stack memory 的處理。以下為 cppreference 的描述： > **Stack memory, storage duration** > automatic storage duration. The storage is allocated when the block in which the object was declared is entered and deallocated when it is exited by any means (goto, return, reaching the end). One exception is the VLAs; their storage is allocated when the declaration is executed, not on block entry, and deallocated when the declaration goes out of scope, not when the block is exited (since C99). If the block is entered recursively, a new allocation is performed for every recursion level. All function parameters and non-static block-scope objects have this storage duration, as well as compound literals used at block scope. ## Coroutine Coroutine 是指在 userspace 中，建立 subroutine (user-mode task) 的形式，於 user-mode 自行切換。此技術於 1958 年由 Melvin Conway 發明。在此引述 [Wikipidia](https://en.wikipedia.org/wiki/Coroutine) 的描述： > Coroutines are computer program components that generalize subroutines for non-preemptive multitasking, by allowing execution to be suspended and resumed. Coroutines are well-suited for implementing familiar program components such as cooperative tasks, exceptions, event loops, iterators, infinite lists and pipes. 除了上述所列的實用情境外，也有被運用於 server accept 、Virtual Machine 等。而關於 preemptive multitasking ，請不要侷限於定義上，在其他實作中，Coroutine 也可以藉由 [signal](https://man7.org/linux/man-pages/man7/signal.7.html) 等相關操作實現 [preemptive](https://www.ptt.cc/bbs/C_and_CPP/M.1584751926.A.0D6.html) 。在分類上， Coroutine 分成 stackful 和 stackless 兩種。一般來說 stackful 可以自行切換 subroutine ，而 stackless 需要提供 `suspend` 和 `resume` 等功能自行手動切換。然而， stackful 需要有分配記憶體空間儲存 stack 等資訊，在 context switch 上會比 stackless 沒有效率（需要考量複製現存資訊、cache locality 等議題）。在 [C++20](https://en.cppreference.com/w/cpp/language/coroutines) ，C++ 提供了 coroutine 的功能，而在 2014 年時，有一份關於 coroutine 新增至 C++ Standard Library 的報告，可以參考：[Stackful Coroutines and Stackless Resumable Functions](https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2014/n4232.pdf)； 2018 年的報告： [Working Draft, C++ Extensions for Coroutines](https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2018/n4775.pdf)；也可看此部落格所寫的文章：[Stackless vs. Stackful Coroutines](https://blog.varunramesh.net/posts/stackless-vs-stackful-coroutines/)。另外，其他實作例子也有，Virtual Machine - QEMU 的 [coroutine](https://www.youtube.com/watch?v=2gNszk7OjIY) [[1]](https://www.qemu.org/docs/master/devel/block-coroutine-wrapper.html)[[2]](http://blog.vmsplice.net/2014/01/coroutines-in-qemu-basics.html) 以及 Meta 的 folly 中所使用的 [coro](https://github.com/facebook/folly/tree/c2c5b28d3273a24bd4fe5393b3fb728cfe541284/folly/experimental/coro) 。甚至 LLVM 或者說 Clang 編譯器也有提供 [builtin coroutine support](https://clang.llvm.org/docs/LanguageExtensions.html#c-coroutines-support-builtins) 。 :::info **額外資訊** 這邊有幾個關於 C++ 的 coroutine 介紹。 * [CppCon 2016: Gor Nishanov “C++ Coroutines: Under the covers"](https://www.youtube.com/watch?v=8C8NnE1Dg4A) * [CppCon 2014: Gor Nishanov "await 2.0: Stackless Resumable Functions"](https://www.youtube.com/watch?v=KUhSjfSbINE) * [C++20’s Coroutines for Beginners - Andreas Fertig - CppCon 2022](https://www.youtube.com/watch?v=8sEe-4tig_A) * [Deciphering Coroutines - A Visual Approach - Andreas Weis - CppCon 2022](https://www.youtube.com/watch?v=J7fYddslH0Q) ::: 而以其他非 C/C++ 的語言來說，由 Google 開發的 [Golang](https://en.wikipedia.org/wiki/Go_(programming_language)) 也有 [Goroutine](https://go.dev/tour/concurrency/1) 。除此之外，Google 也有對 [Linux Kernel](https://lwn.net/Articles/863386/) 提出 [User-managed concurrency groups](https://lwn.net/Articles/879398/) ，相當於 user-mode thread 的 patch 。只是，此實作方式是由作業系統幫助來進行 context switch 。回到正題， Coroutine 當中的 Context Switch 實作方式，可以藉由 [setjmp](https://en.cppreference.com/w/c/program/setjmp) / [longjmp](https://en.cppreference.com/w/c/program/longjmp) 、[組合語言](https://github.com/hnes/libaco/blob/master/acosw.S) 或是 Unix 的 [ucontext](https://pubs.opengroup.org/onlinepubs/7908799/xsh/ucontext.h.html) 實作。各有利弊需要考量運用情境，例如最間單的 setjmp / longjmp ，可以在 cppreference 的相關說明可以看到，對於變數的型態有相關的限制： > Upon return to the scope of setjmp, all accessible objects, floating-point status flags, and other components of the abstract machine have the same values as they had when longjmp was executed, **except for the non-[volatile](https://en.cppreference.com/w/c/language/volatile) local variables** in the function containing the invocation of setjmp, whose values are indeterminate if they have been changed since the setjmp invocation. 而 [volatile](https://lwn.net/Articles/233479/) 會造成編譯器無法對其變數進行最佳化，而導致執行效率無法提昇。並且，如果是要在 signal handler 以 setjmp / longjmp 實作，則須考量 glibc 等函式庫所作的 security 防範。可見 [CS170: Project 2 - User Mode Thread Library (20% of project score)](https://sites.cs.ucsb.edu/~chris/teaching/cs170/projects/proj2.html)。 ## Questions 在得知實際作業系統是如何實行 Context Switch 並看過 userspace 是如何實作 Coroutine (User-mode thread) 後，請在下列問題當中**擇一**回答。 1. 設計自己的 schedule 演算法，但**不能與 [Coroutine-in-C](https://github.com/linD026/Coroutine-in-C) 中的 `CR_DEFAULT` 和 `CR_FIFO` 兩個演算法相同**。於 [Coroutine-in-C](https://github.com/linD026/Coroutine-in-C) 當中實作出來，自行驗證。請把演算法的設計、實作的函式 (你所添增的函式) 與驗證程式碼、過程與結果，依序記錄下來，以 pdf 形式繳交。如果可以，請把改動過的專案連結 (GitHub 、GitLab 等含有 Git messages 紀錄的形式，但請不要以雲端硬碟提供)，也放置在 pdf 當中。 2. 利用上述的資訊，以 [Git](https://git-scm.com/book/en/v2) 建立專案，自行實作 **Stackful Coroutine** ，並說明運用到的原理、 schedule 的設計、實作方式以及驗證程式、過程與結果，依序記錄下來，以 pdf 形式繳交。專案當中需要有版本管理的紀錄，亦即，請不要只有一次 commit 或是 commit 內容過於空泛，如 `Update the code` 、 `RTFSC` 等說明。若不懂如何撰寫 commit ，請參閱此 [How to Write a Git Commit Message](https://cbea.ms/git-commit/)。並且請在 pdf 當中提供能夠下載此專案的方式，如 GitHub 、GitLab、自建 Server 並給予專案連結等。 3. 請研究 [Coroutine-in-C](https://github.com/linD026/Coroutine-in-C) 的實作方式並加以解釋實作設計，以 pdf 形式繳交。**但以此形式繳交所得的分數會相對較低**。 - 解釋內容可以抽象的描述資料管理、到每個 task 是如何被執行以管理、 schedule 又是如何註冊。並不要求逐行註解程式碼，畫圖加點描述也可。 - 如果要以逐行註解程式碼，請不要過於空泛。希望不會出現如下圖所示。 ![](https://i.imgur.com/of8QPZ4.jpg) ![](https://i.imgur.com/QW6gjtS.jpg) 各組交一份，檔名請以 `{幾班}-{第幾組}-os2022-sched-{問題編號}.pdf` 命名，於檔案中列出所有組員以及心得，繳交至 TronClass 。所有 pdf 當中的程式碼以截圖形式呈現。檔案模板順序以下為例： ``` 檔案名稱：<{幾班}-{第幾組}-os2022-sched-{問題編號}.pdf> -------------------------------------------------- 檔案內容標題順序： 1. 班級，組別，組員。 2. 問題編號，問題回答。 3. 心得。 ``` :::success **附註** 推薦在 [類 Unix](https://en.wikipedia.org/wiki/Unix-like) 環境中進行此作業，如 Linux ([WSL](https://learn.microsoft.com/en-us/windows/wsl/install), Ubuntu, Archlinux) 、FreeBSD 或是 MacOS 等。關於如何編譯與執行 [Coroutine-in-C](https://github.com/linD026/Coroutine-in-C)，請參閱 [GCC and Make Compiling, Linking and Building C/C++ Applications](https://www3.ntu.edu.sg/home/ehchua/programming/cpp/gcc_make.html)。 --- **補充說明** 此題要求設計排程演算法並加入到對應專案當中。具體上，是把演算法實作後，在 src/sched.c:sched_init() 函式中新增對應排程器。其中，相關對應接口可以從現有兩個演算法得知。以 CR_DEFAULT 為例，其定義[在此](https://github.com/linD026/Coroutine-in-C/blob/2ef4c990218972913d5bcedbd6fffa4d5ca200cb/src/coroutine.h#L21-L26)。可以注意到在個排程器演算法下方有 CR_SCHED_MASK 這個 macro ，此為設定排程器時的 flag 檢測，可見的 [src/coroutine.c:coroutine_create()](https://github.com/linD026/Coroutine-in-C/blob/2ef4c990218972913d5bcedbd6fffa4d5ca200cb/src/coroutine.c#L18-L19) 函式。要注意 flag 的值是以 2 的倍數漸增，其對應到 bit-wise 的操作。排程器的實作，需要有得到下一個執行的 task 以及把還沒完成的 task 放回到排程器的資料結構，以 CR_DEFAULT 為例，為紅黑樹。而當執行 coroutine 的函式是為於 [src/coroutine.c:coroutine_start()](https://github.com/linD026/Coroutine-in-C/blob/main/src/coroutine.c#L57) 。而如果想不到還有哪些簡易的演算法可以實作，在此提供一些方向。例如， Shortest Job First (SJF) 、 random 、 LIFO 和 FILO 等。 --- 若有任何疑問，請先於 TronClass 助教討論區查看是否有相似討論，若無則在此討論區留言發問。恕不接受私訊。 :::

Read more

Linux 核心 KCSAN

Linux 核心 Read-Copy Update 筆記整理

Linux 核心採納 Rust 的狀況

Making Dynamic Page Coalescing Effective on Virtualized Clouds. EuroSys '23