執行人: chun61205
專題解說錄影
第七章的實驗或許可改用 schedgraph 來進行?
Referring to the article written by linhoward0522
, set up the environment of schedgraph.
To set up the environment of opam
, the command should be
instead.
After generating a workload, use the following command to get the statistic result:
Take the experiment of generating differnt workloads for example, the outcoming grapu will be like
In the graph, lines in different colors means different processes. Process of purple line is the process with nice value , process of brown line is the process with nice value , and process of brown line is the process with nice value .
However, I found out that schedgraph
does not count vruntime
, so I can not calculate the ratio .
改進《Demystifying the Linux CPU Scheduler》,確保所有參照的程式碼皆適用於 Linux v5.15 (LTS),包含 ftrace 在內的實驗。內容使用英文書寫。
紀錄閱讀過程中的問題,連同自己的推論彙整在本頁面,適時與授課教師討論。
提及頁碼時,要附上電子書最後更新日期,範例:
P.16 @ March 21
johnnylord
你所不知道的 C 語言: linked list 和非連續記憶體
Why Linux kernel doubly-linked list is not just a simple linked list?!
include/linux/list.h
No need to consider edge cases
Here is the linked list diagram in linux kernel.
Observing the diagram above, we can see that the head node does not involve any data. In other words, it has nothing to do with the data in nodes.
Take __list_del
for example,
If a singly-linked list is used, user needs to consider the situation that if the head node is deleted. However, in the case above, there is no need to consider the edge case.
Polymorphism
Simply putting, polymorphism allows you to treat objects of different subclasses as if they were objects of the parent class, enabling you to store them in a common list and call their specific functions in a consistent manner.
Suppose we want to create a structure named employee
and several positions inheriting it. The implementation is like below:
struct supervisor
and struct business
inherit employee
and we put all inherited objects of a parent into a list.
Poison
Here is the function list_del
in include/linux/list.h.
In __list_del_entry
, the previous node and the next node of entry
will be connected to delete entry
. However if entry
can still be accessed, entry->next
and entry->prev
can still be accessed by entry
too, which is not the result that we expect. Therefore, in this case LIST_POISON1
and LIST_POISON2
are used to identify that the pointer is not accessible anymore.
According to the patch c9cf55285e87ac423c45d9efca750d3f50234d10,
Localize poison values into one header file for better documentation and easier/quicker debugging and so that the same values won't be used for multiple purposes.
poisons are designed to improve code traceability.
How does linux kernel ensure that the position LIST_POISON1
and LIST_POISON2
will not be accessed?
In my observation, linux kernel just check manually to ensure the address will not be access.
For example, in /lib/list_debug.c
However, there is still a question in my mind. If we can allocate an address like LIST_POISON1
or LIST_POISON2
, is that means that the newly allocated address is useless?
Why do linux kernel define two macros for list poison instead of just one?
I guess it is for traceability. Users can use two macro to trace specificly.
pid_chain
和 pid_list
? (P.19 @ May 30)TODO: 對照原始程式碼
include/linux/pid.h
Pid Namespace 详解- 泰晓科技
struct pid and struct upid - Linux - Kernel - LinuxQuestions.org
kernel/pid.c
kernel/fork.c
include/linux/sched.h
92476d7fc0326a409ab1d3864a04093a6be9aca7
e8cfbc245e24887e3c30235f71e9e9405e0cfc39
struct pid
and pid_t
pid_t
is an int
data type used to store the process id.pid
is an data structure storing process data.
attach_pid
in pid.c
pid
pid
is an data structure storing process data. A struct pid
structure represents a specific number of pid. A struct pid
may contain serveral types of pid, whereas these types may not be related directly and may contain completely different tasks.
pid_type
pid_type
is used to specify the type of pid such as PIDTYPE_PID
or PIDTYPE_TGID
.struct pid_namespace
retval
retval
is used to stored the error type.struct upid
struct hlist tasks
It is never possible for two tasks (inside one NS) to have the same PID. However, the ProcessGroup ID and the Session ID can be shared by multiple tasks.
PIDTYPE_MAX
enum pid_type
defines types of pid. In this case PIDTYPE_MAX
is , which means the number of pid types. In include/linux/pid.h
, pid_links
is defined like
We can see that there are PIDTYPE_MAX
pid hash tables.attach_pid
In kernel/fork.c
, copy_process
creates a new process as a copy of the old one. In copy_process
, attach_pid
will be called several times to add different types of a pid with specific number to the hash table list.kernel/fork.c
中, copy_process
會複製舊的 process 來創建一個新的 process 。在 copy_process
中, attach_pid
會被呼叫數次,用來將同一個 pid number 的不同 type 的 pid 插入到特定 type 的 hash table 。struct task_struct
, pid_links
pid_chain
and pid_list
are removed?
pid_list
pid_list
is removed is metioned in the patch 92476d7fc0326a409ab1d3864a04093a6be9aca7.pid_list
simplifies the code, reduces the need for 4 pid hash tables, and makes the code more capable.pid_list
contains a type of pid. When a pid is created, list node of each type will not be allocated immediately, instead, only when attaching, the node will be allocated.pid_chain
pid_chain
is removed is metioned in the patch e8cfbc245e24887e3c30235f71e9e9405e0cfc39.
pidhash is no longer required as all the information can be looked up from idr tree.
Where idr stands for "ID Radix" tree. idr is a data structure that is used for managing mappings from integer IDs to pointers.
struct task_struct
) ,但是在舊版的 kernel ,所有資訊都會儲存在這裡,如此設計的考量是什麼? (P.24 第一段第三行 @ March 21)TODO: 對照 git log / git blame include/linux/sched.h
Per-Entity Load Tracking (PELT)
PELT is a mechanism within the used for tracking the load or the utilization of CPU per-cpu runqueue.
Where is the cared record of the current time slot and is the decay factor, which controls the decaying process decribed before.
Why doesn't the Linux kernel differentiate between the ready state and running state?
The reason is stated in chapter 2.2.3.
Since each core has its own set of registers and cache, this means that it is simpler to restore a task to the core it previoulsy ran on. This is known as processor affinity: the task has an affinity for running on the same core it was swapped out from.
In summary, combining ready state and running state reduce the chance of swapping tasks out of the process. In consquence, kernel doesn't need to restore the task.
Applying Machine Learning Techniques to Improve Linux Process Scheduling
PELT ==> load 的測量
已解決Image Not Showing Possible ReasonsLearn More →
- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
已解決Image Not Showing Possible ReasonsLearn More →
- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
逐一檢視書中程式碼列表,確保所有參照的程式碼皆適用於 Linux v5.15+,若遇到程式碼的出入 (可能來自 Linux 核心符號的命名變更),紀錄在此並嘗試從 lxr 找出可能的變遷原因,並附上自己的推論。
重現書中關於 ftrace 和 tracepoint 相關的實驗,確保在 Linux v5.15+ 得以正確運作,若發現出入,嘗試變更程式碼和實驗步驟。
struct pid
已變更 (P.19 @ May 30)Linux v5.15.112 的實作:
移除 pid_chain
和 pid_list
,並使用 2 個 struct hlist_head
。
list_entry
已移動到 include/linux/list.h (P.18 @ March 21)已解決Image Not Showing Possible ReasonsLearn More →
- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
container_of
已移動到 include/linux/container_of.h (P.21 @ May 30)In Linux v5.16 ,container_of
macro has been split out from include/linux/kernel.h
.
The reason can been seen in the patch:
For time being include new header back to kernel.h to avoid twisted
indirected includes for existing users.
Note, there are a lot of headers and modules that include kernel.h
solely for one of these macros and this allows to unburden compiler for
the twisted inclusion paths and to make new code cleaner in the future.
container_of
is quite frequently used, so the linux kernel split it out include/linux/kernel.h
to unburden compiler for the twisted inclusion paths and to make new code cleaner.
研究如何設計實驗
offsetof
已變更 (P.22 @ May 30)已解決Image Not Showing Possible ReasonsLearn More →
- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Linux 核心專題: CPU 排程器研究
rostedt/trace-cmd: utilities for Linux ftrace - GitHub
In the experiment, we will see the variation of virtual runtime. In order the observe it, we need to isolate a cpu first so that the tasks can run on it.
Modify /etc/default/grub
:
Update the configuration:
Reboot:
Check CPU affinity:
trace-cmd
According to 《Demystifying the Linux CPU Scheduler》, prepare two configurations.
For same nice,
For different nice,
Run the experiments
實際執行之後發現無法執行 start.py
,因此這裡暫時使用 stress-ng
來執行實驗。
stress-ng
Use the following command to install stress-ng
.
Use the following command to check if stress-ng
has been successfully installed.
Generating workload
nice -n
: Specify nice valuetaskset -c
: Specify cpu core--cpu
: Specify task numbersTracing the tasks
-e sched
: Scheduling events onlyGetting the result
Preparing a program to plot
Observing the result graph
In this graph, we can find out that tasks with same nice value has the same ratio of . In addition, as we set the nice value to , the ratio will be .
Generating workload
Tracing the tasks
Getting the result
Observing the result graph
PID 10175
: nice -5PID 10181
: nice 0PID 10184
: nice 5In this experiment, we can find out that the task with lower nice value has more runtime.
Accroding to the chapter 7,
which fit the result.
重現第 7 章 scheduler events 實驗,利用 7.4 節提供的程式碼,追蹤排程器行為,應當搭配 Cgroups 驗證 Linux 核心對資源管理的行為。