linux2021: RoyWFHuang

# linux2021: RoyWFHuang ###### tags: `linux_class` > [2021 年暑期「Linux 核心」課程](https://hackmd.io/@sysprog/linux2021-summer/) > [2021 年暑期「Linux 核心」課程先決測驗題](https://hackmd.io/@sysprog/linux2021-summer-quiz) ## 題目 α ```clike= #include <stdint.h> #define __DECLARE_ROTATE(bits, type) \ static inline type rotl##bits(const type v, int c) \ { \ const int mask = (bits) - (1); \ c &= mask; \ \ return (v << c) | (LLL); \ } \ \ static inline type rotr##bits(const type v, int c) \ { \ const int mask = (bits) - (1); \ c &= mask; \ \ return (v >> c) | (RRR); \ } #define DECLARE_ROTATE(bits) __DECLARE_ROTATE(bits, uint##bits##_t) ``` ### 其中一種可能答案: LLL = v >> ((-c) & mask) RRR = v << ((-c) & mask) ### 討論: 此題看到後, 原本第一個想法是 LLL = v >> (bits - c) LLL = v << (bits - c) 但後來查閱 linux kerenl 內部 [/include/linux/bitops.h](https://elixir.bootlin.com/linux/latest/source/include/linux/bitops.h)似乎可以直接用 bitwise 直接解決 bits - c 其實和 c 互為補數關係也就是在 n (0 ~ n-1) 進位循環系統中```(n - c ) + c) = n === 0``` 所以意義上 bits - c = -c 但 -c 必須要在 n 進位系統中, 所以用 mask將之限制在 n 中所以就得出了 (-c & mask) 的結論 ### 延伸問題: #### 1.舉出 Linux 核心原始程式碼裡頭 bit rotation 的案例並說明此問題我直接聯想到密碼學裡面 value 和 key 運算後, 進行旋轉然後在和 key運算的方式, 透過[bootlin](https://elixir.bootlin.com/linux/latest/source) 反查 [ror](https://elixir.bootlin.com/linux/latest/C/ident/ror64) 的使用位置得到 sha512演算法中會使用到. 也才讓我回想起, 以前學 sha演算法的時候, 他的 rotate 計算. #### 2.x86_64 指令集具備 rotr 和 rotl 指令，上述 C 程式碼經過編譯器最佳化 (例如使用 gcc) 後，能否運用到這二個指令呢？ ```clike= // main program int main() { uint8_t rtol = rand()%256; rtol = rotl8(rtol, 4); printf("%x\n", rtol); return 0; } ``` 在 [compiler explorer](https://godbolt.org/) 中, 把此程式放入, 並使用 -Os的最佳化, 出現下面的程式碼 ```asm .LC0: .string "%x\n" main: push rax call rand mov ecx, 256 mov edi, OFFSET FLAT:.LC0 cdq idiv ecx xor eax, eax rol dl, 4 movzx esi, dl call printf xor eax, eax pop rdx ret ``` 改用 -O3 的最佳化, 出現下面的程式碼 ```asm .LC0: .string "%x\n" main: sub rsp, 8 call rand mov edi, OFFSET FLAT:.LC0 rol al, 4 movzx esi, al xor eax, eax call printf xor eax, eax add rsp, 8 ret ``` 似乎都會直接使用到 rol 的 x86指令集 ## 題目β ```clike= #include <stdint.h> static inline uintptr_t align_up(uintptr_t sz, size_t alignment) { uintptr_t mask = alignment - 1; if ((alignment & mask) == 0) { /* power of two? */ return MMM; } return (((sz + mask) / alignment) * alignment); } ``` ### 答案 MMM = (((sz) + (mask)) & ~(mask)) ### 討論 MMM = (sz & (~mask)) + ((((sz ^ mask) & mask) + 1) & (mask + 1) ^ (mask + 1)) 看完 [/include/uapi/linux/const.h](https://elixir.bootlin.com/linux/latest/source/include/uapi/linux/const.h#L32)才發現, 自己把事情想的太複雜了. ```clike= (((x) + (mask)) & ~(mask)) ``` 很簡單的作法, 只有剛好對齊的時候, 是不用多一個page去存, 所以只要把現在的 address + mask, 只要不是剛好對齊, 都會進位然後再把尾數用 ~(mask)去除, 就可以得到答案了. ### 延伸問題 #### 1.說明上述程式碼的運作原理同上述, 只有剛好對齊的時候, 是不用多一個page去存, 所以只要把現在的 address + mask, 只要不是剛好對齊, 都會進位然後再把尾數用 ~(mask)去除, 就可以得到答案了. #### 2.在 Linux 核心原始程式碼找出類似 align_up 的程式碼，並舉例說明其用法 [/include/uapi/linux/const.h](https://elixir.bootlin.com/linux/latest/source/include/uapi/linux/const.h#L32) 直覺想到的是, 記憶體配置的時候, 在計算alignment配置量, 比方說, ==malloc(9)==, 在以32bit 處理器架構下, 一次fetch的量為32bit, 所以 malloc(9) 就要多配置一個, 甚至 strcut 各個element的管理, 都會用到對齊的方式, 方便加速存取資料. 參考 kernel 中, 有相當多地方使用到[align](https://elixir.bootlin.com/linux/latest/C/ident/ALIGN), 包括檔案系統, 記憶體管理, 各家晶片廠提供的 [driver/net] 1. [fs/ext4](https://elixir.bootlin.com/linux/latest/source/fs/ext4/file.c#L343) 中有使用到, 當資料寫回 disk中, 依照 disk blk進行 alignment 寫入, 目的是加速資料的寫入, 若沒有對齊, 則變成要分兩次寫入 disk, 增加io的時間, 也造成讀取上, 需要分別讀取兩次才能組成完整資料的時間. 2. [/mm/memblock.c](https://elixir.bootlin.com/linux/latest/source/mm/memblock.c#L1943)初始化記憶體. ## 題目γ ```clike= #include <stdio.h> #include <unistd.h> int main(void) { for (int i = 0; i < NNN; i++) { fork(); printf("-"); } fflush(stdout); return 0; } ``` ### 答案 NNN = 12 ### 討論這題真的是純粹把程式丟下去跑, 第一次跑10 出來的次數剛好是 10240, 所以大膽的猜測應該是 2^NNN *10 + M = 49152, 而這個最接近的數字就是 12. 實際原理如下 [fork 連 io_buffer中的資料都會 copy](https://coolshell.cn/articles/7965.html) fork 會把 buff 中的 data也給 copy出來, 所以在最後 fflush的時候, 就會把 child process的 io buffer的資料給印出, ### 延伸問題 #### 解釋上述程式碼輸出 - 字元數量的原理 ## 題目 δ ### 答案 AAA = queue->last->next = node; BBB = node->value; CCC = queue->first = new_header; ### 討論 (TODO...) ### 延伸問題 (TODO...) #### 1.解釋上述程式碼運作原理並指出實作缺失 #### 2.以 lock-free 程式設計改寫上述程式碼 ## 題目 ϵ ### 答案 (TODO...) XXX = x + 1 YYY = *ip = mp->hs[i]; ### 討論 (TODO...) 這題其實我沒作答, 直接放棄 [ref](https://github.com/silentbicycle/mpool/blob/master/mpool.c) ### 延伸問題 (TODO...) #### 1.解釋上述程式碼運作原理 #### 2.提出效能和功能的改進策略，並予以實作 ## 題目 δ ### 答案 III = .fd = fds[1], .events = POLLOUT JJJ = .fd = fds[0], .events = POLLIN ### 討論 (TODO...) ### 延伸問題 (TODO...) #### 1.解釋上述程式碼運作原理 #### 2.以 epoll 系統呼叫改寫程式碼，並設計實驗來驗證 proxy 程式碼的效率 ## 作業一 ch.5 ch6 ftrace