linux2023-summer-quiz

# linux2023: 819george ## 測驗 α - 1 [Incomplete S-Tree](https://gist.github.com/jserv/4dfaf78cf516cc20f4bc55ce388c195d) 解釋上述程式碼運作原理 S-Tree 的平衡機制與 AVL tree 較為相像利用 hint 來評估是否需要做 rotate ，然而 st_node 的 hint 的計算方式為其左右節點 hint 最大值 +1 並且只有在被呼叫到 st_update 時才會被更新。 ## 測驗 β - 1 ```cpp #include <stdint.h> static inline uintptr_t align_up(uintptr_t sz, size_t alignment) { uintptr_t mask = alignment - 1; if ((alignment & mask) == 0) { /* power of two? */ return ((sz + mask) & ~mask); } return (((sz + mask) / alignment) * alignment); } ``` 說明上述程式碼的運作原理 - 一般情況下會用 `(數字+大小-1)/大小` (去掉小數) 再 `*大小` 做到向上對齊。 - 2 的冪用 bit operation 加速 :::danger 「加速」的依據何在？實驗並觀察機械碼指令行為。 :notes: jserv ::: ### 位元運算加速 :::info 利用組合語言觀察 ::: 用 `gcc -S -fverbose-asm align_up.c` 編譯 - `-fverbose-asm` C語言變數的名稱作為組合語言裡的註解 align_up.c ```c= #include <stdint.h> #include <stddef.h> int main(){ int sz, alignment; uintptr_t mask = alignment - 1; uintptr_t bit,ans; if ((alignment & mask) == 0) { /* power of two? */ bit = ((sz + mask) & ~mask); } ans = (((sz + mask) / alignment) * alignment); return 0; } ``` align_up.s ```s # align_up.c:12: bit = ((sz + mask) & ~mask); movq -24(%rbp), %rdx # sz, tmp94 movq -32(%rbp), %rax # mask, tmp95 addq %rax, %rdx # tmp95, _2 # align_up.c:12: bit = ((sz + mask) & ~mask); movq -32(%rbp), %rax # mask, tmp96 notq %rax # _3 # align_up.c:12: bit = ((sz + mask) & ~mask); andq %rdx, %rax # _2, tmp97 movq %rax, -16(%rbp) # tmp97, bit .L2: # align_up.c:14: ans = (((sz + mask) / alignment) * alignment); movq -24(%rbp), %rdx # sz, tmp98 movq -32(%rbp), %rax # mask, tmp99 addq %rdx, %rax # tmp98, _4 # align_up.c:14: ans = (((sz + mask) / alignment) * alignment); movl $0, %edx #, tmp101 divq -40(%rbp) # alignment movq %rax, %rdx # tmp100, _5 # align_up.c:14: ans = (((sz + mask) / alignment) * alignment); movq -40(%rbp), %rax # alignment, tmp103 imulq %rdx, %rax # _5, tmp102 movq %rax, -8(%rbp) # tmp102, ans ``` 組語後綴 * "movb"（byte 8 位元） * "movw"（word 16 位元） * "movl"（long 32 位元） * "movq"（quadword 四字 64 位元）可以看到用 bit operation 少了兩次的 mov ，有效減少指令次數，節省時間。 ## 測驗 β - 2 在 Linux 核心原始程式碼找出類似 align_up 的程式碼，並舉例說明其用法 > bootlin ```/tools/testing/selftests/kvm/include/test_util.h``` ```c /* Aligns x up to the next multiple of size. Size must be a power of 2. */ static inline uint64_t align_up(uint64_t x, uint64_t size) { uint64_t mask = size - 1; TEST_ASSERT(size != 0 && !(size & (size - 1)), "size not a power of 2: %lu", size); return ((x + mask) & ~mask); } ``` 將數值向上對齊到指定的倍數 :::danger 解說「如何應用」，不是機械性張貼程式碼，推理和驗證！ :notes: jserv ::: ### 概念 & mask 取補數可以把低位去掉，達到進位效果。 1. 假設要對齊的 size = 4。 1. mask = 3 = 0b0011。 1. ~mask = 0b1100。 1. size 以下位元都變成 0。舉例觀察表格 x MSB，直接把 x 和 mask 補數取交集會向下對齊。向上對齊要再 +mask ，把最高位向上提升一位。 | x | 0b x | x MSB | x+mask | 0b (x+mask) | | --- | ---- | ---- | ------ | ---------- | | 5 | 0101 | 0100 | 8 | 1000 | | 6 | 0110 | 0100 | 9 | 1001 | | 7 | 0111 | 0100 | 10 | 1010 | | 8 | 1000 | 1000 | 11 | 1011 | ### 案例 ::: info 查找 bootlin `/arch/hexagon/lib/memcpy.S` ::: Qualcomm hexagon 是一款 DSP 處理器，用在 audio, imaging embedded vision, video and computationally intensive applications。 - 其中的 memcpy.c 根據八位元對齊做優化 > For blocks less than 16 bytes a byte by byte copy is performed. For 8byte alignments, and length multiples, a dword copy is performed up to 96bytes 直接向下對齊 ```c void * memcpy(char * ptr_out, char * ptr_in, int len) { u8 offset; ... offset = ((int) ptr_in) & 7; ptr8_in = (s64 *) &ptr_in[-offset]; //read in the aligned pointers ``` 讓 ptr_in 根據長度 len 複製到 ptr_out ，長度 0 ~ 32 位元內都可以. > library function for memcpy where length bytes are copied from ptr_in to ptr_out. ptr_out is returned unchanged. Allows any combination of alignment on input and output pointers and length from 0 to 2^32-1 舉例 | 變數 | 數字 | 0b | |:----------------:|:----:|:---------:| | ptr_in | 0x13 | 0001 1011 | | offset | 3 | 0000 0011 | | &ptr_in[-offset] | 0x10 | 0001 1000 | ## 測驗 γ - 1 [qsort-mt (incomple)](https://gist.github.com/jserv/38bca4ebfea1c434e1dfc15337f80eeb) 解釋上述程式碼運作原理 ```c struct qsort *qs, *qs2 //暫存當前的 thread struct common c //c.pool 放所有的 thread ``` `qsort_mt` - 參數: 排序的陣列、元素數量、元素大小、比較函數、最大線程數和分割線程的最小元素數量 `qsort_algo()` - 個數小於7用插入排序，大於用 quick sort。 - Advanced Quick Sort (Hybrid Algorithm) `qsort_thread` - 等所有執行緒空閒，狀態改成工作，開始排序。 - 等到所有都空閒，讓此執行緒可以回到 again 重新排序。 ## reference [2023 年暑期 Linux 核心課程第 1 次測驗/作業檢討](https://hackmd.io/@sysprog/SkmKiSfh2) [課前測驗參考解答: Q1](https://hackmd.io/@sysprog/bitwise-reverse)

Syntax	Example	Reference
# Header	Header	基本排版
- Unordered List	Unordered List
1. Ordered List	Ordered List
- [ ] Todo List	Todo List
> Blockquote	Blockquote
Bold font	Bold font
Italics font	Italics font
~~Strikethrough~~	~~Strikethrough~~
19^th^	19^th
H~2~O	H₂O
++Inserted text++	Inserted text
==Marked text==	Marked text
[link text](https:// "title")	Link
![image alt](https:// "title")	Image
`Code`	`Code`	在筆記中貼入程式碼
```javascript var i = 0; ```	`var i = 0;`
:smile:		Emoji list
{%youtube youtube_id %}	Externals
$L^aT_eX$	L^aT_eX
:::info This is a alert area. :::	This is a alert area.