contributed by < Julian-Chu >
linux2021
單個陣列元素可以紀錄的最大值爲 0xFFFFFFFF(unsigned 32-bit int), overflow 會進位到下一個陣列元素, 注意 index 0 為最小位數
以陣列長度爲三的 big number 進位範例:
值得注意的是 bn_from_int
的部分, 直接傳入 uint64 初始化, 因為 uint64 的位元數爲 uint32 的兩倍, 所以單個 bn 的陣列元素不能完整紀錄資料, 需要利用位元計算把資料分別記錄到兩個陣列元素中, 流程如下
針對 little or big endian 更改為下面程式碼加入判斷
原理: 由較長資料型別轉型為短資料型別會保留記憶體位置較低的位元
示意如下:
關於 big-endian 和 little-endian 的判斷改為條件編譯即可,即善用 __BYTE_ORDER
巨集。
jserv
應用 __BYTE_ORDER
的版本
bn_to_str
由於每個陣列元素可以紀錄的最大值爲 4,294,967,295 (0xFFFFFFFF)以十六進位或是二進位爲單位做進位計算, 並不是十進位, 無法靠單純循序走訪陣列元素印出十進位表示。
限定每個陣列元素最大值爲 1,000,000,000 可以較為簡單的印出十進位, 但對空間的利用效率不佳
bn_dec
在需要退位的情況, 需要退位的陣列元素爲零, 減一之後會出現類似 uint32 溢位的情況, 反而比減一之前的值還要高(res>tmp), 所以走訪陣列元素做減法運算, 直到該元素減法後的值比原先的小, 代表退位結束。
bn_add
注意進位的單位是 MAX_VAL, 同時利用 & 運算計算 uint32 範圍內要保留的值
bn 加減法還是會遇到溢位的問題, 新增回傳錯誤碼作爲檢查或是有更好的方式?
Just do it!
jserv
bn_mul
利用了兩個 uint32 相乘之值, 不會超過 uint64 的情況, 將 uint32 轉為 uint64 做乘法運算 (intermeidate), 在根據相乘時的 index 對結果做對應的位移處理
UTYPE_TMP intermediate = a->array[i]*(UTYPE_TMP) b->array[j];
uint32 轉 uint64 的乘法相關規則如下
C11 6.3.1.1
- The rank oflong long intshall be greater than the rank oflong int, whichshall be greater than the rank ofint, which shall be greater than the rank ofshortint, which shall be greater than the rank ofsigned char.
- The rank of any unsigned integer type shall equal the rank of the correspondingsigned integer type, if any.
C11 6.3.1.8 Usual arithmetic conversions
Otherwise, if both operands have signed integer types or both have unsignedinteger types, the operand with the type of lesser integer conversion rank isconverted to the type of the operand with greater rank
list and hlist
linux 對雙向鏈表的實作有兩種類型
先閱讀以上註解, 由於在 hashtable 的 head 只需要 next, 如果採用一般的 list_head, 同時會需要紀錄 next 與 prev, 導致佔用的記憶體空間變成兩倍, 所以設計了 hlist_head 作為 hashtable 使用的 bucket 節省空間, 這就出現了一個問題, 如果使用 list_head 作為節點的話, first 節點的prev 需要額外判斷跟處理, 這邊 Linux 核心設計了一個 hlist_node, 這個設計中 next 是一般指針, 而特別的是 pprev 是指針的指針, 用以指向前一個節點的 next, 這樣就可以把 hlist_head 的 first 當作 hlist_node 的 next 類比操作
來看一下 hlist delete
的例子
以刪除 node1 為例,
hlist_add_head
h->first = n;
n->pprev = &h->first;
q1: list.h 跟 hashtable.h 看起來都不是 thread-safe, Linux 核心怎麼解決同步問題?
map_init
: 較為單純, 依據給定的 bits 分配對應的記憶體空間, 別忘記對 first 初始化 NULL, 避免對誤判空 bucket 跟錯誤的記憶體位置操作
map_deinit
: 特別注意需要針對每個非空的 bucket 走訪所有節點釋放記憶體空間
由於 hash table 不能有重複, 所以快速尋找是否已經有同一個key 存在 table 裡面是很非常重要的
針對 hash table 的 hash function (區別於 SHA 或是 MD5 , 主要是將數值映射到對應的 bucket) , 可以從以下幾個觀點考量:
從 The Art of Computer Programming vol3. 6.4, Dr.Knuth 提到兩種 hash function (bit shift 也是常見的, 不過沒有在這邊提到)
書中描述 multiplicative hashing, 可以等價於 mod, 且在多數的 CPU 乘法都比除法快( 特例是 mod (power of 2) 可以轉爲位元運算), 此外對於非隨機性的值像是 "part1 part2 part3 …" 可以減少碰撞機率
Our multiplicative hash functions were derived from [Knuth], p. 513ff. The theory posits that machine multiplication by a large number that is likely to cause overflow is the same as finding the modulus by a different number.
利用 gold ratio 做 multiplicative hashing 的稱為 Fibonacci hashing(也是書中提出來的例子), 被上述的 paper 驗證過
而根據 ref 裡面一篇討論 Fibonacci hashing 的 blog, 作者認為 power of two binary and 雖然可以有較快的速度, 但是由於會捨棄掉 low bits , 在沒有留意的狀況可能會造成較高 hash collision
新節點插入 hlist_head.first, 需要將原本的 first 節點接到新節點的後面, 同時將新節點的 pprev 接回 head.first
透過 find_key 取得結果
相關術語: cooperative multi-tasking, nonpreemptive multi-tasking, user-level threads, green threads, coroutine
歧義: fiber 在不同系統上可能可以有不同的定義, 像是 user thread, 這邊採用英文 Wikipedia 上的定義, 非搶占式/協作式的 thread, 由於是非搶佔式的所以目前的 thread 沒有停止工作, 其他 thread 也不會競爭工作跟資源, 在這種情況下 synchronization 的問題就會大大減少
clone
說明)與 quiz4: thread pool 的 apply 相似, 傳入 function pointer 與對應的參數以建立 fiber ( 對比 worker in thread pool), 但 stack 的分配跟指定須由我們處理(請見 clone
)
clone
:
These system calls create a new ("child") process, in a manner similar to fork(2).
By contrast with fork(2), these system calls provide more precise control over what pieces of execution context are shared between the calling process and the child process. For example, using these system calls, the caller can control whether or not the two processes share the virtual address space, the table of file descriptors, and the table of signal handlers. These system calls also allow the new child process to be placed in separate namespaces(7).
When the child process is created with the clone() wrapper function, it commences execution by calling the function pointed to by the argument fn. (This differs from fork(2), where execution continues in the child from the point of the fork(2) call.) The arg argument is passed as the argument of the function fn.
When the fn(arg) function returns, the child process terminates. The integer returned by fn is the exit status for the child process. The child process may also terminate explicitly by calling exit(2) or after receiving a fatal signal.
The stack argument specifies the location of the stack used by the child process. **Since the child and calling process may share memory, it is not possible for the child process to execute in the same stack as the calling process. The calling process must therefore set up memory space for the child stack and pass a pointer to this space to clone(). Stacks grow downward on all processors that run Linux (except the HP PA processors), so stack usually points to the topmost address of the memory space set up for the child stack. Note that clone() does not provide a means whereby the caller can inform the kernel of the size of the stack area.
另外是 flag mask 的部分, 用來設定 parent/child processes 間的共享資源( e.g. signal, memory space, file etc.)
SIGCHLD
The child termination signal
When the child process terminates, a signal may be sent to the parent. The termination signal is specified in the low byte of flags (clone()) or incl_args.exit_signal
(clone3()). If this signal is specified as anything other than SIGCHLD, then the parent process must specify the__WALL
or__WCLONE
options when waiting for the child with wait(2). If no signal (i.e., zero) is specified, then the parent process is not signaled when the child terminates.
CLONE_SIGHAND
CLONE_SIGHAND
(since Linux 2.0)
IfCLONE_SIGHAND
is set, the calling process and the child process share the same table of signal handlers. If the calling process or child process calls sigaction(2) to change the behavior associated with a signal, the behavior is changed in the other process as well. However, the calling process and child processes still have distinct signal masks and sets of pending signals. So, one of them may block or unblock signals using sigprocmask(2) without affecting the other process.If
CLONE_SIGHAND
is not set, the child process inherits a copy of the signal handlers of the calling process at the time of the clone call. Calls to sigaction(2) performed later by one of the processes have no effect on the other process.Since Linux 2.6.0, the flags mask must also include
CLONE_VM
ifCLONE_SIGHAND
is specified
CLONE_VM
CLONE_VM
(since Linux 2.0)
IfCLONE_VM
is set, the calling process and the child process run in the same memory space. In particular, memory writes performed by the calling process or by the child process are also visible in the other process. Moreover, any memory mapping or unmapping performed with mmap(2) or munmap(2) by the child or calling process also affects the other process.If
CLONE_VM
is not set, the child process runs in a separate copy of the memory space of the calling process at the time of the clone call. Memory writes or file mappings/unmappings performed by one of the processes do not affect the other, as with fork(2).If the
CLONE_VM
flag is specified and theCLONE_VFORK
flag is not specified, then any alternate signal stack that was established by sigaltstack(2) is cleared in the child process.
由於 fiber 為非搶佔式的 thread, 必須由目前的 thread/process 放棄控制權, 讓其他人可以拿到資源, 需要yield 讓出目前的執行, 從序列中拿出下一個工作, 並把目前的工作放到序列的最後, 等待下次的執行。所以 fiber 切換時機有兩種情況:
sched_yield
sched_yield() causes the calling thread to relinquish the CPU.
The thread is moved to the end of the queue for its static
priority and a new thread gets to run.In the Linux implementation, sched_yield() always succeeds.
If the calling thread is the only thread in the highest priority
list at that time, it will continue to run after a call to
sched_yield().POSIX systems on which sched_yield() is available define
_POSIX_PRIORITY_SCHEDULING in <unistd.h>.Strategic calls to sched_yield() can improve performance by giving other threads or processes a chance to run when (heavily) contended resources (e.g., mutexes) have been released by the caller. Avoid calling sched_yield() unnecessarily or
inappropriately (e.g., when resources needed by other schedulable
threads are still held by the caller), since doing so will result
in unnecessary context switches, which will degrade system
performance.
pid 在這段程式碼有兩種作用
wait
RETURN VALUE top
wait(): on success, returns the process ID of the terminated
child; on failure, -1 is returned.
stdio.h(含printf) 都是 thread-safe, 猜測是 stdout buffer 的問題, 嘗試用 fflush(stdout) 仍出現 interleaveing。
解法: 移除 clone 內的 flag CLONE_VM, 使兩個 fiber 無法看到對方的記憶體空間, 還在思考原因爲何…
q1: 雖然可以靠區分記憶體空間解決, 但是依然沒想到為什麼共用記憶體空間會產生這樣的問題?
ans: 思考上的誤區, 參考講座 Linux 核心設計: 不僅是個執行單元的 Process , 移除範例程式碼中 clone 的 CLONE_VM flag , 會建立獨立記憶體空間的 cild process 不會互相影響, 設定 CLONE_VM 即為共用記憶體空間的 lightweight process(thread), 即為多執行緒共用資源的問題, 所以解決方式:
sudo taskset -c 7 ./fiber.o
綁定 CPU 變成單執行緒摘自 man 2 clone
輸出:
#define _GNU_SOURCE
#include <sched.h>
注意要將 _GNU_SOURCE
放置於 #include <sched.h>
之前。
jserv
謝謝老師, 我把這兩個放置在 fiber.h 導致編譯錯誤, 經提點後把這兩行加入至 main.c, 已排除警告
Julian