contributed by < linD026
>
linux2021
jserv
list_sort
仍採用修改過的 merge sort,能否推敲和列出考量?*list
節點前增加 node_t
節點,並讓 *list
重新指向 *list
鏈結串列最前端。node_t->next = *list;
中可以看到 list
需要 dereference 出來被保存的節點。`
此函式先找出 *left
linked list 末端節點的 struct __node next;
,之後和 right
串聯。
此函式利用指標的指標來簡化頭端是 NULL 以及因單指標的 scope 而需回傳值的額外處理。
*list
的最初節點,與原先 list 單獨獨立出來後,使其做為區分*list
成 left
和 right
list 的依據。注意,能讓pivot
獨立是因為 pivot
指向的值為 *list
最前端,因此無任何節點指向它,外加上指定 struct __node next;
NULL 。
*list
中小於pivot的利用 list_add_node_t
加入 left
,反之right
。隨後再分別進行 quicksort
遞迴。quicksort
中是以 left
優先遞迴,較小的值會先排序完成,再加上 list_concat
會把 right
取代調 *left
的第一個出現的 NULL ,因此節點之間 NULL 可以規避,不會產生斷點。最終利用 list_concat
依序把 left
、 pivot
、 right
串聯即完成。
EXAMPLE :
會得到 NULL > 1 > 2 > 3 > NULL > 5 > 6 > 7 > NULL
=> 1 > 2 > 3 > 5 > 6 > 7
的排序。
閱讀 Tail Recursion,確認上述 quick sort 實作是否能透過 tail call 最佳化來改進效能?
簡單說 tail call 可以在經由最佳化後達到盡可能的使用重複的 stack 空間,簡化操作以增進執行效能與空間使用率。
對此我進行比較:
可看出迴圈版本自製的 stack ,其效能無法與被編譯器最佳化過的其他兩個版本相比。
Reference
根據 random(3) - Linux man page
The random() function uses a nonlinear additive feedback random number generator
以及
The srandom() function sets its argument as the seed for a new sequence of pseudo-random integers to be returned by random(). These sequences are repeatable by calling srandom() with the same seed value. If no seed value is provided, the random() function is automatically seeded with a value of 1.
和 glibc random.c 中的註解與定義
可以看出 random 會因為我們擁有的 state information 大小 ( break value ) 而有不同數學式。
可以從這裡得證。
然而 state information 也可以用其他函式替換:
再來可以從下註解知道我們所擁有的資訊是 array 型態:
可以從第 71 行看到 state information 的 array 的第一個元素是存放用哪種數學型態。
以下為初始型態:
可以知道 state infomation 的初始化跟 srandom 有關:
state[i] = (16807 * state[i - 1]) % 2147483647
( linear congruential generator )。*result
為最後函式輸出的值,可以看出是由 *fptr
和 *rptr
給出。對於如何改變 type ,這要從 initstate 和 setstate 看起。
有了上述概念後,可以直接從 Linux Programmer's Manual 看相關描述:
- The initstate() function allows a state array state to be initialized for use by random(). The size of the state array n is used by initstate() to decide how sophisticated a random number generator it should use—the larger the state array, the better the random numbers will be.
- The setstate() function changes the state array used by the random() function.
可以看到會因為 initstate() 的 傳入的 array 大小 n 來決定 要用哪個 type 。
在 initstate 實作 __initstate_r 中也可以看到這個判斷式:
有趣的是在 __initstate_r 已經在 271 行幫你呼叫了 __srandom_r (seed, buf);
了。
最後我們也可以從 man page 上看到:
這是因為在呼叫這些函式的實作時,都已經有 __libc_lock_lock (lock);
和 __libc_lock_unlock (lock);
了。
我以用 srandom(time(NULL));
以及外包另一個 PRNG Lagged Fibonacci generators 來重新取回想要的亂數。
但還是有問題,如果快速執行此執行檔會因為 seed
是由 time()
去取得,會如先前一樣狀況(重複)。
因此可以說對於 random()
所造成的問題並沒有從根本上解決。
在這之後我又找了 int main(int argc, char **argv)
的 *argv
地址來做另一個亂數。
這才大致上完成我想像中的 PRNG ,雖然還是有種美中不足的感覺,之後如果有新想法再改吧。
TODO 指出上述 ASLR 運作機制,並探討其 entropy
jservImage Not Showing Possible ReasonsLearn More →
- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
The goal of Address Space Layout Randomization is to introduce randomness into addresses used by a given task. This will make a class of exploit techniques fail with a quantifiable probability and also allow their detection since failed attempts will most likely crash the attacked task.
可以知道是為了避免當程式執行的起始位置過於單調導致容易被攻擊,所做的防護措施。
Address Space Layout Randomization (ASLR)is a class of computer security defense techniques designed to reduce theimpact of buffer-overflow vulnerabilities. It adds a random offset to the virtual memory layout of each program, making itharder for an attacker to predict the target memory address thatthey wish the vulnerable program to return to. If the attackeroverwrites the return address with a bad memory address, theprobability of a successful exploit decreases.
詳細的內容如作業系統的比較可以請見 ASLR: How Robust is the Randomness? 。
To help understand the ideas behind ASLR, let's look at an example task and its address space: we made a copy of /bin/cat into /tmp/cat then disabled all PaX features on it and executed "/tmp/cat /proc/self/maps". The [x] marks are not part of the original output, we use them to refer to the various lines in the explanation (note that the VMMIRROR document contains more examples with various PaX features active).
[1] 08048000-0804a000 R+Xp 00000000 00:0b 812 /tmp/cat
[2] 0804a000-0804b000 RW+p 00002000 00:0b 812 /tmp/cat
[3] 40000000-40015000 R+Xp 00000000 03:07 110818 /lib/ld-2.2.5.so
[4] 40015000-40016000 RW+p 00014000 03:07 110818 /lib/ld-2.2.5.so
[5] 4001e000-40143000 R+Xp 00000000 03:07 106687 /lib/libc-2.2.5.so
[6] 40143000-40149000 RW+p 00125000 03:07 106687 /lib/libc-2.2.5.so
[7] 40149000-4014d000 RW+p 00000000 00:00 0
[8] bfffe000-c0000000 RWXp fffff000 00:00 0As we can see, /tmp/cat is a dynamically linked ELF executable, its address space contains several file mappings.
[1] and [2] correspond to the loadable ELF segments of /tmp/cat containing code and data (both initialized and uninitialized), respectively.
[3] and [4] represent the dynamic linker whereas [5], [6] and [7] are the segments of the C runtime library ([7] holds its uninitialized data that is big enough to not fit into the last page of [6]).
[8] is the stack which grows downwards.
There are other mappings as well that this simple example does not show us: the brk() managed heap that would directly follow [2] and various anonymous and file mappings that the task can create via mmap() and would be placed between [7] and [8] (unless an explicit mapping address outside this region was requested using the MAP_FIXED flag).
For our purposes all these possible mappings can be split into three groups:
- [1], [2] and the brk() managed heap following them,
- [3]-[7] and all the other mappings created by mmap(),
- [8], the stack.The mappings in the first and last groups are established during execve() and do not move (only their size can change) whereas the mappings in the second group may come and go during the lifetime of the task. Since the base addresses used to map each group are not related to each other, we can apply different amount of randomization to each. This also has the benefit that whenever a given attack technique needs advance knowledge of addresses from more than group, the attacker will likely have to guess or brute force all entropies at once which further reduces the chances of success.
在 grsecurity 也有詳細對 ASLR 的相關機率描述,詳細請見 ASLR 1. Design ,在此指引入一部分對於 ASLR 搭配 crash detection 所形成保護的內容。
On the other hand however the defense side has quite some control over the value of x: whenever an attack attempt makes a wrong guess on the randomized bits, the attacked application will go into a state that will likely result in a crash and hence becomes detectable by the kernel. It is therefore a good strategy to use a crash detection and reaction mechanism together with ASLR (PaX itself contains no such mechanism).
The last set of side effects of ASLR is address space fragmentation and entropy pool exhaustion. Since randomization shifts entire ranges of memory, it will also randomly change the gaps between them (which were constant before). This in turn will change the maximum size of memory mappings that will fit in there and applications expecting to be able to create them will fail. Finally, ASLR increases the consumption of the system's entropy pool since every task creation (through the execve() system call) requires some bits of randomness to determine the new address space layout. Depending on the system's threat model however a given implementation can relax the requirements for the quality of this entropy. In particular, if only remote attacks are considered, then ASLR does not need cryptographically secure random bits as a remote attacker cannot observe them (or if he can, he does not need to care about ASLR at all).
根據 Linux man page - ldd 顯示:
For each dependency, ldd displays the location of the matching object and the (hexadecimal) address at which it is loaded. (The linux-vdso and ld-linux shared dependencies are special; see vdso(7) and ld.so(8).)
可以看到共享物件的位置不同。
註:kernel.randomize_va_space = 2
為 full Randomization 。
也可以下達 cat /proc/sys/kernel/random/entropy_avail
來看電腦目前儲存的 entropy bit 為多少。 ( below 4096 )
此為 Linux 核心所提供的 RNG ,相關函式可見 random(7) 和 random(4) 的 Linux manual page 。
urandom
和 getrandom
The random number generator gathers environmental noise from device drivers and other sources into an entropy pool. The generator also keeps an estimate of the number of bits of noise in the entropy pool. From this entropy pool, random numbers are created.
When read, the /dev/urandom device returns random bytes using a pseudorandom number generator seeded from the entropy pool. Reads from this device do not block (i.e., the CPU is not yielded), but can incur an appreciable delay when requesting large amounts of data.
When read during early boot time, /dev/urandom may return data prior to the entropy pool being initialized. If this is of concern in your application, use getrandom(2) or /dev/random instead.
筆記 : // linux's rand using cryptographically secure pseudo random number
Reference
相關閱讀
註解 :
quicksort
的 function stack 資訊在單一函式內重新建構所需的資訊。partition = stack[top--];
的值為單一節點。quicksort
一樣對linked list 處理成 left
、 pivot
、 right
。
left
或 right
為 NULL 的狀況。else
進行 pop stack 直到 top
指到不為單一節點的 item 。linux-list 問題中有描述到:
Linux 核心內部也有 linked list 實作,但是 circulr doubly-linked list,linux-list 仿效 Linux 核心的實作並予以簡化……
關於 Linux 核心的實作可以從下列網址查閱:
實作方面是以 merge sort 為主。在更早之前好像是看「Linux 核心設計」系列講座 的錄影檔有說道是因為 merge sort 是 stable 且 average, best case, worst case 都是 ,所以才採用 merge sort (stable sort) 而非 worst case 的 quick sort (unstable sort) 。
list_qsort
list_head
結構是由 prev
指標指向前一個節點與 next
指向下一個節點的雙向鏈結串列所構成。listitem
結構為儲存目標節點的數值 i
以及其來源 list
。可以很明確地看到每個動作都被包裝,這很符合一般對於大型專案為了多人維護、開發等考量的認知,每個動作都獨立出來使得在修改底下程式碼時 API 依舊保持一致。
對於一一解釋包裝內容會使篇幅過於冗長,因此在此會先列出各行為內部原始碼後,再以 pseudocode 形式完整表示所有行為。
下列為各被包裝的底部細節,依循 list_qsort
所逐一使用的動作由上而下列出:
list_qsort
與 quicksort
差異list_qsort
pseudocode 可以明確的得出,兩個函式行為大同小異,差別在對於一個行為它程式碼的是否被包裝成 object 。Reference
相關閱讀
若想直接看 4 思考高效率的 linked list 排序演算法 所提出的實作請見 Tree sort。
註解:
根據 A Comparative Study Of Linked List Sorting Algorithms 探討六個排序演算法,其中 Doubly Linked List Sorting Algorithms 提到的 tree sort 在文中所給出的數據是最快速。
n D-Bub S-Bub Select Msort Qsort Tree 100 0.22 0.12 0.10 0.08 0.07 0.05 200 0.98 0.54 0.44 0.15 0.13 0.10 300 2.20 1.22 0.76 0.23 0.22 0.19 400 4.18 2.42 1.44 0.32 0.30 0.21 500 6.38 3.74 2.18 0.42 0.37 0.29 600 10.22 6.48 4.06 0.53 0.51 0.40 700 15.38 10.10 6.46 0.69 0.57 0.43 800 21.20 14.82 9.68 0.76 0.69 0.51 900 28.34 20.20 13.62 0.88 0.79 0.61 1000 36.58 26.14 17.88 1.01 0.89 0.69 2000 159 127 93 2.75 2.00 1.38 3000 379 302 220 3.38 3.38 2.88 4000 693 549 401 5.50 4.12 4.12 5000 1104 867 643 6.00 6.88 5.50 6000 1763 1395 1082 9.00 8.88 6.38 7000 3037 2604 2169 12.38 11.00 9.62 8000 4449 3850 3252 13.75 11.62 10.25 9000 5515 4630 3917 16.38 14.38 12.25 10000 6591 5509 4619 19.25 16.50 12.25 table 1 and 2 : Running Time for item n = 100 to 1000 and 2000 to 10000
在文中有提到如何進行 tree sort :
Therefore, we can use these fields to build a binary search tree and reconstruct a sorted doubly linked list as the binary search tree is traversed with inorder.
雖然文中對於如何建構 binary search tree 沒有詳細闡述,但有列出是如何進行轉換。
Figure 2 : Reconstruct a List from a Tree
Figure 2 shows a modified recursive inorder traversal of a binary search tree. Two static variables, head and tail, are set to NULL before calling Traverse(). Function Traverse() receives the current root pointer. If it is not NULL, the left subtree is traversed. Then, the pointer to root's right subtree is saved to work, the root is appended to the end of the doubly linked list with head and tail pointers head and tail, and finally the right subtree pointed to by work is traversed.
Note that the pointer to the right subtree must be saved before the root is appended to the doubly linked list since macro APPEND_NODE destroys prev and next.
至於如何建構 binary search tree 可以看 splay tree 或是參考 Linux 核心裡的 red black tree。
此疑問已解決。詳細請看錯誤修正。
在此有個疑問,明明都是 的排序演算法為何會在 10000 個 items 時 quick sort 會與 tree sort 有 4 秒多差距,甚至和 merge sort 相比會有 7 秒之差呢?
文中也有說到作者的測試環境:
All of these six algorithms were coded in ANSI C and SWAP(), APPEND() and JOIN() are C macros rather than functions except for the sediment sort whose swap function is taken directly from Carraway's paper.2 For those who love C++, these macros and variable parameters can easily be changed to inline functions and aliases, respectively. Each sorting algorithm is repeated several times sorting the same set of input to minimize timing error and the average elapsed time is recorded. The clock() function is used to retrieve the elapsed time between the start and the end of a sorting algorithm, excluding data generation and all other operations. Note that clock() returns the number of clock ticks rather than the number of seconds.
Moreover, since clock() returns elapsed time rather than user time (i.e., the CPU time used by a user program), this test is performed under MSDOS rather than Windows and Unix to minimize the multitasking effect. The machine used for this test is an Intel 66mhz 486DX2 IBM PC compatible and the compiler is Watcom C/C++ Version 10.0 with compiler options set to /oneatx/zp4/4/fp3 as suggested by Watcom for maximum efficiency.
也有 Bubble sort, merge sort, quick sort 的程式碼,在此就先不列出,詳見 A comparative study of linked list sorting algorithms。
文中也有提出關於使用 binary search tree 的 tree sort 為何是 :
As is well-known, the complexity of binary search tree insertion is , since in a binary search tree, except for one leaf, all nodes could have exactly one child and in this case the tree reduces to a linked list. However, if the input data are random, the resulting binary search tree could be reasonably balanced and the complexity would be approximately .
關於為何是 可以看 stackoverflow - Run time for inserting a binary search tree is n^2? 。
因此可以推測在文中 tree sort 所分析計算的時間,是沒有包含 Traverse
所耗的時間。
至此,在我自己的 tree sort 時間複雜度算法會是 ,簡單的列出為何會是如此:
traverse
當然對於作者為何會提出 ,可能有其他因素考量而我不知道或漏看也說不定。
關於此疑問在 (2021/3/4 - 12:48)已解決,是我算法有問題。
詳細看錯誤修正。
排除尚未探討的 tree sort,本處仍以 quick sort 分析為主。
Figure 3: Quick Sort
Figure 3 shows the basic concept, where macro APPEND() appends the first argument to the tail of a singly linked list whose head and tail are defined by the second and third arguments. On return, the first argument will be modified so that it points to the next node of the list.
Macro JOIN() appends the list whose head and tail are defined by the third and fourth arguments to the list whose head and tail are defined by the first and second arguments.
在文中,作者把 quick sort 歸類在 4 Singly Linked List Sorting Algorithms 中;而就其 unstable 特性也有大略提到:
For quick sort, Hoare's original algorithm [3] cannot be used since this algorithm “burns a candle from both ends”. Nico Lomuto's algorithm as described in Bentley [1] could be a better candidate for our study since it keeps two forward scanning pointers. However, since quick sort is not stable (Sedgewick [6]), it is not included. Instead, an algorithm which was originally designed to make quick sort stable and to handle equal keys is selected for this study. This algorithm was first proposed by Motzkin [5] and then analyzed by Wegner [7]. In fact, Wegner showed that on average this algorithm is of order , where n is the number of keys in an input linked list in which each key occurs m times.
對上述所說 Tree sort 為 是沒錯的,但在轉換成 Big O notation 時,會因為定義:
Big O notation characterizes functions according to their growth rates: different functions with the same growth rate may be represented using the same O notation. The letter O is used because the growth rate of a function is also referred to as the order of the function. A description of a function in terms of big O notation usually only provides an upper bound on the growth rate of the function. Associated with big O notation are several related notations, using the symbols o, Ω, ω, and Θ, to describe other kinds of bounds on asymptotic growth rates.
因此 Tree sort 為
至此,文中把其歸類為 是正確的。
此 tree sort 的 balanced binary search tree 採用 red black tree。
我改寫 linux kernel 裡的 rbtree.h
和 rb_tree_augmented.h
,簡化它並藉此達到我想要的功能。原先是打算實作 cache 或 augmented 版本,但詳細操作還需要一段時間才能理解,因此做出最簡化的版本。
關於 Linux 核心的實作似乎是以 rb_tree_augmented.h
為主,因為其也可以達到 rbtree.h
的效果。
我實作過一個精簡的 red-black tree,可見:
tree-sort.c
因為如果利用 linux-list 內建的 make
會產生
dummy_rotate
是 NULL function
。warnings
可以忽略,但以此 makefile 操作的話會因為 cc1: all warnings being treated as errors
而出現錯誤不給編譯。dummy_rotate
又是必要的,因此檢測方式我先改成列印出來與獨立的 valgrind 來交叉檢測。列印 (values size = 256):
valgrind (values size 10000):
以下是針對 quick sort 和 tree sort 的效能分析程式:
對此我進行了比對,可以明顯看出 Tree sort 所擁有的效率大於 quick sort 。
可以做出推論,quick sort 在進行分離、比較及合併所累積的成本,比轉化成 balanced binary search tree 與走訪重建順序還要高。
對此思考過 quick sort 所使用的 stack 操作造成資源使用過大,運行成本高。
Reference
考量點
stackoverflow - What is a “cache-friendly” code?
The idea is that most of the executing code will be hitting a small set of variables often, and the rest (a much larger set of variables) infrequently. If the processor can't find the data in L1 cache, then it looks in L2 cache. If not there, then L3 cache, and if not there, main memory. Each of these "misses" is expensive in time.
Reference
相關閱讀
在經過第三周 quiz3 的 fbstring 測驗後,以及 hankluo6 同學對於 list_sort 和 tree_sort 的 cache 使用一提,令我想到能對 rbtree 的儲存方式進行最佳化。
因此仔細回去查看 Linux kernel 裡的 rbtree 實作發現,實作中早已對節點的記憶體進行了最佳化。
以下是我發現的地方:
根據 hankluo6 同學與 jserv 老師的 c_map:
可以看到對於 parent 這個連結點是以 struct c_map_node *up
或是 struct __node *up
表示。
然而在 Linux kernel 中卻是只有:
少了 parent 這個參數,且 color 被設為 unsigned long 型態。
根據 stackoverflow - Red black node's struct alignment in linux kernel 的解說,是因為他利用 __attribute__((aligned(sizeof(long))))
把 color 這個參數儲存在 parent 的最後兩個 bit 了。
原理引述上續那篇提問者 Don Hall 的留言:
I think i get it: the pointer save a node's address, which is multiple of 4(100) because of alignment of 4, so the value of a pointer will always like 100, 1000, 1100, 10100100. and is always end with 00. thanks a lot! – Don Hall
可以知道因為 aligned 之後會讓記憶體對齊,一般來說是讓 CPU 的讀取效率提高,但在 Linux kernel 的紅黑樹卻利用這點使 rb_node 的記憶體位置的最後兩的位元一定為零,然後利用這兩個位元去紀錄是黑還是紅。
因此我學習了 Linux kernel 裡的技巧改了 hankluo6 同學的 node_t ( c_map ) 對顏色儲存的方式:
並設了以下 marco 讓我們能分別讀取 parent 和 color :
下方為各節點映出的佔用空間:
可以看到就如第三周測驗的 quiz3 裡的 fbstring 一樣,近乎榨乾了所有能用的空間。
對此因為如果原先的 c_map 與更改過後的函式一起執行的話會衝突到,但改下去要花費大量的時間,在此先進行較不精確的效能比對:
詳細程式碼請到 c_map_bit.c 和 type.h 查閱,謝謝。
Reference
額外閱讀