list_sort
分析RinHizakura
laneser
Risheng
freshLiver
首行
// SPDX-License-Identifier: GPL-2.0
參考 GNU 通用公共許可證
Where exactly is the file linux/kernel.h?
The
linux/kernel.h
header which gets used for module builds is the header which is part of the kernel source. When modules are built in the kernel source tree, that’s the version which is used.
__attribute__((nonnull(2,3,4)))
其中2、3、4代表第二、三、四個引數不能為空
__attribute__
可以設置函數屬性(Function Attribute)、變量屬性(Variable Attribute)和類型屬性(Type Attribute)
其中函數屬性(Function Attribute)中講到
函數屬性可以幫助開發者把一些特性添加到函數聲明中,從而可以使編譯器在錯誤檢查方面的功能更強大。__attribute__機制也很容易同非GNU應用程序做到兼容之功效。
參考 9.40 attribute((nonnull)) function attribute
This function attribute specifies function parameters that are not supposed to be null pointers. This enables the compiler to generate a warning on encountering such a parameter.
__attribute__((nonnull[(arg-index, ...)]))
Where[(arg-index, ...)]
denotes an optional argument index list.
If no argument index list is specified, all pointer arguments are marked as nonnull.
list_sort.c
中一共有三個函式,分別為 merge
,merge_final
以及 list_sort
merge
__attribute__((nonnull(2,3,4)))
static struct list_head *merge(void *priv, list_cmp_func_t cmp,
struct list_head *a, struct list_head *b)
{
struct list_head *head, **tail = &head;
for (;;) {
/* if equal, take 'a' -- important for sort stability */
if (cmp(priv, a, b) <= 0) {
*tail = a;
tail = &a->next;
a = a->next;
if (!a) {
*tail = b;
break;
}
} else {
*tail = b;
tail = &b->next;
b = b->next;
if (!b) {
*tail = a;
break;
}
}
}
return head;
}
首先設定2,3,4的引數不能是 NULL
在 list_sort.h 中查找相關 typedef
參考 [C語言] function pointer的應用[三]: 使用 typdef 來定義函數指標以增加程式的可讀性
在 list_sort.h
中查找 list_cmp_func_t
相關定義
/* SPDX-License-Identifier: GPL-2.0 */
#ifndef _LINUX_LIST_SORT_H
#define _LINUX_LIST_SORT_H
#include <linux/types.h>
struct list_head;
typedef int __attribute__((nonnull(2,3))) (*list_cmp_func_t)(void *,
const struct list_head *, const struct list_head *);
__attribute__((nonnull(2,3)))
void list_sort(void *priv, struct list_head *head, list_cmp_func_t cmp);
#endif
發現 cmp
是一個函式指標,回傳 int
型態,有三個參數,分別為 void*
以及兩個 const struct list_head*
,由 __attribute__((nonnull(2,3)))
定義第二個以及第三個參數不為null
根據程式註解
The comparison function @cmp must return > 0 if @a should sort after
@b ("@a > @b" if you want an ascending sort), and <= 0 if @a should sort before @b or their original order should be preserved. It is always called with the element that came first in the input in @a, and list_sort is a stable sort, so it is not necessary to distinguish the @a < @b and @a == @b cases.
cmp
return > 0 (升冪排列a排在b後面)cmp
return <= 0 (升冪排列a排在b前面) 或 原本的排序保留list_sort
是一個stable sort, 不必區分小於或等於0的分別merge_final
Combine final list merge with restoration of standard doubly-linked list structure. This approach duplicates code from merge(), but runs faster than the tidier alternatives of either a separate final prev-link restoration pass, or maintaining the prev links throughout.
此函式恢復雙向鍊結串列的鍊結結構,且由 merge()
而來,但是運算較在最終回複 prev
的鍊結或是持續維持 prev
的鍊結還要快
__attribute__((nonnull(2,3,4,5)))
static void merge_final(void *priv, list_cmp_func_t cmp, struct list_head *head,
struct list_head *a, struct list_head *b)
{
struct list_head *tail = head;
u8 count = 0;
for (;;) {
/* if equal, take 'a' -- important for sort stability */
if (cmp(priv, a, b) <= 0) {
tail->next = a;
a->prev = tail;
tail = a;
a = a->next;
if (!a)
break;
} else {
tail->next = b;
b->prev = tail;
tail = b;
b = b->next;
if (!b) {
b = a;
break;
}
}
}
/* Finish linking remainder of list b on to tail */
tail->next = b;
do {
/*
* If the merge is highly unbalanced (e.g. the input is
* already sorted), this loop may run many iterations.
* Continue callbacks to the client even though no
* element comparison is needed, so the client's cmp()
* routine can invoke cond_resched() periodically.
*/
if (unlikely(!++count))
cmp(priv, b, b);
b->prev = tail;
tail = b;
b = b->next;
} while (b);
/* And the final links to make a circular doubly-linked list */
tail->next = head;
head->prev = tail;
}
在註解中提到如果是非常不平衡的 merge ,像是輸入已經排序好了,這個迴圈會經過非常多次的迭代儘管根本不需要,所以這裡提到 cmp()
可以週期性的調用 cond_resched()
參考laneser同學提供的連結[Linux Kernel慢慢學]likely and unlikely macro
在Linux kernel 2.6之後提供了likely, unlikely 這兩個macro,他們被定義在/include/linux/compiler.h中
# define likely(x) __builtin_expect(!!(x), 1) # define unlikely(x) __builtin_expect(!!(x), 0)
其中 __built_expect()
是gcc的內建function,用來將branch的相關資訊提供給compiler,告訴compiler設計者期望的比較結果,以便對程式碼改變分支順序來進行優化,也就是說我告訴它這個 if
發生的的機率是多少,它會幫我優化
參考Other Built-in Functions Provided by GCC
而 !!(x)
這樣的寫法是因為
!!(x)
就無法確保值一定是0或1likely
macro表示這段敘述(x)為 true
的機率比較大(比較常發生),告訴compiler將 x==true
的對應執行程式碼接在判斷後面unlikely
macro表示這段敘述(x)為 true
的機率比較小(不常發生),告訴compiler將 x==false
的對應執行程式碼接在判斷後面list_sort
此段感謝RinHizakura的筆記
This mergesort is as eager as possible while always performing at least 2:1 balanced merges. Given two pending sublists of size 2^k, they are merged to a size-2^(k+1) list as soon as we have 2^k following elements.
這邊提到要保持兩個被merge的list至少是2:1,這可以降低比較的次數。一般的fully-eager bottom-up mergesort只要出現兩個大小為
參考閱讀
[RESEND PATCH v2 5/5] lib/list_sort: Optimize number of calls to comparison function
Thus, it will avoid cache thrashing as long as
只要這2:1的list (
The merging is controlled by "count", the number of elements in the pending lists. This is beautifully simple code, but rather subtle.
Each time we increment "count", we set one bit (bit k) and clear bits k-1 .. 0. Each time this happens (except the very first time for each bit, when count increments to
This merge happens exactly when the count reaches an odd multiple of
每次 count 達到
After this happens twice, we have created two lists of size
在發生兩次 merge 之後 產生兩個
* The number of pending lists of size 2^k is determined by the
* state of bit k of "count" plus two extra pieces of information:
*
* - The state of bit k-1 (when k == 0, consider bit -1 always set), and
* - Whether the higher-order bits are zero or non-zero (i.e.
* is count >= 2^(k+1)).
*
* There are six states we distinguish. "x" represents some arbitrary
* bits, and "y" represents some arbitrary non-zero bits:
* 0: 00x: 0 pending of size 2^k; x pending of sizes < 2^k
* 1: 01x: 0 pending of size 2^k; 2^(k-1) + x pending of sizes < 2^k
* 2: x10x: 0 pending of size 2^k; 2^k + x pending of sizes < 2^k
* 3: x11x: 1 pending of size 2^k; 2^(k-1) + x pending of sizes < 2^k
* 4: y00x: 1 pending of size 2^k; 2^k + x pending of sizes < 2^k
* 5: y01x: 2 pending of size 2^k; 2^(k-1) + x pending of sizes < 2^k
* (merge and loop back to state 2)
*
* We gain lists of size 2^k in the 2->3 and 4->5 transitions (because
* bit k-1 is set while the more significant bits are non-zero) and
* merge them away in the 5->2 transition. Note in particular that just
* before the 5->2 transition, all lower-order bits are 11 (state 3),
* so there is one list of each smaller size.
*
* When we reach the end of the input, we merge all the pending
* lists, from smallest to largest. If you work through cases 2 to
* 5 above, you can see that the number of elements we merge with a list
* of size 2^k varies from 2^(k-1) (cases 3 and 5 when x == 0) to
* 2^(k+1) - 1 (second merge of case 5 when x == 2^(k-1) - 1).
結構體
struct list_head *list = head->next, *pending = NULL;
size_t count = 0; /* Count of pending */
if (list == head->prev) /* Zero or one elements */
return;
/* Convert to a null-terminated singly-linked list. */
head->prev->next = NULL;
do {
size_t bits;
struct list_head **tail = &pending;
/* Find the least-significant clear bit in count */
for (bits = count; bits & 1; bits >>= 1)
tail = &(*tail)->prev;
/* Do the indicated merge */
if (likely(bits)) {
struct list_head *a = *tail, *b = a->prev;
a = merge(priv, cmp, b, a);
/* Install the merged result in place of the inputs */
a->prev = b->prev;
*tail = a;
}
/* Move one element from input list to pending */
list->prev = pending;
pending = list;
list = list->next;
pending->next = NULL;
count++;
} while (list);
查找 likely
定義
compiler.h
// first
# ifndef likely
# define likely(x) (__branch_check__(x, 1, __builtin_constant_p(x)))
# endif
# ifndef unlikely
# define unlikely(x) (__branch_check__(x, 0, __builtin_constant_p(x)))
# endif
// second
# define likely(x) __builtin_expect(!!(x), 1)
# define unlikely(x) __builtin_expect(!!(x), 0)
由 compiler.h
中的第 15、16 行決定該要定義哪一種
#if defined(CONFIG_TRACE_BRANCH_PROFILING) \
&& !defined(DISABLE_BRANCH_PROFILING) && !defined(__CHECKER__)
第二種定義可以參考本篇前面的章節,現在來分析第一種定義
兩個巨集函式 __branch_check__()
以及 __builtin_const_p()
在第 23 行提到定義
#define __branch_check__(x, expect, is_constant) ({ \
long ______r; \
static struct ftrace_likely_data \
__aligned(4) \
__section("_ftrace_annotated_branch") \
______f = { \
.data.func = __func__, \
.data.file = __FILE__, \
.data.line = __LINE__, \
}; \
______r = __builtin_expect(!!(x), expect); \
ftrace_likely_update(&______f, ______r, \
expect, is_constant); \
______r; \
})
在這之中又隱含兩個巨集,可以在 compiler_attributes.h 找到定義
#define __aligned(x) __attribute__((__aligned__(x)))
#define __section(section) __attribute__((__section__(section)))
接著參考 C 之 attribute 用法
大致有六個參數值可以被設定,即:aligned, packed, transparent_union, unused, deprecated 和 may_alias 。
在使用__attribute__
參數時,你也可以在參數的前後都加上__
(兩個下劃線),例如,使用__aligned__
而不是aligned
,這樣,你就可以在相應的頭文件裏使用它而不用關心頭文件裏是否有重名的宏定義。
該屬性設定一個指定大小的對齊格式(以字節爲單位)
要了解Linux Kernel代碼的分段信息,需要了解一下gcc的
__attribute__
的編繹屬性,__attribute__
主要用於改變所聲明或定義的函數或 數據的特性,它有很多子項,用於改變作用對象的特性。比如對函數,noline將禁止進行內聯擴展、noreturn表示沒有返回值、pure表明函數除 返回值外,不會通過其它(如全局變量、指針)對函數外部產生任何影響。但這裏我們比較感興趣的是對代碼段起作用子項section。
Syntax
__attribute__((section("section_name")))
其作用是將作用的函數或數據放入指定名爲 "section_name" 輸入段。
int var __attribute__((section(".xdata"))) = 0;
這樣定義的變量 var
將被放入名爲 .xdata
的輸入段,(注意: __attribute__
這種用法中的括號好像很嚴格,這裏的幾個括號好象一個也不能少。)
輸入段和輸出段是相對於要生成最終的elf或binary時的Link過程說的,Link過程的輸入大都是由源代碼編繹生成的目標文件.o,那麼這些.o 文件中包含的段相對link過程來說就是輸入段
line 4
to line 9
__func__
6.50 Function Names as Strings
static const char __func__[] = "function-name";
__FILE__
原始檔名
__LINE__
所在行數
3.7.1 Standard Predefined Macros
line 11
__builtin_expect
可以查找本篇前段說明
line 12
ftrace_likely_update(&______f, ______r, \
expect, is_constant);
定義於 linux/kernel/trace/trace_branch.c Line 205
本節程式碼行數提醒 line xx
依照原始碼行數定義
首先研讀註解
/*
* Data structure invariants:
* - All lists are singly linked and null-terminated; prev
* pointers are not maintained.
* - pending is a prev-linked "list of lists" of sorted
* sublists awaiting further merging.
* - Each of the sorted sublists is power-of-two in size.
* - Sublists are sorted by size and age, smallest & newest at front.
* - There are zero to two sublists of each size.
* - A pair of pending sublists are merged as soon as the number
* of following pending elements equals their size (i.e.
* each time count reaches an odd multiple of that size).
* That ensures each later final merge will be at worst 2:1.
* - Each round consists of:
* - Merging the two sublists selected by the highest bit
* which flips when count is incremented, and
* - Adding an element from the input as a size-1 sublist.
*/
NULL
且 prev
pointer 沒有維持pending
這個 list 是 prev-linked
的排序過後等待 merging 的子 lists list of lists
斷開環狀鍊結
struct list_head *list = head->next, *pending = NULL;
size_t count = 0; /* Count of pending */
if (list == head->prev) /* Zero or one elements */
return;
/* Convert to a null-terminated singly-linked list. */
head->prev->next = NULL;
掃過整個 lists
for
迴圈用以找出最 tail
中最左不為 0 的 bitsif
迴圈在 bits
MSB 為 0 ( MSB 不為 0 時必進 for
迴圈) 時且在其他位置有非 0 值時進入,代表有合併要進行do-while
迴圈之後其實最後的兩個 element 還是 unsorted 的do {
size_t bits;
struct list_head **tail = &pending;
/* Find the least-significant clear bit in count */
for (bits = count; bits & 1; bits >>= 1)
tail = &(*tail)->prev;
/* Do the indicated merge */
if (likely(bits)) {
struct list_head *a = *tail, *b = a->prev;
a = merge(priv, cmp, b, a);
/* Install the merged result in place of the inputs */
a->prev = b->prev;
*tail = a;
}
/* Move one element from input list to pending */
list->prev = pending;
pending = list;
list = list->next;
pending->next = NULL;
count++;
} while (list);
最後尾端的 element 進行 merge (過程中會 sort)
/* End of input; merge together all the pending lists. */
list = pending;
pending = pending->prev;
for (;;) {
struct list_head *next = pending->prev;
if (!next)
break;
list = merge(priv, cmp, pending, list);
pending = next;
}
最後把兩個 sublists 合併起來
/* The final merge, rebuilding prev links */
merge_final(priv, cmp, head, pending, list);
最後一行
EXPORT_SYMBOL(list_sort);
EXPORT_SYMBOL標籤內定義的函數或者符號對全部內核代碼公開,不用修改內核代碼就可以在您的內核模塊中直接調用,即使用EXPORT_SYMBOL可以將一個函數以符號的方式導出給其他模塊使用。
以及 Linux驅動開發——EXPORT_SYMBOL的使用
list_sort.c
假設輸入的 linked list 為(已將環狀結構斷開)
根據程式註解 prev pointers are not maintained.
為了讓圖解結構清細,接下來除了重要的 prev
指標之外不再繪製 prev
的指向,且 next
指向下一個節點在圖上會指向 prev
,實則指向該 node 的位置
上述輸入排序完應為
開始分析
在 line 187
中定義
struct list_head *list = head->next, *pending = NULL;
在 line 216
中定義,可知在 do-while
迴圈中,每進入一次新的 do
,tail
會追著 pending
struct list_head **tail = &pending;
可以得到
進入 do-while
迴圈
bits
會追著 count
定義
第一次迭代時
count = 0
不會進入 for
迴圈 也不會進入 if(likely(bits))
迴圈
接著進行 list
和 pending
的推移
/* Move one element from input list to pending */
list->prev = pending;
pending = list;
list = list->next;
pending->next = NULL;
count++;
到下一 do
的 for
迴圈之前
可以得到
count = 1
接著下來的每一次迭代作圖基本上都以 do
以後 for
以前 作為作圖的分界,若有不一樣的操作則詳細繪製
count = 1
進入 for
迴圈,執行
for (bits = count; bits & 1; bits >>= 1)
tail = &(*tail)->prev;
沒有進入 if(likely(bits))
,執行接下來的程式碼
到下一個 do
之後 for
之前,可得
count=2
count=2
不會進入 for
迴圈,進入 if(likely(bits))
迴圈
由line 232
定義
struct list_head *a = *tail, *b = a->prev;
進入 merge
a = merge(priv, cmp, b, a);
/* Install the merged result in place of the inputs */
a->prev = b->prev;
*tail = a;
進行 list
和 pending
的推移
count=3
count=3
進入 for
迴圈 (迭代兩圈)
進行 list
和 pending
的推移
count=4,且 list= NULL
跳出 do-while
迴圈
此時發現原本的 linked list 變成由 prev
連接的 linked list 且 node4
以及 node5
還沒有進行排序
執行最後節點的排序
/* End of input; merge together all the pending lists. */
list = pending;
pending = pending->prev;
進入 for(;;)
迴圈
for (;;) {
struct list_head *next = pending->prev;
if (!next)
break;
list = merge(priv, cmp, pending, list);
pending = next;
}
由上圖,經過 for(;;)
變為下圖
由line 247
list = merge(priv, cmp, pending, list);
將 list
移至 pending
的位置,且對 node4
及 node5
排序
由line 248
pending = next;
將 pending
移至 node3
得位置
且在執行下一次迭代時就會經過
struct list_head *next = pending->prev;
進而 break
迴圈
此時可以發現有兩個 sublists
list
指向 node4
指向 node5
pending
指向 node3
指向 node2
最後執行
merge_final(priv, cmp, head, pending, list);
得到
完成排序!
拜讀 kdnvt 的筆記之後發現了自己不足的部份,並嘗試理解同學們提到的部份
文中提到目前的 merge sort 有