2022q1 Homework1 (quiz1)

contributed by < SmallHanley >

linux2022-quiz1

測驗 `1`

解釋上述程式碼運作原理

首先是資料結構的部分：

struct hlist_node { struct hlist_node *next, **pprev; };
struct hlist_head { struct hlist_node *first; };
typedef struct { int bits; struct hlist_head *ht; } map_t;

struct hash_key {
    int key;
    void *data;
    struct hlist_node node;
};

struct hlist_node 這個結構體包含指標 next ，用來指向下一個節點，以及指標的指標 prev，指向前一個節點的 next 指標，可以在接到 hlist_head 時不用額外判斷，避免分支。

這裡不懂開發者選擇使用雙向鏈結串列的考量，查看 history 最早出現 hash_node 是在 v2.5.64，github commit history 沒有這麼舊的資料，ChangeLog-2.5.64 也沒有查到有用的資訊。

struct hlist_head 結構體則是 hash table 各個 bucket 的 head，其中 first 指標指向第一個 struct hlist_node 節點。

而 map_t 是 hash table，bits 代表 bucket 的多寡 (以多少位元表示)，因此 bucket 的數量為 2 的冪。ht 則是一個陣列，用來存放每個 bucket 的 head。

struct hash_key 結構體是用來包裝每個 key-value pair，key 會根據 hash function 找到對應的 bucket，該 bucket 的 struct hlist_head 會串接所有映射到這個 bucket 的 struct hlist_node，示意圖如下：

map_init()
根據傳進來的 bits 來決定 hash table 的大小，bits 代表 bucket 的多寡 (以多少位元表示)，因此 hash table 的 size 為 2 的冪。配置 map_t 以及 map->ht 的空間，並將所有 entry 的 first 指標指向 NULL。
hash()
即 hash function 的所在，這裡使用 magic number 0x61C88647，看後續分析 linux kernel 相關實作會不會探討，這裡先跳過。
find_key()
首先根據 key 的值，透過 hash function 映射到 hash table 的其中一個 bucket，並用 for 迴圈走訪每一個 struct hlist_node 結構體，如果該結構體的 key 是我們要找的，則回傳該結構體，若找不到則回傳 NULL。
map_get()
先呼叫 find_key() 取得目標結構體，並回傳其 data。
map_add()
插入新的 key-data pair 到 hash table，如果該 key 已經存在，則不做任何事情。
map_deinit()
釋放 hash table 相關的資源。

研讀 Linux 核心原始程式碼 include/linux/hashtable.h 及對應的文件 How does the kernel implements Hashtables?，解釋 hash table 的設計和實作手法，並留意到 tools/include/linux/hash.h 的 GOLDEN_RATIO_PRIME，探討其實作考量

光是 hash table 的 define 就有分三種：

#define DEFINE_HASHTABLE(name, bits)						\
	struct hlist_head name[1 << (bits)] =					\
			{ [0 ... ((1 << (bits)) - 1)] = HLIST_HEAD_INIT }

#define DEFINE_READ_MOSTLY_HASHTABLE(name, bits)				\
	struct hlist_head name[1 << (bits)] __read_mostly =			\
			{ [0 ... ((1 << (bits)) - 1)] = HLIST_HEAD_INIT }

#define DECLARE_HASHTABLE(name, bits)                                   	\
	struct hlist_head name[1 << (bits)]

中間多了 __read_mostly，追蹤相關程式碼，發現 __read_mostly 的定義是被分散在不同處理器架構的 cache.h (arch/xxx/include/asm/cache.h)，其中，有部分架構定義如下，包含 arm、arm64、ia64、mips、parisc、powerpc、s390、sh、sparc、x86：

#define __read_mostly __section(".data..read_mostly")

其中 __section 又被定義在 include/linux/compiler_attributes.h 中，在 GNU C and C++ 中，可以用來描述函式的特定性質 (參見 Declaring Attributes of Functions)。而這裡使用到的是 section 這個 attribute (參見 Function-Attributes-section)，概念上是將一段程式放置在指定的 special section，推測跟 cache 的相關優化有關。

hashtable.h

Linux hashtable.h 有 include linux/rculist.h，可以參考 linux Documentation rcu.rst、listRCU.rst。

重點整理：

RCU (read-copy update) 的概念是將 destructive operation 分成兩個部分，第一是將要 destroy 的資料隱藏不讓其他 reader 看到，第二是實際執行 destruction 的動作。而兩個部分中間有一段時間稱作 "grace period"，grace period 需要足夠長使得所有 reader drop the reference。
RCU 在鏈結串列的操作之一，流程上是將節點從鏈結串列中移除，等待 grace period，然後釋放對應空間。
使用情境：待補充
- Read-mostly list: Deferred Destruction
- Read-Side Action Taken Outside of Lock: No In-Place Updates
- Handling In-Place Updates
- Eliminating Stale Data
- Skipping Stale Objects

hash_init()

將 hash table 初始化。
從上面示意圖可以知道 hash table 是一個 struct hlist_head 組成的陣列，每個 struct hlist_head 是映射到同一個 bucket 的鏈結串列的 head。將每個結構體的成員 first 初始化成 NULL。

// linux/list.h
#define INIT_HLIST_HEAD(ptr) ((ptr)->first = NULL)

// linux/hashtable.h
static inline void __hash_init(struct hlist_head *ht, unsigned int sz)
{
	unsigned int i;

	for (i = 0; i < sz; i++)
		INIT_HLIST_HEAD(&ht[i]);
}

#define hash_init(hashtable) __hash_init(hashtable, HASH_SIZE(hashtable))

hash_add()

// linux/hashtable.h
#define hash_add(hashtable, node, key)						\
	hlist_add_head(node, &hashtable[hash_min(key, HASH_BITS(hashtable))])

// linux/list.h
static inline void hlist_add_head(struct hlist_node *n, struct hlist_head *h)
{
	struct hlist_node *first = h->first;
	WRITE_ONCE(n->next, first);
	if (first)
		WRITE_ONCE(first->pprev, &n->next);
	WRITE_ONCE(h->first, n);
	WRITE_ONCE(n->pprev, &h->first);
}

commit c54a274

Also add various WRITE_ONCE() in __hlist_del(), hlist_add_head()
and hlist_add_before()/hlist_add_behind() to pair with
the READ_ONCE().

hash.h

在 tools/include/linux/hash.h 可以發現一個巨集 GOLDEN_RATIO_PRIME，

/*
 * Knuth recommends primes in approximately golden ratio to the maximum
 * integer representable by a machine word for multiplicative hashing.
 * Chuck Lever verified the effectiveness of this technique:
 * http://www.citi.umich.edu/techreports/reports/citi-tr-00-1.pdf
 *
 * These primes are chosen to be bit-sparse, that is operations on
 * them can use shifts and additions instead of multiplications for
 * machines where multiplications are slow.
 */
#if BITS_PER_LONG == 32
/* 2^31 + 2^29 - 2^25 + 2^22 - 2^19 - 2^16 + 1 */
#define GOLDEN_RATIO_PRIME 0x9e370001UL
#elif BITS_PER_LONG == 64
/*  2^63 + 2^61 - 2^57 + 2^54 - 2^51 - 2^18 + 1 */
#define GOLDEN_RATIO_PRIME 0x9e37fffffffc0001UL
#else
#error Define GOLDEN_RATIO_PRIME for your wordsize.
#endif

在 commit 689de1d 後改成以下實作版本：

/*
 * This hash multiplies the input by a large odd number and takes the
 * high bits.  Since multiplication propagates changes to the most
 * significant end only, it is essential that the high bits of the
 * product be used for the hash value.
 *
 * Chuck Lever verified the effectiveness of this technique:
 * http://www.citi.umich.edu/techreports/reports/citi-tr-00-1.pdf
 *
 * Although a random odd number will do, it turns out that the golden
 * ratio phi = (sqrt(5)-1)/2, or its negative, has particularly nice
 * properties.  (See Knuth vol 3, section 6.4, exercise 9.)
 *
 * These are the negative, (1 - phi) = phi**2 = (3 - sqrt(5))/2,
 * which is very slightly easier to multiply by and makes no
 * difference to the hash distribution.
 */
#define GOLDEN_RATIO_32 0x61C88647
#define GOLDEN_RATIO_64 0x61C8864680B583EBull

其中有討論到有關 GOLDEN_RATIO_PRIME 的實作考量，出自於 The Art of Computer Programming (Knuth vol 3, section 6.4, exercise 9) 這本書，以下取自這本書的片段。

Theorem S. Let

θ

be any irrational number. When the points {

θ

}, {

2 θ

}, …, {

n θ

} are placed in the line segment [0 … 1], then

n + 1

line segments formed have at most three different lengths. Moreover, the next point {

(n + 1) θ

} will fall in one of the largest existing segments.
(用 Three-gap theorem 推導。)

{

n θ

} 代表

n θ

的小數部分。

之所以要取小數部分，原理是這個分割問題可以 model 成 Three-gap theorem。line segment [0 … 1] 對應到 Three-gap theorem 中周長為 1 的圓，

n θ

的整數部分相當於繞了圓多少圈 (即在 line segment 循環多少次)，而小數部分相當於新的點坐落於圓上的哪一個位置 (即 line segment 中新分割點的位置)。

有關 Theorem S 的證明在 exercise 8

任意數乘上這兩個數再取其高位元，可呈現好的數值分布。實際的 hash function (舉 32-bit 為例) 是定義在 tools/include/linux/hash.h：













#ifndef HAVE_ARCH__HASH_32
#define __hash_32 __hash_32_generic
#endif
static inline u32 __hash_32_generic(u32 val)
{
	return val * GOLDEN_RATIO_32;
}

static inline u32 hash_32(u32 val, unsigned int bits)
{
	/* High bits are more random, so use them. */
	return __hash_32(val) >> (32 - bits);
}

上述的實作是將 0 ~ 1 的浮點數以 32-bit 無號整數表示。第 6 行將欲進行 hash 的數值 val 乘上 GOLDEN_RATIO_32 ，第 12 行是取最高的 bits 個位元。對應上述 Fibonacci Hashing 理論，val 對應到

n

，GOLDEN_RATIO_32 為

ϕ^{- 1}

以 32-bit 無號整數表示，兩者相乘，因為會產生溢位 (一次溢位相當於 Three-gap theorem 中繞了圓一圈)，相乘的結果為 {

n ϕ^{- 1}

} 以 32-bit 無號整數表示，取其高位元，則相當於取小數前幾位數。

實驗

實際對比新、舊版本的 GOLDEN_RATIO_PRIME 實作，新版的實作可以讓 hash function 結果的分佈更為均勻。以下分別測試 0 到 1000、10000 映射到 hash table 不同 locations 的次數，hash table 的大小為 1024 (10 bits)：

Reference

並行程式設計: Atomics 操作
 6.46 When is a Volatile Object Accessed?
Documentation/memory-barriers
Golden_ratio
Fibonacci_number#Relation_to_the_golden_ratio

2022q1 Homework1 (quiz1)

測驗 1

hashtable.h

hash_init()

hash_add()

hash.h

實驗

Reference

Read more

2022q1 Homework2 (quiz2)

Lab1: RV32I Simulator

BMQ 排程器研究

2022q1 Homework4 (quiz4)

測驗 `1`