2022q1 Homework1 (quiz1)

contributed by < tinyynoob >

作業要求

測驗 1

本題主要是在闡述 Linux kernel 當中 hash table 的實作，觀察

struct hlist_head {
    struct hlist_node *first;
};

struct hlist_node {
    struct hlist_node *next, **pprev;
};

可知 table 結構大致如下

每格都連接著一條 linked list (或 NULL )，以 heads[2] 為例

每個 hlist_node 上的 entry 結構定義為：

struct hash_key {
    int key;
    void *data;
    struct hlist_node node;
};

其中不同節點的 key 應不同，即 key 的唯一性需要被保證，否則後續在 find_key() 可能會出問題。

typedef struct { int bits; struct hlist_head *ht; } map_t;

#define MAP_HASH_SIZE(bits) (1 << bits)

table size (圖中的數字 4 ) 是由 MAP_HASH_SIZE() macro 來取得，其值為輸入乘以 2 ，並會紀錄於 map_t 的 bits 成員。

為何 bits 不使用 size_t 型別？
此外，為何會選用 bits 做為變數名稱？是否有什麼特殊意涵？

取名 bits 主要是搭配 capacity 的計算，無論是 size_t 或 ssize_t 都不影響理解。

Image Not Showing Possible Reasons
The image file may be corrupted
The server hosting the image is unavailable
The image path is incorrect
The image format is not supported
Learn More →

jserv

map_init()

map_t *map_init(int bits) {
    map_t *map = malloc(sizeof(map_t));
    if (!map)
        return NULL;

    map->bits = bits;
    map->ht = malloc(sizeof(struct hlist_head) * MAP_HASH_SIZE(map->bits));
    if (map->ht) {
        for (int i = 0; i < MAP_HASH_SIZE(map->bits); i++)
            (map->ht)[i].first = NULL;
    } else {
        free(map);
        map = NULL;
    }
    return map;
}

該函式的功能為動態配置出 table 結構，並依序將每個 hlist_head 的 first 成員都初始化為 NULL 。

find_key()









static struct hash_key *find_key(map_t *map, int key) {
    struct hlist_head *head = &(map->ht)[hash(key, map->bits)];
    for (struct hlist_node *p = head->first; p; p = p->next) {
        struct hash_key *kn = container_of(p, struct hash_key, node);
        if (kn->key == key)
            return kn;
    }
    return NULL;
}

函式的第 2 行使用 hash function 快速前往資料所在的 hlist ，接著 3 到 7 行走訪該 hlist ，一個一個比對是否為我們要尋找的節點，若找不到則回傳 NULL 。

以上功能正確運作的一個必要條件是 key 的唯一性，否則可能會找到 key 值相同的另一個節點，使得後續取出的 data 不如預期。

這裡頭所使用的 hash function 定義為

#define GOLDEN_RATIO_32 0x61C88647
static inline unsigned int hash(unsigned int val, unsigned int bits) {
    /* High bits are more random, so use them. */
    return (val * GOLDEN_RATIO_32) >> (32 - bits);
}

目前不太清楚為何選用 golden ratio 為乘數。

這裡使用了 Fibonacci hashing ，根據 wikipedia ，這樣的 hash 方法可以使結果均勻分配在 table space 中。

或許可以用程式測看看它的分佈

至於前面的 macro define 怎麼來呢？首先， golden ratio 在數學上為

φ = (\sqrt{5} + 1) / 2 \approx 1.618033989

，我們可以計算出其倒數

φ^{- 1} = (\sqrt{5} - 1) / 2 \approx 0.618033989

恰好是原本的小數部份。

linux/hash.h 的註解中引註了一篇文章 Linux Kernel Hash Table Behavior: Analysis and Improvements ，在其 5.1 節給出

$(\sqrt{5} - 1) / 2 = 0.618033989$

$2654435761 / 4294967296 = 0.618033987$

這個 4294967296 顯然是 32 位無號數的最大值加 1 ，也就是

2^{32}

。

這代表我們將輸入的 val 乘上

φ^{- 1}

幾乎等同於乘上 2654435761 並右移 32 ，然後我們需要左移 bits 位以充分利用 hashtable 的 size 。

然而 hash.h 裡定義的 golden_ratio_32 是 0x61C88647 也就是 1640531527 ，並不是文章中的 2654435761 ，由於 0x61C88647 的二進制最高位是 0 ，所以我以為這個差異不是來自有無號轉換。
查詢了網路上的討論，得到答案說這其實還是跟有號數、無號數有關，我整理如下

signed	unsigned	binary	hex
1640531527	1640531527	01100001110010001000011001000111	0x61C88647
-1640531527	2654435769	10011110001101110111100110111001	0x9E3779B9

所以兩種相乘的答案雖然會以截然不同的面貌存在記憶體中，但就二補數的電腦數值系統而言，它們所表達的數「值」是相同，只差了 sign ，也因此並不影響 hash 的 distribution 。

另外雖然 2654435769 並不是文章中給出的 2654435761 ，但是它更接近於我們期望的 0.618033989 ：

(gdb) p (double)2654435769/4294967296
$2 = 0.6180339886341244
(gdb) p (double)2654435761/4294967296
$3 = 0.61803398677147925

map_get()

void *map_get(map_t *map, int key)
{
    struct hash_key *kn = find_key(map, key);
    return kn ? kn->data : NULL;
}

該函式去 map 中尋找是否有符合 key 值的 entry ，若有便將該 entry 的 data 讀出來。

map_add()

















void map_add(map_t *map, int key, void *data)
{
    struct hash_key *kn = find_key(map, key);
    if (kn)
        return;

    kn = malloc(sizeof(struct hash_key));
    kn->key = key, kn->data = data;

    struct hlist_head *h = &map->ht[hash(key, map->bits)];
    struct hlist_node *n = &kn->node, *first = h->first;
    AAA;
    if (first)
        first->pprev = &n->next;
    h->first = n;
    BBB;
}

首先 3 至 7 行直接到 map 當中查找，若已存在該 key 則不增添新節點，反之進入新增節點的程序。

以下舉例說明新增節點的過程：

Suppose

hash (key) = 1

假設 [1] 原為

現在我們希望將 kn 所指到的節點放進這條 list ，最便利的方法是直接塞在最前方

圖中黑色的連線已由程式碼第 14 及 15 行完成，因此另外兩條藍色的連線即為我們在 AAA 和 BBB 需要達成的功能。

map_deinit()




























void map_deinit(map_t *map)
{
    if (!map)
        return;

    for (int i = 0; i < MAP_HASH_SIZE(map->bits); i++) {
        struct hlist_head *head = &map->ht[i];
        for (struct hlist_node *p = head->first; p;) {
            struct hash_key *kn = container_of(p, struct hash_key, node);
            struct hlist_node *n = p;
            p = p->next;

            if (!n->pprev) /* unhashed */
                goto bail;

            struct hlist_node *next = n->next, **pprev = n->pprev;
            *pprev = next;
            if (next)
                next->pprev = pprev;
            n->next = NULL, n->pprev = NULL;

        bail:
            free(kn->data);
            free(kn);
        }
    }
    free(map);
}

map_deinit() 的功能為刪除掉整個 hash table ，其運作流程大致如下圖

接著圖解內層迴圈的行為，假設程式在某個時候執行到第 8 行

在程式執行到第 21 行時會變成

百思不解什麼情況下會出現第 13 行的 condition ，或許與多執行緒有關？

看 Linux 核心原始程式碼 list.h 對於 hlist_unhashed 的註解:
/**
 * hlist_unhashed - Has node been removed from list and reinitialized?
 * @h: Node to be checked
 *
 * Not that not all removal functions will leave a node in unhashed
 * state.  For example, hlist_nulls_del_init_rcu() does leave the
 * node in unhashed state, but hlist_nulls_del() does not.
 */
然後再翻閱核心文件關於 RCU 對於 reader 的行為

Image Not Showing Possible Reasons
The image file may be corrupted
The server hosting the image is unavailable
The image path is incorrect
The image format is not supported
Learn More →

jserv

twoSum()

























int *twoSum(int *nums, int numsSize, int target, int *returnSize)
{
    map_t *map = map_init(10);
    *returnSize = 0;
    int *ret = malloc(sizeof(int) * 2);
    if (!ret)
        goto bail;

    for (int i = 0; i < numsSize; i++) {
        int *p = map_get(map, target - nums[i]);
        if (p) { /* found */
            ret[0] = i, ret[1] = *p;
            *returnSize = 2;
            break;
        }

        p = malloc(sizeof(int));
        *p = i;
        map_add(map, nums[i], p);
    }

bail:
    map_deinit(map);
    return ret;
}

程式運作邏輯可大致理解如下：

成功配對出 target 值則達成目標，否則將現值加入 hash table 再繼續往下尋找。

此處使用 hashtable 帶來的主要好處是加快搜尋舊資料的速度，避免一一確認帶來的

O (n)

低效。

探討 kernel 中的 hashtable 實作

本題範例與 Linux kernel 中所使用的 hash 結構相同，最大的區別在於 kernel 中有為了多使用者並行讀寫的情境設計了另一組 API ，同時在此種情境下的 hashtable 被稱為 rcu enabled hashtable 。

RCU 的主要概念是將更新動作拆分成移除與再生，原因在於在並行的情境下，若更新在 reader 使用資料時發生，則可能會導致 reader 讀到更新一半的資料，發生非預期的後果； RCU 機制使用 read-side critical section ， reader 只可能讀到完全舊的資料抑或是完全新的資料，如此便不致發生讀取錯誤。

What is RCU? – “Read, Copy, Update”

測驗 2















struct ListNode *deleteDuplicates(struct ListNode *head)
{
    if (!head)
        return NULL;

    if (COND1) {
        /* Remove all duplicate numbers */
        while (COND2)
            head = head->next;
        return deleteDuplicates(head->next);
    }

    head->next = deleteDuplicates(head->next);
    return head;
}

在遞迴的實作中，程式先確認 head 與 head->next 是否 duplicate ，若有則進入 while 迴圈刪除這組 duplicate ，此外就繼續遞迴下去。

例如：

經過 while 後會變成

由於第 10 行的遞迴參數是 head->next ，因此它會對 b 呼叫遞迴，如此一來便移除了所有重複的 a 。

COND1 及 COND2 都是 duplicate 的判斷式

    head->next && head->val == head->next->val

撰寫非遞迴版本

struct ListNode* deleteDuplicates(struct ListNode* head){
    if (!head)
        return NULL;
    
    for (struct ListNode **it = &head; *it;) {
        if((*it)->next && (*it)->val == (*it)->next->val) {
            while((*it)->next && (*it)->val == (*it)->next->val)
                *it = (*it)->next;
            *it = (*it)->next;
        } else {
            it = &(*it)->next;
        }
    }
    return head;
}

測驗 3

LRUCache

這支程式主要是用 linked list 來實現 cache 的運作，並使用 mod 運算來映射對應的 cache 位置。

cache 的本體結構宣告為

typedef struct {
    int capacity, count;
    struct list_head dhead, hheads[];
} LRUCache;

cache 的 creation 如下










LRUCache *lRUCacheCreate(int capacity)
{   
    LRUCache *obj = malloc(sizeof(*obj) + capacity * sizeof(struct list_head));
    obj->count = 0;
    obj->capacity = capacity;
    INIT_LIST_HEAD(&obj->dhead);
    for (int i = 0; i < capacity; i++)
        INIT_LIST_HEAD(&obj->hheads[i]);
    return obj;
}

大致上就是配置空間和初始化，唯一需要留意的是第 3 行的語法可以依據 capacity 的值動態配置 hheads[] 的大小。

每個載有資料的節點有結構：

typedef struct {
    int key, value;
    struct list_head hlink, dlink;
} LRUNode;

比較特別的是節點有 hlink 與 dlink 兩個用於 linked list 的成員，稍後會解釋為何如此。

CacheGet()












int lRUCacheGet(LRUCache *obj, int key)
{
    LRUNode *lru;
    int hash = key % obj->capacity;
    MMM2 (lru, &obj->hheads[hash], hlink) {
        if (lru->key == key) {
            list_move(&lru->dlink, &obj->dhead);
            return lru->value;
        }
    }
    return -1;
}

第 4 行由輸入 key 值對應 cache 的位置。

第 5 行開始明顯是個用來尋找 node 的迴圈，觀察參數的數量與型別，可以推測 MM2 應為 list_for_each_entry() 。

程式在 obj->hheads[hash] list 中尋找，找到之後的行為是回傳 value 並將 dlink 移到 obj->dhead 前方，如果找不到則回傳 -1 。我們可以發現 obj->hheads[hash] 這條 list 都沒有被改變。

得知 obj->hheads[] 結構是 array of struct list_head ，每一個 element 拖著一條 list ，其中的 node 藉由 hlink 成員與對應的 list 互動。所有被放到 cache 的資料都會以節點形式存在 obj->hheads[] 的某處，且可以用近乎

O (1)

的速度快速被找到。

For some variable

hash

這裡用的都是 list.h 裡一般的 list ，所以 prev 全部是 direct pointer 。

除了 hheads[] 以外，我們的 cache 額外 maintain 了一條 list dhead 。由於 obj->dhead 這條 list 在每次節點被 access 時都會更動，我們可以猜想它就是用來指示 recently used 的標的。因為資料每次被 access 就會被移到最前方，所以節點位置越靠後表示越久沒被 access 。而每個節點是用 dlink 來與 obj->dhead 串列互動， hlink 與 dlink 各自獨立、互不干擾。這就像是一筆資料有兩個分身，一個在 hheads[] 中，另一個在 dhead 中。

Example for obj->capacity=3 :

此外值得注意的是第 7 行，藉由 lru->dlink 的方式快速找到目標節點在 dhead 中的分身，我們就不必再於 dhead 走訪一次。

CachePut()

以上圖為例，當又有一筆資料要放進 cache 的時候，節點 a 的兩個分身都會被踢出，因為它在 dhead 這條串列的尾巴，如此功能便是在 CachePut() 實現。

























void lRUCachePut(LRUCache *obj, int key, int value)
{
    LRUNode *lru;
    int hash = key % obj->capacity;
    MMM3 (lru, &obj->hheads[hash], hlink) {
        if (lru->key == key) {
            list_move(&lru->dlink, &obj->dhead);
            lru->value = value;
            return;
        }
    }

    if (obj->count == obj->capacity) {
        lru = MMM4(&obj->dhead, LRUNode, dlink);
        list_del(&lru->dlink);
        list_del(&lru->hlink);
    } else {
        lru = malloc(sizeof(LRUNode));
        obj->count++;
    }
    lru->key = key;
    list_add(&lru->dlink, &obj->dhead);
    list_add(&lru->hlink, &obj->hheads[hash]);
    lru->value = value;
}

此函式概略運作邏輯如圖：

第 5 行依據 key 尋找節點是否已存在，因此 MM3 顯然也是 list_for_each_entry() 。

根據流程圖可以分成 3 個分支討論

(yes)
(no, yes)
(no, no)

(yes)

如果有找到的話就更新 dlink (recently used) ，並修改節點的 value 值。

(no, yes)

使用 obj->count 來計數是否已達 cache 容量上限。

在空間已滿的情況下，我們的 policy 是移除 least recently used 的節點，也就是 dhead 的尾節點。

為了避免 free() 之後又馬上 malloc() ，我們可以回收利用舊節點空間。首先我們得要找到舊節點，取得尾節點可以使用 API list_last_entry() ，於是我們解出了 MM4 。

接著得將節點從對應的 hhead[] 和 dhead 中移出，填入新資料之後依據新的 key 重新串入 hheads[hash] 並移到 dhead 最前方。

(no, no)

這個分支也相對簡單，就是為資料建構新節點，並更新 hheads[hash], dhead 及 count 。

CacheFree()

void lRUCacheFree(LRUCache *obj)
{       
    LRUNode *lru, *n;
    MMM1 (lru, n, &obj->dhead, dlink) {
        list_del(&lru->dlink);
        free(lru);
    }
    free(obj); 
}

將 obj 清空的函式， MM1 為 list_for_each_entry_safe() ，此 API 有註解

iterate over list entries and allow deletes

注意到因為所有進到 cache 節點必定有在 dhead 的分身，所以我們不需要去走訪 hheads[] 陣列，只要清空 dhead 串列就可以釋放完所有的 LRUNode 。

且因 hheads[] 是隨著 obj 動態配置而來，所以我們不應該執行

    free(hheads);

只要呼叫

    free(obj);

便能一勞永逸。

缺點探討

在 hlink 所屬的串列上，我們會進行的操作就是

走訪
新增
走訪後移除

並沒有特別需要 access 尾節點的需求，所以我認為我們沒有必要使用環狀串列，取而代之可以使用 kernel 中 hlist 風格 (如測驗 1 ) 的串列，它更便利於從 list 中移除節點。

而 dlink 所屬的串列則因頭尾操作的需要，維持 kernel list 風格。

LRU in Linux kernel

對應本題，我找到了 linux/lru_cache.h 。

其中定義了兩個資料結構

struct lc_element {
	struct hlist_node colision;
	struct list_head list;		 /* LRU list or free list */
	unsigned refcnt;
	/* back "pointer" into lc_cache->element[index],
	 * for paranoia, and for "lc_element_to_index" */
	unsigned lc_index;
	/* if we want to track a larger set of objects,
	 * it needs to become an architecture independent u64 */
	unsigned lc_number;
	/* special label when on free list */
#define LC_FREE (~0U)

	/* for pending changes */
	unsigned lc_new_number;
};

這裡的 colision 成員相當於我們的 hlink ，使用了 hlist 的結構，與我的想法相符； list 成員相當於 dlink 。

lc 應是 lru_cache 的縮寫。

不確定 element 是怎麼存放資料的，是用內嵌結構或是用 unsigned 儲存指標訊息

struct lru_cache {
	/* the least recently used item is kept at lru->prev */
	struct list_head lru;
	struct list_head free;
	struct list_head in_use;
	struct list_head to_be_changed;

	/* the pre-created kmem cache to allocate the objects from */
	struct kmem_cache *lc_cache;

	/* size of tracked objects, used to memset(,0,) them in lc_reset */
	size_t element_size;
	/* offset of struct lc_element member in the tracked object */
	size_t element_off;

	/* number of elements (indices) */
	unsigned int nr_elements;
	/* Arbitrary limit on maximum tracked objects. Practical limit is much
	 * lower due to allocation failures, probably. For typical use cases,
	 * nr_elements should be a few thousand at most.
	 * This also limits the maximum value of lc_element.lc_index, allowing the
	 * 8 high bits of .lc_index to be overloaded with flags in the future. */
#define LC_MAX_ACTIVE	(1<<24)

	/* allow to accumulate a few (index:label) changes,
	 * but no more than max_pending_changes */
	unsigned int max_pending_changes;
	/* number of elements currently on to_be_changed list */
	unsigned int pending_changes;

	/* statistics */
	unsigned used; /* number of elements currently on in_use list */
	unsigned long hits, misses, starving, locked, changed;

	/* see below: flag-bits for lru_cache */
	unsigned long flags;


	void  *lc_private;
	const char *name;

	/* nr_elements there */
	struct hlist_head *lc_slot;
	struct lc_element **lc_element;
};

在 lru_cache.h 中，節點通常不會真的被刪除，而是不斷被移到 lru_cache 裡的各個不同串列，見這段註解

 * .list is on one of three lists:
 *  in_use: currently in use (refcnt > 0, lc_number != LC_FREE)
 *     lru: unused but ready to be reused or recycled
 *          (lc_refcnt == 0, lc_number != LC_FREE),
 *    free: unused but ready to be recycled
 *          (lc_refcnt == 0, lc_number == LC_FREE),
 *
 * an element is said to be "in the active set",
 * if either on "in_use" or "lru", i.e. lc_number != LC_FREE.

藉此可以推測 refcnt 的功能是紀錄該 element 的 user 數量，原意可能是 reference counter 。若沒有 user 使用則會被移到 lru 或 free 。

為了了解這些函式，得再前往 lru_cache.c

lc_get 的實作函式

static struct lc_element *__lc_get(struct lru_cache *lc, unsigned int enr, unsigned int flags)
{
	struct lc_element *e;

	PARANOIA_ENTRY();
	if (lc->flags & LC_STARVING) {
		++lc->starving;
		RETURN(NULL);
	}

	e = __lc_find(lc, enr, 1);
	/* if lc_new_number != lc_number,
	 * this enr is currently being pulled in already,
	 * and will be available once the pending transaction
	 * has been committed. */
	if (e) {
		if (e->lc_new_number != e->lc_number) {
			/* It has been found above, but on the "to_be_changed"
			 * list, not yet committed.  Don't pull it in twice,
			 * wait for the transaction, then try again...
			 */
			if (!(flags & LC_GET_MAY_USE_UNCOMMITTED))
				RETURN(NULL);
			/* ... unless the caller is aware of the implications,
			 * probably preparing a cumulative transaction. */
			++e->refcnt;
			++lc->hits;
			RETURN(e);
		}
		/* else: lc_new_number == lc_number; a real hit. */
		++lc->hits;
		if (e->refcnt++ == 0)
			lc->used++;
		list_move(&e->list, &lc->in_use); /* Not evictable... */
		RETURN(e);
	}
	/* e == NULL */

	++lc->misses;
	if (!(flags & LC_GET_MAY_CHANGE))
		RETURN(NULL);

	/* To avoid races with lc_try_lock(), first, mark us dirty
	 * (using test_and_set_bit, as it implies memory barriers), ... */
	test_and_set_bit(__LC_DIRTY, &lc->flags);

	/* ... only then check if it is locked anyways. If lc_unlock clears
	 * the dirty bit again, that's not a problem, we will come here again.
	 */
	if (test_bit(__LC_LOCKED, &lc->flags)) {
		++lc->locked;
		RETURN(NULL);
	}

	/* In case there is nothing available and we can not kick out
	 * the LRU element, we have to wait ...
	 */
	if (!lc_unused_element_available(lc)) {
		__set_bit(__LC_STARVING, &lc->flags);
		RETURN(NULL);
	}

	/* It was not present in the active set.  We are going to recycle an
	 * unused (or even "free") element, but we won't accumulate more than
	 * max_pending_changes changes.  */
	if (lc->pending_changes >= lc->max_pending_changes)
		RETURN(NULL);

	e = lc_prepare_for_change(lc, enr);
	BUG_ON(!e);

	clear_bit(__LC_STARVING, &lc->flags);
	BUG_ON(++e->refcnt != 1);
	lc->used++;
	lc->pending_changes++;

	RETURN(e);
}