Try   HackMD

2021q1 Homework6 (quiz6)

contributed by < Julian-Chu >

tags: linux2021

GitHub

Part 1: big number

data structure

/* how large the underlying array size should be */
#define UNIT_SIZE 4

/* These are dedicated to UNIT_SIZE == 4 */
#define UTYPE uint32_t
#define UTYPE_TMP uint64_t
#define FORMAT_STRING "%.08x"
#define MAX_VAL ((UTYPE_TMP) 0xFFFFFFFF)

#define BN_ARRAY_SIZE (128 / UNIT_SIZE) /* size of big-numbers in bytes */

/* bn consists of array of TYPE */
struct bn { UTYPE array[BN_ARRAY_SIZE]; };

單個陣列元素可以紀錄的最大值爲 0xFFFFFFFF(unsigned 32-bit int), overflow 會進位到下一個陣列元素, 注意 index 0 為最小位數

以陣列長度爲三的 big number 進位範例:

[0FFFFFFFF  0x00000000 0x00000000 ]+1 = 
[0X00000000  0x00000001  0x00000000]

[0FFFFFFFF  0xFFFFFFFF 0x00000001 ]+1 = 
[0X00000000  0x00000000  0x00000002]

init

// 一般陣列的初始化
static inline void bn_init(struct bn *n) {
    for (int i = 0; i < BN_ARRAY_SIZE; ++i)
        n->array[i] = 0;
}

// 從 uint64 直接初始化
static inline void bn_from_int(struct bn *n, UTYPE_TMP i) {
    bn_init(n);

    /* FIXME: what if machine is not little-endian? */
    n->array[0] = i;
    /* bit-shift with U64 operands to force 64-bit results */
    UTYPE_TMP tmp = i >> 32;
    n->array[1] = tmp;
}

值得注意的是 bn_from_int 的部分, 直接傳入 uint64 初始化, 因為 uint64 的位元數爲 uint32 的兩倍, 所以單個 bn 的陣列元素不能完整紀錄資料, 需要利用位元計算把資料分別記錄到兩個陣列元素中, 流程如下

(讀取到 CPU 之後, 非記憶體空間)
       i = 0x12345678 87654321
array[0] =          0x87654321 
   i>>32 = 0x00000000 12345678
array[1] =          0x12345678
array = [0x87654321, 0x12345678, 0, ......]

針對 little or big endian 更改為下面程式碼加入判斷

static inline void bn_from_int(struct bn *n, UTYPE_TMP i) {
    bn_init(n);

    /* FIXME: what if machine is not little-endian? */
    uint32_t x = 1;
    if(*(char*)(&x) == 1){
        n->array[0] = i;
        /* bit-shift with U64 operands to force 64-bit results */
        UTYPE_TMP tmp = i >> 32;
        n->array[1] = tmp;
    }else{
        n->array[1] = i;
        /* bit-shift with U64 operands to force 64-bit results */
        UTYPE_TMP tmp = i >> 32;
        n->array[0] = tmp;
    }
}

原理: 由較長資料型別轉型為短資料型別會保留記憶體位置較低的位元
示意如下:

uint32 x = 10
(記憶體位置      low --> high     )
little endian: 0x0A 0x00 0x00 0x00
big endian:    0x00 0x00 0x00 0x0A
轉型成 char
little endian: 0x0A
big endian:    0x00

關於 big-endian 和 little-endian 的判斷改為條件編譯即可,即善用 __BYTE_ORDER 巨集。

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →
jserv

應用 __BYTE_ORDER 的版本

static inline void bn_from_int(struct bn *n, UTYPE_TMP i) {    
    bn_init(n);    
    
#if __BYTE_ORDER__ == __ORDER_LITTLE_ENDIAN__    
        n->array[0] = i;    
        UTYPE_TMP tmp = i >> 32;    
        n->array[1] = tmp;    
#else    
        n->array[1] = i;    
        UTYPE_TMP tmp = i >> 32;    
        n->array[0] = tmp;    
#endif    
} 

印出 string

bn_to_str

while ((j >= 0) && (nbytes > (i + 1))) { sprintf(&str[i], FORMAT_STRING, n->array[j]); i += (2 * UNIT_SIZE); /* step UNIT_SIZE hex-byte(s) forward in the string */ j -= 1; /* step one element back in the array */ }

由於每個陣列元素可以紀錄的最大值爲 4,294,967,295 (0xFFFFFFFF)以十六進位或是二進位爲單位做進位計算, 並不是十進位, 無法靠單純循序走訪陣列元素印出十進位表示。

限定每個陣列元素最大值爲 1,000,000,000 可以較為簡單的印出十進位, 但對空間的利用效率不佳

加減1運算

bn_dec

/* Decrement: subtract 1 from n */
static void bn_dec(struct bn *n) {
    for (int i = 0; i < BN_ARRAY_SIZE; ++i) {
        UTYPE tmp = n->array[i];
        UTYPE res = tmp - 1;
        n->array[i] = res;

        if(!(res>tmp))break; //COND;
    }
}
一般情況不需要退位
array = [0x00000002, 0x00000001, ......] 
-1
array = [0x00000001, 0x00000001, ......] 

需要退位的情況
array = [0x00000000, 0x00000001, ......]
-1
array = [0xFFFFFFFF, 0x00000000, ......]

在需要退位的情況, 需要退位的陣列元素爲零, 減一之後會出現類似 uint32 溢位的情況, 反而比減一之前的值還要高(res>tmp), 所以走訪陣列元素做減法運算, 直到該元素減法後的值比原先的小, 代表退位結束。

bn_add

static void bn_add(struct bn *a, struct bn *b, struct bn *c) {
    int carry = 0;
    for (int i = 0; i < BN_ARRAY_SIZE; ++i) {
        UTYPE_TMP tmp = (UTYPE_TMP) a->array[i] + b->array[i] + carry;
        carry = (tmp > MAX_VAL);
        c->array[i] = (tmp & MAX_VAL);
    }
}

注意進位的單位是 MAX_VAL, 同時利用 & 運算計算 uint32 範圍內要保留的值

bn 加減法還是會遇到溢位的問題, 新增回傳錯誤碼作爲檢查或是有更好的方式?

Just do it!

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →
jserv

乘法運算

bn_mul

#define UTYPE uint32_t
#define UTYPE_TMP uint64_t
static void bn_mul(struct bn *a, struct bn *b, struct bn *c) {
    struct bn row, tmp;
    bn_init(c);

    for (int i = 0; i < BN_ARRAY_SIZE; ++i) {
        bn_init(&row);

        for (int j = 0; j < BN_ARRAY_SIZE; ++j) {
            if (i + j < BN_ARRAY_SIZE) {
                bn_init(&tmp);
                UTYPE_TMP intermediate = a->array[i]*(UTYPE_TMP) b->array[j]; // III;
                bn_from_int(&tmp, intermediate);
                lshift_unit(&tmp, i + j);
                bn_add(&tmp, &row, &row);
            }
        }
        bn_add(c, &row, c);
    }
}

static inline void lshift_unit(struct bn *a, int n_units) {
    int i;
    /* Shift whole units */
    for (i = (BN_ARRAY_SIZE - 1); i >= n_units; --i)
        a->array[i] = a->array[i - n_units];
    /* Zero pad shifted units */
    for (; i >= 0; --i)
        a->array[i] = 0;
}

利用了兩個 uint32 相乘之值, 不會超過 uint64 的情況, 將 uint32 轉為 uint64 做乘法運算 (intermeidate), 在根據相乘時的 index 對結果做對應的位移處理

UTYPE_TMP intermediate = a->array[i]*(UTYPE_TMP) b->array[j];
uint32 轉 uint64 的乘法相關規則如下

C11 6.3.1.1

  • The rank oflong long intshall be greater than the rank oflong int, whichshall be greater than the rank ofint, which shall be greater than the rank ofshortint, which shall be greater than the rank ofsigned char.
  • The rank of any unsigned integer type shall equal the rank of the correspondingsigned integer type, if any.

C11 6.3.1.8 Usual arithmetic conversions

Otherwise, if both operands have signed integer types or both have unsignedinteger types, the operand with the type of lesser integer conversion rank isconverted to the type of the operand with greater rank


Part 2: hash table

data structure

struct hlist_node { struct hlist_node *next, **pprev; };
struct hlist_head { struct hlist_node *first; };
typedef struct { int bits; struct hlist_head *ht; } map_t;

struct hash_key {
    int key;
    void *data;
    struct hlist_node node;
};

#define MAP_HASH_SIZE(bits) (1 << bits)
  • 應用到 quiz2 測驗 1 提到的在 Linux 核心中 通用型 doubly-linked list 的實作技巧
  • hash table 在 quiz2 測驗 4 的 string interning 也有用到(搭配單向鏈表, 不同於此題的雙向鏈表)
  • bits 爲 hash table bucket 數量的 power of two, 類似的實作可見 quiz3: xs in heap 以及 quiz5: vector in heap
  • pprev 其實可以從本次實作移除, 設想是為了給未實作的 map_delete/map_remove 使用
    pprev 爲 hlist_head 跟 hlist_node 設計上的一環, 並非 設計給 hashmap, 請見list and hlist
  • 應對不同資料, 使用 void *data

list and hlist

linux 對雙向鏈表的實作有兩種類型

  • list_head
  • hlist_head + hlist_head
struct list_head {
	struct list_head *next, *prev;
};

struct hlist_head {
	struct hlist_node *first;
};

struct hlist_node {
	struct hlist_node *next, **pprev;
};

/*
 * Double linked lists with a single pointer list head.
 * Mostly useful for hash tables where the two pointer list head is
 * too wasteful.
 * You lose the ability to access the tail in O(1).
 */

先閱讀以上註解, 由於在 hashtable 的 head 只需要 next, 如果採用一般的 list_head, 同時會需要紀錄 next 與 prev, 導致佔用的記憶體空間變成兩倍, 所以設計了 hlist_head 作為 hashtable 使用的 bucket 節省空間, 這就出現了一個問題, 如果使用 list_head 作為節點的話, first 節點的prev 需要額外判斷跟處理, 這邊 Linux 核心設計了一個 hlist_node, 這個設計中 next 是一般指針, 而特別的是 pprev 是指針的指針, 用以指向前一個節點的 next, 這樣就可以把 hlist_head 的 first 當作 hlist_node 的 next 類比操作







structs



head

hlist_head

first



node1

hlist_node

pprev

next



head:first->node1:w





node1:pprev->head:s





node2

hlist_node

pprev

next



node1:next->node2:w





node2:pprev->node1:s





來看一下 hlist delete 的例子

static inline void __hlist_del(struct hlist_node *n)
{
	struct hlist_node *next = n->next;
	struct hlist_node **pprev = n->pprev;

	WRITE_ONCE(*pprev, next);
	if (next)
		next->pprev = pprev;
}

以刪除 node1 為例,

//next 指向 node2
struct hlist_node *next = n->next; 
// *pprev 指向 first(node1)
struct hlist_node **pprev = n->pprev;






structs



head

head

first



node1

node1

pprev

next



head:first->node1:w





node1:pprev->head:s





node2

node2

pprev

next



node1:next->node2:w





node2:pprev->node1:s





// 利用 WRITE_ONCE 直接對 *pprev 賦值 node2
// 現在 *pprev 指向 first(node2)
WRITE_ONCE(*pprev, next);






structs



head

head

first



node2

node2

pprev

next



head:s->node2:w





node1

node1

pprev

next



node1:pprev->head:e





node1:next->node2:e





node2:pprev->node1:e





    // node2 存在的情況m, 把 node2 的 pprev 更新
	if (next)
		next->pprev = pprev;






structs



head

head

first



node2

node2

pprev

next



head:first->node2:w





node1

node1

pprev

next



node1:pprev->head:w





node1:next->node2:n





node2:pprev->head:s





hlist_add_head

static inline void hlist_add_head(struct hlist_node *n, struct hlist_head *h)
{
	struct hlist_node *first = h->first;
	n->next = first;
	if (first)
		first->pprev = &n->next;
	h->first = n;
	n->pprev = &h->first;
}






structs



head

head

first



node2

node2

pprev

next



head:first->node2:w





node1

node1

pprev

next



node2:pprev->head:s





// first 指向 node2 
n->next = first;






structs



head

head

first



node2

node2

pprev

next



head:first->node2:w





node1

node1

pprev

next



node1:next->node2





node2:pprev->head:s





if (first)
    first->pprev = &n->next;






structs



head

head

first



node2

node2

pprev

next



head:first->node2:w





node1

node1

pprev

next



node1:next->node2:hlist_node





node2:pprev->node1:e





​h->first = n;






structs



head

head

first



node1

node1

pprev

next



head:first->node1:w





node2

node2

pprev

next



node1:next->node2:hlist_node





node2:pprev->node1:s





​n->pprev = &h->first;






structs



head

head

first



node1

node1

pprev

next



head:first->node1:w





node1:pprev->head:s





node2

node2

pprev

next



node1:next->node2:hlist_node





node2:pprev->node1:s





  • pointer to pointer 的理解還需加強
  • 節省 hashtable 記憶體空間設計的巧思

q1: list.h 跟 hashtable.h 看起來都不是 thread-safe, Linux 核心怎麼解決同步問題?

initialization and finalization

map_init : 較為單純, 依據給定的 bits 分配對應的記憶體空間, 別忘記對 first 初始化 NULL, 避免對誤判空 bucket 跟錯誤的記憶體位置操作

map_deinit: 特別注意需要針對每個非空的 bucket 走訪所有節點釋放記憶體空間

void map_deinit(map_t *map)
{
    if (!map)
        return;

    for (int i = 0; i < MAP_HASH_SIZE(map->bits); i++) {
        struct hlist_head *head = &map->ht[i];
        //p 不爲 NULL 的情況需要釋放記憶體, NULL 的情況, bucket 已空
        for (struct hlist_node *p = head->first; p;) {
            struct hash_key *kn = container_of(p, struct hash_key, node);
            struct hlist_node *n = p;
            p = p->next;

            if (!n->pprev) /* unhashed */
                goto bail;

            struct hlist_node *next = n->next, **pprev = n->pprev;
            *pprev = next;
            if (next)
                next->pprev = pprev;
            n->next = NULL, n->pprev = NULL;

        bail:
            free(kn->data);
            free(kn);
        }
    }
    free(map);
}

hash and find key

由於 hash table 不能有重複, 所以快速尋找是否已經有同一個key 存在 table 裡面是很非常重要的

#define GOLDEN_RATIO_32 0x61C88647
static inline unsigned int hash(unsigned int val, unsigned int bits) {
    /* High bits are more random, so use them. */
    return (val * GOLDEN_RATIO_32) >> (32 - bits);
}

// hash function 找到對應的 bucket
// 在走訪節點找到對應的 key
static struct hash_key *find_key(map_t *map, int key) {
    struct hlist_head *head = &(map->ht)[hash(key, map->bits)];
    for (struct hlist_node *p = head->first; p; p = p->next) {
        struct hash_key *kn = container_of(p, struct hash_key, node);
        if (kn->key == key)
            return kn;
    }
    return NULL;
}

針對 hash table 的 hash function (區別於 SHA 或是 MD5 , 主要是將數值映射到對應的 bucket) , 可以從以下幾個觀點考量:

  • hash collision 的機率(雜湊值的隨機性), 影響桶內的鏈表的長度 => 走訪的效率
  • bucket 的使用率 => 空間使用率
  • hash function 本身的計算效率
  • 無法從雜湊值回推原值(加解密用, 應該不是 hash table 的考量點)

從 The Art of Computer Programming vol3. 6.4, Dr.Knuth 提到兩種 hash function (bit shift 也是常見的, 不過沒有在這邊提到)

  • based on division
  • based on multiplication(multiplicative hashing)

書中描述 multiplicative hashing, 可以等價於 mod, 且在多數的 CPU 乘法都比除法快( 特例是 mod (power of 2) 可以轉爲位元運算), 此外對於非隨機性的值像是 "part1 part2 part3 " 可以減少碰撞機率

hash.h 註解的 paper 也提到

Our multiplicative hash functions were derived from [Knuth], p. 513ff. The theory posits that machine multiplication by a large number that is likely to cause overflow is the same as finding the modulus by a different number.

利用 gold ratio 做 multiplicative hashing 的稱為 Fibonacci hashing(也是書中提出來的例子), 被上述的 paper 驗證過

而根據 ref 裡面一篇討論 Fibonacci hashing 的 blog, 作者認為 power of two binary and 雖然可以有較快的速度, 但是由於會捨棄掉 low bits , 在沒有留意的狀況可能會造成較高 hash collision

add key

void map_add(map_t *map, int key, void *data)
{    
    struct hash_key *kn = find_key(map, key);    
    if (kn)    
        return;        
    
    kn = malloc(sizeof(struct hash_key));    
    kn->key = key, kn->data = data;    
    
    struct hlist_head *h = &map->ht[hash(key, map->bits)];    
    struct hlist_node *n = &kn->node, *first = h->first;    
    n->next = first;    
    if (first)    
        first->pprev = &n->next;    
    h->first = n;    
    n->pprev = &h->first;    
}  

新節點插入 hlist_head.first, 需要將原本的 first 節點接到新節點的後面, 同時將新節點的 pprev 接回 head.first

get key

透過 find_key 取得結果

ref:


Part 3: fiber

相關術語: cooperative multi-tasking, nonpreemptive multi-tasking, user-level threads, green threads, coroutine

歧義: fiber 在不同系統上可能可以有不同的定義, 像是 user thread, 這邊採用英文 Wikipedia 上的定義, 非搶占式/協作式的 thread, 由於是非搶佔式的所以目前的 thread 沒有停止工作, 其他 thread 也不會競爭工作跟資源, 在這種情況下 synchronization 的問題就會大大減少

data structure

/* The maximum number of fibers that can be active at once. */
#define MAX_FIBERS 10

/* The size of the stack for each fiber. */
#define FIBER_STACK (1024 * 1024)

typedef struct {
    pid_t pid;   /* The pid of the child thread as returned by clone */
    void *stack; /* The stack pointer */
} fiber_t;

/* The fiber "queue" */
static fiber_t fiber_list[MAX_FIBERS];

/* The pid of the parent process */
static pid_t parent;

/* The number of active fibers */
static int num_fibers = 0;
  • 需要紀錄 parent id 跟 child process id, 判斷目前的 process
  • 需要自行分配 stack (參照 clone 說明)

init

void fiber_init()
{
    for (int i = 0; i < MAX_FIBERS; ++i)
        fiber_list[i].pid = 0, fiber_list[i].stack = 0;
    parent = getpid();
}

建立 fiber

struct fiber_args {
    void (*func)(void);
};

static int fiber_start(void *arg)
{
    struct fiber_args *args = (struct fiber_args *) arg;
    void (*func)() = args->func;
    free(args);

    func();
    return 0;
}

/* Creates a new fiber, running the func that is passed as an argument. */
int fiber_spawn(void (*func)(void))
{
    if (num_fibers == MAX_FIBERS)
        return FIBER_MAXFIBERS;

    if ((fiber_list[num_fibers].stack = malloc(FIBER_STACK)) == 0)
        return FIBER_MALLOC_ERROR;

    struct fiber_args *args;
    if ((args = malloc(sizeof(*args))) == 0) {
        free(fiber_list[num_fibers].stack);
        return FIBER_MALLOC_ERROR;
    }
    args->func = func;

    fiber_list[num_fibers].pid = clone(
        fiber_start, (char *)fiber_list[num_fibers].stack + FIBER_STACK,
        SIGCHLD | CLONE_FS | CLONE_FILES | CLONE_SIGHAND | CLONE_VM, args);
    if (fiber_list[num_fibers].pid == -1) {
        free(fiber_list[num_fibers].stack);
        free(args);
        return FIBER_CLONE_ERROR;
    }

    num_fibers++;
    return FIBER_NOERROR;
}

與 quiz4: thread pool 的 apply 相似, 傳入 function pointer 與對應的參數以建立 fiber ( 對比 worker in thread pool), 但 stack 的分配跟指定須由我們處理(請見 clone)
clone:

  • 與 fork 相同,都是建立新的 child process, 但是可以提供更多的共享資源設定
  • fork 是從 fork 開始的點建立新的 process, clone 需要傳入 function pointer 對, function pointer 建立新的 process, 這點上與 pthread 相近
  • 由於 clone 出來的 child process 可能會與 parent process 共用記憶體空間, 所以需要自己建立 stack 空間後傳入位置(需要自己 malloc 跟 free, 類似 heap, 因為生命週期的關係與一般函數呼叫會自行建立跟釋放的 stack 不同), 且注意 stack 的使用是由高記憶體往低擴張

These system calls create a new ("child") process, in a manner similar to fork(2).

By contrast with fork(2), these system calls provide more precise control over what pieces of execution context are shared between the calling process and the child process. For example, using these system calls, the caller can control whether or not the two processes share the virtual address space, the table of file descriptors, and the table of signal handlers. These system calls also allow the new child process to be placed in separate namespaces(7).

When the child process is created with the clone() wrapper function, it commences execution by calling the function pointed to by the argument fn. (This differs from fork(2), where execution continues in the child from the point of the fork(2) call.) The arg argument is passed as the argument of the function fn.

When the fn(arg) function returns, the child process terminates. The integer returned by fn is the exit status for the child process. The child process may also terminate explicitly by calling exit(2) or after receiving a fatal signal.

The stack argument specifies the location of the stack used by the child process. **Since the child and calling process may share memory, it is not possible for the child process to execute in the same stack as the calling process. The calling process must therefore set up memory space for the child stack and pass a pointer to this space to clone(). Stacks grow downward on all processors that run Linux (except the HP PA processors), so stack usually points to the topmost address of the memory space set up for the child stack. Note that clone() does not provide a means whereby the caller can inform the kernel of the size of the stack area.

另外是 flag mask 的部分, 用來設定 parent/child processes 間的共享資源( e.g. signal, memory space, file etc.)

SIGCHLD

The child termination signal
When the child process terminates, a signal may be sent to the parent. The termination signal is specified in the low byte of flags (clone()) or in cl_args.exit_signal (clone3()). If this signal is specified as anything other than SIGCHLD, then the parent process must specify the __WALL or __WCLONE options when waiting for the child with wait(2). If no signal (i.e., zero) is specified, then the parent process is not signaled when the child terminates.

CLONE_SIGHAND

CLONE_SIGHAND (since Linux 2.0)
If CLONE_SIGHAND is set, the calling process and the child process share the same table of signal handlers. If the calling process or child process calls sigaction(2) to change the behavior associated with a signal, the behavior is changed in the other process as well. However, the calling process and child processes still have distinct signal masks and sets of pending signals. So, one of them may block or unblock signals using sigprocmask(2) without affecting the other process.

If CLONE_SIGHAND is not set, the child process inherits a copy of the signal handlers of the calling process at the time of the clone call. Calls to sigaction(2) performed later by one of the processes have no effect on the other process.

Since Linux 2.6.0, the flags mask must also include
CLONE_VM if CLONE_SIGHAND is specified

CLONE_VM

CLONE_VM (since Linux 2.0)
If CLONE_VM is set, the calling process and the child process run in the same memory space. In particular, memory writes performed by the calling process or by the child process are also visible in the other process. Moreover, any memory mapping or unmapping performed with mmap(2) or munmap(2) by the child or calling process also affects the other process.

If CLONE_VM is not set, the child process runs in a separate copy of the memory space of the calling process at the time of the clone call. Memory writes or file mappings/unmappings performed by one of the processes do not affect the other, as with fork(2).

If the CLONE_VM flag is specified and the CLONE_VFORK flag is not specified, then any alternate signal stack that was established by sigaltstack(2) is cleared in the child process.

fiber 切換工作

/* Yield control to another execution context */
void fiber_yield()
{
    /* move the current process to the end of the process queue. */
    sched_yield();
}

由於 fiber 為非搶佔式的 thread, 必須由目前的 thread/process 放棄控制權, 讓其他人可以拿到資源, 需要yield 讓出目前的執行, 從序列中拿出下一個工作, 並把目前的工作放到序列的最後, 等待下次的執行。所以 fiber 切換時機有兩種情況:

  1. 呼叫 yield 讓出執行權
  2. 工作結束

sched_yield

sched_yield() causes the calling thread to relinquish the CPU.
The thread is moved to the end of the queue for its static
priority
and a new thread gets to run.

In the Linux implementation, sched_yield() always succeeds.

If the calling thread is the only thread in the highest priority
list at that time, it will continue to run after a call to
sched_yield().

POSIX systems on which sched_yield() is available define
_POSIX_PRIORITY_SCHEDULING in <unistd.h>.

Strategic calls to sched_yield() can improve performance by giving other threads or processes a chance to run when (heavily) contended resources (e.g., mutexes) have been released by the caller. Avoid calling sched_yield() unnecessarily or
inappropriately (e.g., when resources needed by other schedulable
threads are still held by the caller), since doing so will result
in unnecessary context switches
, which will degrade system
performance.

等待 fiber 執行結束

/* Execute the fibers until they all quit. */
int fiber_wait_all()
{
    /* Check to see if we are in a fiber, since we do not get signals in the
     * child threads
     */
    pid_t pid = getpid();
    if (pid != parent)
        return FIBER_INFIBER;

    /* Wait for the fibers to quit, then free the stacks */
    while (num_fibers > 0) {
        if ((pid = wait(0)) == -1)
            exit(1);

        /* Find the fiber, free the stack, and swap it with the last one */
        for (int i = 0; i < num_fibers; ++i) {
            if (fiber_list[i].pid == pid) {
                free(fiber_list[i].stack);
                if (i != --num_fibers)
                    fiber_list[i] = fiber_list[num_fibers];
                break;
            }
        }
    }

    return FIBER_NOERROR;
}

pid 在這段程式碼有兩種作用

  • 檢查目前程式是否在 parent process, 是的話就開始檢查是否 fiber 工作結束
  • 代表終止的 child process/fiber id (由 wait 回傳), 用以釋放已終止的 fiber 所使用的記憶體空間

wait

RETURN VALUE top
wait(): on success, returns the process ID of the terminated
child; on failure, -1 is returned.


interleaving

Fib(0) = 0
Fib(1) = 1
Fib(1 = 1
2) = 1
Fib(1 = 1
Fib(3) = 2
2 * 4) = 2 2 * 4) = 2 = 4
Fib(5) = 5
3 * 3 = 9
Fib(6) = 8
4 * 4 = 16
Fib(7) = 13
5 * 5 = 25
Fib(8) = 21
6 * 6 = 36
6 * 6 = 36
Fib(9) = 34
49
49
Fib(10) = 55
8 * 8 = 64
Fib(11) = 89
9 * 9 = 81
Fib(12) = 144
Fib(13) = 233
Fib(14) = 377

stdio.h(含printf) 都是 thread-safe, 猜測是 stdout buffer 的問題, 嘗試用 fflush(stdout) 仍出現 interleaveing。

解法: 移除 clone 內的 flag CLONE_VM, 使兩個 fiber 無法看到對方的記憶體空間, 還在思考原因爲何

q1: 雖然可以靠區分記憶體空間解決, 但是依然沒想到為什麼共用記憶體空間會產生這樣的問題?

ans: 思考上的誤區, 參考講座 Linux 核心設計: 不僅是個執行單元的 Process , 移除範例程式碼中 clone 的 CLONE_VM flag , 會建立獨立記憶體空間的 cild process 不會互相影響, 設定 CLONE_VM 即為共用記憶體空間的 lightweight process(thread), 即為多執行緒共用資源的問題, 所以解決方式:

  • 設定不共用記憶體空間(移除CLONE_VM), 建立獨立空間的 child process
  • 當作多執行緒的同步處理:
    • 使用 multiple producer - single consumer 來解決, 由單一worker 印出訊息
    • 使用 pthread_mutex
    • 不使用 pthread_mutex, 改用 futex
  • 利用 sudo taskset -c 7 ./fiber.o 綁定 CPU 變成單執行緒
  • pthread_mutex 底層是用 futex 實作
  • pthread_mutex 不需要搭配 pthread_create 使用

編譯錯誤警告排除

cc  -o fiber.o -g main.c
In file included from main.c:2:
fiber.h: In function ‘fiber_spawn’:
fiber.h:80:34: warning: implicit declaration of function ‘clone’; did you mean ‘close’? [-Wimplicit-function-declaration]
   80 |     fiber_list[num_fibers].pid = clone(
      |                                  ^~~~~
      |                                  close
fiber.h:82:19: error: ‘CLONE_FS’ undeclared (first use in this function)
   82 |         SIGCHLD | CLONE_FS | CLONE_FILES | CLONE_SIGHAND | CLONE_VM, args);
      |                   ^~~~~~~~
fiber.h:82:19: note: each undeclared identifier is reported only once for each function it appears in
fiber.h:82:30: error: ‘CLONE_FILES’ undeclared (first use in this function)
   82 |         SIGCHLD | CLONE_FS | CLONE_FILES | CLONE_SIGHAND | CLONE_VM, args);
      |                              ^~~~~~~~~~~
fiber.h:82:44: error: ‘CLONE_SIGHAND’ undeclared (first use in this function)
   82 |         SIGCHLD | CLONE_FS | CLONE_FILES | CLONE_SIGHAND | CLONE_VM, args);
      |                                            ^~~~~~~~~~~~~
fiber.h:82:60: error: ‘CLONE_VM’ undeclared (first use in this function)
   82 |         SIGCHLD | CLONE_FS | CLONE_FILES | CLONE_SIGHAND | CLONE_VM, args);
      |                                                            ^~~~~~~~
make: *** [Makefile:2: all] Error 1

摘自 man 2 clone 輸出:

#define _GNU_SOURCE
#include <sched.h>

注意要將 _GNU_SOURCE 放置於 #include <sched.h> 之前。

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →
jserv

謝謝老師, 我把這兩個放置在 fiber.h 導致編譯錯誤, 經提點後把這兩行加入至 main.c, 已排除警告

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →
Julian

ref:
Fibers, Oh My!
Quantifying The Cost of Context Switch