Linux Fibonacci Driver

contributed by < Tonr01 >

開發環境

gcc (Ubuntu 11.3.0-1ubuntu1~22.04) 11.3.0

架構：                   x86_64
CPU 作業模式：            32-bit, 64-bit
Address sizes:          39 bits physical, 48 bits virtual
Byte Order:             Little Endian
CPU(s):                 16
On-line CPU(s) list:    0-15
供應商識別號：            GenuineIntel
Model name:             11th Gen Intel(R) Core(TM) i7-11800H @ 2.30GHz
    CPU 家族：           6
    型號：               141
    每核心執行緒數：       2
    每通訊端核心數：       8
    Socket(s):          1
    製程：               1
    CPU max MHz:        4600.0000
    CPU min MHz:        800.0000
    BogoMIPS:           4608.00

開發環境準備

自從 Linux 核心 4.4 版以來，Ubuntu Linux 預設開啟 EFI_SECURE_BOOT_SIG_ENFORCE，這使得核心模組需要適度的簽章才可掛載進入 Linux 核心，為了後續測試的便利，我們需要將 UEFI Secure Boot 的功能關閉，請見 Why do I get “Required key not available” when install 3rd party kernel modules or after a kernel upgrade?

首先先關閉 UEFI Secure Boot

$ sudo apt install mokutil
$ sudo mokutil --disable-validation

檢查 Linux 核心版本

$ uname -r
5.19.0-35-generic

安裝 linux-headers 套件

$ sudo apt install linux-headers-`uname -r`

確認 linux-headers 套件已正確安裝於開發環境，預期得到以下輸出:

$ dpkg -L linux-headers-5.4.0-66-generic | grep "/lib/modules"
/lib/modules
/lib/modules/5.19.0-35-generic
/lib/modules/5.19.0-35-generic/build

檢驗目前的使用者身份

$ whoami
tonr01

檢驗測試過程所需 sudo

$ sudo whoami
root

編譯並測試

$ make check

預期會看到綠色的 Passed [-] 字樣，隨後是

f(93) fail
input: 7540113804746346429
expected: 12200160415121876738

在 vscode 中 `fibdrv.c` 的標頭檔錯誤

直接將檔案從 clone 後，用 vscode 開啟，會有標頭檔錯誤，這裡參考 Shiritai 的方法。

Image Not Showing Possible Reasons

The image file may be corrupted
The server hosting the image is unavailable
The image path is incorrect
The image format is not supported

Learn More →

首先先確認 Linux 核心版本。

$ uname -r
5.19.0-35-generic

再確認 Linux kernel header，確認其安裝的路徑，可移動至 /usr/src 下查看。

$ ls /usr/src | grep linux-headers
linux-headers-5.19.0-32-generic
linux-headers-5.19.0-35-generic

再確認 gcc version

$ ls /usr/lib/gcc/x86_64-linux-gnu/
11

最後 IntelliSense 需要調整 C/C++ Extension 設定檔，在 vscode 界面按下 f1，輸入 Edit configuration (UI)，就能編輯 IntelliSense 組態，詳細改動放在 commit。

改動完，就無 error message 了

Image Not Showing Possible Reasons

The image file may be corrupted
The server hosting the image is unavailable
The image path is incorrect
The image format is not supported

Learn More →

費氏數列一般式

Image Not Showing Possible Reasons

The image file may be corrupted
The server hosting the image is unavailable
The image path is incorrect
The image format is not supported

Learn More →

費氏數列公式解

這個「公式解」一般認為由 Jacques P. M. Binet 在 1843 年發現，仍可追溯得更早。習慣上我們仍稱之為 Binet 等式 (Binet’s Formula)

Image Not Showing Possible Reasons

The image file may be corrupted
The server hosting the image is unavailable
The image path is incorrect
The image format is not supported

Learn More →

大多數的迷思認為這是最快的解法，為

O (1)

的公式解，但對於電腦的離散本質會有幾個問題

根號運算表達困難，計算完 sqrt(5) 之後，只能用一個近似的值來表達結果，其誤差會在 n 值變大時慢慢放大。
根號、除法、次方對於程式來說是高成本的

可運用 GMP 或 MPFR 這這兩個 GNU 函式庫是所謂的「無限」位數的整數跟「無限」精確度的浮點數，可參考 Fibonacci 快速解程式碼與公式解程式碼實作

Q-matrix

主要透過矩陣轉換之方法，我們把原本之遞迴式轉換到矩陣表示，並透過矩陣次方之Divide and Conquer Method 來做加速，解說短片: The Fibonacci Q-matrix。

Image Not Showing Possible Reasons

The image file may be corrupted
The server hosting the image is unavailable
The image path is incorrect
The image format is not supported

Learn More →

Fast-Doubling

Fast Doubling 為 Q-matrix 的改良版，主要再細分成兩種情況， n 為奇數或偶數的情況，可以降低遞迴次數，

Image Not Showing Possible Reasons

The image file may be corrupted
The server hosting the image is unavailable
The image path is incorrect
The image format is not supported

Learn More →

Fast Doubling 實作

作法

這裡參考 Fast Doubling 的手法，先用 63 - __builtin_clzl(k) 去找出不包含 leading 0s 的真正數字，再用 bit-mask 的方式取檢驗目前 bit 為0或1。

以 k = 50 為例，其二進位表示為

110010_{2}

00.....00 110010
          ^
          6
          
count = 63 - __builtin_clzl(k) = 63 - 58 = 5
mask = 1 << count = 100000

round	k	mask	k & mask	odd/even
1	110010	100000	100000	odd
2	110010	010000	010000	odd
3	110010	001000	000000	even
4	110010	000100	000000	even
5	110010	000010	000010	odd
6	110010	000001	000000	even

程式碼

/* Fast Doubling with clz */
static uint64_t fib_sequence(uint64_t k)
{   
    uint64_t f[2] = {0, 1};
    uint64_t count = 63 - __builtin_clz(k);

    for (unsigned int mask = 1 << count; mask; mask >>= 1) {
        uint64_t a = f[0] * (2 * f[1] - f[0]);   //F(2k+1) = F(k+1)^2+F(k)^2
        uint64_t b = f[0] * f[0] + f[1] * f[1];  //F(2k)   = F(k)[2F(k+1)−F(k)]

        if ((mask & k)) {
            f[0] = b;
            f[1] = a + b;
        } else {
            f[0] = a;
            f[1] = b;
        }
    }
    return f[0];
}

測試結果

可以輸出正確到 Fib(92)

Reading from /dev/fibonacci at offset 89, returned the sequence 1779979416004714189.
Reading from /dev/fibonacci at offset 90, returned the sequence 2880067194370816120.
Reading from /dev/fibonacci at offset 91, returned the sequence 4660046610375530309.
Reading from /dev/fibonacci at offset 92, returned the sequence 7540113804746346429.
Reading from /dev/fibonacci at offset 93, returned the sequence 7540113804746346429.
Reading from /dev/fibonacci at offset 94, returned the sequence 7540113804746346429.

測試效能差距

這裡引入 ktime_t 來測量輸出上一次 fib 的執行時間，先使用老師的參考範例來測試，因為 fib_write 未使用，所以使用這個函式計算時間。

/* ktime_t to calculate fib time */
static ktime_t kt;

static uint64_t fib_time_proxy(uint64_t k, int n)
{
    kt = ktime_get();
    uint64_t result = fib_sequence(k, n);
    kt = ktime_sub(ktime_get(), kt);

    return result;
}

/* Calculate the runtime */
static ssize_t fib_write(struct file *file,
                         const char *buf,
                         size_t size,
                         loff_t *offset)
{
    fib_time_proxy(*offset, n);
    return ktime_to_ns(kt);
}

再不影響原本的 client.c ，這裡多加入了一個新的使用者模式，叫做 runtime.c ，主要用來輸出時間的資料，並修改 Makefile 來執行程式，詳細改動在 commit，將輸出的時間資料用 gnuplot 去繪製圖片。

後續改善

static uint64_t fib_sequence(uint64_t k, int n)
{
    switch (n) {
    case 0:
        return fib_sequence_dp(k);
        break;
    default:
        return fib_sequence_fast_doubling(k);
        break;
    }
}

將 fib_sequence 改善成可以切換不同的實作。

static ssize_t fib_write(struct file *file,
                         const char *buf,
                         size_t size,
                         loff_t *offset)
{
    switch (size) {
    case 0:
        fib_time_proxy(*offset, 0); // Fib_sequence with dynamic programming
        break;
    default:
        fib_time_proxy(*offset, 1); // Fast Doubling with clz
        break;
    }
    return ktime_to_ns(kt);
}

並將 fib_write 改善成可以同時計算及輸出其 runtime。

/* Output the runtime */
    for (int i = 0; i <= offset; i++) {
        lseek(fd, i, SEEK_SET);
        long long sz1 = write(fd, write_buf, 0);
        long long sz2 = write(fd, write_buf, 1);
        printf("%d %lld %lld\n", i, sz1, sz2);
    }

在 runtime.c 中將兩種實作的 runtime 輸出到 txt 檔，方便圖的繪製，詳細改動在 commit。

引入大數運算並研究

參考初步支援大數運算，這邊引入 KYG-yaya573142 的實作來研究。

結構體 `bn`

/* number[size - 1] = msb, number[0] = lsb */
typedef struct _bn {
    unsigned int *number;
    unsigned int size;
    int sign;
} bn;

首先是結構體的部份，number 指向數值的部份，在後續使用會以陣列的方式存放數值，size 則是大數的記憶體大小，也就是陣列的大小，陣列的大小為 number[size] ，sign 存放的是數值的正負。

`bn_clz`

static int bn_clz(const bn *src)
{
    int cnt = 0;
    for (int i = src->size - 1; i >= 0; i--) {
        if (src->number[i]) {
            // prevent undefined behavior when src = 0
            cnt += __builtin_clz(src->number[i]);
            return cnt;
        } else {
            cnt += 32;
        }
    }
    return cnt;
}

大數版本的 clz ，主要用來計算 leading zero 的個數。

`bn_to_string`

char *bn_to_string(bn src)
{
    // log10(x) = log2(x) / log2(10) ~= log2(x) / 3.322
    size_t len = (8 * sizeof(int) * src.size) / 3 + 2 + src.sign;
    char *s = kmalloc(len, GFP_KERNEL);
    char *p = s;

    memset(s, '0', len - 1);
    s[len - 1] = '\0';

    for (int i = src.size - 1; i >= 0; i--) {
        for (unsigned int d = 1U << 31; d; d >>= 1) {
            /* binary -> decimal string */
            int carry = !!(d & src.number[i]);
            for (int j = len - 2; j >= 0; j--) {
                s[j] += s[j] - '0' + carry; // double it
                carry = (s[j] > '9');
                if (carry)
                    s[j] -= 10;
            }
        }
    }
    // skip leading zero
    while (p[0] == '0' && p[1] != '\0') { 
        p++;
    }
    if (src.sign)
        *(--p) = '-';
    memmove(s, p, strlen(p) + 1);
    return s;
}

因大數無法以數值的形式表示，故需要將其轉成字串，len 存放大數的 bit 個數，迴圈將二進位字串轉換成十進位字串。

`bn_add`, `bn_sub`

void bn_add(const bn *a, const bn *b, bn *c)
{
    if (a->sign == b->sign) { // both positive or negative
        bn_do_add(a, b, c);
        c->sign = a->sign;
    } else { // different sign
        if (a->sign)  // let a > 0, b < 0
            SWAP(a, b);
        int cmp = bn_cmp(a, b);
        if (cmp > 0) {
            /* |a| > |b| and b < 0, hence c = a - |b| */
            bn_do_sub(a, b, c);
            c->sign = 0;
        } else if (cmp < 0) {
            /* |a| < |b| and b < 0, hence c = -(|b| - |a|) */
            bn_do_sub(b, a, c);
            c->sign = 1;
        } else {
            /* |a| == |b| */
            bn_resize(c, 1);
            c->number[0] = 0;
            c->sign = 0;
        }
    }
}

加法的部份，首先要先判斷兩個大數的正負是否相同，相同的話直接加起來即可，若正負不同時，因為大數的結構體是將數值跟正負號的部份分開存放，所以要先判斷兩個數值的大小，做相減的動作，其結果在加上正負號。

static void bn_do_add(const bn *a, const bn *b, bn *c)
{
    // max digits = max(sizeof(a) + sizeof(b)) + 1
    int d = MAX(bn_msb(a), bn_msb(b)) + 1;
    d = DIV_ROUNDUP(d, 32) + !d;
    bn_resize(c, d); // round up, min size = 1

    unsigned long long int carry = 0;
    for (int i = 0; i < c->size; i++) {
        unsigned int tmp1 = (i < a->size) ? a->number[i] : 0;
        unsigned int tmp2 = (i < b->size) ? b->number[i] : 0;
        carry += (unsigned long long int) tmp1 + tmp2;
        c->number[i] = carry;
        carry >>= 32;
    }

    if (!c->number[c->size - 1] && c->size > 1)
        bn_resize(c, c->size - 1);
}

這裡用 DIV_ROUNDUP 來計算 c 的所需大小，其大小會滿足所需大小而不過於太多，使用 8 bytes 大小的 carry 來實行兩個 4 bytes 項目的加法來避免 overflow，因為每個 array 大小為 4 bytes，所以當相加完，carry 扣除右邊 4 bytes ，其餘都是進位的值，所以每回合 carry>>32 。

static void bn_do_sub(const bn *a, const bn *b, bn *c)
{
    // max digits = max(sizeof(a) + sizeof(b))
    int d = MAX(a->size, b->size);
    bn_resize(c, d);

    long long int carry = 0;
    for (int i = 0; i < c->size; i++) {
        unsigned int tmp1 = (i < a->size) ? a->number[i] : 0;
        unsigned int tmp2 = (i < b->size) ? b->number[i] : 0;
        carry = (long long int) tmp1 - tmp2 - carry;
        if (carry < 0) {
            c->number[i] = carry + (1LL << 32);
            carry = 1;
        } else {
            c->number[i] = carry;
            carry = 0;
        }
    }
    
    d = bn_clz(c) / 32;
    if (d == c->size)
        --d;
    bn_resize(c, c->size - d);
}

因為減法不會有 overflow 的問題，所以不須用 DIV_ROUNDUP ，這裡是做無號數減法，carry 與加法的不同，這裡是作為向前一位的借位，當相減的值小於 0 ，才須向前一位借位，最後在調整 c 的所需大小，將不需要的 0 去掉。

/* c = a - b 
 * Note: work for c == a or c == b
 */
void bn_sub(const bn *a, const bn *b, bn *c)
{
    /* xor the sign bit of b and let bn_add handle it */
    bn tmp = *b;
    tmp.sign ^= 1; // a - b = a + (-b)
    bn_add(a, &tmp, c);
}

再來是減法的部份，因為

a - b = a + (- b)

，所以先將負號給 b ，但是不能改變 b 的值，所以宣告 tmp 給予改變後的 b 值，最後將 a 與 tmp 做加法即可。

`bn_mult`

void bn_mult(const bn *a, const bn *b, bn *c)
{
    // max digits = sizeof(a) + sizeof(b))
    int d = bn_msb(a) + bn_msb(b);
    d = DIV_ROUNDUP(d, 32) + !d;  // round up, min size = 1
    bn *tmp;
    /* make it work properly when c == a or c == b */
    if (c == a || c == b) {
        tmp = c;  // save c
        c = bn_alloc(d);
    } else {
        tmp = NULL;
        bn_resize(c, d);
    }

    for (int i = 0; i < a->size; i++) {
        for (int j = 0; j < b->size; j++) {
            unsigned long long int carry = 0;
            carry = (unsigned long long int) a->number[i] * b->number[j];
            bn_mult_add(c, i + j, carry);
        }
    }
    c->sign = a->sign ^ b->sign;

    if (tmp) {
        bn_cpy(tmp, c);  // restore c
        bn_free(c);
    }
}

乘法採用簡單的 long multiplication ，若 c == a || c == b，就必須配置記憶體來儲存計算結果，避免 a 與 b 在計算途中就被改變。

static void bn_mult_add(bn *c, int offset, unsigned long long int x)
{
    unsigned long long int carry = 0;
    for (int i = offset; i < c->size; i++) {
        carry += c->number[i] + (x & 0xFFFFFFFF);
        c->number[i] = carry;
        carry >>= 32;
        x >>= 32;
        if (!x && !carry)  // done
            return;
    }
}

bn_mult_add 主要處理乘法運算時，每回合的計算結果累加。

`bn_lshift`, `bn_rshift`

void bn_lshift(const bn *src, size_t shift, bn *dest)
{
    size_t z = bn_clz(src);
    shift %= 32;  // only handle shift within 32 bits atm
    if (!shift)
        return;

    if (shift > z) {
        bn_resize(dest, src->size + 1);
    } else {
        bn_resize(dest, src->size);
    }
    /* bit shift */
    for (int i = src->size - 1; i > 0; i--)
        dest->number[i] =
            src->number[i] << shift | src->number[i - 1] >> (32 - shift);
    dest->number[0] = src->number[0] << shift;
}

主要做大數左移，這裡只考慮最多位移 31 bits 的情形，當 shift 大於 leading zero 個數時，會超出原本的大小限制，所以須將 size + 1，在重新配置記憶體大小。
dest->number[0] = src->number[0] << shift; 這行就是因為當超出原本大小時，可能會有 overflow 的問題，所以須將多出來的部份在放到最一開始新增的位置。

void bn_rshift(bn *src, size_t shift)
{
    size_t z = 32 - bn_clz(src);
    shift %= 32;  // only handle shift within 32 bits atm
    if (!shift)
        return;

    /* bit shift */
    for (int i = 0; i < (src->size - 1); i++)
        src->number[i] = src->number[i] >> shift | src->number[i + 1]
                                                       << (32 - shift);
    src->number[src->size - 1] >>= shift;

    if (shift >= z && src->size > 1)
        bn_resize(src, src->size - 1);
}

與左移類似，只考慮最多位移 31 bits 的情形，當 shift 大於 leading zero 個數時，則表示原本的 number[0] 的數值以全部右移到下一個陣列，故須重新調整大小，扣除不需要的空間。

Linux Fibonacci Driver

開發環境

開發環境準備

在 vscode 中 `fibdrv.c` 的標頭檔錯誤

參考相關資料

研讀費氏數列相關材料

費氏數列一般式

費氏數列公式解

Q-matrix

Fast-Doubling

Fast Doubling 實作

作法

程式碼

測試結果

測試效能差距

後續改善

引入大數運算並研究

結構體 `bn`

`bn_clz`

`bn_to_string`

`bn_add`, `bn_sub`

`bn_mult`

`bn_lshift`, `bn_rshift`

Linux Fibonacci Driver

開發環境

開發環境準備

在 vscode 中 fibdrv.c 的標頭檔錯誤

參考相關資料

研讀費氏數列相關材料

費氏數列一般式

費氏數列公式解

Q-matrix

Fast-Doubling

Fast Doubling 實作

作法

程式碼

測試結果

測試效能差距

後續改善

引入大數運算並研究

結構體 bn

bn_clz

bn_to_string

bn_add, bn_sub

bn_mult

bn_lshift, bn_rshift

Read more

Linux Linked-List Queue

2023q1 Linux核心設計-問題紀錄

2023q1 Homework6 (quiz5)

2023q1 Homework5 (Assessment)

在 vscode 中 `fibdrv.c` 的標頭檔錯誤

結構體 `bn`

`bn_clz`

`bn_to_string`

`bn_add`, `bn_sub`

`bn_mult`

`bn_lshift`, `bn_rshift`