2022q1 Homework3 (fibdrv)

contributed by < 2020leon >

作業要求

自我檢查清單

研讀上述 Linux 效能分析的提示描述，在自己的實體電腦運作 GNU/Linux，做好必要的設定和準備工作
$\to$ 從中也該理解為何不希望在虛擬機器中進行實驗;
研讀上述費氏數列相關材料 (包含論文)，摘錄關鍵手法，並思考 clz / ctz 一類的指令對 Fibonacci 數運算的幫助。請列出關鍵程式碼並解說
複習 C 語言數值系統和 bitwise operation，思考 Fibonacci 數快速計算演算法的實作中如何減少乘法運算的成本;
研讀 KYG-yaya573142 的報告，指出針對大數運算，有哪些加速運算和縮減記憶體操作成本的舉措？
lsmod 的輸出結果有一欄名為 Used by，這是 "each module's use count and a list of referring modules"，但如何實作出來呢？模組間的相依性和實際使用次數 (reference counting) 在 Linux 核心如何追蹤呢？

搭配閱讀 The Linux driver implementer’s API guide » Driver Basics
注意到 fibdrv.c 存在著 DEFINE_MUTEX, mutex_trylock, mutex_init, mutex_unlock, mutex_destroy 等字樣，什麼場景中會需要呢？撰寫多執行緒的 userspace 程式來測試，觀察 Linux 核心模組若沒用到 mutex，到底會發生什麼問題。嘗試撰寫使用 POSIX Thread 的程式碼來確認。

作業要求

回答上述自我檢查清單的所有問題，需要附上對應的參考資料和必要的程式碼，以第一手材料 (包含自己設計的實驗) 為佳
Image Not Showing Possible Reasons
- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More →
如果你在 2022 年 3 月 1 日前，已從 GitHub sysprog21/fibdrv 進行 fork，請依據 Alternatives to forking into the same account 一文，對舊的 repository 做對應處置，然後重新 fork
在 GitHub 上 fork fibdrv，目標是修正 Fibonacci 數計算的正確性 (現有實作存在缺陷，請指出)，隨後改善 fibdrv 計算 Fibinacci 數列的執行效率，過程中需要量化執行時間，可在 Linux 核心模組和使用層級去測量
- 原本的程式碼只能列出到
  $F i b o n a c c i (100)$ 而且部分輸出是錯誤的數值，請修改程式碼，列出後方更多數值 (注意: 檢查正確性和數值系統的使用)
- 務必研讀上述 Linux 效能分析的提示的描述，降低效能分析過程中的干擾;
- 在 Linux 核心模組中，可用 ktime 系列的 API;
- 在 userspace 可用 clock_gettime 相關 API;
- 善用統計模型，除去極端數值，過程中應詳述你的手法
- 分別用 gnuplot 製圖，分析 Fibonacci 數列在核心計算和傳遞到 userspace 的時間開銷，單位需要用 us 或 ns (自行斟酌)
- 嘗試解讀上述實驗的時間分佈，特別是隨著 Fibonacci 數列增長後，對於 Linux 核心的影響。可修改 VFS 函式，允許透過寫入 /dev/fibonacci 裝置檔案來變更不同的 Fibonacci 求值的實作程式碼。
  Image Not Showing Possible Reasons
  The image file may be corrupted
  The server hosting the image is unavailable
  The image path is incorrect
  The image format is not supported
  Learn More →
用 printk 固然方便，但其執行時間易受各種因素而不穩定，除了透過統計來觀察，也可改用核心的 sysfs 介面，後者可讓我們自行宣告 show 和 store 介面來提供讀寫操作，可參見 Sample kobject implementation (注意: 切換到對應的 Linux 核心版本)。
逐步最佳化 Fibonacci 的執行時間，引入費氏數列分析提到的策略，並善用 clz / ffs 一類的指令 (Linux 核心有對應的包裝函式)，過程中都要充分量化
嘗試研讀 bignum (fibdrv 分支) 的實作，理解其中的技巧並與你的實作進行效能比較，探討個中值得取鏡之處。
- 原理和分析可見 KYG-yaya573142 的報告
當
$F i b (n)$ 隨著
$n$ 越來越大時，由 Linux 核心傳遞字串形式的十進位就成為效能瓶頸，研讀〈Integer Encoding Algorithm 筆記〉所提及的策略，在 Linux 核心模組先對數值進行編碼和壓縮，以二進位方式傳遞到使用者層級後，再由應用程式予以顯示
$F i b (n)$ 數值
- 儘量降低由核心傳遞到應用程式的資料量
- 確保應用程式不需要過多的計算量才可重現 Fibonacci 數列

環境

$ uname -a
Linux leon-ubuntu-20 5.13.0-35-generic #40~20.04.1-Ubuntu SMP Mon Mar 7 09:18:32 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux

$ gcc --version     
gcc (Ubuntu 9.4.0-1ubuntu1~20.04) 9.4.0
Copyright (C) 2019 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

$ lscpu           
Architecture:                    x86_64
CPU op-mode(s):                  32-bit, 64-bit
Byte Order:                      Little Endian
Address sizes:                   36 bits physical, 48 bits virtual
CPU(s):                          4
On-line CPU(s) list:             0-3
Thread(s) per core:              1
Core(s) per socket:              4
Socket(s):                       1
NUMA node(s):                    1
Vendor ID:                       GenuineIntel
CPU family:                      6
Model:                           58
Model name:                      Intel(R) Core(TM) i5-3570 CPU @ 3.40GHz
Stepping:                        9
CPU MHz:                         1700.000
CPU max MHz:                     3800.0000
CPU min MHz:                     1600.0000
BogoMIPS:                        6784.21
Virtualization:                  VT-x
L1d cache:                       128 KiB
L1i cache:                       128 KiB
L2 cache:                        1 MiB
L3 cache:                        6 MiB
NUMA node0 CPU(s):               0-3

開發紀錄

修正可變長度陣列（ Variable-Length Array, VLA ）

原程式碼如下。

static long long fib_sequence(long long k)
{
    /* FIXME: C99 variable-length array (VLA) is not allowed in Linux kernel. */
    long long f[k + 2];

    f[0] = 0;
    f[1] = 1;

    for (int i = 2; i <= k; i++) {
        f[i] = f[i - 1] + f[i - 2];
    }

    return f[k];
}

其中宣告處可見 long long f[k + 2] ，其雖合乎 C99 標準，但其實有更好的實做可以避免 VLA 。修正後如下。

static long long fib_sequence(long long k)
{
    long long a = 0, b = 1;
    if (k <= 1)
        return k;
    for (long long i = k; i > 1; i--) {
        k = a + b;
        a = b;
        b = k;
    }
    return k;
}

利用 fast doubling 實做

直接以費波那契數的定義實做，時間複雜度為

O (n)

，若改以 fast doubling 的方式則可使時間複雜度降為

O (l o g n)

。

根據作業要求所給出的虛擬碼實做 fast doubling ，首次嘗試如下。

#define clz(x) __builtin_clzll(x)

static long long fib_sequence(long long k)
{
    long long a = 0, b = 1;
    int k_clz = clz(k);

    if (k <= 1)
        return k;
    k <<= k_clz;
    for (int i = sizeof(long long) * 8; i > k_clz; i--, k <<= 1) {
        long long t1 = a * (2 * b - a);
        long long t2 = b * b + a * a;
        a = t1;
        b = t2;
        if (k & (long long) ~(~0ULL >> 1)) {
            t1 = a + b;
            a = b;
            b = t1;
        }
    }
    return a;
}

在以上嘗試中，將 k 最高位的 1 對齊 MSB ，並逐次將 k 左移偵測最高位以實做 fast doubling 。

後來發現更為簡化的實做方式。

#define clz(x) __builtin_clzll(x)

static long long fib_sequence(long long k)
{
    if (k <= 1)
        return k;
    long long a = 0, b = 1;
    for (int mask = 1 << (sizeof(long long) * 8 - 1 - clz(k)); mask > 0;
         mask >>= 1) {
        long long t1 = a * (2 * b - a);
        b = b * b + a * a;
        a = t1;
        if (k & mask) {
            t1 = a + b;
            a = b;
            b = t1;
        }
    }
    return a;
}

以上實做中，不再將 k 進行位移。反之，以 mask 變數作為遮罩，逐次將 mask 右移以對 k 進行檢查。

最後則是為未來大數運算做準備。由於目前的程式碼在 fib_read 函式中直接回傳所算得之費波那契數，然而此機制無法應付數值超過 ssize_t 所能表示之最大值的情況。因此回歸 read 函式之用法，將所得之數放進 fib_read 函式的 buf 參數中。

原 fib_read 實做如下。

/* calculate the fibonacci number at given offset */
static ssize_t fib_read(struct file *file,
                        char *buf,
                        size_t size,
                        loff_t *offset)
{
    return (ssize_t) fib_sequence(*offset);
}

先修改 fib_read 。當 buf 的大小不足以放入 long long 整數時，回傳 -1 。

/* calculate the fibonacci number at given offset */
static ssize_t fib_read(struct file *file,
                        char *buf,
                        size_t size,
                        loff_t *offset)
{
    if (size >= sizeof(long long)) {
        *(long long *) buf = fib_sequence(*offset);
        return (ssize_t) size;
    }
    return -1;
}

再修改 client.c 關於 read 的部份。

 int main()
 {
     long long sz;
 
-    char buf[1];
+    char buf[sizeof(long long)];
     ...
     for (int i = 0; i <= offset; i++) {
         lseek(fd, i, SEEK_SET);
-        sz = read(fd, buf, 1);
+        sz = read(fd, buf, sizeof(buf));
         printf("Reading from " FIB_DEV
                " at offset %d, returned the sequence "
                "%lld.\n",
-               i, sz);
+               i, *(long long *) buf);
     }
 
     for (int i = offset; i >= 0; i--) {
         lseek(fd, i, SEEK_SET);
-        sz = read(fd, buf, 1);
+        sz = read(fd, buf, sizeof(buf));
         printf("Reading from " FIB_DEV
                " at offset %d, returned the sequence "
                "%lld.\n",
-               i, sz);
+               i, *(long long *) buf);
     }
 
     close(fd);
     return 0;
 }

實做大數

在此實做固定長度有號大數。新增 bignum.h ，定義 struct bignum 。

#define BN_ARRAY_SIZE 7

/* Little-endian */
struct bignum {
    uint32_t num[BN_ARRAY_SIZE];
    int32_t num_and_sign;
};

struct bignum 以 little endian 的方式儲存數值：最低位位於 num[0] ，最高位位於 num_and_sign 。在此選擇以 32 位元整數作為成員型態的原因是考慮在進行運算時可以 64 位元的變數儲存會溢位的數值。將最高位獨立為 int32_t 型態可避免在實做時過度轉型。此結構的數值範圍為

[- 2^{255}, 2^{255} - 1]

。

接下來實做基本大數運算於 bignum.c 。函式原型如下。

/* bignum = 0 */
void bignum_init(struct bignum *bignum);
void bignum_from_int(struct bignum *bignum, int64_t i);
void bignum_to_dec(const struct bignum *bignum, char *str, size_t size);

/* c = a + b */
void bignum_add(const struct bignum *a,
                const struct bignum *b,
                struct bignum *c);
/* c = a - b */
void bignum_sub(const struct bignum *a,
                const struct bignum *b,
                struct bignum *c);
/* c = a * b */
void bignum_mul(const struct bignum *a,
                const struct bignum *b,
                struct bignum *c);
/* c = a / b */
void bignum_div(const struct bignum *a,
                const struct bignum *b,
                struct bignum *c);
/* b = -a */
void bignum_neg(const struct bignum *a, struct bignum *b);
/* b = |a| */
void bignum_abs(const struct bignum *a, struct bignum *b);
/* b = a << 1 */
void bignum_shl1(const struct bignum *a, struct bignum *b, uint32_t lsb);
/* b = a >> 1 */
void bignum_shr1(const struct bignum *a, struct bignum *b, uint32_t msb);

實做後，需修改 Makefile 以將 bignum 導入模組及 client 中。

 ...
-TARGET_MODULE := fibdrv
+TARGET_MODULE := fibdrvmodule
 ...
+$(TARGET_MODULE)-objs := fibdrv.o bignum.o
 ...
-client: client.c
+client: client.c bignum.c
	$(CC) -o $@ $^

在編譯前，需了解到在 user mode 和 kernel mode 所引入的標頭檔不盡相同。可以藉由檢查 __KERNEL__ 巨集是否有被定義來決定所引入的標頭檔。如下。

#ifdef __KERNEL__
#include <linux/kernel.h>
#include <linux/module.h>
#include <linux/string.h>
#else
#include <stdio.h>
#include <string.h>
#endif

接著以 fast doubling 和大數實做費波那契數，最後稍稍修改 fib_read 及 client.c 。

static void fib_sequence(long long k, struct bignum *result)
{
    if (!result)
        return;
    if (k <= 1) {
        bignum_from_int(result, k);
        return;
    }
    struct bignum a, b;
    bignum_from_int(&a, 0);
    bignum_from_int(&b, 1);
    for (int mask = 1 << (sizeof(long long) * 8 - 1 - clz(k)); mask > 0;
         mask >>= 1) {
        struct bignum t;
        bignum_shl1(&b, &t, 0);
        bignum_sub(&t, &a, &t);
        bignum_mul(&t, &a, &t);

        bignum_mul(&b, &b, &b);
        bignum_mul(&a, &a, &a);
        bignum_add(&a, &b, &b);

        a = t;
        if (k & mask) {
            bignum_add(&a, &b, &t);
            a = b;
            b = t;
        }
    }
    *result = a;
}

最後執行 make check 發現無法通過測試，原因為 scripts/expected.txt 中大於 92 的費波那契數為錯誤的。修改後方可通過測試。

 Reading from /dev/fibonacci at offset 90, returned the sequence 2880067194370816120.
 Reading from /dev/fibonacci at offset 91, returned the sequence 4660046610375530309.
 Reading from /dev/fibonacci at offset 92, returned the sequence 7540113804746346429.
-Reading from /dev/fibonacci at offset 93, returned the sequence 7540113804746346429.
-Reading from /dev/fibonacci at offset 94, returned the sequence 7540113804746346429.
-Reading from /dev/fibonacci at offset 95, returned the sequence 7540113804746346429.
-Reading from /dev/fibonacci at offset 96, returned the sequence 7540113804746346429.
-Reading from /dev/fibonacci at offset 97, returned the sequence 7540113804746346429.
-Reading from /dev/fibonacci at offset 98, returned the sequence 7540113804746346429.
-Reading from /dev/fibonacci at offset 99, returned the sequence 7540113804746346429.
-Reading from /dev/fibonacci at offset 100, returned the sequence 7540113804746346429.
-Reading from /dev/fibonacci at offset 100, returned the sequence 7540113804746346429.
-Reading from /dev/fibonacci at offset 99, returned the sequence 7540113804746346429.
-Reading from /dev/fibonacci at offset 98, returned the sequence 7540113804746346429.
-Reading from /dev/fibonacci at offset 97, returned the sequence 7540113804746346429.
-Reading from /dev/fibonacci at offset 96, returned the sequence 7540113804746346429.
-Reading from /dev/fibonacci at offset 95, returned the sequence 7540113804746346429.
-Reading from /dev/fibonacci at offset 94, returned the sequence 7540113804746346429.
-Reading from /dev/fibonacci at offset 93, returned the sequence 7540113804746346429.
+Reading from /dev/fibonacci at offset 93, returned the sequence 12200160415121876738.
+Reading from /dev/fibonacci at offset 94, returned the sequence 19740274219868223167.
+Reading from /dev/fibonacci at offset 95, returned the sequence 31940434634990099905.
+Reading from /dev/fibonacci at offset 96, returned the sequence 51680708854858323072.
+Reading from /dev/fibonacci at offset 97, returned the sequence 83621143489848422977.
+Reading from /dev/fibonacci at offset 98, returned the sequence 135301852344706746049.
+Reading from /dev/fibonacci at offset 99, returned the sequence 218922995834555169026.
+Reading from /dev/fibonacci at offset 100, returned the sequence 354224848179261915075.
+Reading from /dev/fibonacci at offset 100, returned the sequence 354224848179261915075.
+Reading from /dev/fibonacci at offset 99, returned the sequence 218922995834555169026.
+Reading from /dev/fibonacci at offset 98, returned the sequence 135301852344706746049.
+Reading from /dev/fibonacci at offset 97, returned the sequence 83621143489848422977.
+Reading from /dev/fibonacci at offset 96, returned the sequence 51680708854858323072.
+Reading from /dev/fibonacci at offset 95, returned the sequence 31940434634990099905.
+Reading from /dev/fibonacci at offset 94, returned the sequence 19740274219868223167.
+Reading from /dev/fibonacci at offset 93, returned the sequence 12200160415121876738.
 Reading from /dev/fibonacci at offset 92, returned the sequence 7540113804746346429.
 Reading from /dev/fibonacci at offset 91, returned the sequence 4660046610375530309.
 Reading from /dev/fibonacci at offset 90, returned the sequence 2880067194370816120.

效能分析

根據作業要求，將可能會影響效能的因素排除。

# Disable ASLR
sudo sh -c "echo 0 > /proc/sys/kernel/randomize_va_space"

# Disable inline turbo
sudo sh -c "echo 1 > /sys/devices/system/cpu/intel_pstate/no_turbo"

# Scaling govenor
for i in /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor
do
    sudo sh -c "echo performance > ${i}"
done

接著再次改寫程式碼，將多個需比較的內容寫入 driver 中，並以 enum 來決定所要執行的程式碼區段。

#define FIBDRV_MODE_SIZE 5
enum fibdrv_mode {
    FIBDRV_BIGNUM_FAST,
    FIBDRV_BIGNUM_ORIG,
    FIBDRV_LL_FAST,
    FIBDRV_LL_ORIG,
    FIBDRV_TIME
};

改寫 fib_write ，使之成為使用者改變 driver mode 的函式。

static ssize_t fib_write(struct file *file,
                         const char *buf,
                         size_t size,
                         loff_t *offset)
{
    if (size == sizeof(mode) && *(enum fibdrv_mode *) buf < FIBDRV_MODE_SIZE) {
        if (copy_from_user(&mode, buf, sizeof(mode)) != 0)
            return -1;
    } else
        mode = FIBDRV_BIGNUM_FAST;
    return 1;
}

接著在 driver 中安插計時相關程式碼，所得之時間單位為奈秒。

ktime_t kt = ktime_get();
// codes
duration = ktime_sub(ktime_get(), kt); // duration is a global var

並在 user mode 中也安插計時相關程式碼。

struct timespec t0, t1;
clock_gettime(CLOCK_MONOTONIC, &t0);
// code
clock_gettime(CLOCK_MONOTONIC, &t1);
// to nanosecond
long long ut = (long long) (t1.tv_sec * 1e9 + t1.tv_nsec) -
                       (t0.tv_sec * 1e9 + t0.tv_nsec);

最後利用 gnuplot 將資料轉換為折線圖。

第一張圖為利用 bignum fast doubling 所得之數據。

第二張及第三張圖為根據不同的演算法所得之時間數據。

2022q1 Homework3 (fibdrv)

作業要求

自我檢查清單

作業要求

環境

開發紀錄

修正可變長度陣列（ Variable-Length Array, VLA ）

利用 fast doubling 實做

實做大數

效能分析

Read more

2022q1 Homework1 (lab0)

2022q1 Project (cpumemory)

2022q1 Homework3 (quiz3)

2022q1 Homework2 (quiz2)