Try   HackMD

2022q1 Homework3 (fibdrv)

contributed by < 2020leon >

作業要求

自我檢查清單

  • 研讀上述 Linux 效能分析的提示 描述,在自己的實體電腦運作 GNU/Linux,做好必要的設定和準備工作
    從中也該理解為何不希望在虛擬機器中進行實驗;
  • 研讀上述費氏數列相關材料 (包含論文),摘錄關鍵手法,並思考 clz / ctz 一類的指令對 Fibonacci 數運算的幫助。請列出關鍵程式碼並解說
  • 複習 C 語言 數值系統bitwise operation,思考 Fibonacci 數快速計算演算法的實作中如何減少乘法運算的成本;
  • 研讀 KYG-yaya573142 的報告,指出針對大數運算,有哪些加速運算和縮減記憶體操作成本的舉措?
  • lsmod 的輸出結果有一欄名為 Used by,這是 "each module's use count and a list of referring modules",但如何實作出來呢?模組間的相依性和實際使用次數 (reference counting) 在 Linux 核心如何追蹤呢?

    搭配閱讀 The Linux driver implementer’s API guide » Driver Basics

  • 注意到 fibdrv.c 存在著 DEFINE_MUTEX, mutex_trylock, mutex_init, mutex_unlock, mutex_destroy 等字樣,什麼場景中會需要呢?撰寫多執行緒的 userspace 程式來測試,觀察 Linux 核心模組若沒用到 mutex,到底會發生什麼問題。嘗試撰寫使用 POSIX Thread 的程式碼來確認。

作業要求

  • 回答上述自我檢查清單的所有問題,需要附上對應的參考資料和必要的程式碼,以第一手材料 (包含自己設計的實驗) 為佳

    Image Not Showing Possible Reasons
    • The image file may be corrupted
    • The server hosting the image is unavailable
    • The image path is incorrect
    • The image format is not supported
    Learn More →
    如果你在 2022 年 3 月 1 日前,已從 GitHub sysprog21/fibdrv 進行 fork,請依據 Alternatives to forking into the same account 一文,對舊的 repository 做對應處置,然後重新 fork

  • 在 GitHub 上 fork fibdrv,目標是修正 Fibonacci 數計算的正確性 (現有實作存在缺陷,請指出),隨後改善 fibdrv 計算 Fibinacci 數列的執行效率,過程中需要量化執行時間,可在 Linux 核心模組和使用層級去測量
    • 原本的程式碼只能列出到
      Fibonacci(100)
      而且部分輸出是錯誤的數值,請修改程式碼,列出後方更多數值 (注意: 檢查正確性和數值系統的使用)
    • 務必研讀上述 Linux 效能分析的提示 的描述,降低效能分析過程中的干擾;
    • 在 Linux 核心模組中,可用 ktime 系列的 API;
    • 在 userspace 可用 clock_gettime 相關 API;
    • 善用統計模型,除去極端數值,過程中應詳述你的手法
    • 分別用 gnuplot 製圖,分析 Fibonacci 數列在核心計算和傳遞到 userspace 的時間開銷,單位需要用 us 或 ns (自行斟酌)
    • 嘗試解讀上述實驗的時間分佈,特別是隨著 Fibonacci 數列增長後,對於 Linux 核心的影響。可修改 VFS 函式,允許透過寫入 /dev/fibonacci 裝置檔案來變更不同的 Fibonacci 求值的實作程式碼。
      Image Not Showing Possible Reasons
      • The image file may be corrupted
      • The server hosting the image is unavailable
      • The image path is incorrect
      • The image format is not supported
      Learn More →
  • 用 printk 固然方便,但其執行時間易受各種因素而不穩定,除了透過統計來觀察,也可改用核心的 sysfs 介面,後者可讓我們自行宣告 show 和 store 介面來提供讀寫操作,可參見 Sample kobject implementation (注意: 切換到對應的 Linux 核心版本)。
  • 逐步最佳化 Fibonacci 的執行時間,引入費氏數列分析 提到的策略,並善用 clz / ffs 一類的指令 (Linux 核心有對應的包裝函式),過程中都要充分量化
  • 嘗試研讀 bignum (fibdrv 分支) 的實作,理解其中的技巧並與你的實作進行效能比較,探討個中值得取鏡之處。
  • Fib(n)
    隨著
    n
    越來越大時,由 Linux 核心傳遞字串形式的十進位就成為效能瓶頸,研讀〈Integer Encoding Algorithm 筆記〉所提及的策略,在 Linux 核心模組先對數值進行編碼和壓縮,以二進位方式傳遞到使用者層級後,再由應用程式予以顯示
    Fib(n)
    數值
    • 儘量降低由核心傳遞到應用程式的資料量
    • 確保應用程式不需要過多的計算量才可重現 Fibonacci 數列

環境

$ uname -a
Linux leon-ubuntu-20 5.13.0-35-generic #40~20.04.1-Ubuntu SMP Mon Mar 7 09:18:32 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux

$ gcc --version     
gcc (Ubuntu 9.4.0-1ubuntu1~20.04) 9.4.0
Copyright (C) 2019 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

$ lscpu           
Architecture:                    x86_64
CPU op-mode(s):                  32-bit, 64-bit
Byte Order:                      Little Endian
Address sizes:                   36 bits physical, 48 bits virtual
CPU(s):                          4
On-line CPU(s) list:             0-3
Thread(s) per core:              1
Core(s) per socket:              4
Socket(s):                       1
NUMA node(s):                    1
Vendor ID:                       GenuineIntel
CPU family:                      6
Model:                           58
Model name:                      Intel(R) Core(TM) i5-3570 CPU @ 3.40GHz
Stepping:                        9
CPU MHz:                         1700.000
CPU max MHz:                     3800.0000
CPU min MHz:                     1600.0000
BogoMIPS:                        6784.21
Virtualization:                  VT-x
L1d cache:                       128 KiB
L1i cache:                       128 KiB
L2 cache:                        1 MiB
L3 cache:                        6 MiB
NUMA node0 CPU(s):               0-3

開發紀錄

修正可變長度陣列( Variable-Length Array, VLA )

原程式碼如下。

static long long fib_sequence(long long k)
{
    /* FIXME: C99 variable-length array (VLA) is not allowed in Linux kernel. */
    long long f[k + 2];

    f[0] = 0;
    f[1] = 1;

    for (int i = 2; i <= k; i++) {
        f[i] = f[i - 1] + f[i - 2];
    }

    return f[k];
}

其中宣告處可見 long long f[k + 2] ,其雖合乎 C99 標準,但其實有更好的實做可以避免 VLA 。修正後如下。

static long long fib_sequence(long long k)
{
    long long a = 0, b = 1;
    if (k <= 1)
        return k;
    for (long long i = k; i > 1; i--) {
        k = a + b;
        a = b;
        b = k;
    }
    return k;
}

利用 fast doubling 實做

直接以費波那契數的定義實做,時間複雜度為

O(n) ,若改以 fast doubling 的方式則可使時間複雜度降為
O(logn)

根據作業要求所給出的虛擬碼實做 fast doubling ,首次嘗試如下。

#define clz(x) __builtin_clzll(x)

static long long fib_sequence(long long k)
{
    long long a = 0, b = 1;
    int k_clz = clz(k);

    if (k <= 1)
        return k;
    k <<= k_clz;
    for (int i = sizeof(long long) * 8; i > k_clz; i--, k <<= 1) {
        long long t1 = a * (2 * b - a);
        long long t2 = b * b + a * a;
        a = t1;
        b = t2;
        if (k & (long long) ~(~0ULL >> 1)) {
            t1 = a + b;
            a = b;
            b = t1;
        }
    }
    return a;
}

在以上嘗試中,將 k 最高位的 1 對齊 MSB ,並逐次將 k 左移偵測最高位以實做 fast doubling 。

後來發現更為簡化的實做方式。

#define clz(x) __builtin_clzll(x)

static long long fib_sequence(long long k)
{
    if (k <= 1)
        return k;
    long long a = 0, b = 1;
    for (int mask = 1 << (sizeof(long long) * 8 - 1 - clz(k)); mask > 0;
         mask >>= 1) {
        long long t1 = a * (2 * b - a);
        b = b * b + a * a;
        a = t1;
        if (k & mask) {
            t1 = a + b;
            a = b;
            b = t1;
        }
    }
    return a;
}

以上實做中,不再將 k 進行位移。反之,以 mask 變數作為遮罩,逐次將 mask 右移以對 k 進行檢查。

最後則是為未來大數運算做準備。由於目前的程式碼在 fib_read 函式中直接回傳所算得之費波那契數,然而此機制無法應付數值超過 ssize_t 所能表示之最大值的情況。因此回歸 read 函式之用法,將所得之數放進 fib_read 函式的 buf 參數中。

fib_read 實做如下。

/* calculate the fibonacci number at given offset */
static ssize_t fib_read(struct file *file,
                        char *buf,
                        size_t size,
                        loff_t *offset)
{
    return (ssize_t) fib_sequence(*offset);
}

先修改 fib_read 。當 buf 的大小不足以放入 long long 整數時,回傳 -1

/* calculate the fibonacci number at given offset */
static ssize_t fib_read(struct file *file,
                        char *buf,
                        size_t size,
                        loff_t *offset)
{
    if (size >= sizeof(long long)) {
        *(long long *) buf = fib_sequence(*offset);
        return (ssize_t) size;
    }
    return -1;
}

再修改 client.c 關於 read 的部份。

 int main()
 {
     long long sz;
 
-    char buf[1];
+    char buf[sizeof(long long)];
     ...
     for (int i = 0; i <= offset; i++) {
         lseek(fd, i, SEEK_SET);
-        sz = read(fd, buf, 1);
+        sz = read(fd, buf, sizeof(buf));
         printf("Reading from " FIB_DEV
                " at offset %d, returned the sequence "
                "%lld.\n",
-               i, sz);
+               i, *(long long *) buf);
     }
 
     for (int i = offset; i >= 0; i--) {
         lseek(fd, i, SEEK_SET);
-        sz = read(fd, buf, 1);
+        sz = read(fd, buf, sizeof(buf));
         printf("Reading from " FIB_DEV
                " at offset %d, returned the sequence "
                "%lld.\n",
-               i, sz);
+               i, *(long long *) buf);
     }
 
     close(fd);
     return 0;
 }

實做大數

在此實做固定長度有號大數。新增 bignum.h ,定義 struct bignum

#define BN_ARRAY_SIZE 7

/* Little-endian */
struct bignum {
    uint32_t num[BN_ARRAY_SIZE];
    int32_t num_and_sign;
};

struct bignum 以 little endian 的方式儲存數值:最低位位於 num[0] ,最高位位於 num_and_sign 。在此選擇以 32 位元整數作為成員型態的原因是考慮在進行運算時可以 64 位元的變數儲存會溢位的數值。將最高位獨立為 int32_t 型態可避免在實做時過度轉型。此結構的數值範圍為

[2255,22551]

接下來實做基本大數運算於 bignum.c 。函式原型如下。

/* bignum = 0 */
void bignum_init(struct bignum *bignum);
void bignum_from_int(struct bignum *bignum, int64_t i);
void bignum_to_dec(const struct bignum *bignum, char *str, size_t size);

/* c = a + b */
void bignum_add(const struct bignum *a,
                const struct bignum *b,
                struct bignum *c);
/* c = a - b */
void bignum_sub(const struct bignum *a,
                const struct bignum *b,
                struct bignum *c);
/* c = a * b */
void bignum_mul(const struct bignum *a,
                const struct bignum *b,
                struct bignum *c);
/* c = a / b */
void bignum_div(const struct bignum *a,
                const struct bignum *b,
                struct bignum *c);
/* b = -a */
void bignum_neg(const struct bignum *a, struct bignum *b);
/* b = |a| */
void bignum_abs(const struct bignum *a, struct bignum *b);
/* b = a << 1 */
void bignum_shl1(const struct bignum *a, struct bignum *b, uint32_t lsb);
/* b = a >> 1 */
void bignum_shr1(const struct bignum *a, struct bignum *b, uint32_t msb);

實做後,需修改 Makefile 以將 bignum 導入模組及 client 中。

 ...
-TARGET_MODULE := fibdrv
+TARGET_MODULE := fibdrvmodule
 ...
+$(TARGET_MODULE)-objs := fibdrv.o bignum.o
 ...
-client: client.c
+client: client.c bignum.c
	$(CC) -o $@ $^

在編譯前,需了解到在 user mode 和 kernel mode 所引入的標頭檔不盡相同。可以藉由檢查 __KERNEL__ 巨集是否有被定義來決定所引入的標頭檔。如下。

#ifdef __KERNEL__
#include <linux/kernel.h>
#include <linux/module.h>
#include <linux/string.h>
#else
#include <stdio.h>
#include <string.h>
#endif

接著以 fast doubling 和大數實做費波那契數,最後稍稍修改 fib_readclient.c

static void fib_sequence(long long k, struct bignum *result)
{
    if (!result)
        return;
    if (k <= 1) {
        bignum_from_int(result, k);
        return;
    }
    struct bignum a, b;
    bignum_from_int(&a, 0);
    bignum_from_int(&b, 1);
    for (int mask = 1 << (sizeof(long long) * 8 - 1 - clz(k)); mask > 0;
         mask >>= 1) {
        struct bignum t;
        bignum_shl1(&b, &t, 0);
        bignum_sub(&t, &a, &t);
        bignum_mul(&t, &a, &t);

        bignum_mul(&b, &b, &b);
        bignum_mul(&a, &a, &a);
        bignum_add(&a, &b, &b);

        a = t;
        if (k & mask) {
            bignum_add(&a, &b, &t);
            a = b;
            b = t;
        }
    }
    *result = a;
}

最後執行 make check 發現無法通過測試,原因為 scripts/expected.txt 中大於 92 的費波那契數為錯誤的。修改後方可通過測試。

 Reading from /dev/fibonacci at offset 90, returned the sequence 2880067194370816120.
 Reading from /dev/fibonacci at offset 91, returned the sequence 4660046610375530309.
 Reading from /dev/fibonacci at offset 92, returned the sequence 7540113804746346429.
-Reading from /dev/fibonacci at offset 93, returned the sequence 7540113804746346429.
-Reading from /dev/fibonacci at offset 94, returned the sequence 7540113804746346429.
-Reading from /dev/fibonacci at offset 95, returned the sequence 7540113804746346429.
-Reading from /dev/fibonacci at offset 96, returned the sequence 7540113804746346429.
-Reading from /dev/fibonacci at offset 97, returned the sequence 7540113804746346429.
-Reading from /dev/fibonacci at offset 98, returned the sequence 7540113804746346429.
-Reading from /dev/fibonacci at offset 99, returned the sequence 7540113804746346429.
-Reading from /dev/fibonacci at offset 100, returned the sequence 7540113804746346429.
-Reading from /dev/fibonacci at offset 100, returned the sequence 7540113804746346429.
-Reading from /dev/fibonacci at offset 99, returned the sequence 7540113804746346429.
-Reading from /dev/fibonacci at offset 98, returned the sequence 7540113804746346429.
-Reading from /dev/fibonacci at offset 97, returned the sequence 7540113804746346429.
-Reading from /dev/fibonacci at offset 96, returned the sequence 7540113804746346429.
-Reading from /dev/fibonacci at offset 95, returned the sequence 7540113804746346429.
-Reading from /dev/fibonacci at offset 94, returned the sequence 7540113804746346429.
-Reading from /dev/fibonacci at offset 93, returned the sequence 7540113804746346429.
+Reading from /dev/fibonacci at offset 93, returned the sequence 12200160415121876738.
+Reading from /dev/fibonacci at offset 94, returned the sequence 19740274219868223167.
+Reading from /dev/fibonacci at offset 95, returned the sequence 31940434634990099905.
+Reading from /dev/fibonacci at offset 96, returned the sequence 51680708854858323072.
+Reading from /dev/fibonacci at offset 97, returned the sequence 83621143489848422977.
+Reading from /dev/fibonacci at offset 98, returned the sequence 135301852344706746049.
+Reading from /dev/fibonacci at offset 99, returned the sequence 218922995834555169026.
+Reading from /dev/fibonacci at offset 100, returned the sequence 354224848179261915075.
+Reading from /dev/fibonacci at offset 100, returned the sequence 354224848179261915075.
+Reading from /dev/fibonacci at offset 99, returned the sequence 218922995834555169026.
+Reading from /dev/fibonacci at offset 98, returned the sequence 135301852344706746049.
+Reading from /dev/fibonacci at offset 97, returned the sequence 83621143489848422977.
+Reading from /dev/fibonacci at offset 96, returned the sequence 51680708854858323072.
+Reading from /dev/fibonacci at offset 95, returned the sequence 31940434634990099905.
+Reading from /dev/fibonacci at offset 94, returned the sequence 19740274219868223167.
+Reading from /dev/fibonacci at offset 93, returned the sequence 12200160415121876738.
 Reading from /dev/fibonacci at offset 92, returned the sequence 7540113804746346429.
 Reading from /dev/fibonacci at offset 91, returned the sequence 4660046610375530309.
 Reading from /dev/fibonacci at offset 90, returned the sequence 2880067194370816120.

效能分析

根據作業要求,將可能會影響效能的因素排除。

# Disable ASLR
sudo sh -c "echo 0 > /proc/sys/kernel/randomize_va_space"

# Disable inline turbo
sudo sh -c "echo 1 > /sys/devices/system/cpu/intel_pstate/no_turbo"

# Scaling govenor
for i in /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor
do
    sudo sh -c "echo performance > ${i}"
done

接著再次改寫程式碼,將多個需比較的內容寫入 driver 中,並以 enum 來決定所要執行的程式碼區段。

#define FIBDRV_MODE_SIZE 5
enum fibdrv_mode {
    FIBDRV_BIGNUM_FAST,
    FIBDRV_BIGNUM_ORIG,
    FIBDRV_LL_FAST,
    FIBDRV_LL_ORIG,
    FIBDRV_TIME
};

改寫 fib_write ,使之成為使用者改變 driver mode 的函式。

static ssize_t fib_write(struct file *file,
                         const char *buf,
                         size_t size,
                         loff_t *offset)
{
    if (size == sizeof(mode) && *(enum fibdrv_mode *) buf < FIBDRV_MODE_SIZE) {
        if (copy_from_user(&mode, buf, sizeof(mode)) != 0)
            return -1;
    } else
        mode = FIBDRV_BIGNUM_FAST;
    return 1;
}

接著在 driver 中安插計時相關程式碼,所得之時間單位為奈秒。

ktime_t kt = ktime_get();
// codes
duration = ktime_sub(ktime_get(), kt); // duration is a global var

並在 user mode 中也安插計時相關程式碼。

struct timespec t0, t1;
clock_gettime(CLOCK_MONOTONIC, &t0);
// code
clock_gettime(CLOCK_MONOTONIC, &t1);
// to nanosecond
long long ut = (long long) (t1.tv_sec * 1e9 + t1.tv_nsec) -
                       (t0.tv_sec * 1e9 + t0.tv_nsec);

最後利用 gnuplot 將資料轉換為折線圖。

第一張圖為利用 bignum fast doubling 所得之數據。

第二張及第三張圖為根據不同的演算法所得之時間數據。