Try  HackMD Logo HackMD

2022q1 Homework3 (fibdrv)

contributed by <huang-me>

排除干擾效能分析的因素

  • 抑制 address space layout randomization (ASLR)

    ​​​​$ sudo sh -c "echo 0 > /proc/sys/kernel/randomize_va_space"
    
  • 設定 scaling_governor 為 performance。準備以下 shell script,檔名為 performance.sh:

    ​​​​for i in /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor
    ​​​​do
    ​​​​    echo performance > ${i}
    ​​​​done
    

    之後再用 $ sudo sh performance.sh 執行

  • 針對 Intel 處理器,關閉 turbo mode

    ​​​​$ sudo sh -c "echo 1 > /sys/devices/system/cpu/intel_pstate/no_turbo"
    

Fix variable-length array in fib_sequence

static long long fib_sequence(long long k)
{
    /* FIXME: C99 variable-length array (VLA) is not allowed in Linux kernel. */
    long long f[2] = {0, 1};

    for (int i = 2; i <= k; i++) {
        f[1] += f[0];
        f[0] = f[1] - f[0];
    }

    return k==0 ? 0 : f[1];
}
  • 因為永遠只需要前兩項的值,因此其實不需要這麼多記憶體位置記錄所有之前計算的值。

fast doubling

利用 F(n) 以及 F(n+1) 直接計算 F(2n) 之值。

F(2k)=F(k)[2F(k+1)F(k)]F(2k+1)=F(k+1)2+F(k)2

static long long fib_sequence(long long k)
{
    if (k < 2) /* F(0) = 0, F(1) = 1 */
        return k;

    long long f[2] = {0, 1};

    for (unsigned int i = 1U << 31; i; i >>= 1) {
        /* F(2k) = F(k) * [ 2 * F(k+1) – F(k) ] */
        long long k1 = f[0] * (f[1] * 2 - f[0]);
        /* F(2k+1) = F(k)^2 + F(k+1)^2 */
        long long k2 = f[0] * f[0] + f[1] * f[1];
        if (k & i) {
            f[0] = k2;      /* F(2k+1) */
            f[1] = k1 + k2; /* F(2k+2) */
        } else {
            f[0] = k1; /* F(2k) */
            f[1] = k2; /* F(2k+1) */
        }
    }
    return f[0];
}
  • 原本使用 iterative 的方式計算 Fibonacci number 的時間複雜度爲 O(n) 的時間複雜度。
  • 使用 fast doubling 因為無論 n 的大小都只需要做計算32次,因此只需要 O(1) 的時間複雜度。

效能分析

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →

滿足前面推論的 O(n) 以及 O(1),不過因為需要 shift 太多次,所以在 F(40) 以前並沒有任何的優勢。

fast doubling with clz

static long long fib_fast_clz(long long k)
{
    if (k < 2) /* F(0) = 0, F(1) = 1 */
        return k;

    long long f[2] = {0, 1};

    for (unsigned int i = 1U << (32 - __builtin_clz(k)); i; i >>= 1) {
        /* F(2k) = F(k) * [ 2 * F(k+1) – F(k) ] */
        long long k1 = f[0] * (f[1] * 2 - f[0]);
        /* F(2k+1) = F(k)^2 + F(k+1)^2 */
        long long k2 = f[0] * f[0] + f[1] * f[1];
        if (k & i) {
            f[0] = k2;      /* F(2k+1) */
            f[1] = k1 + k2; /* F(2k+2) */
        } else {
            f[0] = k1; /* F(2k) */
            f[1] = k2; /* F(2k+1) */
        }
    }

    return f[0];
}

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →

因為大幅減少了前面無意義的 shift,也大幅的減少了計算所需的時間。