Try   HackMD

2022q1 Homework3 (fibdrv)

contributed by < oucs638 >

Assignment
oucs638/fibdrv

Development Environment

Compiler
$ gcc --version
gcc (Ubuntu 9.3.0-17ubuntu1~20.04) 9.3.0
Hardware
$ lscpu
Architecture:                    x86_64
CPU op-mode(s):                  32-bit, 64-bit
Byte Order:                      Little Endian
Address sizes:                   39 bits physical, 48 bits virtual
CPU(s):                          12
On-line CPU(s) list:             0-11
Thread(s) per core:              2
Core(s) per socket:              6
Socket(s):                       1
NUMA node(s):                    1
Vendor ID:                       GenuineIntel
CPU family:                      6
Model:                           158
Model name:                      Intel(R) Core(TM) i7-9750H CPU @ 2.60GHz
Stepping:                        13
CPU MHz:                         2600.000
CPU max MHz:                     4500.0000
CPU min MHz:                     800.0000
BogoMIPS:                        5199.98
Virtualization:                  VT-x
L1d cache:                       192 KiB
L1i cache:                       192 KiB
L2 cache:                        1.5 MiB
L3 cache:                        12 MiB
NUMA node0 CPU(s):               0-11

Linux kernel version
$ uname -r
5.13.0-30-generic
Install linux-headers package.
$ sudo apt install linux-headers-`uname -r`
$ dpkg -L linux-headers-5.13.0-30-generic | grep "/lib/modules"
/lib/modules
/lib/modules/5.13.0-30-generic
/lib/modules/5.13.0-30-generic/build
Confirm user identity
$ whoami
oucs638
$ sudo whoami
root
Install tools
$ sudo apt install util-linux strace gnuplot-nox

Exclude factors that affect performance analysis

  • Isolate a CPU from the kernel scheduler.
$ sudo vim /etc/default/grub
...
// modify GRUB_CMDLINE_LINUX
GRUB_CMDLINE_LINUX="isolcous=11"
  • Update grub and reboot.
$ sudo update-grub
$ reboot
  • Check whether the CPU was successfully isolated.
$ taskset -cp 1
pid 1's current affinity list: 0-10
  • Restrain address space layout randomization (ASLR)
$ sudo sh -c "echo 0 > /proc/sys/kernel/randomize_va_space"
  • Create a shell script named performance.sh to set scaling_governor as performance.
// performance.sh
for i in /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor
do
    echo performance > ${i}
done

// execute script
$ sudo sh performance.sh
  • For Intel processors, turn off turbo mode.
$ sudo sh -c "echo 1 > /sys/devices/system/cpu/intel_pstate/no_turbo"

Use class clz/ctz instructions to improve Fibonacci number operations.

  • Definition of Fibonacci number:
    {F0=0,F1=1Fn=Fn1+Fn2,n2
  • Fibonacci number can be expressed as:
    Fn=[FnFn1]=[1110][Fn1Fn2]
  • Implement in C code:
int F(int n)
{
    if (n == 0 || n == 1)
        return n;
    return F(n - 1) + F(n - 2);
}
  • Assuming n = 6, the result of the program executing the function calls are as follows:






G



1

F(6)



2

F(4)



1->2





3

F(5)



1->3





4

F(2)



2->4





5

F(3)



2->5





6

F(3)



3->6





7

F(4)



3->7





8

F(0)



4->8





9

F(1)



4->9





10

F(1)



5->10





11

F(2)



5->11





12

F(1)



6->12





13

F(2)



6->13





14

F(2)



7->14





15

F(3)



7->15





16

F(0)



11->16





17

F(1)



11->17





18

F(0)



13->18





19

F(1)



13->19





20

F(0)



14->20





21

F(1)



14->21





22

F(1)



15->22





23

F(2)



15->23





24

F(0)



23->24





25

F(1)



23->25





  • Obviously there are a lot of unnecessary repeated function calls.
  • Refer to Calculating Fibonacci Numbers by Fast Doubling, the Fibonacci number can be redefined as:
    {F0=0,F1=1,F2=1F2n+1=Fn+12+Fn2F2n=Fn(2Fn+1Fn)n0
  • Implement in C code:
int F(int n)
{
    if (n == 0 || n == 1)
        return n;
    if (n == 2)
        return 1;

    int k = 0;
    if (n % 2) {
        k = (n - 1) / 2;
        return F(k) * F(k) + F(k + 1) * F(k + 1);
    } else {
        k = n / 2;
        return F(k) * (2 * F(k + 1) - F(k));
    }
}
  • Assuming n = 6, the result of the program executing the function calls are as follows:






G



1

F(6)



2

F(3)



1->2





3

F(4)



1->3





4

F(1)



2->4





5

F(2)



2->5





6

F(2)



3->6





7

F(3)



3->7





8

F(1)



7->8





9

F(2)



7->9





  • The Fast Doubling method can significantly reduce thr function calls required for program execution.
  • Because computers represent numbers in binary, we can start from MSB to determine which operation to perform.
    • Find 0 : Calculate
      F2n
      and
      F2n+1
      .
    • Find 1 : Calculate
      F2n
      and
      F2n+1
      first, then calculate
      F2n+2
      .
  • But the actual number would not have lead 0s, so we can use class ctz instructions to count how many lead 0s and skip them.

Development Progress Records

  • Test.
$ make check
make -C /lib/modules/5.13.0-30-generic/build M=/home/oucs638/GitHub/fibdrv modules
make[1]: Entering directory '/usr/src/linux-headers-5.13.0-30-generic'
make[1]: Leaving directory '/usr/src/linux-headers-5.13.0-30-generic'
make unload
make[1]: Entering directory '/home/oucs638/GitHub/fibdrv'
sudo rmmod fibdrv || true >/dev/null
rmmod: ERROR: Module fibdrv is not currently loaded
make[1]: Leaving directory '/home/oucs638/GitHub/fibdrv'
make load
make[1]: Entering directory '/home/oucs638/GitHub/fibdrv'
sudo insmod fibdrv.ko
make[1]: Leaving directory '/home/oucs638/GitHub/fibdrv'
sudo ./client > out
make unload
make[1]: Entering directory '/home/oucs638/GitHub/fibdrv'
sudo rmmod fibdrv || true >/dev/null
make[1]: Leaving directory '/home/oucs638/GitHub/fibdrv'
 Passed [-]
f(93) fail
input: 7540113804746346429
expected: 12200160415121876738

  • Observe the resulting fibdrv.ko module.
$ modinfo fibdrv.ko
filename:       fibdrv.ko
version:        0.1
description:    Fibonacci engine driver
author:         National Cheng Kung University, Taiwan
license:        Dual MIT/GPL
srcversion:     9A01E3671A116ADA9F2BB0A
depends:        
retpoline:      Y
name:           fibdrv
vermagic:       5.13.0-30-generic SMP mod_unload modversions 
  • Observe the behavior of the fibdrv.ko after the linux kernel is mounted.
$ sudo insmod fibdrv.ko
$ ls -l /dev/fibonacci
crw------- 1 root root 510, 0 Mar  7 01:50 /dev/fibonacci
$ cat /sys/class/fibonacci/fibonacci/dev
510:0
$ cat /sys/module/fibdrv/version
0.1
$ lsmod | grep fibdrv
fibdrv                 16384  0
$ cat /sys/module/fibdrv/refcnt
0

Calculate the Fibonacci number after the 93rd item (inclusive)

  • Refer to assignment, attempt to introduce big number that operate on string-number.
  • Use Small/Short String Optimization.
  • Refer to the example code provided in the assignment.
    • The example code only use string-number addition to calculate the Fibonacci number.
    • Expect to add a string-number subtraction and multiplication based on the addition-related code of the example code.
    • With string-number substraction and multiplication, fast doubling can be intoduced to calculate Fibonacci numbers.
    • But in this part, the Fibonacci number will be calculated by addtion first, and fast doubling will be introduced in the later part.
  • Because C99 variable-length array(VLA) is not allowed in Linux kernel, so I use kmalloc to allocate memory space for the array dynamically.
...
/home/oucs638/GitHub/fibdrv/fibdrv.c: In function ‘fib_sequence’:
/home/oucs638/GitHub/fibdrv/fibdrv.c:70:5: warning: ISO C90 forbids variable length array ‘f’ [-Wvla]
   70 |     bn f[k + 2];
      |     ^~
/home/oucs638/GitHub/fibdrv/fibdrv.c: In function ‘bn_add’:
/home/oucs638/GitHub/fibdrv/fibdrv.c:263:5: warning: ISO C90 forbids variable length array ‘buf’ [-Wvla]
  263 |     char buf[size_a + 2];
      |     ^~~~
...
  • After calculate the Fibonacci number, the allocated memory space should be freed.
static int fib_sequence(long long k, char __user *buf)
{
    /* FIXME: C99 variable-length array (VLA) is not allowed in Linux kernel. */
    // bn f[k + 2];
    bn *f = (bn *) kmalloc((k + 2) * sizeof(bn), GFP_KERNEL);

    f[0] = *bn_tmp("0");
    f[1] = *bn_tmp("1");

    for (int i = 2; i <= k; i++)
        bn_add(&f[i - 1], &f[i - 2], &f[i]);

    int res = bn_size(&f[k]);
    if (copy_to_user(buf, bn_data(&f[k]), res))
        return -EFAULT;

    for (int i = 0; i <= k; i++)
        bn_free(&f[i]);

    return res;

}
  • String-number addition will reverse the string first, then add digit one by one from the least significant digit to the most significant digit.
static void bn_add(bn *a, bn *b, bn *out)
{
    char *data_a, *data_b;
    size_t size_a, size_b;
    int i, s, c = 0;

    if (bn_size(a) < bn_size(b))
        __swap((void *) &a, (void *) &b, sizeof(void *));

    data_a = bn_data(a);
    data_b = bn_data(b);
    size_a = bn_size(a);
    size_b = bn_size(b);
    reverse_str(data_a, size_a);
    reverse_str(data_b, size_b);

    // char buf[size_a + 2];
    char *buf = (char *) kmalloc((size_a + 2), GFP_KERNEL);

    for (i = 0; i < size_b; i++) {
        s = (data_a[i] - '0') + (data_b[i] - '0') + c;
        buf[i] = '0' + s % 10;
        c = s / 10;
    }

    for (i = size_b; i < size_a; i++) {
        s = (data_a[i] - '0') + c;
        buf[i] = '0' + s % 10;
        c = s / 10;
    }

    if (c)
        buf[i++] = '0' + c;

    buf[i] = 0;
    reverse_str(buf, i);
    reverse_str(data_a, size_a);
    reverse_str(data_b, size_b);

    if (out)
        *out = *bn_tmp(buf);
}

Calculating Fibonacci Numbers by Fast Doubling (big number)

  • I found it challenging to overcome positive-negtive sign problem issues when implementing "fasting doubling" with string numbers.
  • It needs a lot of reallocating memory space to implement big-number operations with string number.
  • Refer to eecheng87, rebuilt the big number operation.
    • Preset a space large enough for Fibonacci number value.
  • As previous part, the Fibonacci number can be represented as:
    {F0=0,F1=1,F2=1F2n+1=Fn+12+Fn2F2n=Fn(2Fn+1Fn)n0
  • Expressed as functions:
    {f(0)=0=f(k)f(1)=1=f(k+1)f(2k)=f(k)[2f(k+1)f(k)]=2f(k)f(k+1)f(k)f(k)f(2k+1)=f(k)2+f(k+1)2=f(k)f(k)+f(k+1)f(k+1)k0
  • Assume:
    {t(0)=f(k)f(k)t(1)=f(k+1)f(k+1)f(2)=2f(k)f(k+1)
  • Because computers represent numbers in binary, we can start from MSB to determine which operation to perform.
    • Find 1 : next
      k
      is odd.
      {next    k=2k+1f(k)=f(2k+1)=t(0)+t(1)f(k+1)=f(2k+2)=f(2k)+f(2k+1)=t(2)t(0)+t(0)+t(1)=t(2)+t(1)
    • Find 0 : next
      k
      is even.
      {next    k=2kf(k)=f(2k)=t(2)t(0)f(k+1)=f(2k+1)=t(0)+t(1)
  • But the actual number would not have lead 0s, so we can use class ctz instructions to count how many lead 0s and skip them.
    • Use __builtin_clzll(k) for typeof(k) = unsigned long long
static bn fib_sequence(unsigned long long k)
{
    bn f[2];
    bn_init(&f[0], 0);  // f(k)
    bn_init(&f[1], 1);  // f(k + 1)

    if (k < 2)
        return f[k];

    // t(0) = f(k) * f(k)
    // t(1) = f(k + 1) * f(k + 1)
    // t(2) = 2 * f(k) * f(k + 1)
    bn t[3];

    // fast doubling
    // f(2k) = f(k) * [2 * f(k + 1) - f(k)]
    //       = 2 * f(k) * f(k + 1) - f(k) * f(k)
    //       = t(2) - t(0)
    // f(2k + 1) = f(k) ^ 2 + f(k + 1) ^ 2
    //           = f(k) * f(k) + f(k + 1) * f(k + 1)
    //           = t(0) + t(1)
    for (unsigned long long i = 1U << (63 - __builtin_clzll(k)); i; i >>= 1) {
        bn_mul(&f[0], &f[0], &t[0]);  // t(0)
        bn_mul(&f[1], &f[1], &t[1]);  // t(1)
        bn_mul(&f[0], &f[1], &t[2]);
        bn_add(&t[2], &t[2], &t[2]);  // t(2)
        if (i & k) {
            // next k = 2k + 1
            // f(k) = f(2k + 1) = t(0) + t(1)
            bn_add(&t[0], &t[1], &f[0]);
            // f(k + 1) = f(2k + 2) = f(2k) + f(2k + 1)
            //          = t(2) - t(0) + t(0) + t(1)
            //          = t(2) + t(1)
            bn_add(&t[2], &t[1], &f[1]);
        } else {
            // next k = 2k
            // f(k) = f(2k)
            bn_sub(&t[2], &t[0], &f[0]);
            // f(k + 1) = f(2k + 1)
            bn_add(&t[0], &t[1], &f[1]);
        }
    }
    // return f(k)
    return f[0];
}

Analyze

  • Use big number.
  • Use builtin int128.
  • Found that if used builtin int128, the execution time would have unpredictable peaks.