2020q3 Homework3 (quiz3)

# 2020q3 Homework3 (quiz3) contributed by < [`WeiCheng14159`](https://github.com/WeiCheng14159) > ###### tags: `sysprog2020` ## Test 1 ### Working Principle ```c=1 int asr_i(signed int m, unsigned int n) { const int logical = (((int) -1) >> 1) > 0; unsigned int fixu = -(logical & (m < 0)); int fix = *(int *) &fixu; return (m >> n) | (fix ^ (fix >> n)); } ``` * The `logical` variable means is * `1` if the platform does logical right shift. `(0xFFFFFFFF >> 1) => 0x7FFFFFFF` * `0` if it does arithmatic right shift. `(0xFFFFFFFF >> 1) => 0xFFFFFFFF` * If `m` is negative (MSB is 1) and `logical` is 1 (the platform does logical right shift) then * `fixu = -1`, which will be cast to `0xFFFFFFFF` because `fixu` is `unsigned int` * `fixu = 0` if otherwise * `fix` will cast `fixu` to `signed int` * `fix=-1` if `fixu=0xFFFFFFFF` * `fix=0` if `fixu=0` * In case of `m` is negative and the platform does logical right shift, the result `m>>n` will do bit-wise OR with `(fix ^ (fix >> n))` * If `fix=-1` then `(fix ^ (fix >> n)) = 0xFFFFFFFF ^ (0xFFFFFFFF >> n) = 0xFF...0` * If `fix=0` then `(fix ^ (fix >> n)) = 0` * The bit-wise OR operation with `(fix ^ (fix >> n))` will replace the leading zero bits of `m >> n` with ones. * | | Logical Right Shift | Arithmetic Right Shift | |-------------------|---------------------|------------------------| | m is **Positive** | fix=0 | fix=0 | | m is **Negative** | fix=-1 | fix=0 | * `OP1 = >> 1` * `OP2 = m < 0` ### More discussion 練習實作其他資料寬度的 ASR，可參照 C 語言：前置處理器應用篇撰寫通用的巨集以強化程式碼的共用程度; ## Test 2 ### Working Principle ```c=1 bool isPowerOfFour(int num) { return num > 0 && (num & (num - 1))==0 && !(__builtin_ctz(num) & 0x1); } ``` * The function should return true if the signed integer is the power of four. * e.g. `num` = {$4^0 = 1, 4^1=4, 4^2=16,...$} * If converted to binary: * 01 = 0000_0000_0000_0000_0000_0000_0000_0001 * 04 = 0000_0000_0000_0000_0000_0000_0000_0100 * 16 = 0000_0000_0000_0000_0000_0000_0001_0000 * This function should return true if **ALL** the conditions listed below are satisfied: * `num > 0` * `(num & (num - 1))==0` * If num is converted to binary, it should consist of exactly one `1` and thirty-one `0` * `!(__builtin_ctz(num) & 0x1)` * The number of trailing `0` bits should be **even**. * `OPQ = & 0x1` ### More discussion 改寫上述程式碼，提供等價功能的不同實作，儘量降低 branch 的數量; 練習 LeetCode 1009. Complement of Base 10 Integer 和 41. First Missing Positive，應善用 clz; 研讀 2017 年修課學生報告，理解 clz 的實作方式，並舉出 Exponential Golomb coding 的案例說明，需要有對應的 C 原始程式碼和測試報告; x-compressor 是個以 Golomb-Rice coding 為基礎的資料壓縮器，實作中也用到 clz/ctz 指令，可參見 Selecting the Golomb Parameter in Rice Coding。 ## Test 3 ### Working Principle ```c=1 int numberOfSteps (int num) { return num ? __builtin_popcount(num) + 31 - __builtin_clz(num) : 0; } ``` * In the LeetCode Problem "1342. Number of Steps to Reduce a Number to Zero", a number is divide by 2 if it's even, otherwise, subtract by 1 if it's odd. * Odd number: minus 1 * Even number: right shift 1 bit * Function `numberOfSteps` should return the number of steps to reduce an input to zero. * Imaging flipping a zero bit to one between two ones, the number of steps required to reduce it to zero will increase by 1. Thus, `AAA = __builtin_popcount(num)`. * e.g. 0000_1001 -> 0000_1101 * 31 - `__builtin_clz(num)` represents the steps required to right shift the number. * `AAA = __builtin_popcount(num)` * `BBB = __builtin_clz(num)` ### More discussion 改寫上述程式碼，提供等價功能的不同實作並解說; 提示: Bit Twiddling Hacks 中提及許多 bitwise operation 的使用，如 bit inverse、abs 等等，是極好的參考材料。避免用到編譯器的擴充功能，只用 C99 特徵及 bitwise 運算改寫 LeetCode 1342. Number of Steps to Reduce a Number to Zero，實作出 branchless 解法，儘量縮減程式碼行數和分支數量; ## Test 4 ### Working Principle ```c=1 #include <stdint.h> uint64_t gcd64(uint64_t u, uint64_t v) { if (!u || !v) return u | v; int shift; for (shift = 0; !((u | v) & 1); shift++) { u /= 2, v /= 2; } while (!(u & 1)) u /= 2; do { while (!(v & 1)) v /= 2; if (u < v) { v -= u; } else { uint64_t t = u - v; u = v; v = t; } } while (v); return u << shift; } ``` ```c=3 if (!u || !v) return u | v; ``` In case `u` or `v` equals to zero, return the bit-wise OR result. Which is the number not equal to zero. ```c=4 int shift; for (shift = 0; !((u | v) & 1); shift++) { u /= 2, v /= 2; } ``` `shift` variable stores the number 2's as common factor between `u` and `v` ```c=8 while (!(u & 1)) u /= 2; do { while (!(v & 1)) v /= 2; if (u < v) { v -= u; } else { uint64_t t = u - v; u = v; v = t; } } while (v); ``` In case one number is even and another is odd, the common factor between the two will be odd. Therefore, we could further safely remove the 2 factor from the even number when computing GCD. Next, we practice the classic Euclidean Algorithm to find the GCD between `u` and `v` * `XXX = v` * `YYY = u << shift` ### More discussion 在 x86_64 上透過 __builtin_ctz 改寫 GCD，分析對效能的提升; ## Test 5 ### Working Principle Given an array `bitmap` of size `bitmapsize`. The function below compress a bitmap array by **bitmap index**. The index of a bit whose value is `1` will be stored in the array `out`. ```c=1 #include <stddef.h> size_t naive(uint64_t *bitmap, size_t bitmapsize, uint32_t *out) { size_t pos = 0; for (size_t k = 0; k < bitmapsize; ++k) { uint64_t bitset = bitmap[k]; size_t p = k * 64; for (int i = 0; i < 64; i++) { if ((bitset >> i) & 0x1) out[pos++] = p + i; } } return pos; } ``` * `t = (bitset & -bitset)` generates a bit mask that marks the least significant non-zero bit in `bitset`. * i.e. a = `001`, -a = `111`, a & (-a) = `001` * i.e. a = `010`, -a = `110`, a & (-a) = `010` * i.e. a = `000`, -a = `000`, a & (-a) = `000` * `r` is the number of tailing zeros in `bitset`, with `k` and `r` we can store the index in array `out`. * `bitset ^= t;` then flip the least significant non-zero bit from `1` to `0`. The loop continues until all `1` in bitset are flipped. ```c=1 #include <stddef.h> size_t improved(uint64_t *bitmap, size_t bitmapsize, uint32_t *out) { size_t pos = 0; uint64_t bitset; for (size_t k = 0; k < bitmapsize; ++k) { bitset = bitmap[k]; while (bitset != 0) { uint64_t t = (bitset & -bitset); int r = __builtin_ctz(bitset); out[pos++] = k * 64 + r; bitset ^= t; } } return pos; } ``` ### More discussion 設計實驗，檢驗 clz 改寫的程式碼相較原本的實作有多少改進？應考慮到不同的 bitmap density; 思考進一步的改進空間;