# 2020q3 Homework3 (quiz3)
contributed by < [`WeiCheng14159`](https://github.com/WeiCheng14159) >
###### tags: `sysprog2020`
## Test 1
### Working Principle
```c=1
int asr_i(signed int m, unsigned int n)
{
const int logical = (((int) -1) >> 1) > 0;
unsigned int fixu = -(logical & (m < 0));
int fix = *(int *) &fixu;
return (m >> n) | (fix ^ (fix >> n));
}
```
* The `logical` variable means is
* `1` if the platform does logical right shift. `(0xFFFFFFFF >> 1) => 0x7FFFFFFF`
* `0` if it does arithmatic right shift. `(0xFFFFFFFF >> 1) => 0xFFFFFFFF`
* If `m` is negative (MSB is 1) and `logical` is 1 (the platform does logical right shift) then
* `fixu = -1`, which will be cast to `0xFFFFFFFF` because `fixu` is `unsigned int`
* `fixu = 0` if otherwise
* `fix` will cast `fixu` to `signed int`
* `fix=-1` if `fixu=0xFFFFFFFF`
* `fix=0` if `fixu=0`
* In case of `m` is negative and the platform does logical right shift, the result `m>>n` will do bit-wise OR with `(fix ^ (fix >> n))`
* If `fix=-1` then `(fix ^ (fix >> n)) = 0xFFFFFFFF ^ (0xFFFFFFFF >> n) = 0xFF...0`
* If `fix=0` then `(fix ^ (fix >> n)) = 0`
* The bit-wise OR operation with `(fix ^ (fix >> n))` will replace the leading zero bits of `m >> n` with ones.
* | | Logical Right Shift | Arithmetic Right Shift |
|-------------------|---------------------|------------------------|
| m is **Positive** | fix=0 | fix=0 |
| m is **Negative** | fix=-1 | fix=0 |
* `OP1 = >> 1`
* `OP2 = m < 0`
### More discussion
練習實作其他資料寬度的 ASR,可參照 C 語言:前置處理器應用篇 撰寫通用的巨集以強化程式碼的共用程度;
## Test 2
### Working Principle
```c=1
bool isPowerOfFour(int num)
{
return num > 0 && (num & (num - 1))==0 &&
!(__builtin_ctz(num) & 0x1);
}
```
* The function should return true if the signed integer is the power of four.
* e.g. `num` = {$4^0 = 1, 4^1=4, 4^2=16,...$}
* If converted to binary:
* 01 = 0000_0000_0000_0000_0000_0000_0000_0001
* 04 = 0000_0000_0000_0000_0000_0000_0000_0100
* 16 = 0000_0000_0000_0000_0000_0000_0001_0000
* This function should return true if **ALL** the conditions listed below are satisfied:
* `num > 0`
* `(num & (num - 1))==0`
* If num is converted to binary, it should consist of exactly one `1` and thirty-one `0`
* `!(__builtin_ctz(num) & 0x1)`
* The number of trailing `0` bits should be **even**.
* `OPQ = & 0x1`
### More discussion
改寫上述程式碼,提供等價功能的不同實作,儘量降低 branch 的數量;
練習 LeetCode 1009. Complement of Base 10 Integer 和 41. First Missing Positive,應善用 clz;
研讀 2017 年修課學生報告,理解 clz 的實作方式,並舉出 Exponential Golomb coding 的案例說明,需要有對應的 C 原始程式碼和測試報告;
x-compressor 是個以 Golomb-Rice coding 為基礎的資料壓縮器,實作中也用到 clz/ctz 指令,可參見 Selecting the Golomb Parameter in Rice Coding。
## Test 3
### Working Principle
```c=1
int numberOfSteps (int num)
{
return num ? __builtin_popcount(num) + 31 - __builtin_clz(num) : 0;
}
```
* In the LeetCode Problem "1342. Number of Steps to Reduce a Number to Zero", a number is divide by 2 if it's even, otherwise, subtract by 1 if it's odd.
* Odd number: minus 1
* Even number: right shift 1 bit
* Function `numberOfSteps` should return the number of steps to reduce an input to zero.
* Imaging flipping a zero bit to one between two ones, the number of steps required to reduce it to zero will increase by 1. Thus, `AAA = __builtin_popcount(num)`.
* e.g. 0000_1001 -> 0000_1101
* 31 - `__builtin_clz(num)` represents the steps required to right shift the number.
* `AAA = __builtin_popcount(num)`
* `BBB = __builtin_clz(num)`
### More discussion
改寫上述程式碼,提供等價功能的不同實作並解說;
提示: Bit Twiddling Hacks 中提及許多 bitwise operation 的使用,如 bit inverse、abs 等等,是極好的參考材料。
避免用到編譯器的擴充功能,只用 C99 特徵及 bitwise 運算改寫 LeetCode 1342. Number of Steps to Reduce a Number to Zero,實作出 branchless 解法,儘量縮減程式碼行數和分支數量;
## Test 4
### Working Principle
```c=1
#include <stdint.h>
uint64_t gcd64(uint64_t u, uint64_t v) {
if (!u || !v) return u | v;
int shift;
for (shift = 0; !((u | v) & 1); shift++) {
u /= 2, v /= 2;
}
while (!(u & 1))
u /= 2;
do {
while (!(v & 1))
v /= 2;
if (u < v) {
v -= u;
} else {
uint64_t t = u - v;
u = v;
v = t;
}
} while (v);
return u << shift;
}
```
```c=3
if (!u || !v) return u | v;
```
In case `u` or `v` equals to zero, return the bit-wise OR result. Which is the number not equal to zero.
```c=4
int shift;
for (shift = 0; !((u | v) & 1); shift++) {
u /= 2, v /= 2;
}
```
`shift` variable stores the number 2's as common factor between `u` and `v`
```c=8
while (!(u & 1))
u /= 2;
do {
while (!(v & 1))
v /= 2;
if (u < v) {
v -= u;
} else {
uint64_t t = u - v;
u = v;
v = t;
}
} while (v);
```
In case one number is even and another is odd, the common factor between the two will be odd. Therefore, we could further safely remove the 2 factor from the even number when computing GCD.
Next, we practice the classic Euclidean Algorithm to find the GCD between `u` and `v`
* `XXX = v`
* `YYY = u << shift`
### More discussion
在 x86_64 上透過 __builtin_ctz 改寫 GCD,分析對效能的提升;
## Test 5
### Working Principle
Given an array `bitmap` of size `bitmapsize`. The function below compress a bitmap array by **bitmap index**. The index of a bit whose value is `1` will be stored in the array `out`.
```c=1
#include <stddef.h>
size_t naive(uint64_t *bitmap, size_t bitmapsize, uint32_t *out)
{
size_t pos = 0;
for (size_t k = 0; k < bitmapsize; ++k) {
uint64_t bitset = bitmap[k];
size_t p = k * 64;
for (int i = 0; i < 64; i++) {
if ((bitset >> i) & 0x1)
out[pos++] = p + i;
}
}
return pos;
}
```
* `t = (bitset & -bitset)` generates a bit mask that marks the least significant non-zero bit in `bitset`.
* i.e. a = `001`, -a = `111`, a & (-a) = `001`
* i.e. a = `010`, -a = `110`, a & (-a) = `010`
* i.e. a = `000`, -a = `000`, a & (-a) = `000`
* `r` is the number of tailing zeros in `bitset`, with `k` and `r` we can store the index in array `out`.
* `bitset ^= t;` then flip the least significant non-zero bit from `1` to `0`. The loop continues until all `1` in bitset are flipped.
```c=1
#include <stddef.h>
size_t improved(uint64_t *bitmap, size_t bitmapsize, uint32_t *out)
{
size_t pos = 0;
uint64_t bitset;
for (size_t k = 0; k < bitmapsize; ++k) {
bitset = bitmap[k];
while (bitset != 0) {
uint64_t t = (bitset & -bitset);
int r = __builtin_ctz(bitset);
out[pos++] = k * 64 + r;
bitset ^= t;
}
}
return pos;
}
```
### More discussion
設計實驗,檢驗 clz 改寫的程式碼相較原本的實作有多少改進?應考慮到不同的 bitmap density;
思考進一步的改進空間;