--- tags: linux2022 --- # Quiz 2 contributed by <`linchi199yeh`> ## 測驗`1` [Average](https://hackmd.io/@sysprog/linux2022-quiz2#%E6%B8%AC%E9%A9%97-1) ```c #include <stdint.h> // this implementation has risk of overflow uint32_t average(uint32_t a, uint32_t b) { return (a + b) / 2; } // to use this implementation we must know which integer is larger uint32_t average(uint32_t low, uint32_t high) { return low + (high - low) / 2; } ``` An improved version ```c #include <stdint.h> uint32_t average(uint32_t a, uint32_t b) { return (a >> 1) + (b >> 1) + (EXP1); } ``` The right shift for both `a` and `b` we can regard it as integer division of `a` and `b`. So the last term we should check `a` and `b`'s parity we should only add 1 when both of them are odd. We can do that by just checking the last bit of both variables. EXP1 = `a & b & 1` We can further reduce the sum operation: ```c uint32_t average(uint32_t a, uint32_t b) { return (EXP2) + ((EXP3) >> 1); } ``` The implementation thought derives from a [adder](https://en.wikipedia.org/wiki/Adder_(electronics)) ![](https://i.imgur.com/MHgCJgt.png) Where `a ^ b` is the sum and `a & b` is the carry. We can formulate `a + b` as `((a & b) << 1) + (a ^ b)` To average both terms we divide the sum by 2 or shifting it to the right by 1 which yields: `((a & b) << 1 + (a ^ b)) >> 1` simplifying the term we get: `(a & b) + (a ^ b) >> 1` Therefore : EXP2 = `a & b` EXP3 = `a ^ b` ### Dive in Further :::success 比較上述實作在編譯器最佳化開啟的狀況,對應的組合語言輸出,並嘗試解讀 ::: ==to get the assembly code run `gcc -S -O average.c -o average.s`== -S flag is to compile the c file into assembly as an output -O flag is to enable [optimization](https://gcc.gnu.org/onlinedocs/gcc/Optimize-Options.html) Quick review on [assembly code](https://www.godevtool.com/GoasmHelp/newass.htm) **First version of average** :::spoiler C code ```c uint32_t average_v1(uint32_t a, uint32_t b) { return (a + b) / 2; } ``` ::: leal([load effective address](https://pdos.csail.mit.edu/6.828/2008/readings/i386/LEA.htm)) lea used as [adder](https://stackoverflow.com/questions/12869637/trouble-understanding-assembly-command-load-effective-address) ```assembly= average_v1: .LFB0: .cfi_startproc endbr64 leal (%rdi,%rsi), %eax shrl %eax ret .cfi_endproc ``` We can see that only 2 instructions(5 and 6) are required to calculate the average **Second version of average** :::spoiler C code ```c uint32_t average_v2(uint32_t a, uint32_t b) { return low + (high - low) / 2; } ``` ::: ```assembly= average_v2: .LFB1: .cfi_startproc endbr64 movl %esi, %eax subl %edi, %eax shrl %eax addl %edi, %eax ret ``` **Third version of average** :::spoiler C code ```c uint32_t average_v3(uint32_t a, uint32_t b) { return (a >> 1) + (b >> 1) + (a & b & 1); } ``` ::: ```assembly= average_v3: .LFB2: .cfi_startproc endbr64 movl %edi, %eax shrl %eax movl %esi, %edx shrl %edx addl %edx, %eax andl %esi, %edi andl $1, %edi addl %edi, %eax ret ``` **Last version of average** :::spoiler C code ```c uint32_t average_v3(uint32_t a, uint32_t b) { return ((a ^ b) + ((a & b ) >> 1)); } ``` ::: ```assembly= average_v4: .LFB3: .cfi_startproc endbr64 movl %edi, %eax andl %esi, %eax shrl %eax xorl %esi, %edi addl %edi, %eax ret ``` The last version takes 3 instructions less than the 3rd one. It's not so obvious, because in the c code we only use 2 operations less than the previous version. ### Linux average.h I've just captured the relevant part ```c= static inline void ewma_##name##_add(struct ewma_##name *e, \ unsigned long val) \ { \ unsigned long internal = READ_ONCE(e->internal); \ unsigned long weight_rcp = ilog2(_weight_rcp); \ unsigned long precision = _precision; \ \ BUILD_BUG_ON(!__builtin_constant_p(_precision)); \ BUILD_BUG_ON(!__builtin_constant_p(_weight_rcp)); \ BUILD_BUG_ON((_precision) > 30); \ BUILD_BUG_ON_NOT_POWER_OF_2(_weight_rcp); \ \ WRITE_ONCE(e->internal, internal ? \ (((internal << weight_rcp) - internal) + \ (val << precision)) >> weight_rcp : \ (val << precision)); \ } ``` The if in line 13 is in case there is no value it is just initialized, so return the newest value. Above code have specified that `weight_rcp` must be a power of 2 for efficiency. I spent a bit of time understandint lines `14-15`: The equation is $$ average = (average(1-\frac{1}{w}) + val(\frac{1}{w})) $$ and above implementation is better understood if putting it this way. $$ average = (average(\frac{w-1}{w}) + val(\frac{1}{w})) $$ The question is doesn't the value overflow when adding up? --- ## 測驗`2`[Max](https://hackmd.io/@sysprog/linux2022-quiz2#%E6%B8%AC%E9%A9%97-2) ```c #include <stdint.h> uint32_t max(uint32_t a, uint32_t b) { return a ^ ((EXP4) & -(EXP5)); } ``` EXP4 = `a ^ b` EXP5 = `a <= b` ### Thought process :::info Must check [XOR tricks](https://florian.github.io/xor-trick/) first. ::: - Given the hint that `EXP4` must contain a **boolean operator (&/|/^)** and `EXP5` must contain a **comparison operator** between `a` and `b` - And knowing that `x ^ 0 = x`, `x ^ x = 0`, and the commutativity law <br> `x ^ y = y ^ x` - Using the above mention laws `x ^ y ^ x = x ^ x ^ y = 0 ^ y = y` So considering the case that `a > b` is true: 1. `a ^ ((EXP4) & -(EXP5)) ` must return `a` 2. So we want the expression to result into `a ^ 0` 3. We can achieve that by simply letting `EXP5` to be ` a <= b` 4. This yields ` a ^ ((EXP4) & -(a <= b)` = `a ^ (EXP4 & -0)` = `a ^ 0` = `a` Now, considering that `a <= b` is true: 1. The expression reduces to: <br> `a ^ ((EXP4) & -(a <= b))` = `a ^ ((EXP4) & -1)` = `a ^ (EXP4)` <br>(`-1` is `0xfffffff` which is all ones in binary) 2. So `a ^ (EXP4)` must return `b` 3. EXP4 must be `a ^ b` this case ## 測驗`3` [GDC](https://hackmd.io/@sysprog/linux2022-quiz2#%E6%B8%AC%E9%A9%97-3) ```c #include <stdint.h> uint64_t gcd64(uint64_t u, uint64_t v) { if (!u || !v) return u | v; while (v) { uint64_t t = v; v = u % v; u = t; } return u; } ``` Below is also an implementation of GDC but without using the modulo operation ```c= #include <stdint.h> uint64_t gcd64(uint64_t u, uint64_t v) { if (!u || !v) return u | v; int shift; for (shift = 0; !((u | v) & 1); shift++) { u /= 2, v /= 2; } while (!(u & 1)) u /= 2; do { while (!(v & 1)) v /= 2; if (u < v) { v -= u; } else { uint64_t t = u - v; u = v; v = t; } } while (COND); return RET; } ``` COND = `v` RET = `u<<shift` ### Analisis First have a look at [GDC Euclid's algorithm](https://en.wikipedia.org/wiki/Euclidean_algorithm) **worked example part** 1. Lines `6-8` calculates the GDC of multiples 2 it can be done faster using right shiftings. 2. After we got all the multiples of 2, we can now divide `u` and `v` until there are no multiples of 2 left.(lines `9-13`) 3. Then we perform the euclidean algorithm until `v` is 0. So COND must be `v` 4. And the return value is $$u\times2^{shift} $$ 5. `u<<shift` ### Implementation usint __builtin_ctz() ```c #include <stdint.h> uint64_t gcd64(uint64_t u, uint64_t v) { if (!u || !v) return u | v; int shift_v = __builtin_ctz(v); int shift_u = __builtin_ctz(u); int shift = min(u,v); u >>= shift_u, v >>= shift_v; do { if (u < v) { v -= u; } else { uint64_t t = u - v; u = v; v = t; } } while (v != 0); return u << shift; } ``` :::warning As in wikipedia pointed out this implementation without modulo would take many iterations if `v` is much greater than`u` or the other way around. e.g. When `v = gdc * n` and `u = gdc ` Then the algorithm would have to perform n times. I doubt how often values `v` and `u` differs by a lot and how fast would the modulo version run compared to this one. ::: ## 測驗`4` [Bitset](https://hackmd.io/@sysprog/linux2022-quiz2#%E6%B8%AC%E9%A9%97-4) ```c #include <stddef.h> size_t improved(uint64_t *bitmap, size_t bitmapsize, uint32_t *out) { size_t pos = 0; uint64_t bitset; for (size_t k = 0; k < bitmapsize; ++k) { bitset = bitmap[k]; while (bitset != 0) { uint64_t t = EXP6; int r = __builtin_ctzll(bitset); out[pos++] = k * 64 + r; bitset ^= t; } } return pos; } ``` we can get the position of the last non-zero bit by operating `x & -x` let` x = 01011100` so`-x = 10100100` `x & -x = 00000100` clearing out all the bits except the last non-zero bit EXP6 = `bitset & -bitset` ## 測驗`5`[Fraction to recurring decimal](https://hackmd.io/@sysprog/linux2022-quiz2#%E6%B8%AC%E9%A9%97-5) ```c #include <stdbool.h> #include <stdio.h> #include <stdlib.h> #include <string.h> #include "list.h" struct rem_node { int key; int index; struct list_head link; }; static int find(struct list_head *heads, int size, int key) { struct rem_node *node; int hash = key % size; list_for_each_entry (node, &heads[hash], link) { if (key == node->key) return node->index; } return -1; } char *fractionToDecimal(int numerator, int denominator) { int size = 1024; char *result = malloc(size); char *p = result; if (denominator == 0) { result[0] = '\0'; return result; } if (numerator == 0) { result[0] = '0'; result[1] = '\0'; return result; } /* using long long type make sure there has no integer overflow */ long long n = numerator; long long d = denominator; /* deal with negtive cases */ if (n < 0) n = -n; if (d < 0) d = -d; bool sign = (float) numerator / denominator >= 0; if (!sign) *p++ = '-'; long long remainder = n % d; long long division = n / d; sprintf(p, "%ld", division > 0 ? (long) division : (long) -division); if (remainder == 0) return result; p = result + strlen(result); *p++ = '.'; /* Using a map to record all of reminders and their position. * if the reminder appeared before, which means the repeated loop begin, */ char *decimal = malloc(size); memset(decimal, 0, size); char *q = decimal; size = 1333; struct list_head *heads = malloc(size * sizeof(*heads)); for (int i = 0; i < size; i++) INIT_LIST_HEAD(&heads[i]); for (int i = 0; remainder; i++) { int pos = find(heads, size, remainder); if (pos >= 0) { while (PPP > 0) *p++ = *decimal++; *p++ = '('; while (*decimal != '\0') *p++ = *decimal++; *p++ = ')'; *p = '\0'; return result; } struct rem_node *node = malloc(sizeof(*node)); node->key = remainder; node->index = i; MMM(&node->link, EEE); *q++ = (remainder * 10) / d + '0'; remainder = (remainder * 10) % d; } strcpy(p, decimal); return result; } ``` ### Thought process `MMM(&node->link, EEE)` is when remainder has not appeared before so we need to add it into hash map. So `MMM` must be `list_add`, `EEE` address to the list and its the index where it's hashed `&heads[remainder % size]` ## 測驗`6`[__alignof__](https://hackmd.io/@sysprog/linux2022-quiz2#%E6%B8%AC%E9%A9%97-6) ```c /* * ALIGNOF - get the alignment of a type * @t: the type to test * * This returns a safe alignment for the given type. */ #define ALIGNOF(t) \ ((char *)(&((struct { char c; t _h; } *)0)->M) - (char *)X) ``` M = `_h` X = `0` [related article](https://embetronicx.com/tutorials/p_language/c/understanding-of-container_of-macro-in-linux-kernel/) ### Thought process This seems like a similar implementation to offsetof and sizeof. - We can regard the function as a substraction of two variables casted as character pointer. - `(char *)() - (char *)()` - `(&((struct { char c; t _h; } *)0)->M)` is just a implementation of `offsetof` so `M` is the member of `type t` which is `_h` - Now we need to substract the offset out the character c that precedes it so `X` by a common choice is `0` ## 測驗`7`[FizzBuzz](https://hackmd.io/@sysprog/linux2022-quiz2#%E6%B8%AC%E9%A9%97-7) ```c static inline bool is_divisible(uint32_t n, uint64_t M) { return n * M <= M - 1; } static uint64_t M3 = UINT64_C(0xFFFFFFFFFFFFFFFF) / 3 + 1; static uint64_t M5 = UINT64_C(0xFFFFFFFFFFFFFFFF) / 5 + 1; int main(int argc, char **argv) { for (size_t i = 1; i <= 100; i++) { uint8_t div3 = is_divisible(i, M3); uint8_t div5 = is_divisible(i, M5); unsigned int length = (2 << KK1) << KK2; char fmt[9]; strncpy(fmt, &"FizzBuzz%u"[(9 >> div5) >> (KK3)], length); fmt[length] = '\0'; printf(fmt, i); printf("\n"); } return 0; } ``` ## Reference <`kevinshieh0225`> [quiz2](https://hackmd.io/@Masamaloka/linux2022-quiz2)