owned this note
owned this note
Published
Linked with GitHub
---
tags: linux2022
---
# Quiz 2
contributed by <`linchi199yeh`>
## 測驗`1` [Average](https://hackmd.io/@sysprog/linux2022-quiz2#%E6%B8%AC%E9%A9%97-1)
```c
#include <stdint.h>
// this implementation has risk of overflow
uint32_t average(uint32_t a, uint32_t b)
{
return (a + b) / 2;
}
// to use this implementation we must know which integer is larger
uint32_t average(uint32_t low, uint32_t high)
{
return low + (high - low) / 2;
}
```
An improved version
```c
#include <stdint.h>
uint32_t average(uint32_t a, uint32_t b)
{
return (a >> 1) + (b >> 1) + (EXP1);
}
```
The right shift for both `a` and `b` we can regard it as integer division of `a` and `b`.
So the last term we should check `a` and `b`'s parity we should only add 1 when both of them are odd.
We can do that by just checking the last bit of both variables.
EXP1 = `a & b & 1`
We can further reduce the sum operation:
```c
uint32_t average(uint32_t a, uint32_t b)
{
return (EXP2) + ((EXP3) >> 1);
}
```
The implementation thought derives from a [adder](https://en.wikipedia.org/wiki/Adder_(electronics))
![](https://i.imgur.com/MHgCJgt.png)
Where `a ^ b` is the sum and `a & b` is the carry.
We can formulate `a + b` as `((a & b) << 1) + (a ^ b)`
To average both terms we divide the sum by 2 or shifting it to the right by 1 which yields:
`((a & b) << 1 + (a ^ b)) >> 1`
simplifying the term we get:
`(a & b) + (a ^ b) >> 1`
Therefore :
EXP2 = `a & b`
EXP3 = `a ^ b`
### Dive in Further
:::success
比較上述實作在編譯器最佳化開啟的狀況,對應的組合語言輸出,並嘗試解讀
:::
==to get the assembly code run `gcc -S -O average.c -o average.s`==
-S flag is to compile the c file into assembly as an output
-O flag is to enable [optimization](https://gcc.gnu.org/onlinedocs/gcc/Optimize-Options.html)
Quick review on [assembly code](https://www.godevtool.com/GoasmHelp/newass.htm)
**First version of average**
:::spoiler C code
```c
uint32_t average_v1(uint32_t a, uint32_t b)
{
return (a + b) / 2;
}
```
:::
leal([load effective address](https://pdos.csail.mit.edu/6.828/2008/readings/i386/LEA.htm))
lea used as [adder](https://stackoverflow.com/questions/12869637/trouble-understanding-assembly-command-load-effective-address)
```assembly=
average_v1:
.LFB0:
.cfi_startproc
endbr64
leal (%rdi,%rsi), %eax
shrl %eax
ret
.cfi_endproc
```
We can see that only 2 instructions(5 and 6) are required to calculate the average
**Second version of average**
:::spoiler C code
```c
uint32_t average_v2(uint32_t a, uint32_t b)
{
return low + (high - low) / 2;
}
```
:::
```assembly=
average_v2:
.LFB1:
.cfi_startproc
endbr64
movl %esi, %eax
subl %edi, %eax
shrl %eax
addl %edi, %eax
ret
```
**Third version of average**
:::spoiler C code
```c
uint32_t average_v3(uint32_t a, uint32_t b)
{
return (a >> 1) + (b >> 1) + (a & b & 1);
}
```
:::
```assembly=
average_v3:
.LFB2:
.cfi_startproc
endbr64
movl %edi, %eax
shrl %eax
movl %esi, %edx
shrl %edx
addl %edx, %eax
andl %esi, %edi
andl $1, %edi
addl %edi, %eax
ret
```
**Last version of average**
:::spoiler C code
```c
uint32_t average_v3(uint32_t a, uint32_t b)
{
return ((a ^ b) + ((a & b ) >> 1));
}
```
:::
```assembly=
average_v4:
.LFB3:
.cfi_startproc
endbr64
movl %edi, %eax
andl %esi, %eax
shrl %eax
xorl %esi, %edi
addl %edi, %eax
ret
```
The last version takes 3 instructions less than the 3rd one. It's not so obvious, because in the c code we only use 2 operations less than the previous version.
### Linux average.h
I've just captured the relevant part
```c=
static inline void ewma_##name##_add(struct ewma_##name *e, \
unsigned long val) \
{ \
unsigned long internal = READ_ONCE(e->internal); \
unsigned long weight_rcp = ilog2(_weight_rcp); \
unsigned long precision = _precision; \
\
BUILD_BUG_ON(!__builtin_constant_p(_precision)); \
BUILD_BUG_ON(!__builtin_constant_p(_weight_rcp)); \
BUILD_BUG_ON((_precision) > 30); \
BUILD_BUG_ON_NOT_POWER_OF_2(_weight_rcp); \
\
WRITE_ONCE(e->internal, internal ? \
(((internal << weight_rcp) - internal) + \
(val << precision)) >> weight_rcp : \
(val << precision)); \
}
```
The if in line 13 is in case there is no value it is just initialized, so return the newest value.
Above code have specified that `weight_rcp` must be a power of 2 for efficiency.
I spent a bit of time understandint lines `14-15`:
The equation is
$$
average = (average(1-\frac{1}{w}) + val(\frac{1}{w}))
$$
and above implementation is better understood if putting it this way.
$$
average = (average(\frac{w-1}{w}) + val(\frac{1}{w}))
$$
The question is doesn't the value overflow when adding up?
---
## 測驗`2`[Max](https://hackmd.io/@sysprog/linux2022-quiz2#%E6%B8%AC%E9%A9%97-2)
```c
#include <stdint.h>
uint32_t max(uint32_t a, uint32_t b)
{
return a ^ ((EXP4) & -(EXP5));
}
```
EXP4 = `a ^ b`
EXP5 = `a <= b`
### Thought process
:::info
Must check [XOR tricks](https://florian.github.io/xor-trick/) first.
:::
- Given the hint that `EXP4` must contain a **boolean operator (&/|/^)** and `EXP5` must contain a **comparison operator** between `a` and `b`
- And knowing that `x ^ 0 = x`, `x ^ x = 0`, and the commutativity law <br> `x ^ y = y ^ x`
- Using the above mention laws `x ^ y ^ x = x ^ x ^ y = 0 ^ y = y`
So considering the case that `a > b` is true:
1. `a ^ ((EXP4) & -(EXP5)) ` must return `a`
2. So we want the expression to result into `a ^ 0`
3. We can achieve that by simply letting `EXP5` to be ` a <= b`
4. This yields ` a ^ ((EXP4) & -(a <= b)` = `a ^ (EXP4 & -0)` = `a ^ 0` = `a`
Now, considering that `a <= b` is true:
1. The expression reduces to: <br> `a ^ ((EXP4) & -(a <= b))` = `a ^ ((EXP4) & -1)` = `a ^ (EXP4)` <br>(`-1` is `0xfffffff` which is all ones in binary)
2. So `a ^ (EXP4)` must return `b`
3. EXP4 must be `a ^ b` this case
## 測驗`3` [GDC](https://hackmd.io/@sysprog/linux2022-quiz2#%E6%B8%AC%E9%A9%97-3)
```c
#include <stdint.h>
uint64_t gcd64(uint64_t u, uint64_t v)
{
if (!u || !v) return u | v;
while (v) {
uint64_t t = v;
v = u % v;
u = t;
}
return u;
}
```
Below is also an implementation of GDC but without using the modulo operation
```c=
#include <stdint.h>
uint64_t gcd64(uint64_t u, uint64_t v)
{
if (!u || !v) return u | v;
int shift;
for (shift = 0; !((u | v) & 1); shift++) {
u /= 2, v /= 2;
}
while (!(u & 1))
u /= 2;
do {
while (!(v & 1))
v /= 2;
if (u < v) {
v -= u;
} else {
uint64_t t = u - v;
u = v;
v = t;
}
} while (COND);
return RET;
}
```
COND = `v`
RET = `u<<shift`
### Analisis
First have a look at [GDC Euclid's algorithm](https://en.wikipedia.org/wiki/Euclidean_algorithm) **worked example part**
1. Lines `6-8` calculates the GDC of multiples 2 it can be done faster using right shiftings.
2. After we got all the multiples of 2, we can now divide `u` and `v` until there are no multiples of 2 left.(lines `9-13`)
3. Then we perform the euclidean algorithm until `v` is 0. So COND must be `v`
4. And the return value is $$u\times2^{shift} $$
5. `u<<shift`
### Implementation usint __builtin_ctz()
```c
#include <stdint.h>
uint64_t gcd64(uint64_t u, uint64_t v)
{
if (!u || !v) return u | v;
int shift_v = __builtin_ctz(v);
int shift_u = __builtin_ctz(u);
int shift = min(u,v);
u >>= shift_u, v >>= shift_v;
do {
if (u < v) {
v -= u;
} else {
uint64_t t = u - v;
u = v;
v = t;
}
} while (v != 0);
return u << shift;
}
```
:::warning
As in wikipedia pointed out this implementation without modulo would take many iterations if `v` is much greater than`u` or the other way around.
e.g.
When `v = gdc * n` and `u = gdc `
Then the algorithm would have to perform n times.
I doubt how often values `v` and `u` differs by a lot and how fast would the modulo version run compared to this one.
:::
## 測驗`4` [Bitset](https://hackmd.io/@sysprog/linux2022-quiz2#%E6%B8%AC%E9%A9%97-4)
```c
#include <stddef.h>
size_t improved(uint64_t *bitmap, size_t bitmapsize, uint32_t *out)
{
size_t pos = 0;
uint64_t bitset;
for (size_t k = 0; k < bitmapsize; ++k) {
bitset = bitmap[k];
while (bitset != 0) {
uint64_t t = EXP6;
int r = __builtin_ctzll(bitset);
out[pos++] = k * 64 + r;
bitset ^= t;
}
}
return pos;
}
```
we can get the position of the last non-zero bit by operating `x & -x`
let` x = 01011100`
so`-x = 10100100`
`x & -x = 00000100` clearing out all the bits except the last non-zero bit
EXP6 = `bitset & -bitset`
## 測驗`5`[Fraction to recurring decimal](https://hackmd.io/@sysprog/linux2022-quiz2#%E6%B8%AC%E9%A9%97-5)
```c
#include <stdbool.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include "list.h"
struct rem_node {
int key;
int index;
struct list_head link;
};
static int find(struct list_head *heads, int size, int key)
{
struct rem_node *node;
int hash = key % size;
list_for_each_entry (node, &heads[hash], link) {
if (key == node->key)
return node->index;
}
return -1;
}
char *fractionToDecimal(int numerator, int denominator)
{
int size = 1024;
char *result = malloc(size);
char *p = result;
if (denominator == 0) {
result[0] = '\0';
return result;
}
if (numerator == 0) {
result[0] = '0';
result[1] = '\0';
return result;
}
/* using long long type make sure there has no integer overflow */
long long n = numerator;
long long d = denominator;
/* deal with negtive cases */
if (n < 0)
n = -n;
if (d < 0)
d = -d;
bool sign = (float) numerator / denominator >= 0;
if (!sign)
*p++ = '-';
long long remainder = n % d;
long long division = n / d;
sprintf(p, "%ld", division > 0 ? (long) division : (long) -division);
if (remainder == 0)
return result;
p = result + strlen(result);
*p++ = '.';
/* Using a map to record all of reminders and their position.
* if the reminder appeared before, which means the repeated loop begin,
*/
char *decimal = malloc(size);
memset(decimal, 0, size);
char *q = decimal;
size = 1333;
struct list_head *heads = malloc(size * sizeof(*heads));
for (int i = 0; i < size; i++)
INIT_LIST_HEAD(&heads[i]);
for (int i = 0; remainder; i++) {
int pos = find(heads, size, remainder);
if (pos >= 0) {
while (PPP > 0)
*p++ = *decimal++;
*p++ = '(';
while (*decimal != '\0')
*p++ = *decimal++;
*p++ = ')';
*p = '\0';
return result;
}
struct rem_node *node = malloc(sizeof(*node));
node->key = remainder;
node->index = i;
MMM(&node->link, EEE);
*q++ = (remainder * 10) / d + '0';
remainder = (remainder * 10) % d;
}
strcpy(p, decimal);
return result;
}
```
### Thought process
`MMM(&node->link, EEE)` is when remainder has not appeared before so we need to add it into hash map.
So `MMM` must be `list_add`, `EEE` address to the list and its the index where it's hashed `&heads[remainder % size]`
## 測驗`6`[__alignof__](https://hackmd.io/@sysprog/linux2022-quiz2#%E6%B8%AC%E9%A9%97-6)
```c
/*
* ALIGNOF - get the alignment of a type
* @t: the type to test
*
* This returns a safe alignment for the given type.
*/
#define ALIGNOF(t) \
((char *)(&((struct { char c; t _h; } *)0)->M) - (char *)X)
```
M = `_h`
X = `0`
[related article](https://embetronicx.com/tutorials/p_language/c/understanding-of-container_of-macro-in-linux-kernel/)
### Thought process
This seems like a similar implementation to offsetof and sizeof.
- We can regard the function as a substraction of two variables casted as character pointer.
- `(char *)() - (char *)()`
- `(&((struct { char c; t _h; } *)0)->M)` is just a implementation of `offsetof` so `M` is the member of `type t` which is `_h`
- Now we need to substract the offset out the character c that precedes it so `X` by a common choice is `0`
## 測驗`7`[FizzBuzz](https://hackmd.io/@sysprog/linux2022-quiz2#%E6%B8%AC%E9%A9%97-7)
```c
static inline bool is_divisible(uint32_t n, uint64_t M)
{
return n * M <= M - 1;
}
static uint64_t M3 = UINT64_C(0xFFFFFFFFFFFFFFFF) / 3 + 1;
static uint64_t M5 = UINT64_C(0xFFFFFFFFFFFFFFFF) / 5 + 1;
int main(int argc, char **argv)
{
for (size_t i = 1; i <= 100; i++) {
uint8_t div3 = is_divisible(i, M3);
uint8_t div5 = is_divisible(i, M5);
unsigned int length = (2 << KK1) << KK2;
char fmt[9];
strncpy(fmt, &"FizzBuzz%u"[(9 >> div5) >> (KK3)], length);
fmt[length] = '\0';
printf(fmt, i);
printf("\n");
}
return 0;
}
```
## Reference
<`kevinshieh0225`> [quiz2](https://hackmd.io/@Masamaloka/linux2022-quiz2)