cantfindagoodname
    • Create new note
    • Create a note from template
      • Sharing URL Link copied
      • /edit
      • View mode
        • Edit mode
        • View mode
        • Book mode
        • Slide mode
        Edit mode View mode Book mode Slide mode
      • Customize slides
      • Note Permission
      • Read
        • Only me
        • Signed-in users
        • Everyone
        Only me Signed-in users Everyone
      • Write
        • Only me
        • Signed-in users
        • Everyone
        Only me Signed-in users Everyone
      • Engagement control Commenting, Suggest edit, Emoji Reply
    • Invite by email
      Invitee

      This note has no invitees

    • Publish Note

      Share your work with the world Congratulations! 🎉 Your note is out in the world Publish Note

      Your note will be visible on your profile and discoverable by anyone.
      Your note is now live.
      This note is visible on your profile and discoverable online.
      Everyone on the web can find and read all notes of this public team.
      See published notes
      Unpublish note
      Please check the box to agree to the Community Guidelines.
      View profile
    • Commenting
      Permission
      Disabled Forbidden Owners Signed-in users Everyone
    • Enable
    • Permission
      • Forbidden
      • Owners
      • Signed-in users
      • Everyone
    • Suggest edit
      Permission
      Disabled Forbidden Owners Signed-in users Everyone
    • Enable
    • Permission
      • Forbidden
      • Owners
      • Signed-in users
    • Emoji Reply
    • Enable
    • Versions and GitHub Sync
    • Note settings
    • Note Insights New
    • Engagement control
    • Transfer ownership
    • Delete this note
    • Save as template
    • Insert from template
    • Import from
      • Dropbox
      • Google Drive
      • Gist
      • Clipboard
    • Export to
      • Dropbox
      • Google Drive
      • Gist
    • Download
      • Markdown
      • HTML
      • Raw HTML
Menu Note settings Note Insights Versions and GitHub Sync Sharing URL Create Help
Create Create new note Create a note from template
Menu
Options
Engagement control Transfer ownership Delete this note
Import from
Dropbox Google Drive Gist Clipboard
Export to
Dropbox Google Drive Gist
Download
Markdown HTML Raw HTML
Back
Sharing URL Link copied
/edit
View mode
  • Edit mode
  • View mode
  • Book mode
  • Slide mode
Edit mode View mode Book mode Slide mode
Customize slides
Note Permission
Read
Only me
  • Only me
  • Signed-in users
  • Everyone
Only me Signed-in users Everyone
Write
Only me
  • Only me
  • Signed-in users
  • Everyone
Only me Signed-in users Everyone
Engagement control Commenting, Suggest edit, Emoji Reply
  • Invite by email
    Invitee

    This note has no invitees

  • Publish Note

    Share your work with the world Congratulations! 🎉 Your note is out in the world Publish Note

    Your note will be visible on your profile and discoverable by anyone.
    Your note is now live.
    This note is visible on your profile and discoverable online.
    Everyone on the web can find and read all notes of this public team.
    See published notes
    Unpublish note
    Please check the box to agree to the Community Guidelines.
    View profile
    Engagement control
    Commenting
    Permission
    Disabled Forbidden Owners Signed-in users Everyone
    Enable
    Permission
    • Forbidden
    • Owners
    • Signed-in users
    • Everyone
    Suggest edit
    Permission
    Disabled Forbidden Owners Signed-in users Everyone
    Enable
    Permission
    • Forbidden
    • Owners
    • Signed-in users
    Emoji Reply
    Enable
    Import from Dropbox Google Drive Gist Clipboard
       Owned this note    Owned this note      
    Published Linked with GitHub
    • Any changes
      Be notified of any changes
    • Mention me
      Be notified of mention me
    • Unsubscribe
    # 2022q1 Homework3 (quiz3) contributed by < [cantfindagoodname](https://github.com/cantfindagoodname) > > [題目](https://hackmd.io/@sysprog/linux2022-quiz3) ## 實驗環境 ```shell $ cat /proc/version Linux version 5.13.0-30-generic (buildd@lcy02-amd64-003) (gcc (Ubuntu 9.3.0-17ubuntu1~20.04) 9.3.0, GNU ld (GNU Binutils for Ubuntu) 2.34) #33~20.04.1-Ubuntu SMP Mon Feb 7 14:25:10 UTC 2022 $ gcc --version gcc (Ubuntu 9.3.0-17ubuntu1~20.04) 9.3.0 ``` ## 測驗1 ```c /* GENMASK(h, l) */ /* h l */ /* 76543210 */ /* 01110000 */ ``` Fill in `LEFT`, `RIGHT`: ```c #define GENMASK(h, l) \ (((~0UL) >> (LEFT)) & ((~0UL) >> (l) << (RIGHT))) ``` From `man operator` : ```shell Operator Associativity << >> left to right & left to right ``` `~0UL` would produce a stream of set bits, which length depends on length of `long`. > [ISO/IEC 9899[6.5.7:5]](http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1124.pdf) > The result of $E1$ >> $E2$ is E1 right-shifted $E2$ bit positions. If $E1$ has an unsigned type or if $E1$ has a signed type and a nonnegative value, the value of the result is the integral part of the quotient of $E1 / 2^{E2}$. `(~0UL) >> LEFT` would clear the left most `LEFT` bits. Resulting `000...011...`, where number of clear bit would be `LEFT`. `(~0UL) >> (l)` would discard the right most `l` bits, let the result be `r`. > [ISO/IEC 9899[6.5.7:4]](http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1124.pdf) > The result of $E1$ << $E2$ is $E1$ left-shifted $E2$ bit positions; vacated bits are filled with zeros. `r << RIGHT` would clear the right-most `RIGHT` bits. The macro should be doing the operation as followed : ```c /* h l */ /* 01111111 */ /* & 11111000 */ ``` `LEFT` : `63 - h` ( Given 64-bit architecture ) `RIGHT`: `l` However, as stated from above, the vacated bits for right shift are filled with zero. `(((~0UL) >> (63 - h)) & ((~0UL) << (l)))` should yield the same result. ### 延伸問題 #### 解釋上述程式碼運作原理 ```c #define __GENMASK(h, l) \ (((~UL(0)) - (UL(1) << (l)) + 1) & \ (~UL(0) >> (BITS_PER_LONG - 1 - (h)))) #define GENMASK(h, l) \ (GENMASK_INPUT_CHECK(h, l) + __GENMASK(h, l)) ``` `GENMASK_INPUT_CHECK` uses `build_bug.h` library to force compilation error. `(BITS_PER_LONG - 1 - (h))` is equivalent to `63 - h`, `BITS_PER_LONG` would not assume the word size as a constant, it instead takes the word size that is known at compile time. `((~UL(0)) - (UL(1) << (l)) + 1)` uses the property that the first clear bit would clear all prior set bit(s) when add by 1, i.e. `0bXX0111 + 1 = 0bXX1000`. It generates the less significant bit mask. `UL(X)` instead of `XUL` as 1UL and 0UL are hard-coded, therefore they are not available for assembly. [Source](https://github.com/torvalds/linux/commit/95b980d62d52c4c1768ee719e8db3efe27ef52b2) #### 舉出 Linux 核心原始程式碼中 GENMASK 巨集和 include/linux/bitfield.h 的應用案例 `GENMASK` is very popular when programming in lower level, typically used when consecutive bits are used as flags. e.g. [i2c](https://github.com/torvalds/linux/blob/5bfc75d92efd494db37f5c4c173d3639d4772966/drivers/i2c/busses/i2c-fsi.c) ## 測驗2 ```c struct foo; struct fd { struct foo *foo; unsigned int flags; }; enum { FOO_DEFAULT = 0, FOO_ACTION, FOO_UNLOCK, } FOO_FLAGS; static inline struct fd to_fd(unsigned long v) { return (struct fd){EXP1, v & 3}; } ``` > [align_down](https://www.boost.org/doc/libs/1_78_0/doc/html/align/reference.html#align.reference.functions.align_down) > Requires : alignment shall be power of 2 > [ISO/IEC9899:3.2](http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1124.pdf) > **alignment** > requirement that objects of a particular type be located on storage boundaries with addresses that are particular multiples of a byte address The smallest addressable unit of memory is 1 byte, when align for 4 bytes, the address would be multiple of 4. `EXP` : `v - v % 4` We are required to not use `%` (modulo) operator. For power of 2, we can replace modulo with `&` mask. `EXP` : `v & ~0x3` The compiler would give a warning message `makes pointer from integer without a cast`. We can cast the `intptr_t` value to `struct foo *` to dodge the warning. `EXP` : `(struct foo *)(v & ~0x3)` ### 延伸問題 #### 在 Linux 核心原始程式碼找出類似技巧的實作 Any `intptr_t` reference more or less manipulates address as an integer type. e.g. in [kernel.h](https://github.com/torvalds/linux/blob/master/include/linux/kernel.h), there is a macro `u64_to_user_ptr(x)` dedicated to using this technique. The program also uses compound literals, it allows us to use initializer list to create an unnamed object. e.g. in [list.h](https://github.com/torvalds/linux/blob/master/include/linux/list.h), `LIST_HEAD_INIT(name)` would initialize the head of list with compound literals. The implementation of `ALIGN_DOWN` can be found in [linux/align.h](https://github.com/torvalds/linux/blob/master/include/linux/align.h) `ALIGN_DOWN` is referenced in several files, a documented use of `ALIGN_DOWN` would be [fs/btrfs/send.c](https://github.com/torvalds/linux/blob/master/fs/btrfs/send.c), where it states that trying to clone a block would corrupt the data in destination file. ## 測驗3 ```c x = (x >> 4) | (x << 4); x = ((x & 0xCC) >> 2) | (EXP2); x = ((x & 0xAA) >> 1) | (EXP3); ``` `EXP2` : `((x & 0x33) << 2)` `EXP3` : `((x & 0x55) << 1)` ```c x = ((x & 0xF0) >> 4) | ((x & 0x0F) << 4); /* AAAA BBBB => BBBB AAAA */ x = ((x & 0xCC) >> 2) | ((x & 0x33) << 2); /* AABB CCDD => BBAA DDCC */ x = ((x & 0xAA) >> 1) | ((x & 0x55) << 1); /* ABCD EFGH => BADC FEHG */ ``` Take a look at each index : |Dec|Bin| |:-:|:-:| |0|000| |1|000| |2|010| |3|011| |4|100| |5|101| |6|110| |7|111| Index `i` would be moved to `(N-1) - i`, which will be equilvalent to `~i`, as `N` is power of 2. Take `6 (110)` as an example, `~6` would be `001`, which means index 6 would do `>> 4`, `>> 2`, then `<< 1`. As for the mask encoding, it simply encodes which bit is set. i.e. take `0xCC` and `0x33` as example, they are `11001100` and `00110011` respectively. `11001100` would mask for `2`,`3`,`6`, and`7`, where all their second bit are set. Which means they would later shifted by `0b10`, which is why they do `>> 2`. This uses the divide and conquer technique. It splits the data by half every time it does a swap. i.e. ``` A X X X|X X X X A X|X X A|X A ``` ### 延伸問題 #### 在 Linux 核心原始程式碼找出類似技巧的實作並解說其應用場景 Linux kernel would apply this technique when doing conversion between Big Endian and Little Endian. The macros are defined in [byteorder/generic.h](https://github.com/torvalds/linux/blob/master/include/linux/byteorder/generic.h) and the implementation could be seen in [swab.h](https://github.com/torvalds/linux/blob/master/include/uapi/linux/swab.h) Aside referencing the macros in conversion of endianness in interface between two hardware structure such as utility file [endian.h](https://github.com/torvalds/linux/blob/master/tools/include/tools/endian.h), and network interface [ipv4/esp4.c](https://github.com/torvalds/linux/blob/master/net/ipv4/esp4.c), encoding algorithm like secure hash [SHA-256](https://github.com/ilbers/linux/blob/master/arch/x86/purgatory/sha256.c) would also reference the macro to reverse bytes. ## 測驗4 ```c #include <assert.h> #define _foreach_no_nullval(i, p, arr) \ assert((i) >= sizeof(arr) / sizeof(arr[0]) || (p)) #define foreach_int(i, ...) \ for (unsigned _foreach_i = (((i) = ((int[]){__VA_ARGS__})[0]), 0); \ _foreach_i < sizeof((int[]){__VA_ARGS__}) / sizeof(int); \ (i) = ((int[]){__VA_ARGS__, 0})[EXP4]) #define foreach_ptr(i, ...) \ for (unsigned _foreach_i = \ (((i) = (void *) ((typeof(i)[]){__VA_ARGS__})[0]), 0); \ (i); (i) = (void *) ((typeof(i)[]){__VA_ARGS__, \ NULL})[EXP5], \ _foreach_no_nullval(_foreach_i, i, \ ((const void *[]){__VA_ARGS__}))) ``` From [gcc documentation](https://gcc.gnu.org/onlinedocs/cpp/Variadic-Macros.html) > Variadic Macros > A macro can be declared to accept a variable number of arguments much as a function can. The preprocessor would expand `__VA_ARGS__` to argument(s) in `...` (ellipsis) operator. From [ISO/IEC9899:6.5.17](http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1124.pdf) > Comma operator > The left operand of a comma operator is evaluated as a void expression; there is a sequence point after its evaluation. Then the right operand is evaluated; the result has its type and value. Then take a look at `foreach_int` ```c /* Declaration part of for statement */ unsigned _foreach_i = (((i) = ((int[]){__VA_ARGS__})[0]), 0) ``` `_foreach_i` is an unsigned integer type, which would be initialized as 0 `i` is dereferencing a pointer to integer array, which points to first element in `__VA_ARGS__` ```c /* In controlling expression of for statement */ _foreach_i < sizeof((int[]){__VA_ARGS__}) / sizeof(int) ``` The LHS indicates `_foreach_i` would be the iterator. The RHS would be identical to `sizeof (int *) / sizeof(int)`, which indicates the number of elements. ```c /* Final expression of for statement */ (i) = ((int[]){__VA_ARGS__, 0})[EXP4] ``` The statement would let `i` access the next element to the one it is currently accessing, which would be `[_foreach_i + 1]`. Right before the statement is executed in the first iteration, the loop body would let `i` access the 0th element, and `_foreach_i` is `0`. Hence, `EXP4` should be a prefix `++` operator. By then, the loop would terminates when `_foreach_i == sizeof((int[]){__VA_ARGS__}) / sizeof(int)`, and would not cause out-of-bound memory access. `EXP4` : `++_foreach_i` The macro `foreach_ptr` is very similar to `foreach_int`, except it would pass pointer instead. `EXP5` : `++_foreach_i` One difference is the macro would check if the pointer is `NULL` ```c #define _foreach_no_nullval(i, p, arr) \ assert((i) >= sizeof(arr) / sizeof(arr[0]) || (p)) ``` From [ISO/IEC9899:6.5.14/4](http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1124.pdf) > Logical OR operator > If the first operand compares unequal to 0, the second operand is not evaluated. The first argument is `_foreach_i` assert would not give warning before `_foreach_i` reaches the last element.`p` `NULL` check would not be evaluated. Once `_foreach_i` reaches the last element, the compiler would handle cases when the element isn't `NULL`. ## 測驗5 `EXP6` : `dvs << shift` `EXP7` : `dvd -= dvs << shift` ```c unsigned int dvd = dividend; if (dividend < 0) { signal *= -1; dvd = ~dvd + 1; } unsigned int dvs = divisor; if (divisor < 0) { signal *= -1; dvs = ~dvs + 1; } ``` The first part of the program would ensure the division is operated for two positive numbers. `signal` is recorded to let the program follow the rule for sign of multiplication. ```c int shift = 0; while (dvd > (dvs << shift)) shift++; unsigned int res = 0; while (dvd >= dvs) { while (dvd < (dvs << shift)) shift--; res |= (unsigned int) 1 << shift; dvd -= dvs << shift; } ``` The remaining part of would be doing long division in binary. The progarm would align msb of `dvd` and `dvs` then subtract using `shift` (quotient is set after subtraction). ### 延伸問題 Improvement 1 : ```c while (dvd > (dvs << shift)) shift++; ``` The above program would align both `dvd` and `dvs` by looping each binary digits, use `clz` would be faster. $O(digit)\rightarrow O(1)$ ```c shift = __builtin_clz(dvs) - __builtin_clz(dvd); ``` The while loop would do at most once. (Could be replaced with `if`) ![](https://i.imgur.com/WGqZngP.png) Improvement 2 : Apply hardware binary division (Reference : [Computer Organization and Design MIPS EDITION .6E](https://www.tenlong.com.tw/products/9780128201091)) ![](https://i.imgur.com/ulWGqhe.png) The algorithm is same as before, however, `shift` is replaced by bitmask (assume 32-bit) and shift by 1. ![](https://i.imgur.com/YlpPqVQ.png) The result won't be faster in best case, however, it is more consistenly lower than the previous version (improvent 1). ### 延伸問題2 Fixed Point Arithmetic When the precision is known, fixed point arithmetic would be better than floating point to not make the value "float". This ensures the value would be correct every calculation, and the arithmetic is faster than floating point arithmetic. The idea is as shown : ```c val = n << precision /* get value */ val >> precision . val & mask ``` One example could be seen at [quiz2/1.EMWA](https://hackmd.io/@sysprog/linux2022-quiz2#%E6%B8%AC%E9%A9%97-1) ## 測驗6 variables : ```c int pointsSize; /* @args : number of points */ int *dup_cnts = malloc(pointsSize * sizeof(int)); int *hori_cnts = malloc(pointsSize * sizeof(int)); int *vert_cnts = malloc(pointsSize * sizeof(int)); int slope_size = pointsSize * pointsSize / 2 + 133; int *slope_cnts = malloc(slope_size * sizeof(int)); ``` From the name (and its implementation), `$(N)##_cnts` would mean number of points that shares some property `dup` : same coordinates, (including self) `hori`: same x-coordinates `vert`: same y-coordinates We can deduce `slope` is the remaining points, seperated by their slope/gradient. :::info the theoretical value for number of slope would be $\dfrac{{pointsSize}*{(pointsSize-1)}}{2}$, not including inverse for slope : (x,y) shares the same line with (y,x). Not sure why `+133` is needed ::: The last part of program simply takes the list with largest number : ```c int max_num = 0; for (i = 0; i < pointsSize; i++) { if (hori_cnts[i] + 1 > max_num) max_num = hori_cnts[i] + 1; if (vert_cnts[i] + 1 > max_num) max_num = vert_cnts[i] + 1; } for (i = 0; i < slope_size; i++) { if (slope_cnts[i] > max_num) max_num = slope_cnts[i]; } ``` The code to find group points with slope : ```c /* i and j is every combination of two independent input points */ if (points[i].x != points[j].x && points[i].y != points[j].y) { /* find slope as irreducible fraction */ int dx = points[j].x - points[i].x; int dy = points[j].y - points[i].y; int tmp = gcd(dx, dy); dx /= tmp; dy /= tmp; /* Start of code segment */ int hash = dx * dy - 1333 * (dx + dy); if (hash < 0) hash = -hash; hash %= slope_size; if (can_insert(&heads[hash], i, j)) { struct point_node *pn = malloc(sizeof(*pn)); pn->p1 = i; pn->p2 = j; EXP9; } } /* can_insert */ static bool can_insert(struct list_head *head, int p1, int p2) { struct point_node *pn; list_for_each_entry (pn, head, link) return EXP8; return true; } ``` `EXP9` in the branch `can_insert`, it should be inserting into the list `EXP9` : `list_add(&pn->link, &head[hash])` Without knowing how the hash is calculated, assume `heads[hash]` each be list of points on the same straight line. `can_insert` is very likely to be a function to prevent recounting line segments. Take a straight line for example : ```c /* a--b--c */ /* 0 1 2 */ ``` The current program would tranverse from front to back with : ```c for (int i = 0;i < sz;++i) /* should be i < sz - 1 */ for (int j = i + 1;j < sz;++j) /* ... */ ``` which means, the line segment `a-b`, `a-c`, and `b-c`. In which `a-c` and `b-c` is repeated. `can_insert` prevents this by counting points with same slope with 1 point. `EXP8` : `pn->p1 == p1` After finding every slopes, the program include duplicates : ```c for (i = 0; i < slope_size; i++) { int index = -1; struct point_node *pn; list_for_each_entry (pn, &heads[i], link) { index = pn->p1; slope_cnts[i]++; } if (index >= 0) slope_cnts[i] += dup_cnts[index]; } ``` ### 延伸問題 which is the implementation of hash function could cause false positive errors. ```c printf("%d\n", maxPoints( (struct Point []){ [0].x=0,[0].y=0, [1].x=1,[1].y=1, [2].x=12,[2].y=19, [3].x=24,[3].y=38, [4].x=36,[4].y=57, [5].x=40,[5].y=12, [6].x=5,[6].y=6, [7].x=6,[7].y=7, [8].x=7,[8].y=8, [9].x=8,[9].y=9, }, 10)); ``` ![](https://i.imgur.com/Ivg1w1F.png) The function above would produce an output `5`, as `(dx,dy)` = `(1,1)` and `(12,19)` when `slope_count` = 10 would produce hash 103, they both would pass point `(0,0)`, hence the output is `5` instead of `4` (straight line are grouped together) Note : mint colored line starts with `(0,0)`, blue colored line intersects x-axis at `(0,1)` Hash map makes time complexity goes from $O(n^3)$ to $O(n^2)$, rough estimate : n - tranverse point, n - tranverse point to compare, n (to 1 as hash) check slope list = $O(n^2)$ To satisfy space requirement, there should be time trade-off. Change hash table to linked-list and the space complexity is immediately lowered if not worst-case, but the time complexity would goes up a dimension with the random access property from hash table lost. We could potentially make the program more efficient by tackling the algorithm There are few ways to confirm three points are collinear, such as : find line equation (slope being one of the method), calculate angle (180 degree or perpendicular bisector) (angle in computer program would have low precision, unless apply special formulas) (same with slope but potentially time save), area = 0 (determinant) ($O(1)$ time complexity in calculation but has large hidden constant, $O(n^3) + O(1)$ vs $O(n^3) + O(logn)$) ## 測驗7 From the description, we need to find : > the minimum number of bits required to store an unsigned 32-bit value without any leading zero bits Which the function returns output similar to : ```c 32 - __builtin_clz(v) ``` With the similar structure repeated few times ```c m = (v > 0xFFFU) << 4; /* ... */ m = (v > 0xFFU) << 3 /* ... */ m = (v > 0xFU) << 2 ``` We can expect similar result below : ```c int ilog32(uint32_t v) { int ret = v > 0; int m = (v > 0xFFFFU) << 4; v >>= m; ret |= m; m = (v > 0xFFU) << 3; v >>= m; ret |= m; m = (v > 0xFU) << 2; v >>= m; ret |= m; m = (v > 0x3) << 1; v >>= m; ret |= m; ret += v > 1; return ret; } ``` The program would use technique similar to binary search, in which it divides the value by half and determine which portion (upper or lower) the data is in. As an example : ```c m = (v > 0xFFFFU) << 4; v >>= m; ``` v would be an unsigned 32-bit integer, hence the maximum value would be `0xFFFFFFFF` `v > 0xFFFFU` would check if the most significant half of v has value. If so, at least (16 + 1) bits is needed to store the number. e.g. ```c /* * base-16 * XXXX YYYY * XXXX is not clear * YYYY must be stored (16-bits) */ (v > 0xFFFFU) << 4; 1 << 4; // 16 ``` Hence, we got `(v > 0xFFFFU) << 4`, which should be `(16)` ```c m = (v > 0xFFU) << 3; v >>= m; ``` The following part of the program would do the same, as the program already did `v >>= m`, it would check if 8 additional bits should be present to store the number, in addition to last calculation. There should be 1 additional bit for each check ```c /* v = 0x10000 */ m = (v > 0xFFFFU) << 4; // 16 but at least 17 bits are needed ``` The program would add the bit in the beginning : ```c int ret = v > 0; ``` All `ret |= m ` would just add `m` to `ret`, for all those additional bits, however, as m must be power of 2, bitwise OR `|` operator is used instead. To make the program more readable and maintainable,`ret |= m` could be replaced by `ret += m` ### 延伸問題 As stated earlier, `ilog32` did the same thing as `32 - __builtin_clz(v)` > Tested with ```c int main() { for (uint32_t i = 0;i < UINT_MAX;++i){ printf("%d - Result : %d, Expected : %d\n", i, ilog32(i), 32 - __builtin_clz(i)); fflush(stdout); assert(ilog32(i) == 32 - __builtin_clz(i)); printf("\x1b[1F\x1b[2K"); } printf("ilog32 is no different to 32 - clz\n"); } ``` The code can be found in [bitops.h](https://github.com/torvalds/linux/blob/master/arch/arc/include/asm/bitops.h) ```c /* * fls = Find Last Set in word * @result: [1-32] * fls(1) = 1, fls(0x80000000) = 32, fls(0) = 0 */ static inline __attribute__ ((const)) int fls(unsigned int x) { if (__builtin_constant_p(x)) return constant_fls(x); return 32 - clz(x); } ``` --- The idea of deBrujin is to find a hash function that generates unique output for each set bit. $h(x) = (x * deBrujin) >> (n - lgn)$ ```c /* 8-bit value */ /* n = 8, lgn = 3 */ /* 1 2 3 4 5 6 7 8 */ ``` ${ >> (n - lgn)}$ ensures we get $lgn$ bits possible outputs [deBrujin sequence](https://en.wikipedia.org/wiki/De_Bruijn_sequence), $B(n, lgn)$ makes sure every single output is an unique output. This satisfies all three requirement of hash function > we need > - the hash table to be small > - the hash function to be easily computable > - the hash function to produce no collisions :::info The [paper](http://supertech.csail.mit.edu/papers/debruijn.pdf) gives an example to do `ffs()` (find first set) with `x & -x`, `fls()` could be done using similar technique ::: ## 測驗8 The point is to replace poniter operation with pointer to pointer operation ```c /* Original version */ if (d < p->data) { q = p; p = p->left; } else { q = p; p = p->right; } /* Modified version */ if (d < (*p)->data) p = &(*p)->left; else p = &(*p)->right; ``` ```c /* Original version */ while (r->left) { q = r; r = r->left; } /* Modified version */ while ((*r)->left) r = (*r)->left; ``` tested with : ```c tree insert(tree &t, int d) { if (t == nullptr) return new tnode(d); if (d < t->data) t->left = insert(t->left, d); else t->right = insert(t->right, d); return t; } /* lvr inorder */ void print(tree &t) { if (t == nullptr) return; if (t->left) print(t->left); std::cout << t->data; if (t->right) print(t->right); } ``` `EXP12` : `&(*p)->left` `EXP13` : `&(*p)->right` `EXP14` : `&(*r)->left` :::info Even though `*p = (*p)->left` looks indentical to `&(*p)->left`, they are completely different. Difference : > original ![](https://i.imgur.com/0ztnVx1.png) > `*p = (*p)->left` ![](https://i.imgur.com/QolNTWc.png) > `p = &(*p)->left` ![](https://i.imgur.com/Jgj8sQ7.png) ::: ### 延伸問題 [Static B-Tree](https://en.algorithmica.org/hpc/data-structures/s-tree/) // TODO : Revision 2,3 Tree, 2,3,4 Tree (B-Tree), B+ Tree > Reference > [Fundamentals of Data Structures in C, 2/e](https://www.tenlong.com.tw/products/9780929306407) ## 測驗9 Alignment macro that rounds up would maps the least multiple of alignment greater than or equal to the argument. i.e. $Align(x) = min\{n \in \mathbb{Z}\ |\ {ALIGNMENT}*n\ge x\}$ ```c /* maximum alignment needed for any type on this platform, rounded up to a power of two */ #define MAX_ALIGNMENT 16 /* Given a size, round up to the next multiple of sizeof(void *) */ #define ROUND_UP_TO_ALIGNMENT_SIZE(x) \ (((x) + MAX_ALIGNMENT - MMM) & ~(NNN)) ``` The alignment is 16, which means we would need to have a bitmask `1111`. The mask could be calculated by doing `MAX_ALIGNMENT - 1` A very straight forward way of rounding up would be : ```c x & ~(MAX_ALIGNMENT - 1) + (MAX_ALIGNMENT ^ -(x > 0)) ``` Where the lower order bits are cleared to align with the alignment, then round up the result if the argument is non-zero. The test should use the same logic, but with a different approach. The bitwise AND operator `&` should be clearing the lower order bits to align the input, hence : `NNN` : `MAX_ALIGNMENT - 1` Whats remaining would be rounding up if the argument is non-zero. The lower order bits would later be cleared therefore only the most significant carry bit matters. ```c x + MAX_ALIGNMENT - MMM ``` should result in a value that is between `MAX_ALIGNMENT` and `MAX_ALIGNMENT << 1`, if `x` is non-zero, and is `MAX_ALIGNMENT - 1` if `x` is zero. And so, `MMM` : `1` ### 延伸問題 round down would be very straight forward, which is ```c #define ROUND_DOWN_TO_ALIGNMENT_SIZE(x) \ ((x) % MAX_ALIGNMENT) ``` Conveniently, `MAX_ALIGNMENT` is power of 2, hence the macro could be redefined as: ```c #define ROUND_DOWN_TO_ALIGNMENT_SIZE(x) \ ((x) & (MAX_ALIGNMENT - 1)) ``` --- Not a program from Linux Kernel but still applying the idea : In [2022q1 fibdrv](https://hackmd.io/@sysprog/linux2022-fibdrv), we are required to write a program to do arithmetic on big number that surpass the 64-bit range of `unsigned long` in LP64 Data Model. One common approach is to encode the value as a character array then define a new function to do arithmetic operation. In those function, code segment to tranverse the array would be used extensively. In modern CPU architecture, a character is normally 1 byte wide and the width of data bus (1 word) goes up to 8 bytes wide. Everytime we do an operation on a character, the data bus of CPU is not fully utilized, hence we could do the following optimization : ```c int l = strlen(s); switch (l && 0x3) { for (; l > 0; l -= 4) { case 0: /* do something */ case 3: /* do something */ case 2: /* do something */ case 1: /* do something */ } } ``` Even though the implementation above would make use of the 32-bit wide bus, there would be repeating code in `/* do something */` and `/* do something */` is called multiple times. It is wise to explicitly cast the character to integer. ```c int l = 0; uint32_t v; while((v - 0x01010101) & ~v & (0x80808080)) { v = *(uint32_t *)s; /* do something */ s += 4; // move pointer } if (((v) & 0xFF) == '\0') /* do something on s */; if (((v >> 8) & 0xFF) == '\0') /* do something on s, s+1 */; if (((v >> 16) & 0xFF) == '\0') /* do something on s, s+1, s+2 */; ``` Even though both of the implementations are almost identical to each other, with the latter one being more efficient in theory ( less `/* do something */` call ). In reality, even though the latter program sometimes outperforms the previous implenentation, often more than not it suffers from severe performance degration. Upon further investigation, I found out that this is caused operation on unaligned memory blocks. ![](https://i.imgur.com/e8xNvHl.png) As shown as figure, `SP` means the 4 bytes accessed in previous iteration, `SC` means 4 bytes accessed in current iteration, and `SN` means 4 bytes accessed in next iteration. The program would have performance issue as `s += 4` sometimes makes the program access memory accross multiple blocks. This is evident when the following code is added : ```c while (&s != ROUND_DOWN_TO_ALIGNMENT_SIZE(&s)) { /* do something */ ++s; } ``` ## 測驗10 ```c #define DIVIDE_ROUND_CLOSEST(x, divisor) \ ({ \ typeof(x) __x = x; \ typeof(divisor) __d = divisor; \ (((typeof(x)) -1) > 0 || ((typeof(divisor)) -1) > 0 || \ (((__x) > 0) == ((__d) > 0))) \ ? ((RRR) / (__d)) \ : ((SSS) / (__d)); \ }) ``` `(typeof(x)) -1) > 0 || ((typeof(divisor)) -1) > 0` clearly indicates that both types are unsigned. Combined with `(__x) > 0) == ((__d) > 0)` we can deduce the following : `((RRR) / (__d))` if both `x` and `divisor` are positive, `((SSS) / (__d))` if `x` is negative, ( `divisor` negtive undefined from description) Take `3` as divisor : |${\ \ x\ \ }$|$f(x,3)$| |:-:|:-:| |1|0| |2|1| |3|<font color="#F00">**1**</font>| |4|1| |5|2| |6|<font color="#F00">**2**</font>| |7|2| |8|3| |9|<font color="#F00">**3**</font>| round nearest would mean $dvd/dvs$ is closer to ${\ n*dvs\ }$ or ${\ (n+1)*dvs\ }$ i.e. `5` is closer to `6` or `3`, `5` is at upper portion or lower portion of `4.5` (`(3+6)/2`) As C truncate the result of division, this could be found by adding `dvd` by `dvs / 2`. Hence, `RRR` : `(__x) + (__dvs >> 1)` When dividend is negative, the magnitude of number would be increased by subtracting. Hence, `SSS` : `(__x) - (__dvs >> 1)` ## 測驗11 `fls` would do find last set (zero-indexed) as the maximum value is 63 It will match bit-mask to check from first half (similar to test7, but opposite direction) ```c unsigned long i_sqrt(unsigned long x) { unsigned long b, m, y = 0; if (x <= 1) return x; m = 1UL << (fls(x) & ~1UL); while (m) { b = y + m; XXX; if (x >= b) { YYY; y += m; } ZZZ; } return y; } ``` `i_sqrt` is equilvalent to `floor(sqrt(x))` ```c m = 1UL << (fls(x) & ~1UL) ``` It is not obvious what this statement does so I design a simple experiment to see what this statement would do. ```c for (unsigned int i = 1; i ;i <<= 1) printf("%10u %10lu\n", i, 1UL << (fls(i) & ~1UL)); ``` output : ```c 1 1 2 1 4 4 8 4 16 16 32 16 64 64 128 64 256 256 512 256 1024 1024 2048 1024 4096 4096 8192 4096 16384 16384 32768 16384 65536 65536 131072 65536 262144 262144 524288 262144 1048576 1048576 2097152 1048576 4194304 4194304 8388608 4194304 16777216 16777216 33554432 16777216 67108864 67108864 134217728 67108864 268435456 268435456 536870912 268435456 1073741824 1073741824 2147483648 1073741824 ``` We can see that the result is a sequence of $2^{2n}$, repeating twice every time. `(fls(x) & ~1UL)` is clear last bit of fls(x), which is the reason why every two of the output will yield the same result, as the sequence of `fls` would go in odd-even pair. The sequence for $2^{2n}$ is chosen as we can easily find its square root as shown : $\sqrt{2^{2n}} = 2^{\frac{2n}{2}} = 2^n$ ```c m = 1UL << (fls(x) & ~1UL); while (m) { b = y + m; XXX; if (x >= b) { YYY; y += m; } ZZZ; } ``` The condition of the while loop is `m`, and `m` is not modified originally. Hence, if the loop is following good coding practice, `m` should be modified in `ZZZ` `m` must be one possible output of `1UL << (fls(x) & ~1UL)` to maintain loop invariant. `m` would initially be upper bound of `sqrt(x)`, as shown : ```c printf("%-16s %-16s\n", "n", "n^2"); for (unsigned int i = 0;i < 17;++i) { print_bin16(i); print_bin16(i * i); printf("%10lu\n", 1UL << (fls(i * i) & ~1UL)); } ``` Output : ```c n n^2 0000000000000000 0000000000000000 1 0000000000000001 0000000000000001 1 0000000000000010 0000000000000100 4 0000000000000011 0000000000001001 4 0000000000000100 0000000000010000 16 0000000000000101 0000000000011001 16 0000000000000110 0000000000100100 16 0000000000000111 0000000000110001 16 0000000000001000 0000000001000000 64 0000000000001001 0000000001010001 64 0000000000001010 0000000001100100 64 0000000000001011 0000000001111001 64 0000000000001100 0000000010010000 64 0000000000001101 0000000010101001 64 0000000000001110 0000000011000100 64 0000000000001111 0000000011100001 64 0000000000010000 0000000100000000 256 ``` Observation : ```c n n^2 0000000000000011 0000000000001001 4 0000000000000100 0000000000010000 16 /* ... */ 0000000000000111 0000000000110001 16 0000000000001000 0000000001000000 64 ``` every time `n` needs one more bit to represent the number, `n^2` would need two. Resulting in `fls(2^n) & ~1UL` By every loop `m` would be divided by $2^2$, hence, `ZZZ` : `m >>= 2` How the algorithm works : Reference : [二進位系統的平方根](https://hackmd.io/@sysprog/sqrt#%E4%BA%8C%E9%80%B2%E4%BD%8D%E7%B3%BB%E7%B5%B1%E7%9A%84%E5%B9%B3%E6%96%B9%E6%A0%B9), [How to calculate a square root without a calculator](https://www.homeschoolmath.net/teaching/square-root-algorithm.php), [Why the square root algorithm works](https://www.homeschoolmath.net/teaching/sqr-algorithm-why-works.php)

    Import from clipboard

    Paste your markdown or webpage here...

    Advanced permission required

    Your current role can only read. Ask the system administrator to acquire write and comment permission.

    This team is disabled

    Sorry, this team is disabled. You can't edit this note.

    This note is locked

    Sorry, only owner can edit this note.

    Reach the limit

    Sorry, you've reached the max length this note can be.
    Please reduce the content or divide it to more notes, thank you!

    Import from Gist

    Import from Snippet

    or

    Export to Snippet

    Are you sure?

    Do you really want to delete this note?
    All users will lose their connection.

    Create a note from template

    Create a note from template

    Oops...
    This template has been removed or transferred.
    Upgrade
    All
    • All
    • Team
    No template.

    Create a template

    Upgrade

    Delete template

    Do you really want to delete this template?
    Turn this template into a regular note and keep its content, versions, and comments.

    This page need refresh

    You have an incompatible client version.
    Refresh to update.
    New version available!
    See releases notes here
    Refresh to enjoy new features.
    Your user state has changed.
    Refresh to load new user state.

    Sign in

    Forgot password

    or

    By clicking below, you agree to our terms of service.

    Sign in via Facebook Sign in via Twitter Sign in via GitHub Sign in via Dropbox Sign in with Wallet
    Wallet ( )
    Connect another wallet

    New to HackMD? Sign up

    Help

    • English
    • 中文
    • Français
    • Deutsch
    • 日本語
    • Español
    • Català
    • Ελληνικά
    • Português
    • italiano
    • Türkçe
    • Русский
    • Nederlands
    • hrvatski jezik
    • język polski
    • Українська
    • हिन्दी
    • svenska
    • Esperanto
    • dansk

    Documents

    Help & Tutorial

    How to use Book mode

    Slide Example

    API Docs

    Edit in VSCode

    Install browser extension

    Contacts

    Feedback

    Discord

    Send us email

    Resources

    Releases

    Pricing

    Blog

    Policy

    Terms

    Privacy

    Cheatsheet

    Syntax Example Reference
    # Header Header 基本排版
    - Unordered List
    • Unordered List
    1. Ordered List
    1. Ordered List
    - [ ] Todo List
    • Todo List
    > Blockquote
    Blockquote
    **Bold font** Bold font
    *Italics font* Italics font
    ~~Strikethrough~~ Strikethrough
    19^th^ 19th
    H~2~O H2O
    ++Inserted text++ Inserted text
    ==Marked text== Marked text
    [link text](https:// "title") Link
    ![image alt](https:// "title") Image
    `Code` Code 在筆記中貼入程式碼
    ```javascript
    var i = 0;
    ```
    var i = 0;
    :smile: :smile: Emoji list
    {%youtube youtube_id %} Externals
    $L^aT_eX$ LaTeX
    :::info
    This is a alert area.
    :::

    This is a alert area.

    Versions and GitHub Sync
    Get Full History Access

    • Edit version name
    • Delete

    revision author avatar     named on  

    More Less

    Note content is identical to the latest version.
    Compare
      Choose a version
      No search result
      Version not found
    Sign in to link this note to GitHub
    Learn more
    This note is not linked with GitHub
     

    Feedback

    Submission failed, please try again

    Thanks for your support.

    On a scale of 0-10, how likely is it that you would recommend HackMD to your friends, family or business associates?

    Please give us some advice and help us improve HackMD.

     

    Thanks for your feedback

    Remove version name

    Do you want to remove this version name and description?

    Transfer ownership

    Transfer to
      Warning: is a public team. If you transfer note to this team, everyone on the web can find and read this note.

        Link with GitHub

        Please authorize HackMD on GitHub
        • Please sign in to GitHub and install the HackMD app on your GitHub repo.
        • HackMD links with GitHub through a GitHub App. You can choose which repo to install our App.
        Learn more  Sign in to GitHub

        Push the note to GitHub Push to GitHub Pull a file from GitHub

          Authorize again
         

        Choose which file to push to

        Select repo
        Refresh Authorize more repos
        Select branch
        Select file
        Select branch
        Choose version(s) to push
        • Save a new version and push
        • Choose from existing versions
        Include title and tags
        Available push count

        Pull from GitHub

         
        File from GitHub
        File from HackMD

        GitHub Link Settings

        File linked

        Linked by
        File path
        Last synced branch
        Available push count

        Danger Zone

        Unlink
        You will no longer receive notification when GitHub file changes after unlink.

        Syncing

        Push failed

        Push successfully