# 2019q3Homework2 (quiz2) contributed by < `kksweet8845` > ## Test `1` There are three operations named op1, op2 and op3, respectively. And 1 for `* or /`, 0 for `+ or -`, so there are eight combinations,that is: ```cpp= 000 // op1, op2, op3 001 010 011 100 101 110 111 ``` Answer: ```cpp static int calc(int op1, int op2, int op3) { op1 = op_to_prio(op1); op2 = op_to_prio(op2); op3 = op_to_prio(op3); bool p1 = (op1 & 0x0F) == 0; // = 1 for * or / bool p2 = (op2 & 0x0F) == 0; // else = 0 bool p3 = (op3 & 0x0F) == 0; // (4 + 4 + 4 + 4) or (4 / 4 / 4 / 4) if ((p1 == p2) && (p2 == p3)) return operate(op3, operate(op2, operate(op1, 4, 4), 4), 4); else if( p3 == 1 && p1 == 0 && (p1 == p2) ) // 001 return operate(op1, operate(op2, operate(op3, 4, 4), 4), 4); else if( p2 == 1 && p1 == 0 && (p1 == p3) ) // 010 return operate(op1, operate(op3, operate(op2, 4, 4), 4), 4); else if( p1 == 0 && p2 == 1 && (p2 == p3)) // 011 return operate(op2, operate(op1, operate(op3, 4, 4), 4), 4); else if( p1 == 1 && p2 == 0 && (p2 == p3)) // 100 return operate(op3, operate(op2, operate(op1, 4, 4), 4), 4); else if( p2 == 0 && p1 == 1 && (p1 == p3)) // 101 return operate(op2, operate(op3, operate(op1, 4, 4), 4), 4); else if( p3 == 0 && p1 == 1 && (p1 == p2)) // 110 return operate(op3, operate(op2, operate(op1, 4, 4), 4), 4); return 0; } ``` ### Description The main concept of this question is to compute all possible combination and test it whether it is matched to the number or not. - We can observe that there is three nested for-loop except the most outer one in main function. It will traverse all the operation, `op1, op2 and op3`, into the expression. Such as `4 + 4 - 4 * 4`, `4 - 4 - 4 -4` or `4 * 4 - 4 / 4`. Each one represents a operand, and each operand has four possible symbols, `+ - * /`. - The program just need to traverse all the possible answer so that we can find the only one solution or no solution. ```cpp 4 op1 4 op2 4 op3 4; // there is possible combination // 4 + 4 + 4 + 4 = 16 // 4 + 4 - 4 / 4 = 7 // 4 ... ``` And each operand is represented by number. And using a `operate(int op, int a, int b)` to convert the number into a real expression. - In 4thought, `opType1` and `opType2` represent + and -. And `opType3` and `opType4` represent + and -. - Using a function named `operate` to realize computation. ```cpp operate(0, 4, 4) // 4 + 4 operate(1, 4, 4) // 4 - 4 opearte(16, 4, 4) // 4 * 4 operate(32, 4, 4) // 4 / 4 ``` Then, we just need to concern that op1, op2, op3 are set 1,`(+,-)` or set 2,`(*, /)`. Thus, we need to example all possible combination of two sets. There are 8 number of combinations because of three ops with 2 possible set. That is, ```cpp= 000 // op1, op2, op3 001 010 011 100 101 110 111 ``` Then, we can finally write plenty of if-else statement to finish this question. ### The distribution of 4thought ![](https://i.imgur.com/EPxrwUb.png) It is interesting that the distribution of 4thought is similiar to a normal distribution. ### A method to promote the performance Constructing a table in advanced is a better idea to promote the performance. The original idea uses the brutal method to traverse all the combinations of four operators then check whether the combination is matched with the specified value. In the middle of traversing all combinations, it waste a lot of resource to run the repeated combination each time. Maybe, we can store the value what we found, then next time, we don't need to traverse all the combinations in order to find what we already have found before. Reduce the complexity of finding answer to $O(1)$. I used the link-list as my table and created two struct type. One named `header_ele_t` which store the info of the number produced from 4thought, `num` and how many method to produce the number,`len`. The other named `numeric_ele_t`, which store the info of operators. ```cpp typedef struct NELE{ struct NELE *next; int op1, op2, op3; } numeric_ele_t; typedef struct HELE { numeric_ele_t *header; int num; int len; } header_ele_t; ``` Create a list of header which max is 256 and min is -60, that is, the number of header is 316+1 which a additional one is zero. We can see that the time consuming with table compares with the original one is faster. ![](https://i.imgur.com/vSl05Ow.png) The cycles and instructions are also fewer than the original one. ![](https://i.imgur.com/Vz4mlcw.png) ## Test `2` ```cpp #include <stdbool.h> bool fit_bits(int x, int n) { /* Write your code here */ int move = 32 + (~n + 1); x = !(x^(x<<move)>>move); return (bool) x; } ``` ### Finding the similiar code `Still working on it...` ## Test `3` ```cpp #include <stdbool.h> bool is_leq(int x, int y) { int s; s = x + (~y + 1); s = (s == 0) || s>>31; return (bool) s; } ``` ### Finding the similiar code `Still working on it...` ## Test `4` ```cpp mini_str mini_strcat(mini_str str1, mini_str str2) { // Shift str2 along by str1Len characters to move it into position. unsigned str1Len = mini_strlen(str1); str2.integer <<= str1Len * 8; // FIXME: Assumes little-endian. /* Write your code here */ str1.integer = str1.integer | str2.integer; return str1; } ``` ### 1. Optimization `Still working on it...` ### 2. Application `Still working on it...` ### 3. Big endian ```cpp mini_str mini_strcat(mini_str str1, mini_str str2) { // Shift str2 along by str1Len characters to move it into position. unsigned str2Len = mini_strlen(str2); str1.integer <<= str2Len * 8; // FIXME: Assumes little-endian. /* Write your code here */ str1.integer = str1.integer | str2.integer; return str1; } ``` ## Test `5` ```cpp int popcnt_naive(int n) { int count = 0; while (n) { if (n & 1) ++count; n = (unsigned) n >> 1; } return count; } ``` ## Test `6` ```cpp int popcnt_branchless(int n) { int count = 0; while (n) { count += n&1; n = n >> 1; } return count; } ``` ### The optimization of queries of databse Consider an information system such as a database where the relevant information is presented in terms of `1` and the non-relevant information in terms of `0`. In this case, the bit-counting can be used to respond to the user queries. Popcount was used for SQL run optimization, when joining bitmaps to determine the best evaluation order, and to implement function count-trailing-zeros(), which are used heavily in the binary GCD. [paper](http://intjit.org/cms/journal/volume/9/1/91_1.pdf) ### In linux kernel ```cpp static inline int fls(unsigned int t) { unsigned long x = t & 0xffffffffu; if (!x) return 0; x |= x >> 1; x |= x >> 2; x |= x >> 4; x |= x >> 8; x |= x >> 16; return ia64_popcnt(x); } ``` - This is a method to find the last set bit(most significant). Returns 0 for x==0 and bits are numberd form 1...32. - The idea of this is to find the last set bit and make the bit smaller than the last set bit all becoming set bits. Finally,we just need to use popcnt instr to calculate the number of set bit before the last set bit including last set bit. - `Example` ```cpp x = 0x00cf0234; // 0000 0000 1100 1111 0000 0010 0011 0100 x |= x>>1; /* Extend the last set bit to two set bits, * That is, 0000 0000 "11"10 1111 1000 0001 0001*/ x |= x>>2; /* Extend the two last set bits to four last set bits. * That is, 0000 0000 "1111" 1111 1110 0001 1001*/ x |= x>>4; /* 4 to 8 * 0000 0000 "1111 1111" 1111 1111 1101*/ x |= x>>8; /* 8 to 16 * 0000 0000 "1111 1111 1111 1111" 1111 1111 */ x |= x>>16; /* 16 to 32 * 0000 0000 "1111 1111 1111 1111 1111 1111"*/ return is64_popcnt(x); // 24 ``` It guarantees that there is no zero existing after executing the operation above. ## Test `7` ## Test `8` ## Test `9` ```cpp #include <stdint.h> uint64_t gcd64(uint64_t u, uint64_t v) { if (!u || !v) return u | v; int shift; for (shift = 0; !((u | v) & 1); shift++) { u /= 2, v /= 2; } while (!(u & 1)) u /= 2; do { while (!(v & 1)) v /= 2; if (u < v) { v -= u; } else { uint64_t t = u - v; u = v; v = t; } } while(v); return u<<shift; } ``` ## Test `10` ```cpp= uint64_t gcd64(uint64_t u, uint64_t v) { if (!u || !v) return u | v; /*Find the trailing zero both in u and v.*/ int shift = __builtin_ctzll(u |v); u >>= __builtin_ctzll(u); while (v) { v >>= __builtin_ctzll(v); /*The operation of find GCD using Euclidean algorithm*/ if (u < v) { v -= u; } else { uint64_t t = u - v; u = v, v = t; } } return u<<shift; } ``` The method to find the GCD which make me always come up with Euclidean Algorithm, which is a celver of SOP to find the GCD. - Line 2 : Check either u or v is zero or not. - Line 4 : Finding the Greatest common divisor which is power of 2. - Line 5 : Dividing the u to a number which has no common power of 2 divisor with v. - Line 6 : Applying the algorithm of Euclidean. - Line 7 : Always check the v has no common power of 2 divisor with u. - Line 9~14 : The method of Euclidean Algorithm - Line 16 : Before return the common divisor, multipling the common power of 2 divisor. ### Performance analysis I compare the `loop` version and `ctz` version to check the execution time. - `loop` version ```cpp uint64_t gcd64_loop(uint64_t u, uint64_t v) { if (!u || !v) return u | v; int shift; for (shift = 0; !((u | v) & 1); shift++) { u /= 2, v /= 2; } while (!(u & 1)) u /= 2; do { while (!(v & 1)) v /= 2; if (u < v) { v -= u; } else { uint64_t t = u - v; u = v; v = t; } } while (v); return u<<shift; } ``` - ctz version ```cpp uint64_t gcd64(uint64_t u, uint64_t v) { if (!u || !v) return u | v; int shift = __builtin_ctzll(u |v); u >>= __builtin_ctzll(u); while (v) { v >>= __builtin_ctzll(v); if (u < v) { v -= u; } else { uint64_t t = u - v; u = v, v = t; } } return u<<shift; } ``` And, u and v are all defined with `(1 << shift)`, which `shift` is from 0 to 63. ($u == v$) ![](https://i.imgur.com/oSyGYFp.png) And, u and v are defined $u >> v , u \propto 1/v$. ![](https://i.imgur.com/UhEksrQ.png) Finally, totally random u and v with rand() function. The performance is slightly faster than the loop version. ![](https://i.imgur.com/pLOxk7y.png) . ```cpp u = rand()%(1<<shift)+1; v = rand()%(1<<shift)+1; ``` ## Test 11 ```cpp int cst_memcmp(const void *m1, const void *m2, size_t n) { const uint8_t *pm1 = (const uint8_t *) m1 + n; const uint8_t *pm2 = (const uint8_t *) m2 + n; int res = 0; if (n) { do { int diff = *--pm1 - *--pm2; res = res | diff; } while (pm1 != m1); } return ((res - 1) >> 8) + (res >> 8) + 1; } ``` ### Principle The constant-time method is to traverse all the content of bianries each time. Instead of returning if finding the different between two binaries content, store the info when it finds the different, finally return the value. - The time complexity is $O(n)$. - Why we can't use a table ? - Because of the cache-misses, it may delay if we found that the part of table is not in cache, so CPU needs to fetch the info from memory. It may make the time non-consistent. - Suppose the probability of cache misses is 0.02 and the clock cycles of waiting for fetching content from memory needs 200 cycles while fetching from cache only needs 2 cycles. - Suppose we want to compare the binary of length of 100. - The clock cycles with cache-misses $100 \ \times 0.02\ \times 200 + 100 \ \times 0.98 \times 2 = 596 \ cycles$ - The clock cycles without cache-misses $100 \ \times 2 = 200 \ cycles$ - It is obvious that the time of `memcmp` by using table is not consistent. - How to validate the correctness of program and constant-time? `Still working on it.` ## Test 12 ```cpp static inline list_t *list_pop(list_t *list) { /* Write your code here */ list_t *prev = list->prev; list_t *this = list; prev->next = list->next; list->next->prev = prev; } ``` ### Principle - `list_init(list_t*)` - Let a empty list point itself which satisfying the basic definition of circular link-list. - `list_push(list_t* list, list_t* entry)` - `*prev` , which points to the end of list. - `*entry` ,which let the new block point to the head and the end of list with `*next` and `*prev`, respectively. - `list_remove(list_t* entry)` - Remove the `*entry` which let the `entry->next->prev` point to the `entry->prev` and `entry->prev->next` point to the `entry->prev`. - `list_pop(list* list)` - The concept is like `list_push` but with contrary direction. ### Linux kernel of doubly link-list [include/linux/list.h](https://github.com/torvalds/linux/blob/master/include/linux/list.h) - `__list_add()` I found a code which perform the list_add which insert a new entry between two kown consecutive entries. ```cpp static inline void __list_add(struct list_head *new, struct list_head *prev, struct list_head *next) { if (!__list_add_valid(new, prev, next)) return; next->prev = new; new->next = next; new->prev = prev; WRITE_ONCE(prev->next, new); } ``` And some more general function with `__list_add()`. - `list_add()` ```cpp static inline void list_add(struct list_head *new, struct list_head *head) { __list_add(new, head, head->next); } ``` The code above will insert a new entry after the specified head. This is good for implementing stacks. --- - `list_add_tail()` ```cpp static inline void list_add_tail(struct list_head *new, struct list_head *head) { __list_add(new, head->prev, head); } ``` This code will insert a new entry before the specified head. This is usefule for implementing queues. --- - `__list_del()` ```cpp static inline void __list_del(struct list_head * prev, struct list_head * next) { next->prev = prev; WRITE_ONCE(prev->next, next); } ``` The above code will delete a list entry by making the prev/next entries point to each other. --- - `list_replace()` ```cpp tatic inline void list_replace(struct list_head *old, struct list_head *new) { new->next = old->next; new->next->prev = new; new->prev = old->prev; new->prev->next = new; } ``` Replace old entry by new one. --- - `list_swap()` ```cpp static inline void list_swap(struct list_head *entry1, struct list_head *entry2) { struct list_head *pos = entry2->prev; list_del(entry2); list_replace(entry1, entry2); if (pos == entry1) pos = entry2; list_add(entry1, pos); } ``` Replace entry1 with entry2 and re-add entry1 at entry2's position ## Test 13 ```cpp void bit_array_next_permutation(BIT_ARRAY *bitarr) { /* Write your code here */ word_t ini = (word_t)*(bitarr->words); word_t next; while( (next = _next_permutation(*(bitarr->words))) != ini ){ *(bitarr->words) = next; bit_array_print(bitarr, stdout); fprintf(stdout, "\n"); } } ``` ### The application of `bit array` - `Priority queues` - Bit arrays are used for priority queues, where the bit at index k is set if and only if k is in the queue; this data structure is used, for example, by the Linux kernel, and benefits strongly from a find-first-zero operation in hardware. ![](https://i.imgur.com/PRLUXzm.png) - `Boolm filter` - A probabilistic set data structure that can store large sets in a small space in exchange for a small probability of error. It is also possible to build probabilistic hash tables based on bit arrays that accept either false positives or false negatives. ### Linux kenel - `bitmap.h` [/include/linux/bitmap.h](https://github.com/torvalds/linux/blob/master/include/linux/bitmap.h) `Still working on it...`