2019q3 Homework2 (quiz2)

# 2019q3 Homework2 (quiz2) contributed by < `Ting199708` > ### 第 1 題 :::info 考慮下方檔案 4thought.c 是 ACM-ICPC 題目 4 thought 的一個解法，假設程式的輸入符合 4 thought 的描述，請補完程式碼： ```c #include <stdbool.h> #include <stdio.h> enum { opType1 = 0x1 << 0, opType2 = 0x1 << 1, opType3 = 0x1 << 4, opType4 = 0x1 << 5, }; static int operate(int op, int a, int b) { switch (op) { case opType1: return a + b; case opType2: return a - b; case opType3: return a * b; case opType4: return (int) a / b; } return 0; } static char op_to_char(int op) { return "+-*/?"[op - 1]; } static int op_to_prio(int op) { return ((int[]){opType1, opType2, opType3, opType4, -1})[op - 1]; } static int calc(int op1, int op2, int op3) { op1 = op_to_prio(op1); op2 = op_to_prio(op2); op3 = op_to_prio(op3); bool p1 = (op1 & 0x0F) == 0; // = 1 for * or / bool p2 = (op2 & 0x0F) == 0; // else = 0 bool p3 = (op3 & 0x0F) == 0; // (4 + 4 + 4 + 4) or (4 / 4 / 4 / 4) if ((p1 == p2) && (p2 == p3)) return operate(op3, operate(op2, operate(op1, 4, 4), 4), 4); /* Write your code here */ return 0; } int main(void) { int n; scanf("%d", &n); int sol[n]; for (int i = 0; i < n; i++) scanf("%d", &sol[i]); bool validSolution = false; for (int i = 0; i < n; i++) { for (int op1 = 4; op1 > 0; op1--) { for (int op2 = 4; op2 > 0; op2--) { for (int op3 = 4; op3 > 0; op3--) { int sol_checked = calc(op1, op2, op3); if (sol_checked == sol[i]) { validSolution = true; char op1char = op_to_char(op1); char op2char = op_to_char(op2); char op3char = op_to_char(op3); printf("4 %c 4 %c 4 %c 4 = %d\n", op1char, op2char, op3char, sol[i]); op1 = -1; op2 = -1; op3 = -1; break; } } } } if (!validSolution) printf("no solution\n"); validSolution = false; } return 0; } ``` ::: ==分析與想法== 此題目最主要在處理數學四則運算時的先後順序問題，對此，我將計算分為四大類型： ``` 4 op1 4 op2 4 op3 4 type1: (op1->op2->op3) + + + x x x x + + x x + type2: (op3->op1->op2) + + x type3: (op2->op3->op1) + x + + x x type4: (op1->op3->op2) x + x ``` 因此， static int calc() 可實作如下： ```cpp static int calc(int op1, int op2, int op3) { op1 = op_to_prio(op1); op2 = op_to_prio(op2); op3 = op_to_prio(op3); bool p1 = (op1 & 0x0F) == 0; // = 1 for * or / bool p2 = (op2 & 0x0F) == 0; // else = 0 bool p3 = (op3 & 0x0F) == 0; // (4 + 4 + 4 + 4) or (4 / 4 / 4 / 4) if ((p1 == p2) && (p2 == p3)) return operate(op3, operate(op2, operate(op1, 4, 4), 4), 4); if (p1 && !p3) return operate(op3, operate(op2, operate(op1, 4, 4), 4), 4); if (p3 && !p1 & !p2) return operate(op2, operate(op1, operate(op3, 4, 4), 4), 4); if (p2 && !p1) return operate(op1, operate(op3, operate(op2, 4, 4), 4), 4); if (p1 && p3 && !p2) return operate(op2, operate(op1, 4, 4), operate(op3, 4, 4)); return 0; } ``` #### 延伸問題 :::success 1. 解釋程式運作的原理和推敲背後的思路 2. 探討 4 thought 組合出來的數值分佈，並且透過數論解釋 3. 提出得以改善上述程式碼執行效率的方案，著手分析和實作 ::: 1. 此程式透過遍歷所有組合來與輸入數字進行比對，藉此判斷此數字是否可以利用 4 thought 來表示，另外，核心計算部分則用到了數學中的先乘除後加減的概念來進行 case 的分類並計算。 2. 再來討論 4 thought 的數值分布，下表為其所有可能 | 算式 | 結果 | | --- | --- | | 4 + 4 + 4 + 4 | 16 | | 4 * 4 + 4 + 4 | 24 | | 4 + 4 * 4 + 4 | 24 | | 4 + 4 + 4 * 4 | 24 | | 4 * 4 * 4 + 4 | 68 | | 4 * 4 + 4 * 4 | 32 | | 4 + 4 * 4 * 4 | 68 | | 4 * 4 * 4 * 4 | 256 | 分析其結果，有以下幾點發現： (1) 透過 4 thought 運算的結果均為 4 的倍數 (2) 其運算中有多個重複運算的情形發生，如 4 * 4 * 4 + 4 及 4 + 4 * 4 * 4 :::warning 減法運算該如何處理？ :notes: jserv ::: 3. 根據上點分析結果，我認為此程式可以優化如下主程式部分 (其餘部分不變)： ```c int main(void) { int n; scanf("%d", &n); int sol[n]; for (int i = 0; i < n; i++) scanf("%d", &sol[i]); bool validSolution = false; for (int i = 0; i < n; i++) { if ((sol[i] % 4) != 0) { for (int op1 = 4; op1 > 0; op1--) { for (int op2 = 4; op2 > 0; op2--) { for (int op3 = 4; op3 > 0; op3--) { int sol_checked = calc(op1, op2, op3); if (sol_checked == sol[i]) { validSolution = true; char op1char = op_to_char(op1); char op2char = op_to_char(op2); char op3char = op_to_char(op3); printf("4 %c 4 %c 4 %c 4 = %d\n", op1char, op2char, op3char, sol[i]); op1 = -1; op2 = -1; op3 = -1; break; } } } } } if (!validSolution) printf("no solution\n"); validSolution = false; } return 0; } ``` 如此，可大幅減少不必要的迴圈運算 --- ### 第 2 題 :::info 考慮以下程式碼 (fitbits.c) 可檢驗輸入的整數 x 是否可用 n 個位元來表示，例如 (x = 4, n = 9) 要回傳 true, 當 (x = 4, n = 2) 回傳 false。 ```c #include <stdbool.h> bool fit_bits(int x, int n) { /* Write your code here */ return (bool) x; } ``` ::: ==分析與想法== 看到此題我第一個想法是透過算術右移的方法來判斷是否足位，當足位時，則 x >> n = 0 ，否則 x >> n != 0 因此，可實作此程式如下： ```cpp #include <stdbool.h> bool fit_bits(int x, int n) { x = ((x >> n) == 0); return (bool) x; } ``` #### 延伸問題 :::success 在重視資訊安全的專案中，找出類似用法的程式碼，予以解說並進行相關 information leaks 的實驗 ::: --- ### 第 3 題 :::info 考慮以下程式碼 (is-less-equal.c) 可檢驗輸入的整數 x 和 y，是否存在 x<=y 的關係。例如 (x = 4, n = 4) 要回傳 true, 當 (x = 14, n = 9) 回傳 false。 ```c #include <stdbool.h> bool is_leq(int x, int y) { int s; /* Write your code here */ return (bool) s; } ``` ::: ==分析與想法== 此問題要比較出兩數的大小，我的想法是透過其相減後的 sign-bit 來進行判斷，若 $x <= y$，則 (x - y) 的 sign-bit 為 0，否則為 1 但根據題目，此題不能運用 `-` 來做計算，所以要利用二補數的方式來達成相減運算，即 `x + (~y + 1)` 因此，可實作此程式如下： ```cpp #include <stdbool.h> bool is_leq(int x, int y) { int s; s = (x + (~y + 1)) >> 31; s = (s == 0); return (bool) s; } ``` #### 延伸題目 :::success 在重視資訊安全的專案中，找出類似用法的程式碼，予以解說並進行相關 information leaks 的實驗 ::: --- ### 第 4題 :::info 考慮一種針對短字串的最佳化操作，假設字串總是小於等於 8 bytes，標的硬體是像 x86_64 這樣的 64-bit 架構而且是 little endian，於是我們可實作類似以下的程式碼 (ministr.c): ```c #include <stdint.h> #include <stdio.h> #include <string.h> typedef union { uint64_t integer; char array[8]; } mini_str; static unsigned BitScanReverse(uint64_t x) { return 63 - __builtin_clzll(x); } /** * Find the length of the given mini_str. * @param str string to find length of * @return length of the given string */ unsigned mini_strlen(mini_str str) { // Special case for empty string. if (str.integer == 0) return 0; // Otherwise find first non-zero bit (which will be in the first non-zero // byte), and find which byte it is in. // FIXME: Assumes little-endian. unsigned msb = BitScanReverse(str.integer); return msb / 8 + 1; } /** * Create a new mini_str with length 0. * @return newly created mini_str */ mini_str mini_str_new(void) { // Create string of all null bytes. mini_str str = {.integer = 0}; return str; } /** * Append str2 to the end of str1 and return the reult. * @param str1 first string * @param str2 second string * @return combined string */ mini_str mini_strcat(mini_str str1, mini_str str2) { // Shift str2 along by str1Len characters to move it into position. unsigned str1Len = mini_strlen(str1); str2.integer <<= str1Len * 8; // FIXME: Assumes little-endian. /* Write your code here */ return str1; } #define mini_str_to_c(mini_str) ((const char *) (mini_str).array) #define mini_str_to_cNoConst(mini_str) ((char *) (mini_str).array) /** * Create a mini_str from a standard C character array. * @param cstr Null-terminated C-string to use as input * @return newly created mini_str */ mini_str mini_str_from_c(const char *cstr) { // Create empty string. mini_str mini_str = mini_str_new(); // Copy string. strncpy(mini_str_to_cNoConst(mini_str), cstr, 7); return mini_str; } int main(int argc, char **argv) { mini_str all = mini_str_from_c("All "); mini_str red = mini_str_from_c("red"); mini_str cat = mini_strcat(all, red); printf("%s\n", mini_str_to_c(cat)); return 0; } ``` ::: ==分析與想法== 首先分析 main() 主程式，透過了`mini_str_from_c` 將 "All " 及 "red"，轉換成了 mini_str 的形式因為程式的目標為輸出 All red，即是將此二字串相加起來，因此我們可以利用將一字串的 integer ，左移再加上另一字串的 integer 達成目的實作程式如下： ```c mini_str mini_strcat(mini_str str1, mini_str str2) { // Shift str2 along by str1Len characters to move it into position. unsigned str1Len = mini_strlen(str1); str2.integer <<= str1Len * 8; // FIXME: Assumes little-endian. str1.integer += str2.integer; return str1; } ``` #### 延伸題目 :::success 1. 指出這樣針對短字串的最佳化效益，並嘗試量化 2. 什麼樣的情境會出現大量的短字串？請舉例並分析 3. 程式碼該如何修改，才能適用 big/little-endian 呢？ 4. 考慮到現代的處理器架構支援 SIMD，得以一次處理 128-bit, 256-bit, 甚至是 512-bit，請評估這樣最佳化策略的可用性，應當有對應的實驗 ::: 2. 機場編碼、國碼域名...等皆用到了大量的短字串，利用這些短字串我們可以利用其可以轉換成數字表示來更方便的在程式中進行處理 --- ### 第 5 題 :::info population count 簡稱 popcount 或叫 sideways sum，是計算數值的二進位表示中，有多少位元是 1，在一些場合下很有用，例如計算 0-1 稀疏矩陣 (sparse matrix)或 bit array 中非 0 元素個數、計算兩個字串的 Hamming distance。Intel 在 2008 年 11 月 Nehalem 架構的處理器 Core i7 引入 SSE4.2 指令集，其中就有 CRC32 和 POPCNT 指令，POPCNT 可處理 16-bit, 32-bit, 64-bit 整數。 GCC 提供對應的 builtin function: __builtin_popcount(x): x 總共有幾個 1。使用示範： ```c int x = 5328; // 00000000000000000001010011010000 printf("%d\n", __builtin_popcount(x)); // 5 ``` 以下是個存在實作缺陷的版本： ```c int popcnt_naive(int n) { int count = 0; while (n) { if (n & 1) ++count; n = n >> 1; } return count; } ``` 呼叫 `popcnt_naive(-1)` 時，會造成無窮迴圈，請指出錯誤所在，並且重寫為正確的版本。 ::: ==分析與想法== 首先分析此題若 n = -1 時造成無限迴圈的錯誤情形。上述程式利用與 1 做 AND 運算再右移的方式來計算 1 的 bit 數，且停止條件為 n=0，此方法在正數時正常運作，但一旦 n<0 ，因為其 sign-bit 為 1 ，因此當右移完所有 bit 後， n 的數值為 -1，而非預期的 0，要解決此錯誤，可將程式改寫如下： ```cpp int popcnt_naive(int n) { int count = 0; while (n) { if (n & 1) ++count; n = n >> 1; n &= 0x7FFFFFFF; } return count; } ``` 即將 n 右移後強制將最高位元設為 0 ，如此便可避免上述情形發生。 --- ### 第 6 題 :::info 延伸測驗 5，實作 branchless 的 popcnt 並附上對應的程式原理解說。 ::: ==分析與想法== 若前題程式要使用 branchless 來實作，可利用以下方法： ```c int popcnt_naive(int n) { int count = 0; while (n) { n = (n & (n-1)); count ++; } return count; } ``` 分析上述程式，每當進入一次 while 迴圈， n & (n-1) 會將 n 最右邊的 1 清除，直到 n=0 跳出迴圈，因此便可計算出 n 當中 1 的 bit 數。以下試舉一範例 ``` 設 n = 5 = 0000 0101 第一次進入迴圈後 n = (n & (n-1)) = 00000101 & 00000110 = 00000100，可以發現最右邊的 1 被巧妙地清除了，且 count = 1 。第二次進入迴圈， n = 00000011 & 00000100 = 00000000 ， count 被設為 2 因為 n = 0，因此跳出迴圈，回傳 count 結束此程式 ``` #### 延伸題目 :::success 1. 指出 `popcnt` 的應用場景; 2. 在 Linux 核心程式碼找出具體用法並解析; ::: --- ### 第 7 題 :::info 考慮到以下程式 (alloc.c) 是 aligned_alloc 的一種簡易實作: ```c #include <stdlib.h> // Number of bytes used for storing the aligned pointer offset. // up to 64KB alignment, a size which is already unlikely to be // used for alignment. typedef uint16_t offset_t; #define PTR_OFFSET_SIZE sizeof(offset_t) #define align_up(num, align) \ (((num) + ((align) - 1)) & ~((align) - 1)) void *aligned_malloc(size_t align, size_t size) { void *ptr = NULL; // size must be a power of two. if (!((align & (align - 1)) == 0)) return ptr; if (align && size) { // allocate extra bytes to meet the alignment uint32_t header_size = PTR_OFFSET_SIZE + (align - 1); void *p = malloc(size + header_size); /* Write your code here */ } return ptr; } ``` 其作用是配置針對 align 個 bytes 對齊的記憶體空間，可對照閱讀 Introduction & Allocators 以掌握原理。你應該要實作 aligned_malloc 函式中標註 /* Write your code here */ 之後的程式碼。除了撰寫程式，你應該提供對應的程式碼註解。注意: 輸入的 align 應該要是 2N (power of 2)，否則就回傳 NULL。 ::: #### 延伸題目 :::success 1. 解釋程式運作的原理和推敲背後的思路 2. 在開放原始碼的專案中，找尋類似的程式碼，解說並量化具體效益 ::: --- ### 第 8 題 :::info 延伸測驗 7，實作 `aligned_free`，其原型宣告如下: ```c void aligned_free(void *ptr); ``` ::: ==分析與想法== 本程式實作如下： ```c void aligned_free(void *ptr) { free(ptr); } ``` 當中的 free 函式在 C99 規格書 (P.313-314) 解釋如下： > The free function causes the space pointed to by ptr to be deallocated, that is, made available for further allocation. If ptr is a null pointer, no action occurs. Otherwise, if the argument does not match a pointer earlier returned by the calloc, malloc, or realloc function, or if the space has been deallocated by a call to free or realloc, the behavior is undefined. --- ### 第 9 題 :::info 考慮以下 64-bit GCD (greatest common divisor, 最大公因數) 求值函式: ```c #include <stdint.h> uint64_t gcd64(uint64_t u, uint64_t v) { if (!u || !v) return u | v; while (v) { uint64_t t = v; v = u % v; u = t; } return u; } ``` 改寫為以下等價實作: ```c #include <stdint.h> uint64_t gcd64(uint64_t u, uint64_t v) { if (!u || !v) return u | v; int shift; for (shift = 0; !((u | v) & 1); shift++) { u /= 2, v /= 2; } while (!(u & 1)) u /= 2; do { while (!(v & 1)) v /= 2; if (u < v) { v -= u; } else { uint64_t t = u - v; u = v; v = t; } } while (/* Write your code here */); return /* Write your code here */; } ``` ::: ==分析與想法== 在解這題時我是透過帶入數字來觀察程式的運行，最後發現上下兩程式所運用的概念其實非常相似，但是實作方法不同首先，透過下方這個迴圈可將 u 和 v 的 2 的次冪的因數找出來方便後方運算 ```c for (shift = 0; !((u | v) & 1); shift++) { u /= 2, v /= 2; } ``` 再來，透過減法的方式，找出 u 及 v 其他的因數，當 v=0 時，即可找出兩者剩餘的公因數並存在 u 中，最後，透過算術位移的方法將第一步找出的 2 的次冪之因數項乘回結果實作如下： ```c #include <stdint.h> uint64_t gcd64(uint64_t u, uint64_t v) { if (!u || !v) return u | v; int shift; for (shift = 0; !((u | v) & 1); shift++) { u /= 2, v /= 2; } while (!(u & 1)) u /= 2; do { while (!(v & 1)) v /= 2; if (u < v) { v -= u; } else { uint64_t t = u - v; u = v; v = t; } } while (v != 0); return (u << shift); } ``` --- ### 第 10 題 :::info 承測驗 9, 透過 gcc 內建的 __builtin_ctz (Returns the number of trailing 0-bits in x, starting at the least significant bit position) 改寫程式碼如下: ```c #include <stdint.h> uint64_t gcd64(uint64_t u, uint64_t v) { if (!u || !v) return u | v; int shift = __builtin_ctzll(/* Write your code here */); u >>= __builtin_ctzll(u); while (v) { v >>= __builtin_ctzll(v); if (u < v) { /* Write your code here */ } else { uint64_t t = u - v; u = v, v = t; } } return /* Write your code here */; } ``` ::: ==分析與想法== 此題用的概念跟上題很像，只是取得 shift 的方式利用了 __builtin_ctz 函式，因此可以實作程式碼如下： ```cpp #include <stdint.h> uint64_t gcd64(uint64_t u, uint64_t v) { if (!u || !v) return u | v; int shift = __builtin_ctzll(u | v); u >>= __builtin_ctzll(u); while (v) { v >>= __builtin_ctzll(v); if (u < v) { v-=u; } else { uint64_t t = u - v; u = v, v = t; } } return (u << shift); } ``` #### 延伸問題 :::success 解釋上述程式程式運作原理，以及在 x86_64 上透過 __builtin_ctz 改寫 GCD 對效能的提升 ::: 透過了 gcc 內建的 __builtin_ctz 函式可以減少迴圈的利用，所以可以加速在計算 GCD 時的效率。 --- ### 第 11 題 :::info 考慮到 memcmp 一種實作如下: (行為和 ISO C90 有出入) ```c #include <stdint.h> #include <stddef.h> int memcmp(const uint8_t *m1, const uint8_t *m1, size_t n) { for (size_t i = 0; i < n; ++i ) { int diff = m1[i] - m2[i]; if (diff != 0) return (diff < 0) ? -1 : +1; } return 0; } ``` 我們可能因此承受 information leakage 的風險，於是著手避免使用 conditional branching 一類的指令，從而避免 side-channel attack。為了避免 conditional branch 指令的出現，我們可將 (res > 0) - (res < 0) 替換為 ((res - 1) >> 8) + (res >> 8) + 1。隨後我們實作下方功能等價但避免 branch 的 cst_memcmp: ```c #include <stdint.h> #include <stddef.h> int cst_memcmp(const void *m1, const void *m2, size_t n) { const uint8_t *pm1 = (const uint8_t *) m1 + n; const uint8_t *pm2 = (const uint8_t *) m2 + n; int res = 0; if (n) { do { int diff = *--pm1 - *--pm2; /* Write your code here */ } while (pm1 != m1); } return ((res - 1) >> 8) + (res >> 8) + 1; } ``` ::: #### 延伸問題 :::success 1. 解釋上述程式的原理，需要從機率統計的觀點分析; * 為何不能用事先計算好的表格呢？ (提示: cache 的影響) * 如何驗證程式正確以及 constant-time 呢？ 2. 在 Linux 核心找到這類 constant-time 的操作程式碼，予以解說和設計實驗; ::: --- ### 第 12 題 :::info 給定一個 circular linked list 實作如下: (檔案 list.h) ```c typedef struct __list_t { struct __list_t *prev, *next; } list_t; /* * Initialize a list to empty. Because these are circular lists, an "empty" * list is an entry where both links point to itself. This makes insertion * and removal simpler because they do not need any branches. */ static inline void list_init(list_t *list) { list->prev = list; list->next = list; } /* * Append the provided entry to the end of the list. This assumes the entry * is not in a list already because it overwrites the linked list pointers. */ static inline void list_push(list_t *list, list_t *entry) { list_t *prev = list->prev; entry->prev = prev; entry->next = list; prev->next = entry; list->prev = entry; } /* * Remove the provided entry from whichever list it is currently in. This * assumes that the entry is in a list. You do not need to provide the list * because the lists are circular, so the list's pointers will automatically * be updated if the first or last entries are removed. */ static inline void list_remove(list_t *entry) { list_t *prev = entry->prev; list_t *next = entry->next; prev->next = next; next->prev = prev; } /* * Remove and return the first entry in the list or NULL if the list is empty. */ static inline list_t *list_pop(list_t *list) { /* Write your code here */ } ``` 請依循程式註解的描述，參照 list_push, 實作可正確運作的 list_pop。作答時需要一併包含原本函式內容。除了撰寫程式，你應該提供對應的程式碼註解。注意: 應善用 list_remove 和已實作的函式。 ::: #### 延伸問題 :::success 1. 解釋上述程式的原理和技巧; 2. 在 Linux 核心找到這類的操作程式碼; ::: --- ### 第 13 題 #### 題目 :::info 考慮一個 bit array 的實作 (bit-array.c) 如下: ```c #include <errno.h> #include <inttypes.h> #include <stdarg.h> #include <stdio.h> #include <stdlib.h> #define MAX(a, b) (((a) >= (b)) ? (a) : (b)) #define trailing_zeros(x) \ ((x) ? (__typeof(x)) __builtin_ctzll(x) : (__typeof(x)) sizeof(x) * 8) #define leading_zeros(x) \ ((x) ? (__typeof(x)) __builtin_clzll(x) : (__typeof(x)) sizeof(x) * 8) // 64 bit words typedef uint64_t word_t, word_addr_t, bit_index_t; typedef struct { word_t *words; bit_index_t num_of_bits; // Number of words used -- this is just round_up(num_of_bits / 64) // if num_of_bits == 0, this is 0 word_addr_t num_of_words; // For more efficient allocation we use realloc only to double size -- // not for adding every word. Initial size is INIT_CAPACITY_WORDS. word_addr_t capacity_in_words; } BIT_ARRAY; #define roundup_bits2words64(bits) (((bits) + 63) / 64) // Round a number up to the nearest number that is a power of two #define roundup2pow(x) (1UL << (64 - leading_zeros(x))) // Bit array (bitset) #define _TYPESHIFT(arr, word, shift) \ ((__typeof(*(arr)))((__typeof(*(arr)))(word) << (shift))) #define bitsetX_wrd(wrdbits, pos) ((pos) / (wrdbits)) #define bitsetX_idx(wrdbits, pos) ((pos) % (wrdbits)) #define bitset_wrd(arr, pos) bitsetX_wrd(sizeof(*(arr)) * 8, pos) #define bitset_idx(arr, pos) bitsetX_idx(sizeof(*(arr)) * 8, pos) #define bitset64_wrd(pos) ((pos) >> 6) #define bitset64_idx(pos) ((pos) &63) #define bitmask(nbits, type) \ ((nbits) ? ~(type) 0 >> (sizeof(type) * 8 - (nbits)) : (type) 0) #define bitmask32(nbits) bitmask(nbits, uint32_t) #define bitmask64(nbits) bitmask(nbits, uint64_t) #define bitset2_get(arr, wrd, idx) (((arr)[wrd] >> (idx)) & 0x1) #define bitset2_set(arr, wrd, idx) ((arr)[wrd] |= _TYPESHIFT(arr, 1, idx)) #define bitset_op(func, arr, pos) \ func(arr, bitset_wrd(arr, pos), bitset_idx(arr, pos)) #define bitset_op2(func, arr, pos, bit) \ func(arr, bitset_wrd(arr, pos), bitset_idx(arr, pos), bit) #define bitset_get(arr, pos) bitset_op(bitset2_get, arr, pos) #define bitset_set(arr, pos) bitset_op(bitset2_set, arr, pos) #define bit_array_get(arr, i) bitset_get((arr)->words, i) #define bit_array_set(arr, i) bitset_set((arr)->words, i) #define POPCOUNT(x) (unsigned) __builtin_popcountll(x) // word of all 1s #define WORD_MAX (~(word_t) 0) // If cannot allocate memory, set errno to ENOMEM, return NULL BIT_ARRAY *bit_array_alloc(BIT_ARRAY *bitarr, bit_index_t nbits) { bitarr->num_of_bits = nbits; bitarr->num_of_words = roundup_bits2words64(nbits); bitarr->capacity_in_words = MAX(8, roundup2pow(bitarr->num_of_words)); bitarr->words = (word_t *) calloc(bitarr->capacity_in_words, sizeof(word_t)); if (bitarr->words == NULL) { errno = ENOMEM; return NULL; } return bitarr; } // If cannot allocate memory, set errno to ENOMEM, return NULL BIT_ARRAY *bit_array_create(bit_index_t nbits) { BIT_ARRAY *bitarr = (BIT_ARRAY *) malloc(sizeof(BIT_ARRAY)); if (bitarr == NULL || bit_array_alloc(bitarr, nbits) == NULL) { if (bitarr != NULL) free(bitarr); errno = ENOMEM; return NULL; } return bitarr; } // Print this array to a file stream. Prints '0's and '1'. Doesn't print // newline. void bit_array_print(const BIT_ARRAY *bitarr, FILE *fout) { for (bit_index_t i = 0; i < bitarr->num_of_bits; i++) fprintf(fout, "%c", bit_array_get(bitarr, i) ? '1' : '0'); } // set a bit (to 1) at position b void bit_array_set_bit(BIT_ARRAY *bitarr, bit_index_t b) { bit_array_set(bitarr, b); } // Set multiple bits at once. // e.g. set bits 1, 20 & 31: bit_array_set_bits(bitarr, 3, 1,20,31); void bit_array_set_bits(BIT_ARRAY *bitarr, size_t n, ...) { va_list argptr; va_start(argptr, n); for (size_t i = 0; i < n; i++) { unsigned int bit_index = va_arg(argptr, unsigned int); bit_array_set_bit(bitarr, bit_index); } va_end(argptr); } static word_t _next_permutation(word_t v) { // http://graphics.stanford.edu/~seander/bithacks.html#NextBitPermutation word_t t = v | (v - 1); // t gets v's least significant 0 bits set to 1 // Next set to 1 the most significant bit to change, // set to 0 the least significant ones, and add the necessary 1 bits. return (t + 1) | (((~t & (t + 1)) - 1) >> (trailing_zeros(v) + 1)); } // Get the next permutation of an array with a fixed size and given number of // bits set. Also known as next lexicographic permutation. // Given a bit array find the next lexicographic orginisation of the bits // Number of possible combinations given by (size choose bits_set) i.e. nCk // 00011 -> 00101 -> 00110 -> 01001 -> 01010 -> // 01100 -> 10001 -> 10010 -> 10100 -> 11000 -> 00011 (back to start) void bit_array_next_permutation(BIT_ARRAY *bitarr) { /* Write your code here */ } int main(void) { BIT_ARRAY *bitarr = bit_array_create(10); bit_array_print(bitarr, stdout); fputc('\n', stdout); bit_array_set_bits(bitarr, 3, 1, 2, 5); bit_array_print(bitarr, stdout); fputc('\n', stdout); return 0; } ``` 其中函式 bit_array_next_permutation 可將指定的 bit array 所有排列組合予以列舉 (nCk)，請依據程式碼註解，提供對應的實作，並且標注必要的註解。 ::: #### 延伸問題 :::success 1. 解釋 bit array 的應用場景 2. 在 Linux 核心找到這類的操作程式碼，並予以解釋及分析 3. 提供改善 bit array 效率的機制並評估 ::: ---