Wei Cheng
    • Create new note
    • Create a note from template
      • Sharing URL Link copied
      • /edit
      • View mode
        • Edit mode
        • View mode
        • Book mode
        • Slide mode
        Edit mode View mode Book mode Slide mode
      • Customize slides
      • Note Permission
      • Read
        • Only me
        • Signed-in users
        • Everyone
        Only me Signed-in users Everyone
      • Write
        • Only me
        • Signed-in users
        • Everyone
        Only me Signed-in users Everyone
      • Engagement control Commenting, Suggest edit, Emoji Reply
    • Invite by email
      Invitee

      This note has no invitees

    • Publish Note

      Share your work with the world Congratulations! 🎉 Your note is out in the world Publish Note

      Your note will be visible on your profile and discoverable by anyone.
      Your note is now live.
      This note is visible on your profile and discoverable online.
      Everyone on the web can find and read all notes of this public team.
      See published notes
      Unpublish note
      Please check the box to agree to the Community Guidelines.
      View profile
    • Commenting
      Permission
      Disabled Forbidden Owners Signed-in users Everyone
    • Enable
    • Permission
      • Forbidden
      • Owners
      • Signed-in users
      • Everyone
    • Suggest edit
      Permission
      Disabled Forbidden Owners Signed-in users Everyone
    • Enable
    • Permission
      • Forbidden
      • Owners
      • Signed-in users
    • Emoji Reply
    • Enable
    • Versions and GitHub Sync
    • Note settings
    • Note Insights
    • Engagement control
    • Transfer ownership
    • Delete this note
    • Save as template
    • Insert from template
    • Import from
      • Dropbox
      • Google Drive
      • Gist
      • Clipboard
    • Export to
      • Dropbox
      • Google Drive
      • Gist
    • Download
      • Markdown
      • HTML
      • Raw HTML
Menu Note settings Versions and GitHub Sync Note Insights Sharing URL Create Help
Create Create new note Create a note from template
Menu
Options
Engagement control Transfer ownership Delete this note
Import from
Dropbox Google Drive Gist Clipboard
Export to
Dropbox Google Drive Gist
Download
Markdown HTML Raw HTML
Back
Sharing URL Link copied
/edit
View mode
  • Edit mode
  • View mode
  • Book mode
  • Slide mode
Edit mode View mode Book mode Slide mode
Customize slides
Note Permission
Read
Only me
  • Only me
  • Signed-in users
  • Everyone
Only me Signed-in users Everyone
Write
Only me
  • Only me
  • Signed-in users
  • Everyone
Only me Signed-in users Everyone
Engagement control Commenting, Suggest edit, Emoji Reply
  • Invite by email
    Invitee

    This note has no invitees

  • Publish Note

    Share your work with the world Congratulations! 🎉 Your note is out in the world Publish Note

    Your note will be visible on your profile and discoverable by anyone.
    Your note is now live.
    This note is visible on your profile and discoverable online.
    Everyone on the web can find and read all notes of this public team.
    See published notes
    Unpublish note
    Please check the box to agree to the Community Guidelines.
    View profile
    Engagement control
    Commenting
    Permission
    Disabled Forbidden Owners Signed-in users Everyone
    Enable
    Permission
    • Forbidden
    • Owners
    • Signed-in users
    • Everyone
    Suggest edit
    Permission
    Disabled Forbidden Owners Signed-in users Everyone
    Enable
    Permission
    • Forbidden
    • Owners
    • Signed-in users
    Emoji Reply
    Enable
    Import from Dropbox Google Drive Gist Clipboard
       owned this note    owned this note      
    Published Linked with GitHub
    1
    Subscribed
    • Any changes
      Be notified of any changes
    • Mention me
      Be notified of mention me
    • Unsubscribe
    Subscribe
    # 2020q3 Homework2 (quiz2) contributed by < [WeiCheng14159](https://github.com/WeiCheng14159) > ###### tags: `sysprog2020` ## Test1 ### 運作原理 ```c=1 while ((i + 8) <= size) { uint64_t payload; memcpy(&payload, str + i, 8); if (payload & MMM) return false; i += 8; } ``` * 如果尚未比較的片段大於或等於一個 word (假設 word size = 8 bytes = 64 bits) 則將一次將一個 word 大小的資料拷貝進 payload 變數 * 對於 payload 變數中的每一個 byte ,如果值 >= 128 (0x80) 則不屬於 ascii * 使用 bit-wise 操作,對於每一個 byte 使用 0x80 為 bit mask 做 bit-wise AND 則可以偵測 byte 之值是否大於 128 * 因為有 8 個 byte 故 0x80 重複 8 次,為 0x8080808080808080 * 剩餘不滿一個 word 的資料則一次一個 byte 讀取判斷 * `MMM=0x8080808080808080` ### 延伸問題 * `為何用到 memcpy 呢?` * `給予一個已知長度的字串,檢測裡頭是否包含有效的英文大小寫字母` * 用到第五題的觀念 * [程式碼](https://github.com/supernovaremnant/quiz2/blob/master/test1_ext1.cc) ```c=1 #define PACKED_BYTE(b) (((uint64_t)(b) & (0xff)) * 0x0101010101010101u) // detect the occurence of english character bool valid_eng(const char s[], size_t n) { for (size_t j = 0; j < n; j++) { if ((s[j] < 'A' || s[j] > 'Z') && (s[j] < 'a' || s[j] > 'z')) return false; } return true; } bool valid_eng_vector(const char s[], size_t n) { size_t k = n / 8; for (size_t i = 0; i < k; i++, s += 8) { uint64_t *chunk = (uint64_t *)s; uint64_t A = *chunk + PACKED_BYTE(128 - 'A' + 0); uint64_t Z = *chunk + PACKED_BYTE(128 - 'Z' + -1); uint64_t a = *chunk + PACKED_BYTE(128 - 'a' + 0); uint64_t z = *chunk + PACKED_BYTE(128 - 'z' + -1); uint64_t res = (((A ^ Z) | (a ^ z)) & PACKED_BYTE(0x80)); // printf("res = %lx\n", res); if (res != PACKED_BYTE(0x80)) return false; } k = n % 8; if (k) return valid_eng(s, k); return true; } ``` * `考慮常見的英文標點符號,判斷輸入字串是否為有效的字元集,避免逐一字元比對` * 假設英文標點符號包含從 `!` 到 `~` 所有字元 * [程式碼](https://github.com/supernovaremnant/quiz2/blob/master/test1_ext2.cc) ```c=1 // detect the occurence of english character + punctuation bool normal_eng(const char s[], size_t n) { for (size_t j = 0; j < n; j++) { if (s[j] < '!' || s[j] > '~') return false; } return true; } bool normal_eng_vector(const char s[], size_t n) { size_t k = n / 8; for (size_t i = 0; i < k; i++, s += 8) { uint64_t *chunk = (uint64_t *)s; uint64_t lower = *chunk + PACKED_BYTE(128 - '!' + 0); uint64_t upper = *chunk + PACKED_BYTE(128 - '~' + -1); uint64_t res = ((lower ^ upper) & PACKED_BYTE(0x80)); // printf("res = %lx\n", res); if (res != PACKED_BYTE(0x80)) return false; } k = n % 8; if (k) return normal_eng(s, k); return true; } ``` ## Test2 ### 運作原理 ```c=1 uint8_t hexchar2val(uint8_t in) { const uint8_t letter = in & MASK; const uint8_t shift = (letter >> AAA) | (letter >> BBB); return (in + shift) & 0xf; } ``` 列出使用到的 ascii 字元 | char | hex | bin | char | hex | bin | char | hex | bin | |------|------|-----------|------|------|-----------|------|------|-----------| | 0 | 0x30 | 0011_0000 | A | 0x41 | 0100_0001 | a | 0x61 | 0110_0001 | | 1 | 0x31 | 0011_0001 | B | 0x42 | 0100_0010 | b | 0x62 | 0110_0010 | | 2 | 0x32 | 0011_0010 | C | 0x43 | 0100_0011 | c | 0x63 | 0110_0011 | | 3 | 0x33 | 0011_0011 | D | 0x44 | 0100_0100 | d | 0x64 | 0110_0100 | | 4 | 0x34 | 0011_0100 | E | 0x45 | 0100_0101 | e | 0x65 | 0110_0101 | | 5 | 0x35 | 0011_0101 | F | 0x46 | 0100_0110 | f | 0x66 | 0110_0110 | | 6 | 0x36 | 0011_0110 | | | | | | | | 7 | 0x37 | 0011_0111 | | | | | | | | 8 | 0x38 | 0011_1000 | | | | | | | | 9 | 0x39 | 0011_1001 | | | | | | | * 只考慮 `0-9` 的話,直接對最低 4 bits 作 bit mask (也就是最後一行的 `0xf` ),就可以直接得到答案,故推測輸入為數字時,`shift` 之值為零 * 接著考慮字母 `F = 0100_0110` 的最低四個 bits ,加上 shift 之後取最低的 4 bits 要輸出 `15=0000_1111` ,故推測當輸入為 `F = 0100_0110` 或 `f = 0110_0110` 時,shift 之值為 `0000_1001` * shift 之值會因為輸入為數字或字母而有所不同,故推測 MASK 的作用是區別數字跟字母(從變數名稱也看得出來) * 觀察數字跟字母的二元表示法,發現 `MASK` 等於 `0100_0000` 或 `0001_0000` 都可以區分兩者 * 數字的第7個位元都是0,字母的第7個位元都是1 * 數字的第5個位元都是1,字母的第5個位元都是0 * 使用 `MASK = 0001_0000` 需要再多做一次反轉 * 取 `MASK=0100_0000=0x40` 則 `shift >> 3 || shift >> 6` 可以湊出 `shift=0000_1001` * `MASK=0x40, AAA=3, BBB=6` ### 延伸問題 * `將 hexchar2val 擴充為允許 "0xFF", "0xCAFEBABE", "0x8BADF00D" 等字串輸入,且輸出對應的十進位數值` ## Test3 ### 運作原理 ```c=1 const uint32_t D = 3; #define M ((uint64_t)(UINT64_C(XXX) / (D) + 1)) /* compute (n mod d) given precomputed M */ uint32_t quickmod(uint32_t n) { uint64_t quotient = ((__uint128_t) M * n) >> 64; return n - quotient * D; } ``` * n 模除 3 的數學推導如下: \begin{align} n \% 3 &= n - \left \lfloor {\frac{n}{3}} \right\rfloor \times 3 \\ &= n - \left \lfloor n \times \frac{ \frac{2^{64}}{3} }{2^{64}} \right\rfloor \times 3\\ &= n - \left \lfloor \left( n \times \frac{2^{64}}{3} \right) \times \frac{1}{2^{64}} \right \rfloor \times 3\\ &= n - ((n \times M) >> 64) \times 3 \\ \end{align} * 其中 $\frac{2^{64}}{3}$ 對應到 `#define M ((uint64_t)(UINT64_C(XXX) / (D) + 1))` 無條件捨去的除法運算再補一對應到"取上界" , 故 $M = \left\lceil \frac{2^{64}}{3} \right \rceil$,`XXX` 約等於 $2^{64}$ * `XXX = 0xFFFFFFFFFFFFFFFF` ### 延伸問題 * 由 Facebook 公司所維護的 jemalloc 是個高效能的記憶體配置器 (memory allocator,即 malloc 和 free 函式對應的實作),特別在多核多執行緒的環境有優異表現,在其原始程式碼 [include/jemalloc/internal/div.h](https://github.com/jemalloc/jemalloc/blob/dev/include/jemalloc/internal/div.h) 和 [src/div.c](https://github.com/jemalloc/jemalloc/blob/dev/src/div.c) 也運用類似的技巧,請閱讀原始程式碼,並抽離快速除法實作為獨立的程式碼,說明運作原理,也該撰寫測試程式,比較透過處理器的整數除法指令及本技巧的效能差距 * 原理:`jemalloc` 使用的方法跟前一個問題使用的方法相似,已知 $\frac{n}{d} = n \times \frac{\frac{2^k}{d}}{2^k}$ 但是 k 要取多少才能滿足我們要的精確度呢? [src/div.c](https://github.com/jemalloc/jemalloc/blob/dev/src/div.c) 提供了證明 * 假設 $n, d, \frac{n}{d}$ 為整數 \begin{align} \frac{n}{d} &= p, \ p \in Z \\ &= \left \lfloor \left\lceil \frac{2^k}{d}\right\rceil \times \frac{n}{2^k}\right \rfloor , \ \forall k \in N\\ &= \left \lfloor \frac{2^k + r}{d} \times \frac{n}{2^k}\right \rfloor (\because r = - 2^k \mod d)\\ &= \left \lfloor \frac{n}{d} + \frac{r}{d} \times \frac{n}{2^k} \right \rfloor \\ &= \frac{n}{d} + \left \lfloor \frac{r}{d} \times \frac{n}{2^k} \right \rfloor (\because \frac{n}{d} = p \in Z) \\ \end{align} * 證明 $\left\lceil \frac{2^k}{d}\right\rceil = \frac{2^k + r}{d}$ * $\left\lceil \frac{2^k}{d}\right\rceil = \frac{2^k + d-(2^k \mod d)}{d} = \frac{2^k + r}{d} =\frac{2^k + (-2^k \mod d)}{d}$ * $pf: -2^k \mod d= d-(2^k \mod d)$ \begin{align} \Rightarrow & 2^k \mod d = r \\ 2^k &= pd + r \\ -2^k &= (-p)d + -r \\ -2^k &= (-p-1)d + d-r \\ -2^k &= p'd + (d-r) \\ \Rightarrow & (-2^k \mod d ) = d-r\\ \end{align} * $\frac{n}{d} = \frac{n}{d} + \left \lfloor \frac{r}{d} \times \frac{n}{2^k} \right \rfloor$ 成立的條件為 $\frac{r}{d} \times \frac{n}{2^k} < 1$ * 因為 $r = 2^k \mod d$,所以 $\frac{r}{d} <1$ * 如果 $\frac{n}{2^k} < 1$ 則 $\frac{r}{d} \times \frac{n}{2^k} < 1$ 恆成立 * 假設 n 的範圍是 unsigned int 則 k 取 32 可以保證 $\frac{n}{2^k} <1$ * 測試環境: * 使用 lab01 dudect 的時間測量方法 ```c=1 #include <stdint.h> // http://www.intel.com/content/www/us/en/embedded/training/ia-32-ia-64-benchmark-code-executio$ static inline int64_t cpucycles(void) { #if defined(__i386__) || defined(__x86_64__) unsigned int hi, lo; __asm__ volatile("rdtsc\n\t" : "=a"(lo), "=d"(hi)); return ((int64_t) lo) | (((int64_t) hi) << 32); #else #error Unsupported Architecture #endif } ``` * 將 jemalloc 的快速除法與普通除法做比較 * 編譯器最佳化程度由 `O0` 到 `O3` 都做測試 * 測量 10000 次畫出時間的散布圖, 發現 jemalloc 的快速除法確實比較快 * 測試結果: * 編譯器旗標 `-O0` * ![](https://i.imgur.com/Dk5PAOJ.png) * 編譯器旗標 `-O1` * ![](https://i.imgur.com/j4nUhde.png) * 編譯器旗標 `-O2` * ![](https://i.imgur.com/EcIwnYM.png) * 編譯器旗標 `-O3` * ![](https://i.imgur.com/MXaXXMq.png) ## Test4 ### 運作原理 ```c=1 const uint32_t D = 3; #define M ((uint64_t)(UINT64_C(XXX) / (D) + 1)) bool divisible(uint32_t n) { return n * M <= YYY; } ``` * 參考 [Faster remainders when the divisor is a constant: beating compilers and libdivide ](https://lemire.me/blog/2019/02/08/faster-remainders-when-the-divisor-is-a-constant-beating-compilers-and-libdivide/) * 因為 \begin{align} \left \lfloor {\frac{n}{3}} \right\rfloor &= (n \times M) >> 64 \\ M &= \left\lceil \frac{2^{64}}{3} \right \rceil \\ &= \left\lfloor \frac{2^{64}}{3} \right\rfloor + 1\\ \end{align} * $(n \times M) >> 64$ 是 $\frac{n}{3}$ 的整數值,$n \times M$ 是 $\frac{n}{3}$ 的小數部份放大 $2^{64}$ 倍(照理來說會被右移 64 bits 後捨棄,但是保留小數部份能夠幫助餘數的運算),$M - 1 = \left\lfloor \frac{2^{64}}{3} \right\rfloor$ 是 $\frac{1}{3}$ 的小數部份放大 $2^{64}$ 倍 (將 n=1 帶入算是即可看出) * 比較 $\frac{n}{3}$ 和 $\frac{1}{3}$ 的小數部份,若 $n \times M \leq M - 1$ 則 n 可以被 3 整除 * `YYY=M-1` ## Test5 ### 運作原理 * byte-by-byte 方法: ```c=1 #include <ctype.h> #include <stddef.h> /* in-place implementation for converting all characters into lowercase. */ void strlower(char *s, size_t n) { for (size_t j = 0; j < n; j++) { if (s[j] >= 'A' && s[j] <= 'Z') s[j] ^= 1 << 5; else if ((unsigned) s[j] >= '\x7f') /* extended ASCII */ s[j] = tolower(s[j]); } } ``` * Vectorized 方法: ```c=1 #include <ctype.h> #include <stddef.h> #include <stdint.h> #define PACKED_BYTE(b) (((uint64_t)(b) & (0xff)) * 0x0101010101010101u) /* vectorized implementation for in-place strlower */ void strlower_vector(char *s, size_t n) { size_t k = n / 8; for (size_t i = 0; i < k; i++, s += 8) { uint64_t *chunk = (uint64_t *) s; if ((*chunk & PACKED_BYTE(VV1)) == 0) { /* is ASCII? */ uint64_t A = *chunk + PACKED_BYTE(128 - 'A' + VV2); uint64_t Z = *chunk + PACKED_BYTE(128 - 'Z' + VV3); uint64_t mask = ((A ^ Z) & PACKED_BYTE(VV4)) >> 2; *chunk ^= mask; } else strlower(s, 8); } k = n % 8; if (k) strlower(s, k); } ``` * 延續第一題的想法,判斷一個 byte 是否為 ascii 的方法為對 0x80 做 bit-wise AND 又 `PACKED_BYTE(0x80)` 會產生 `0x8080808080808080` 為針對每一個 byte 的偵測碼 ```c=12 uint64_t A = *chunk + PACKED_BYTE(128 - 'A' + VV2); uint64_t Z = *chunk + PACKED_BYTE(128 - 'Z' + VV3); uint64_t mask = ((A ^ Z) & PACKED_BYTE(VV4)) >> 2; ``` * `128 - 'A' + VV2` 被用來計算 `'A'` 到 128 的距離,令 `c` 為輸入的一個 byte (已知 c 為 ascii,`c < 128` ) 如果 `c + 128 - 'A' >= 128 (0x80) (1000_0000)` 則 `c - 'A' >= 0, c >= 'A'` 。`c + 128 - 'A'` 的 ==MSB 為 1 則 c >= 'A'== * `128 - 'Z' + VV3` 被用來計算 `'Z'` 到 128 的距離,令 `c` 為輸入的一個 byte (已知 c 為 ascii,`c < 128` ) 如果 `c + 128 - 'Z' >= 129 (0x81) (1000_0001)` 則 c 不是 ascii ,因為這樣不好用 bit-wise 做計算,故我們在不等式的兩邊減一,`c + 128 - 'Z' - 1 >= 128 (0x80) (1000_0000)`,如果 `c + 128 - 'Z' - 1` 的 ==MSB 是 1 則 c > 'Z',MSB 是 0 則 c <= 'Z'== * 綜合以上兩個判斷條件,如果 `c + 128 - 'A'` 的 MSB 為 1 且 `c + 128 - 'Z' - 1` 的 MSB 為 0 則 c 滿足 `c >= 'A' 且 c <= 'Z'` c 是大寫字母 * 故使用 `A^Z` bit-wise XOR 來判斷上述邏輯,若 `c >= 'A' 且 c <= 'Z'` 則 c 被轉換成 `0x80=1000_0000` 反之 `c < 'A' 或 c > 'Z'` 則 c 被轉換成 `0x00=0000_0000` * `VV4=0x80` 被當作 mask 和 `(A^Z)` 做 bit-wise AND,接著 `0x80` 向右 shift 2 bit 變成 `0x20 (0010_0000)` 作為大小寫轉換的 bit flipping mask * `'A'=0x41` 翻轉右邊數來第六個 bit 變成 `'a'=0x61` ,`*chunk` 對 `0x20` 作 bit-wise XOR 將大寫轉換成小寫 * `VV1=0x80, VV2=0, VV3=-1, VV4=0x80` ### 延伸問題 * `將前述測試程式 main 函式的 char str[] 更換為 char *str,會發生什麼事?請參閱 C 語言規格書,說明 string literal 的行為` ## Test 6 ### 運作原理 ```c=1 int singleNumber(int *nums, int numsSize) { int lower = 0, higher = 0; for (int i = 0; i < numsSize; i++) { lower ^= nums[i]; lower &= KKK; higher ^= nums[i]; higher &= JJJ; } return lower; } ``` * 參考這篇 LeetCode 的討論 [Detailed explanation and generalization of the bitwise operation method for single numbers](https://leetcode.com/problems/single-number-ii/discuss/43295/Detailed-explanation-and-generalization-of-the-bitwise-operation-method-for-single-numbers) * 先討論只有一個 bit 的情況 * 假設輸入 i 只有一個 bit * 令 $X = \{x_m, x_{m-1}, ..., x_1\}$ 是一個 m-bit 計數器負責計算 i = 1 的次數 * 令 $K$ 是一個 1-bit mask * 假設 i = {1, 0, 1, 0, 1},依序輸入五個數字,希望 m-bit 計數器能夠計算 i=1 的次數 * i=0 時 `X ^= i` X 對 i 做 bit-wise XOR 後 $X = \{0,0,...,0\}$ 符合我們的預期 * i=1 時 `X ^= i` X 對 i 做 bit-wise XOR 後 $X = \{1,1,...,1\}$ ,但我們想要的計數器應該是 $X = \{0,0,...,1\}$ 還需要加上額外的運算才能符合我們的預期 * 分析 $x_j = 1$ 的條件是 $x_{j-1} = x_{j-2} = ... = x_0 = 1$ 且 $i=1$ ,這時候 $x_j$ 才需要進位,所以得到下面的公式 * $x_1 = i \oplus x_1$ * $x_2 = x_2 \oplus (i \cdot x_1)$ * $x_3 = x_3 \oplus (i \cdot x_1 \cdot x_2)$ * $x_j = x_j \oplus (i \cdot x_1 \cdot x_2 \cdot ... \cdot x_{j-1})$ * 實際使用公式計算輸入 i = {1, 0, 1, 0, 1} 的情形: | | x5 | x4 | x3 | x2 | x1 | |-----|----|----|----|----|----| | i=1 | 0 | 0 | 0 | 0 | 1 | | i=0 | 0 | 0 | 0 | 0 | 1 | | i=1 | 0 | 0 | 0 | 1 | 0 | | i=0 | 0 | 0 | 0 | 1 | 0 | | i=1 | 0 | 0 | 0 | 1 | 1 | * 又 m-bit 計數器會一直數下去,但是我們只在意計數器數到 3 的次數就好,故需要使用 bit mask 來限制計數器 * mask 設計的想法是當計數器等於 3 時,mask 要把計數器的每一個 bit 遮成零,其他的條件下 mask 不會干擾計數器的運作(等於1) * 我們用 T 表示次數 $T = \{0, 0, 0, 1, 1\}$ * 遮罩對應的是 bit-wise AND 的運算 * $x_1 = k_1 \cdot x_1$ * $x_2 = k_2 \cdot x_2$ * $x_3 = k_3 \cdot x_3$ * $x_j = k_j \cdot x_j$ * mask K 的計算方法如下 \begin{align} K &= \neg (y_m \land y_{m-1} \land ... \land y_1) \\ &y_j = \begin{cases} x_j, & t_j = 1 \\ \neg x_j, & t_j = 0 \\ \end{cases} \end{align} * 如果 $T = \{0, 0, 0, 1, 1\}$ 則 \begin{align} K &= \neg (\neg x_5 \land \neg x_4 \land \neg x_3 \land x_2 \land x_1) \\ &= x_5 \lor x_4 \lor x_3 \lor \neg x_2 \lor \neg x_1 \\ \end{align} 代表只有在計數器的值等於 $X=\{0, 0, 0, 1, 1\}$ 時 $K=0$, 計數器對 mask 做 bit-wise AND 之後計數器的值會被歸零,其他的情況下計數器不會被影響 * pseudo code 如下: ```c=1 int nums[5] = {1, 0, 1, 0, 1}; for (int i : nums ) { xm ^= (xm-1 & ... & x1 & i); xm-1 ^= (xm-2 & ... & x1 & i); ..... x1 ^= i; mask = ~(y1 & y2 & ... & ym) where yj = xj if kj = 1, and yj = ~xj if kj = 0 (j = 1 to m) xm &= mask; ...... x1 &= mask; } ``` * 接著我們要從 1-bit 的情況推廣到 32-bit 的情況 * 看到這個題目,直覺的反應是想要用 32 個 32 bit 計數器,計算每一個 bit 中出現 1 的次數,最後比較每一個 bit 中 1 出現的數目就可以知道哪一個數字指出現一次。但是這樣就不能善用 bit-wise operation 的優勢 * 使用一個 2 bit 計數器就能數到 3 次(00, 01, 10, 11),所以希望用 32 個 2 bit 計數器,每一個計數器被用來統計每一個 bit 中 1 出現的數目。 * 又因為 bit-wise operation 對每一個 bit 的操作都是獨立的,所以我們可以把這 32 個 2 bit 計數器分開來,變成 2 個 32 bit 的變數。這兩個變數在我們的程式碼裡頭被命名為 `lower` 和 `higher` 分別表示每一個 bit 的計數器的低位和高位 * 具體的示意圖如下:![](https://i.imgur.com/BZfGbW4.png) * ```graphviz digraph g{ // node node [ shape=rect, fillcolor=skyblue, style=filled ]; dot [label="..."]; // edge (hidden) b32->b31->b30->dot->b3->b2->b1 [style=invis, arrowhead=none]; // cluster_int subgraph cluster_int{ label="32-bit integer"; { rank=same b32 b31 b30 dot b3 b2 b1}; } subgraph counter{ // node node [fillcolor=green, shape=record]; _1_1 [label="1st"]; _2_1 [label="1st"]; _3_1 [label="1st"]; _dot_1 [label="..."]; _30_1 [label="1st"]; _31_1 [label="1st"]; _32_1 [label="1st"]; node [fillcolor=forestgreen, shape=record]; _1_2 [label="2nd"]; _2_2 [label="2nd"]; _3_2 [label="2nd"]; _dot_2 [label="..."]; _30_2 [label="2nd"]; _31_2 [label="2nd"]; _32_2 [label="2nd"]; // edge _1_2->_1_1->b1; _2_2->_2_1->b2; _3_2->_3_1->b3; _dot_2->_dot_1->dot[style=invis]; _30_2->_30_1->b30; _31_2->_31_1->b31; _32_2->_32_1->b32; } // lower subgraph cluster_l { label="lower"; {rank=same _1_1 _2_1 _3_1 _dot_1 _30_1 _31_1 _32_1} } // higher subgraph cluster_h { label="higher"; {rank=same _1_2 _2_2 _3_2 _dot_2 _30_2 _31_2 _32_2} } } ``` * 照上述邏輯的 32-bit 版本的實做程式碼如下 * ```c=1 int singleNumber(int *nums, int numsSize) { int lower = 0, higher = 0, mask = 0; for (int i = 0; i < numsSize; i++) { higher ^= lower & nums[i]; lower ^= nums[i]; mask = ~(lower & higher); higher &= mask; lower &= mask; } return lower; } ``` * `higher` `lower` 分別表示 2-bit 計數器的 lower, higher bit * 但是這和課堂上提供的版本有差異,接下來會推導兩者其實是相同的 * 對於 lower 的推導 ( higher同理 ) \begin{align} mask &= \neg ({ lower } \land { higher }) \\ lower &= lower \land mask \\ &= lower \land \neg ({ lower } \land { higher }) \\ &= lower \land ( \neg lower \lor \neg higher) \\ &= (lower \land \neg lower) \lor (lower \land \neg higher) \\ &= lower \land \neg higher \\ \end{align} * 所以計算 mask 之後再計算 lower 跟 mask 的 bit-wise AND 等價於 lower 對 ~higher 做 bit-wise AND * 所以課堂上的程式碼是精簡之後的版本,只使用到較少的運算 * `KKK=~higher` `JJJ=~lower` ### 延伸問題 * `LeetCode 執行結果正確不超時` ![](https://i.imgur.com/bA4QQZt.png) * `將重複 3 次改為改為重複 4 次或 5 次,並找出可容易推廣到多次的程式碼;` * 題目變更如下:Given a non-empty array of integers, every element appears ==r times== except for one, which appears exactly once. Find that single one. * 推廣到 r 次的[程式碼](https://github.com/supernovaremnant/quiz2/blob/master/test6.cc) 如下 ```c=1 int singleNumber(vector<U_INT> &nums, U_INT r) { // N is the size of counter U_INT N = 32 - __builtin_clz(r + 1); // array of counters vector<U_INT> cnt_arr(N); for (int i = 0; i < nums.size(); i++) { // compute the value of all counter // x_j = x_j ^ (x_j-1 & x_j-2 & ... & x_0 & nums[i]) for (int j = N - 1; j >= 0; j--) { U_INT tmp = nums[i]; for (int z = j - 1; z >= 0; z--) { tmp &= cnt_arr[z]; } cnt_arr[j] ^= tmp; } // compute the value of mask U_INT mask = 0xffffffff; U_INT one = 0x1; for (U_INT j = 0; j < N; j++) { if ((one & r) != 0) mask &= cnt_arr[j]; else mask &= (~cnt_arr[j]); one = one << 1; } mask = ~mask; // bit-wise & mask for (int j = N - 1; j >= 0; j--) { cnt_arr[j] &= mask; } } return cnt_arr[0]; } ```

    Import from clipboard

    Paste your markdown or webpage here...

    Advanced permission required

    Your current role can only read. Ask the system administrator to acquire write and comment permission.

    This team is disabled

    Sorry, this team is disabled. You can't edit this note.

    This note is locked

    Sorry, only owner can edit this note.

    Reach the limit

    Sorry, you've reached the max length this note can be.
    Please reduce the content or divide it to more notes, thank you!

    Import from Gist

    Import from Snippet

    or

    Export to Snippet

    Are you sure?

    Do you really want to delete this note?
    All users will lose their connection.

    Create a note from template

    Create a note from template

    Oops...
    This template has been removed or transferred.
    Upgrade
    All
    • All
    • Team
    No template.

    Create a template

    Upgrade

    Delete template

    Do you really want to delete this template?
    Turn this template into a regular note and keep its content, versions, and comments.

    This page need refresh

    You have an incompatible client version.
    Refresh to update.
    New version available!
    See releases notes here
    Refresh to enjoy new features.
    Your user state has changed.
    Refresh to load new user state.

    Sign in

    Forgot password

    or

    By clicking below, you agree to our terms of service.

    Sign in via Facebook Sign in via Twitter Sign in via GitHub Sign in via Dropbox Sign in with Wallet
    Wallet ( )
    Connect another wallet

    New to HackMD? Sign up

    Help

    • English
    • 中文
    • Français
    • Deutsch
    • 日本語
    • Español
    • Català
    • Ελληνικά
    • Português
    • italiano
    • Türkçe
    • Русский
    • Nederlands
    • hrvatski jezik
    • język polski
    • Українська
    • हिन्दी
    • svenska
    • Esperanto
    • dansk

    Documents

    Help & Tutorial

    How to use Book mode

    Slide Example

    API Docs

    Edit in VSCode

    Install browser extension

    Contacts

    Feedback

    Discord

    Send us email

    Resources

    Releases

    Pricing

    Blog

    Policy

    Terms

    Privacy

    Cheatsheet

    Syntax Example Reference
    # Header Header 基本排版
    - Unordered List
    • Unordered List
    1. Ordered List
    1. Ordered List
    - [ ] Todo List
    • Todo List
    > Blockquote
    Blockquote
    **Bold font** Bold font
    *Italics font* Italics font
    ~~Strikethrough~~ Strikethrough
    19^th^ 19th
    H~2~O H2O
    ++Inserted text++ Inserted text
    ==Marked text== Marked text
    [link text](https:// "title") Link
    ![image alt](https:// "title") Image
    `Code` Code 在筆記中貼入程式碼
    ```javascript
    var i = 0;
    ```
    var i = 0;
    :smile: :smile: Emoji list
    {%youtube youtube_id %} Externals
    $L^aT_eX$ LaTeX
    :::info
    This is a alert area.
    :::

    This is a alert area.

    Versions and GitHub Sync
    Get Full History Access

    • Edit version name
    • Delete

    revision author avatar     named on  

    More Less

    Note content is identical to the latest version.
    Compare
      Choose a version
      No search result
      Version not found
    Sign in to link this note to GitHub
    Learn more
    This note is not linked with GitHub
     

    Feedback

    Submission failed, please try again

    Thanks for your support.

    On a scale of 0-10, how likely is it that you would recommend HackMD to your friends, family or business associates?

    Please give us some advice and help us improve HackMD.

     

    Thanks for your feedback

    Remove version name

    Do you want to remove this version name and description?

    Transfer ownership

    Transfer to
      Warning: is a public team. If you transfer note to this team, everyone on the web can find and read this note.

        Link with GitHub

        Please authorize HackMD on GitHub
        • Please sign in to GitHub and install the HackMD app on your GitHub repo.
        • HackMD links with GitHub through a GitHub App. You can choose which repo to install our App.
        Learn more  Sign in to GitHub

        Push the note to GitHub Push to GitHub Pull a file from GitHub

          Authorize again
         

        Choose which file to push to

        Select repo
        Refresh Authorize more repos
        Select branch
        Select file
        Select branch
        Choose version(s) to push
        • Save a new version and push
        • Choose from existing versions
        Include title and tags
        Available push count

        Pull from GitHub

         
        File from GitHub
        File from HackMD

        GitHub Link Settings

        File linked

        Linked by
        File path
        Last synced branch
        Available push count

        Danger Zone

        Unlink
        You will no longer receive notification when GitHub file changes after unlink.

        Syncing

        Push failed

        Push successfully