JimmyLiu0530
    • Create new note
    • Create a note from template
      • Sharing URL Link copied
      • /edit
      • View mode
        • Edit mode
        • View mode
        • Book mode
        • Slide mode
        Edit mode View mode Book mode Slide mode
      • Customize slides
      • Note Permission
      • Read
        • Only me
        • Signed-in users
        • Everyone
        Only me Signed-in users Everyone
      • Write
        • Only me
        • Signed-in users
        • Everyone
        Only me Signed-in users Everyone
      • Engagement control Commenting, Suggest edit, Emoji Reply
    • Invite by email
      Invitee

      This note has no invitees

    • Publish Note

      Share your work with the world Congratulations! 🎉 Your note is out in the world Publish Note

      Your note will be visible on your profile and discoverable by anyone.
      Your note is now live.
      This note is visible on your profile and discoverable online.
      Everyone on the web can find and read all notes of this public team.
      See published notes
      Unpublish note
      Please check the box to agree to the Community Guidelines.
      View profile
    • Commenting
      Permission
      Disabled Forbidden Owners Signed-in users Everyone
    • Enable
    • Permission
      • Forbidden
      • Owners
      • Signed-in users
      • Everyone
    • Suggest edit
      Permission
      Disabled Forbidden Owners Signed-in users Everyone
    • Enable
    • Permission
      • Forbidden
      • Owners
      • Signed-in users
    • Emoji Reply
    • Enable
    • Versions and GitHub Sync
    • Note settings
    • Note Insights
    • Engagement control
    • Transfer ownership
    • Delete this note
    • Save as template
    • Insert from template
    • Import from
      • Dropbox
      • Google Drive
      • Gist
      • Clipboard
    • Export to
      • Dropbox
      • Google Drive
      • Gist
    • Download
      • Markdown
      • HTML
      • Raw HTML
Menu Note settings Versions and GitHub Sync Note Insights Sharing URL Create Help
Create Create new note Create a note from template
Menu
Options
Engagement control Transfer ownership Delete this note
Import from
Dropbox Google Drive Gist Clipboard
Export to
Dropbox Google Drive Gist
Download
Markdown HTML Raw HTML
Back
Sharing URL Link copied
/edit
View mode
  • Edit mode
  • View mode
  • Book mode
  • Slide mode
Edit mode View mode Book mode Slide mode
Customize slides
Note Permission
Read
Only me
  • Only me
  • Signed-in users
  • Everyone
Only me Signed-in users Everyone
Write
Only me
  • Only me
  • Signed-in users
  • Everyone
Only me Signed-in users Everyone
Engagement control Commenting, Suggest edit, Emoji Reply
  • Invite by email
    Invitee

    This note has no invitees

  • Publish Note

    Share your work with the world Congratulations! 🎉 Your note is out in the world Publish Note

    Your note will be visible on your profile and discoverable by anyone.
    Your note is now live.
    This note is visible on your profile and discoverable online.
    Everyone on the web can find and read all notes of this public team.
    See published notes
    Unpublish note
    Please check the box to agree to the Community Guidelines.
    View profile
    Engagement control
    Commenting
    Permission
    Disabled Forbidden Owners Signed-in users Everyone
    Enable
    Permission
    • Forbidden
    • Owners
    • Signed-in users
    • Everyone
    Suggest edit
    Permission
    Disabled Forbidden Owners Signed-in users Everyone
    Enable
    Permission
    • Forbidden
    • Owners
    • Signed-in users
    Emoji Reply
    Enable
    Import from Dropbox Google Drive Gist Clipboard
       owned this note    owned this note      
    Published Linked with GitHub
    Subscribed
    • Any changes
      Be notified of any changes
    • Mention me
      Be notified of mention me
    • Unsubscribe
    Subscribe
    # 2020q3 Homework (quiz2) contributed by <`JimmyLiu530`> ###### tags: `進階電腦系統理論與實作` [題目網址](https://hackmd.io/@sysprog/2020-quiz2#%E6%B8%AC%E9%A9%97-1) :::info 此作業範圍: [數值系統](https://hackmd.io/@sysprog/c-numerics) 及 [bitwise](https://hackmd.io/@sysprog/c-bitwise) 操作 ::: ## 測驗 1: ASCII 字元判斷 欲將以下用來判斷指定的記憶體範圍內是否全是有效的 `ASCII 字元`的函式 ```c= #include <stddef.h> bool is_ascii(const char str[], size_t size) { if (size == 0) return false; for (int i = 0; i < size; i++) if (str[i] & 0x80) /* i.e. (unsigned) str[i] >= 128 */ return false; return true; } ``` 在64位元的架構下,改寫成**一次比對`64位元`的資料**來提高效率。如下: ```c= #include <stdbool.h> #include <stddef.h> #include <stdint.h> #include <string.h> bool is_ascii(const char str[], size_t size) { if (size == 0) return false; int i = 0; while ((i + 8) <= size) { uint64_t payload; memcpy(&payload, str + i, 8); if (payload & MMM) return false; i += 8; } while (i < size) { if (str[i] & 0x80) return false; i++; } return true; } ``` ### 程式說明 到目前為止,ASCII共定義128個字元(從第0~127號),若用2進位來表示 00000000 ~ 01111111 就是 ASCII 可表示的字元範圍。我們可以觀察出這些有效字元轉成2進位後,MSB 都會是`0`,所以用`& 10000000(0x80)`去對每個字元做檢測,若結果 >0,表示無效位元。 因為我們想要一次比對 `64` 位元(8 bytes),因此原本用來檢測的 `0x80` 要擴展成 `0x8080808080808080`,所以上述程式中,MMM應填入 `0x8080808080808080`。 最後第17~21行用來檢測最後那些`不足 8 bytes` 的部分。 ### 延伸問題 1. `memcpy` 這邊之所以用 `memcpy` 是為了讓一次比對的這 8 bytes 對齊,也就是放在同一個 word 中,好讓 cpu 一次讀取完成。 :::warning NOTE: 當資料來源和欲寫入的目的記憶體區塊有部分重疊的話,應改用 [`memmove`](https://www.tutorialspoint.com/c_standard_library/c_function_memmove.htm) ::: 2. 將上述技巧應用於「給予一個已知長度的字串,檢測裡頭是否包含有效的英文大小寫字母」 ```c= #include <stdbool.h> #include <stddef.h> #include <stdint.h> #include <ctype.h> #include <string.h> #define PACKED_BYTE(b) (((uint64_t)(b) & (0xff)) * 0x0101010101010101u) bool contain_letters(const char str[], size_t size) { if (size == 0) return false; size_t n = size/8; for (size_t i = 0; i < n; i++, str+=8){ uint64_t *chunk = (uint64_t *) str; if((*chunk & PACKED_BYTE(0x80)) == 0){ /* is valid ASCII */ uint64_t A = *chunk + PACKED_BYTE(128 - 'A'); uint64_t Z = *chunk + PACKED_BYTE(128 - 'Z' -1); uint64_t uppercase = (A ^ Z) & PACKED_BYTE(0x80); uint64_t a = *chunk + PACKED_BYTE(128 - 'a'); uint64_t z = *chunk + PACKED_BYTE(128 - 'z' -1); uint64_t lowercase = (a ^ z) & PACKED_BYTE(0x80); if ((uppercase | lowercase) & PACKED_BYTE(0x80)) return true; } } n = size % 8; for(size_t i = 0; i < n; i++){ if((str[i] >= 65 && str[i] <= 90) || (str[i] >= 97 && str[i] <= 122) ) return true; } return false; } ``` - 利用測驗5的`PACKED_BYTE` 及 只有大寫字母的 `(A ^ Z) & PACKED_BYTE(0x80)` 這項會有 0x80 的特性去做改寫。 - 新增對小寫字母的判斷 (21~23行) - 若(uppercase | lowercase)其中有包含0x80,則為真 3. 承 (2),考慮常見的英文標點符號,判斷輸入字串是否為有效的字元集,避免逐一字元比對 ```c= // TODO ``` ----------------------------------------------------------- ## 測驗2: 16進位字元轉數值 開發解析器 (parser) 時,常會將十六進位表示的字串 (hexadecimal string) 轉為數值,例如 `'0xF'` (大寫 F 字元) 和 `'0xf'` (小寫 f 字元) 都該轉換為 15。考慮以下不需要分支 (branchless) 的實作: ```c= uint8_t hexchar2val(uint8_t in) { const uint8_t letter = in & MASK; const uint8_t shift = (letter >> AAA) | (letter >> BBB); return (in + shift) & 0xf; } ``` 已知 `in` 一定隸屬於 `'0'`, `'1'`, `'2'`, …, `'9'` 及 `'a'`, `'b'`, `'c'`, …, `'f'` (小寫) 及 `'A'`, `'B'`, `'C'`, …, `'F'` (大寫) 這些字元。預期 `hexchar2val('F')` 和 `hexchar2val('f')` 回傳 `15`,且 `hexchar2val('0')` 回傳 `0`,其他輸入都有對應的數值。 以下摘自 ASCII 表格 * `'0'`, `'1'`, `'2'`, … , `'9'` 對應到 `0x30`, `0x31`, `0x32`, … ,`0x39` * `'a'`, `'b'`, `'c'`, …, `'f'` (小寫) 對應到 `0x61`, `0x62`, `0x63`, …, `0x66` * `'A'`, `'B'`, `'C'`, …, `'F'` 對應到 `0x41`, `0x42`, `0x43`, …, `0x46` ### 程式說明 首先我們可以從變數的命名看出一些端倪,`in & MASK` 的結果為 `letter`,代表這邊想區分出 `數字` 跟 `字母`。再觀察我們會發現數字的 ASCII 都是`0x3X`, a~f小寫字母都是`0x6X`, A~F大寫字母都是`0x4X`,換成2進位表示比較: | in | hex | binary | | ---: | --- | ---- | | 數字 | 0x3X |0**0**11 _ _ _ _ | | 大寫字母 | 0x4X | 0**1**00 _ _ _ _ | | 小寫字母 | 0x6X | 0**1**10 _ _ _ _ | 發現數字跟字母的第4跟第6個 bit 剛好會相反,但為了要表現出 "如果是字母,letter 才會有值",所以用第6個 bit 來區分而不是第4個,因此 `MASK` 取 `01000000` = `0x40`。 接下來假設 `in` 是數字,則 letter = `0x0`,不管 `AAA` 跟 `BBB` 是什麼,shift = `0x0`,最後`(in + shift) & 0xf` 都會得到 `in` 對應的數字,也就是說我們無法從 數字 來推敲出 `AAA` 跟 `BBB`,所以得從 字母 下手。 假設 `in` = 'A' = 0x41 = 0100 _0001_ 轉成數值會是 10 = 0000 _1010_,因為最後一行 ( in + shift ) & 0xf 的 `& 0xf`,所以我們只要看**最低位的4個 bit** 就好。又 0001 (in) + _ _ _ _ = 1010 (輸出數值),很明顯 _ _ _ _會是 1001,所以 `shift` = 0000 **1001**。又因為 `in` 是字母,所以 `letter` = 0100 0000,`shift` 可透過 (letter >> `3`) | (letter >> `6`) 得到,因此 `AAA = 3`, `BBB = 6`。 > 為什麼只透過 `in = 'A'` 的推論就可以直接斷定地說 `AAA = 3`, `BBB = 6`? 難道 `in` 換成 B, C, D, b, c, d也會是一樣的結果嗎? > > 是的! 因為`'A'` 與 `'a'` 最低位的4個 bit 皆為 0001,且若把 `shift` 視為 `in` 跟 `其對應的數值` 最低位的4個 bit 的差值,就會發現 `shift` 是不變的! 舉例來說如果 `in`從 A 換成 C, `in` 的值增加2 (從0x41到0x43),但其實其對應的值也會增加2 (從10到12),所以他們的差值是不變的。 | var | hex | binary | decimal | ---: | --- | ---- | --- | | in(='A') | 0x41 |0100 **0001** | 65 | | (in+shift) & 0xf | 0x0A | 0000 **1010** | 10 | | shift | 0x09 | 0000 ==1001== | 9 | | letter | 0x40 | 0100 0000 | 64 | ### 延伸問題 將 hexchar2val 擴充為允許 "0xFF", "0xCAFEBABE", "0x8BADF00D" 等字串輸入,且輸出對應的十進位數值 ```c= #include <assert.h> uint64_t hexstr2val(const char str[], size_t size) { if(size == 0) return 0; assert(size <= 18); assert(str[0] == '0' && (str[1] == 'x' || str[1] == 'X')); uint64_t output = 0; for (size_t i = 2; i < size; i++){ uint8_t tmp = hexchar2val(str[i]); output = (output << 4) | tmp; } return output; } ``` - 只允許字串長度最多`18 bytes`(含0x)且開頭只能是 `0X` 或 `0x` - 針對字串中的字元去呼叫 `hexchar2val` - 將結果先存入一個 64 bits 的變數,等整個字串轉換完成再輸出 ----------------------------------------------------------- ## 測驗3: 快速mod運算 除法運算的成本往往高於加法和乘法,於是改進的策略被引入,其中一種策略是將除法改為乘法運算。 ```c= const uint32_t D = 3; #define M ((uint64_t)(UINT64_C(XXX) / (D) + 1)) /* compute (n mod d) given precomputed M */ uint32_t quickmod(uint32_t n) { uint64_t quotient = ((__uint128_t) M * n) >> 64; return n - quotient * D; } ``` 其中 `__uint128_t` 是 GCC extension,用以表示 128 位元的無號數值。 預期 quickmod(5) 會得到 2, quickmod(55) 得到 1, quickmod(555) 得到 0 (555 是 3 的倍數)。 ### 程式說明 依據題目給的提示 $\dfrac{n}{D} = n*\dfrac{\dfrac{2^N}{D}}{2^N}$,以及程式第6行可看出 `N = 64` 並且 M = $\dfrac{2^N}{D}$= $\dfrac{2^{64}}{3}$,但因為 $\dfrac{2^{64}}{3}$ 無法用整數表達,因此我們要先估算,並將差值補償回去。選項中最接近 $\dfrac{2^{64}}{3}$ 的就是 $\dfrac{2^{64}-1}{3}+1$,所以分子那項就是`0xFFFFFFFFFFFFFFFF`,因此 `XXX = 0xFFFFFFFFFFFFFFFF`。 接下來我們要驗證其結果的正確性。欲驗證 $\dfrac{n}{D}$ = $\dfrac{M*n}{2^{64}}$,其中 M = $\dfrac{2^{64}-1}{D}+1$。 右式 = $\dfrac{M*n}{2^{64}}$ = $\dfrac{{(\dfrac{2^{64}-1}{D}+1})*n}{2^{64}}$ = $\dfrac{{(2^{64}-1+D})*n}{D*2^{64}}$ = $\dfrac{2^{64}*n-n+D*n}{D*2^{64}}$ = $\dfrac{n}{D}+\dfrac{n(D-1)}{D*2^{64}}$,又因為 n <= $2^{32}-1$ < $2^{64}$,所以$\dfrac{n(D-1)}{D*2^{64}}$ < 1,因此最後結果存入 `quotient` 時,小於 1 的部分會被忽略,就會得到`右式 = 左式`的結果。因此,當我們能得到正確的商數,就能保證`餘數(=n - quotient * D)`正確。 ### 延伸問題 比較透過處理器的整數除法指令及本技巧的效能差距 // TODO ----------------------------------------------------------- ## 測驗4: 是否被指定的除數整除 延伸測驗 3,我們想判斷某個數值能否被指定的除數所整除,在 D 和 M 都沿用的狀況下,程式碼如下: ```c= bool divisible(uint32_t n) { return n * M <= YYY; } ``` ### 程式說明 > 參考`tsengsam` 大大的作法,不僅清楚也易懂! 根據第三題得知 M = $\dfrac{2^{64}-1}{3}+1$,若寫成16進位型式,`M = 0x5555555555555556`。 已知 D = 3,因此我們可以將所有正整數分成三類 - `3k`, `3k+1`, `3k+2`,其中 k 屬於正整數。 當 `n = 3k`, $n*M$ = $3k*M$ = $3k*0x555...56$ = $3k*(0x555...5+1)$ = $k(0xfff...f+3)$ = $2k$ ==(重點!!因為 `0xfff...f` 再加3會 overflow 因此變成 2)== 當 `n = 3k+1`, $n*M$ = $(3k+1)*M$ = $3k*0x555...56 + M$ = $2k+M$ 當 `n = 3k+2`, $n*M$ = $(3k+2)*M$ = $3k*0x555...56 + 2M$ = $2k+2M$ 整除跟不可整除就差在有沒有 M ,所以當`n * M <= M-1`時,n 一定能被 D=3 整除。 :::info Note: 有沒有辦法證明到 對於所有的 D 都適用? > 令 D = k,則 $M = \dfrac{2^{64}-1}{k}+1$,並且可將所有正整數分成 k 類: km, km+1, km+2,...,km+(k-1) > > 當 n = km, $n*M$ = $km*\dfrac{2^{64}-1}{k}+1$ = $m*2^{64}-m+km$,因為 $2^{64}$在64位元的無號整數中會是`0x0`,所以$m*2^{64}-m+km$ = $(k-1)*m$ > 當 n = km+1, $n*M$ = $(km+1)*M$ = $km*M + M$ = $(k-1)m+M$ > 當 n = km+2, $n*M$ = $(km+2)*M$ = $km*M + 2M$ = $(k-1)m+2M$ ... > 當 n = km+(k-1), $n*M$ = $(km+(k-1))*M$ = $km*M + (k-1)M$ = $(k-1)m + (k-1)M$ 因此,由上述證明可見不管 D 帶多少都會成立 ::: ----------------------------------------------------------- ## 測驗5: 字串英文字母全改為小寫 考慮 `strlower` 函式的作用是將指定的字串 (或部分字串) 的英文字母全改為小寫,其 in-place 的實作如下: ```c= #include <ctype.h> #include <stddef.h> /* in-place implementation for converting all characters into lowercase. */ void strlower(char *s, size_t n) { for (size_t j = 0; j < n; j++) { if (s[j] >= 'A' && s[j] <= 'Z') s[j] ^= 1 << 5; else if ((unsigned) s[j] >= '\x7f') /* extended ASCII */ s[j] = tolower(s[j]); } } ``` 在 64 位元處理器架構 (以下針對 `x86_64`, little endian),我們可引入向量化 (vectorized) 的手段,避免一次比對一個字元,而是直接操作一整個 word (對 `x86_64` 來說,就是 64 bits,或 8 個位元組)。沿用上述的 `strlower` 函式,我們用這樣的思路實作向量化的 `strlower`,程式碼列表如下: ```c= #include <ctype.h> #include <stddef.h> #include <stdint.h> #define PACKED_BYTE(b) (((uint64_t)(b) & (0xff)) * 0x0101010101010101u) /* vectorized implementation for in-place strlower */ void strlower_vector(char *s, size_t n) { size_t k = n / 8; for (size_t i = 0; i < k; i++, s += 8) { uint64_t *chunk = (uint64_t *) s; if ((*chunk & PACKED_BYTE(VV1)) == 0) { /* is ASCII? */ uint64_t A = *chunk + PACKED_BYTE(128 - 'A' + VV2); uint64_t Z = *chunk + PACKED_BYTE(128 - 'Z' + VV3); uint64_t mask = ((A ^ Z) & PACKED_BYTE(VV4)) >> 2; *chunk ^= mask; } else strlower(s, 8); } k = n % 8; if (k) strlower(s, k); } ``` 對應的測試程式碼如下: ```c= #include <stdio.h> #include <string.h> int main() { /* quote from https://www.gutenberg.org/files/74/74-h/74-h.htm */ char str[] = "This eBook is for the use of anyone anywhere at no cost and with \ almost no restrictions whatsoever. You may copy it, give it away or \ re-use it under the terms of the Project Gutenberg License included \ with this eBook or online at www.gutenberg.net"; int n = strlen(str); strlower_vector(str, n); puts(str); } ``` 參考執行輸出: > this ebook is for the use of anyone anywhere at no cost and with almost no restrictions whatsoever. you may copy it, give it away or re-use it under the terms of the project gutenberg license included with this ebook or online at `www.gutenberg.net` ### 程式說明 `PACKED_BYTE(b)` 的作用是將傳入的 b, (e.g. 0x80), 擴展成64位元的無號整數 (e.g. 0x8080...80) 程式第13行用來判斷是否為 ASCII 字元,這我們在測驗1已經做過了,所以知道 PACKED_BYTE(VV1) = 0x8080808080808080 => (((uint64_t)(VV1) & (0xff)) * 0x0101010101010101u) = 0x8080808080808080 =>`VV1 = 0x80` 接下來,觀察 `strlower` 的第8行發現相對應的大小寫字母只差 00100000(0x20),因此我們可以推論 `mask = 0x2020...20`,又 `mask = ((A ^ Z) & PACKED_BYTE(VV4)) >> 2;`,所以 (A ^ Z) & PACKED_BYTE(VV4) = 0x8080...80 => `VV4 = 0x80`。此外,根據`mask = ((A ^ Z) & PACKED_BYTE(VV4)) >> 2;`我們也得知 (A ^ Z) = 1 _ _ _ _ _ _ _ 1 _ _ _ _ _ _ _ ... 最後,為了比較好理解,賦予`PACKED_BYTE(128 - 'A' + VV2)` 跟 `PACKED_BYTE(128 - 'Z' + VV3)` 意義。至於為什麼要這麼看,我們後面再解釋。 > 將 `PACKED_BYTE(128 - 'A' + VV2)` 看成 使最小的大寫字母 'A' 與其相加 >= 0x80 的最小數; > 將 `PACKED_BYTE(128 - 'Z' + VV3)` 看成 使最大的大寫字母 'Z' 與其相加 < 0x80 的最大數。 > 因此,`'A'` 在 ASCII 的值為0x41,最小數取 0x3f,因為 0x80 - 0x41 = 0x3f,所以PACKED_BYTE (128 - 'A' + VV2) = 0x3f3f...3f,整理最後可以得到`VV2 = 0`,同理,`VV3 = -1`。 為甚麼會這麼看呢? 根據上述第三段的最後一行,(A ^ Z) = 1 _ _ _ _ _ _ _ 1 _ _ _ _ _ _ _ ... 代表 A 跟 Z 只能有一個長 1 _ _ _ _ _ _ _ 1 _ _ _ _ _ _...,而且一定是 A,因為 A > Z。所以當最小的大寫字母 'A' 與 PACKED_BYTE(128 - 'A') 相加都能達到 0x80了,那麼'B', 'C',...,'Z'一定也可以。 相反的,Z 每兩個 byte 的 MSB 就一定要是0,所以當最大的大寫字母 'Z' 與 PACKED_BYTE(128 - 'Z' -1) 相加都不超過 0x80了,那麼'B', 'C',...,'Z'一定也不超過。 ### 延伸問題 將前述測試程式 main 函式的 `char str[]` 更換為 `char *str`,會發生什麼事?請參閱 C 語言規格書,說明 string literal 的行為 參考 [字串(字元陣列 char str[])與字元指標(char* str)的關係](https://hackmd.io/@karasu/string-and-pointer-in-c) ```c= char str[] = "Hello World"; ``` 與 ```c= char *str = "Hello World"; ``` 皆宣告了 str 字串,在 C-style string 的函數皆可使用,但背後的意義卻是不相同。 前者是個 char array, 含12 bytes(含\0),"Hello World" 對於 str 來說是 initializer,將字元一個一個複製進 str 陣列 而後者是個指向 char 的 pointer,由於 "Hello World" 本身就是一個 string literal (在C99標準[6.4.5] 提到),所以 str 指向這個 string literal 值得一看的例子: 1. 只有char *s可以使用 *s++寫法 ```c= #inlcude <stdio.h> int main(void) { char s[] = "Hello World"; for (int i = 0; i < 11; i++){ printf("%c", *s++); } } ``` 會出現 error ```c= error: lvalue required as increment operand ``` 這是因為雖然 s = &s[0],但 s 是"常數",恆等於 &s[0]。而 char *s 為指標指向 s[0],但卻是變數,可以任意改變,因此可用 *s++來更改 pointer 的值 2. Segment fault ```c= #include <stdio.h> int main(void){ char* str = NULL; gets(str); printf("%s\n", str); return 0; } ``` 會出現 error ```c= error: Segment fault (core dumped) ``` 應改成 `char str[] = NULL` ,或是在第4、5行中加入 `str = (char *)malloc(N*sizeof(char));` 才對 ----------------------------------------------------------- ## 測驗6: LeetCode [137.Single number II](https://leetcode.com/problems/single-number-ii/) > Given a non-empty array of integers, every element appears three times except for one, which appears exactly once. Find that single one. Note: Your algorithm should have a linear runtime complexity. Could you implement it without using extra memory? Example 1: Input: [2,2,3,2] Output: 3 Example 2: Input: [0,1,0,1,0,1,99] Output: 99 考慮以下程式碼為 LeetCode 137. Single Number II 的題解: ```c= int singleNumber(int *nums, int numsSize) { int lower = 0, higher = 0; for (int i = 0; i < numsSize; i++) { lower ^= nums[i]; lower &= KKK; higher ^= nums[i]; higher &= JJJ; } return lower; } ``` ### 程式說明 我們一樣賦予 `lower` 跟 `higher` 意義來幫助理解,`lower` 想像成所有之前出現一次的數的集合;`higher`想像成所有之前出現兩次的數的集合,而且這兩個集合交集為空集合。 一開始,先將兩個集合清空。接著,一個一個去檢查 `nums` 陣列的數字: 1. if (`num[i]` 已經在 lower),則移除 `num[i]`,對應到程式 line 5 > 若此數已在 lower,代表之前已出現過一次,包含這次已第二次,因此接下來它不該出現在 lower 中 => 移除 else if (`num[i]` 不在 higher),則將 `num[i]` 加入至 lower,對應到程式 line 6 > 那如果不在 lower,有兩種可能:(1)這次是**第一次**出現 及(2)這次是**第三次**出現 >既然該數不在上一輪的 higher 中,代表這次是第一次出現,所以要將該數加入 lower 因此,`KKK = ~higher` 2. if (`num[i]` 已經在 higher),則移除 `num[i]`,對應到程式line 7 > 若此數已在 higher,代表此數已出現過兩次,包含這次已第三次,因此接下來它不該出現在 higher 中 => 移除 else if (`num[i]` 不在 lower),則將 `num[i]` 加入至 higher,對應到程式 line 8 > 那如果不在 higher,有兩種可能:(1)這次是**第二次**出現 及(2)這次是**第一次**出現 > 既然該數不在這一輪的 lower 中,代表這次是第二次出現,所以要將該數加入 higher 因此,`JJJ = ~lower` 3. 最後,回傳 lower 集合剩下的數字。 :::info 為甚麼 XOR 可以辦到像是模擬出一個集合的操作? 因為 XOR 的運算特性: 自反: `X ^ Y ^ Y = X` 交換律: `X ^ Y = Y ^ X` 結合律: `(X ^ Y) ^ Z = X ^ (Y ^ Z)` 因此,只要有 XOR 兩次(不用連續)相同的變數時,該變數就會消失,即 X ^ Y ^ Z ^ Y = X ^ Z > 我們 down 到 `bit` 的觀點來看會更加清楚: 若該位元出現過一次 1 則會放在 lower;出現兩次會在 higher,三次則都沒有 | | 初始| 3 | 2 | 3 | 3 | |--|--|---|---|---|---| | binary | ----- | 0011 | 0010 | 0011 | 0011 | | lower |0000| 0011 | 0001 | 0000 | **0010** | | higher |0000| 0000 | 0010 | 0001 | 0000 | > 先拿數字跟 lower 一個 bit 一個 bit 比對,若某字元已經出現過一次(即 lower 中該字元為 1),則這次為第二次,所以 lower 中該 bit 應 clear (即 XOR 操作)。若某字元在 lower 中為 0 ,則需檢查該字元在 higher 中是否為 1,若無,則 lower 中該字元 set(即 & ~higher操作)。 ::: ### 延伸問題 1. 請撰寫不同上述的程式碼,並確保在 LeetCode 137. Single Number II 執行結果正確且不超時 ```c= int singleNumber(int* nums, int numsSize){ int one = 0, two = 0, three = 0; for (int i = 0; i < numsSize; ++i){ two |= one & nums[i]; one ^= nums[i]; three = one & two; one &= ~three; two &= ~three; } return one; } ``` ![](https://i.imgur.com/ahxdHmA.png) 這也是 down 到位元的層級去看,one, two, three 分別代表第 i 位元出現過1, 2, 3次的 mask。 假設現在出現一個數字 nums[i],更新 one 的方法就是 `one ^= nums[i]`,而更新 one 的方法就是用上一個狀態的 one 與 nums[i] 做 and 再跟原本的 two 做 or,所以 two 要比 one 更早更新。至於 three 要如何更新就由 `one & two` 決定,目的在於將已經出現三次的位元 clear。最後 one 剩下的值就是只出現一次的。 2. 延伸題意,將重複 3 次改為改為重複 4 次或 5 次,並找出可容易推廣到多次的程式碼 重複5次 => 需要 $\lceil log_25 \rceil$ = $3$ 個 32-bit integers 作為 counter 重複n次 => 需要 $\lceil log_2n \rceil$ 個 32-bit integers 作為 counter ```c= /* 重複5次 */ int singleNumber(int* nums, int numsSize){ int one = 0, two = 0, three = 0, four = 0; for (int i = 0; i < numsSize; ++i){ three ^= one & two & nums[i]; two ^= one & nums[i]; one ^= num[i]; four = one & two; one &= ~four; two &= ~four; two &= ~four; } return one; } ``` 透過比較此程式碼與上面延伸問題1的程式碼,可以清楚的推廣到重複7次、11次、13次...至於像是重複4次,就可以利用重複2次的程式碼實作,重複6次,能用重複3次的程式碼實作,以此類推。[詳細解說](https://leetcode.com/problems/single-number-ii/discuss/43295/Detailed-explanation-and-generalization-of-the-bitwise-operation-method-for-single-numbers)

    Import from clipboard

    Paste your markdown or webpage here...

    Advanced permission required

    Your current role can only read. Ask the system administrator to acquire write and comment permission.

    This team is disabled

    Sorry, this team is disabled. You can't edit this note.

    This note is locked

    Sorry, only owner can edit this note.

    Reach the limit

    Sorry, you've reached the max length this note can be.
    Please reduce the content or divide it to more notes, thank you!

    Import from Gist

    Import from Snippet

    or

    Export to Snippet

    Are you sure?

    Do you really want to delete this note?
    All users will lose their connection.

    Create a note from template

    Create a note from template

    Oops...
    This template has been removed or transferred.
    Upgrade
    All
    • All
    • Team
    No template.

    Create a template

    Upgrade

    Delete template

    Do you really want to delete this template?
    Turn this template into a regular note and keep its content, versions, and comments.

    This page need refresh

    You have an incompatible client version.
    Refresh to update.
    New version available!
    See releases notes here
    Refresh to enjoy new features.
    Your user state has changed.
    Refresh to load new user state.

    Sign in

    Forgot password

    or

    By clicking below, you agree to our terms of service.

    Sign in via Facebook Sign in via Twitter Sign in via GitHub Sign in via Dropbox Sign in with Wallet
    Wallet ( )
    Connect another wallet

    New to HackMD? Sign up

    Help

    • English
    • 中文
    • Français
    • Deutsch
    • 日本語
    • Español
    • Català
    • Ελληνικά
    • Português
    • italiano
    • Türkçe
    • Русский
    • Nederlands
    • hrvatski jezik
    • język polski
    • Українська
    • हिन्दी
    • svenska
    • Esperanto
    • dansk

    Documents

    Help & Tutorial

    How to use Book mode

    Slide Example

    API Docs

    Edit in VSCode

    Install browser extension

    Contacts

    Feedback

    Discord

    Send us email

    Resources

    Releases

    Pricing

    Blog

    Policy

    Terms

    Privacy

    Cheatsheet

    Syntax Example Reference
    # Header Header 基本排版
    - Unordered List
    • Unordered List
    1. Ordered List
    1. Ordered List
    - [ ] Todo List
    • Todo List
    > Blockquote
    Blockquote
    **Bold font** Bold font
    *Italics font* Italics font
    ~~Strikethrough~~ Strikethrough
    19^th^ 19th
    H~2~O H2O
    ++Inserted text++ Inserted text
    ==Marked text== Marked text
    [link text](https:// "title") Link
    ![image alt](https:// "title") Image
    `Code` Code 在筆記中貼入程式碼
    ```javascript
    var i = 0;
    ```
    var i = 0;
    :smile: :smile: Emoji list
    {%youtube youtube_id %} Externals
    $L^aT_eX$ LaTeX
    :::info
    This is a alert area.
    :::

    This is a alert area.

    Versions and GitHub Sync
    Get Full History Access

    • Edit version name
    • Delete

    revision author avatar     named on  

    More Less

    Note content is identical to the latest version.
    Compare
      Choose a version
      No search result
      Version not found
    Sign in to link this note to GitHub
    Learn more
    This note is not linked with GitHub
     

    Feedback

    Submission failed, please try again

    Thanks for your support.

    On a scale of 0-10, how likely is it that you would recommend HackMD to your friends, family or business associates?

    Please give us some advice and help us improve HackMD.

     

    Thanks for your feedback

    Remove version name

    Do you want to remove this version name and description?

    Transfer ownership

    Transfer to
      Warning: is a public team. If you transfer note to this team, everyone on the web can find and read this note.

        Link with GitHub

        Please authorize HackMD on GitHub
        • Please sign in to GitHub and install the HackMD app on your GitHub repo.
        • HackMD links with GitHub through a GitHub App. You can choose which repo to install our App.
        Learn more  Sign in to GitHub

        Push the note to GitHub Push to GitHub Pull a file from GitHub

          Authorize again
         

        Choose which file to push to

        Select repo
        Refresh Authorize more repos
        Select branch
        Select file
        Select branch
        Choose version(s) to push
        • Save a new version and push
        • Choose from existing versions
        Include title and tags
        Available push count

        Pull from GitHub

         
        File from GitHub
        File from HackMD

        GitHub Link Settings

        File linked

        Linked by
        File path
        Last synced branch
        Available push count

        Danger Zone

        Unlink
        You will no longer receive notification when GitHub file changes after unlink.

        Syncing

        Push failed

        Push successfully