2023q1 Homework2 (quiz2)

--- tags: linux kernel --- # 2023q1 Homework2 (quiz2) contributed by < [hankTaro](https://github.com/hankTaro/quiz2/tree/master) > ## 測驗一 ### 說明與改寫考慮 next_pow2 可針對給定無號 64 位元數值 x，找出最接近且大於等於 2 的冪的值原始程式碼為: ```c uint64_t next_pow2(uint64_t x) { x |= x >> 1; x |= x >> 1; x |= x >> 1; x |= x >> 1; x |= x >> 1; x |= x >> 1; x |= x >> 1; x |= x >> AAAA;// AAAA = 8 x |= x >> 16; x |= x >> BBBB;// BBBB = 32 return CCCC;// x + 1 } ``` 由二進位個概念可知，2 的冪必為最高位元後的所有位元皆為 0 所以其程式碼的作用是將最高位元數下的各個 bit 改為 1 ，最後在 `x + 1` 使其進位，便可以得到大於其值最接近的2的冪可將上方的程式碼簡化成 ```c uint64_t next_pow2(uint64_t x) { //x // x = 0010000000000000 x |= x >> 1; // x = 0011000000000000 x |= x >> 2; // x = 0011110000000000 x |= x >> 4; // x = 0011111111000000 x |= x >> 8; // x = 0011111111111111 x |= x >> 16; x |= x >> 32; return x + 1;// x = 0100000000000000 = 2^14 } ``` 而以上函式還可以使用 `__builtin_clz(x)` 簡化 > `int __builtin_clz (unsigned int x)` Returns the number of leading 0-bits in x, starting at the most significant bit position. If x is 0, the result is undefined. 此函式可以求出 leading 1 前的 0 個數所以使用此函式的程式碼如下: ```c uint64_t next_pow2(uint64_t x) { int n = __builtin_clz(x); return 1 << (64 - n); } ``` 但上述的程式碼有兩個問題 1. 當 x 為 0 時，n 會未定義 > int \__builtin_clz (unsigned int x): Returns the number of leading 0-bits in x, starting at the most significant bit position. ==If x is 0, the result is undefined.== 2. 當輸入的值為 $2^n$ 時，會求得 $2^{n+1}$ 而非 $2^n$ > 要求: > 找出最接近且大於等於 2 的冪的值以下為更改過後的程式碼: ```c uint64_t next_pow2(uint64_t x) { if (!x) return 1; int n = __builtin_clz(x); // 判別 x 在二進位中是否只有一個位元為 1 if (n + __builtin_ctz(x) == 63) return x; return 1 << (64 - n); } ``` ### 編譯器能否產生 clz 的 x86 指令？可以發現 `bsr` 指令，編譯器有產生對應指令 > [endbr64](https://stackoverflow.com/questions/56905811/what-does-the-endbr64-instruction-actually-do) ``` next_pow2: .LFB23: .cfi_startproc endbr64 movl $1, %eax testq %rdi, %rdi je .L1 bsrl %edi, %edi movl $64, %ecx xorl $31, %edi subl %edi, %ecx sall %cl, %eax cltq .L1: ret .cfi_endproc ``` ### 在 Linux 核心原始程式碼找出類似的使用案例並解釋此程式碼用於求出 [Hamming Weight](https://en.wikipedia.org/wiki/Hamming_weight) ```c //types and constants used in the functions below //uint64_t is an unsigned 64-bit integer variable type (defined in C99 version of C language) const uint64_t m1 = 0x5555555555555555; //binary: 0101... const uint64_t m2 = 0x3333333333333333; //binary: 00110011.. const uint64_t m4 = 0x0f0f0f0f0f0f0f0f; //binary: 4 zeros, 4 ones ... const uint64_t m8 = 0x00ff00ff00ff00ff; //binary: 8 zeros, 8 ones ... const uint64_t m16 = 0x0000ffff0000ffff; //binary: 16 zeros, 16 ones ... const uint64_t m32 = 0x00000000ffffffff; //binary: 32 zeros, 32 ones const uint64_t h01 = 0x0101010101010101; //the sum of 256 to the power of 0,1,2,3... //This is a naive implementation, shown for comparison, //and to help in understanding the better functions. //This algorithm uses 24 arithmetic operations (shift, add, and). int popcount64a(uint64_t x) { x = (x & m1 ) + ((x >> 1) & m1 ); //put count of each 2 bits into those 2 bits x = (x & m2 ) + ((x >> 2) & m2 ); //put count of each 4 bits into those 4 bits x = (x & m4 ) + ((x >> 4) & m4 ); //put count of each 8 bits into those 8 bits x = (x & m8 ) + ((x >> 8) & m8 ); //put count of each 16 bits into those 16 bits x = (x & m16) + ((x >> 16) & m16); //put count of each 32 bits into those 32 bits x = (x & m32) + ((x >> 32) & m32); //put count of each 64 bits into those 64 bits return x; } ``` 其概念是先使用 `& 0x5555` 求出其每隔一位的 1 的數量，使用兩個 bit 保存，將兩者數量相加後，隨後用 `& 0x3333` 求出其每隔兩位的 1 的數量，使用四個 bit 保存，將兩者數量相加後，如此循環，直到最終無法再合併 ```graphviz digraph graphname { // label="divide" { node[shape=none,label="x"]; i1 node[shape=none,label="a = (x >> 0) & 0x5555"]; i2 node[shape=none,label="b = (x >> 1) & 0x5555"]; i3 node[shape=none,label="c = a + b"]; i4 node[shape=none,label="d = (c >> 0) & 0x3333"]; i5 node[shape=none,label="e = (c >> 2) & 0x3333"]; i6 node[shape=none,label="f = d + e"]; i7 node[shape=none,label="",style=invis]; i } list_1[label="<f0>01|<f1>10|<f2>11|<f3>00|<f4>10|<f5>11|<f6>10|<f7>10",shape=record, fixedsize=false,width=6] list_2[label="<f0>01|<f1>00|<f2>01|<f3>00|<f4>00|<f5>01|<f6>00|<f7>00",shape=record, fixedsize=false,width=6] list_3[label="<f0>00|<f1>01|<f2>01|<f3>00|<f4>01|<f5>01|<f6>01|<f7>01",shape=record, fixedsize=false,width=6] list_4[label="<f0>01|<f1>01|<f2>10|<f3>00|<f4>01|<f5>10|<f6>01|<f7>01",shape=record, fixedsize=false,width=6] list_5[label="<f0>0001|<f1>0000|<f2>0010|<f3>0001",shape=record, fixedsize=false,width=6] list_6[label="<f0>0001|<f1>0010|<f2>0001|<f3>0001",shape=record, fixedsize=false,width=6] list_7[label="<f0>0011|<f1>0010|<f2>0011|<f3>0010",shape=record, fixedsize=false,width=6] list_1:f0->list_2:f0[style=invis] list_2:f0->list_3:f0[style=invis] list_3:f0->list_4:f0[style=invis] list_4:f0->list_5:f0[style=invis] list_4:f7->list_5:f3[style=invis] list_5:f0->list_6:f0[style=invis] list_6:f0->list_7:f0[style=invis] // list_6:f3->list_7:f1[style=invis] i1->i2[style=invis] i2->i3[style=invis] i3->i4[style=invis] i4->i5[style=invis] i5->i6[style=invis] i6->i7[style=invis] } ``` 而在 Liunx 核心中將其簡化並整理成以下程式碼 ```c unsigned int __sw_hweight16(unsigned int w) { unsigned int res = w - ((w >> 1) & 0x5555); res = (res & 0x3333) + ((res >> 2) & 0x3333); res = (res + (res >> 4)) & 0x0F0F; return (res + (res >> 8)) & 0x00FF; } EXPORT_SYMBOL(__sw_hweight16); ``` :::warning TODO: 搭配 Linux 核心的 `net/mac80211` 和 `net/wireless` 來解釋 Hamming Weight 的使用 :notes: jserv ::: --- ## 測驗二 ### 說明 ```c int concatenatedBinary(int n) { const int M = 1e9 + 7; int len = 0; /* the bit length to be shifted */ /* use long here as it potentially could overflow for int */ long ans = 0; for (int i = 1; i <= n; i++) { // removing the rightmost set bit // e.g. 100100 -> 100000 // 000001 -> 000000 // 000000 -> 000000 // after removal, if it is 0, then it means it is power of 2 // as all power of 2 only contains 1 set bit // if it is power of 2, we increase the bit length if (!(DDDD)) len++; ans = (i | (EEEE)) % M; } return ans; } ``` 此程式碼用於將 1~n 的十進位數值轉為二進位，並且將個別對應的二進位做串接，求出二進位字串 mod $10^9+7$ 的值在 for 迴圈中要將 1~n 的值放入二進位字串，但因為不同數值 leading 1 的位置不同，所需 shift 的量也不同，所以這裡利用 `x & (x - 1) == 0` 進行判別，並用 `len` 紀錄 shift 所需的量 * C 語言中，`x & (x - 1) == 0` 的數學意義為 * power of two * signed v.s. unsigned 在二進位中，每當增加至 2 的冪，leading 1 就會左移一格，因此每當值為 power of two 時，所需的 shift 量就要增加 ```c for (int i = 1; i <= n; i++) { if (!(i & (i - 1) == 0)) len++; ans = (i | (ans<<len)) % M; } ``` ### 嘗試使用 `__builtin_{clz,ctz,ffs}` 改寫當 n 為 power of 2 時，MSB 到第一個 1 之前 0 的個數，加上 LSB 到第一個 1 之後 0 的個數，對剛好有 63 個 `0` ，也就是 `__builtin_clz(n) + __builtin_ctz(n) == 63` ```c int concatenatedBinary(int n) { const int M = 1e9 + 7; int len = 0; /* the bit length to be shifted */ /* use long here as it potentially could overflow for int */ long ans = 0; for (int i = 1; i <= n; i++) { if (__builtin_clz(n) + __builtin_ctz(n) == 63) len++; ans = (i | (ans<<len)) % M; } return ans; } ``` ### (待辦) 改進 mod $10^9+7$的運算由於當 len 大於一定程度後，會不停的對 `ans` 乘以 2^len^ >想法: >如果 N 是 $2^4-1$ 的話，那 1 將會被左移$4\times2^3+3\times2^2+2\times2^1$個bit :::warning 參照 [Barrett reduction](https://en.wikipedia.org/wiki/Barrett_reduction) :notes: jserv ::: --- ## 測驗三 ### 說明 > continuation bytes: > Continuation bytes have two most significant bits set to 10. > 在 UTF-8 中，字元可由 1, 2, 3, 4 個位元組構成，由 leading byte 與 continuation bytes 構成 > 其中 110x.xxxx， 1110.xxxx， 1111.0xxx 就是 leading byte > 而 10xx.xxxx 全是 continuation bytes >UTF-8: >有由單個 byte 組成的字元集(與 ASCII 一致) >以及2,3,4 byte 組成的字元集 >用於涵蓋世界上的多種字元 > size_t: > 在32位架構中定義為：typedef unsigned int size_t； > 在64位架構中被定義為：typedef unsigned long size_t； | ASCII | 0xxx.xxxx | | ------- |:--------------------------------------- | | 2 bytes | 110x.xxxx 10xx.xxxx | | 3 bytes | 1110.xxxx 10xx.xxxx 10xx.xxxx | | 4 bytes | 1111.0xxx 10xx.xxxx 10xx.xxxx 10xx.xxxx | 以下的程式是用於計算 continuation bytes 的量 ```c #include <stddef.h> #include <stdint.h> size_t count_utf8(const char *buf, size_t len) { const int8_t *p = (const int8_t *) buf; size_t counter = 0; for (size_t i = 0; i < len; i++) { /* -65 is 0b10111111, anything larger in two-complement's should start * new code point. */ if (p[i] > -65) counter++; } return counter; } ``` > 這裡的 `count` 所代表的是，不包含 continuation bytes 的字元數量這裡是藉由有號數的架構規則，使得所有小於 -65 的數，都不是以 `10` 作為開頭，便以此作為判斷將以上功能程式使用 [SWAR](https://en.wikipedia.org/wiki/SWAR) 實作，增加系統效率，程式碼如下: ```c=1 size_t swar_count_utf8(const char *buf, size_t len) { const uint64_t *qword = (const uint64_t *) buf; const uint64_t *end = qword + len >> 3; size_t count = 0; for (; qword != end; qword++) { const uint64_t t0 = *qword; const uint64_t t1 = ~t0; const uint64_t t2 = t1 & 0x04040404040404040llu; const uint64_t t3 = t2 + t2; const uint64_t t4 = t0 & t3; count += __builtin_popcountll(t4); } count = (1 << AAAA) * (len / 8) - count; count += (len & BBBB) ? count_utf8((const char *) end, len & CCCC) : 0; return count; } ``` > 在迴圈中 `count` 所代表的是，continuation bytes 的字元數量，要計算 continuation bytes 的數量的原因是，要將其數量從總 byte 中減去，以求得一字串的文字數量 > 而在 16 行中左側的 `count` ，是總字串長度減去 continuation bytes 的字元數量由於系統是 64 位元，可以單次處理 64位元，所以單次就一次檢測 64位元，將效率最大化在迴圈中利用 bitmask 技巧， `~bit#6 and bit#7` ，使最高兩位為 `10` 的 byte 才會得到 `0b10000000` 的結果，其餘都為 `0b0`，接著再使用 `popcount()` 獲取 64 bits 中 non-zero bit 的數量，便可求得 continuation bytes 的數量 ```c const uint64_t *qword = (const uint64_t *) buf; const uint64_t *end = qword + len >> 3; ``` 由一開始宣告的資料型態和內容得知， `qword` 一次存取 8 bytes ，第二行求出總字串中共有幾個 8 bytes (`len >> 3` 便是減去無法被 `0b1000 = 8` 整除的 bytes)，將可以完整放入 64 bits 中的 bytes 用迴圈中的 bitmask 方法求 continuation bytes 數量，至於無法被整除的 bytes 最後在統一處理並用 `count_utf8` 計算 continuation bytes 數量雖然程式中並未寫出 `len` 所代表的涵義，但在 `count_utf8` 函式中的 `for (size_t i = 0; i < len; i++)` 可得知 `len` 所代表的是總字串長度，也就是傳入的字串串列中，總共有幾個 bytes ```c count = (1 << 3) * (len / 8) - count; count += (len & 7) ? count_utf8((const char *) end, len & 7) : 0; return count; ``` 最後需要確認未能被 8 bytes 整除的 bytes 中有多少的 continuation bytes `count = (1 << 3) * (len / 8) - count;` 的作用是計算出，可以被整除的部分中的字元數，其中 `(1 << 3) * (len / 8)` 的作用是將最右側的 3 個 bits 設為 0 ，以求得總字串長度中可以被 8 byte 整除的 bytes 數量，再將其減去 continuation bytes 數量第二行則是將無法被整除的部分，使用 `count_utf8` 求出去除 continuation bytes 的字元數，最後將其相加以得到最終結果 ### 在 Linux 核心原始程式碼找出 UTF-8 和 Unicode 相關字串處理的程式碼，探討其原理在 [`unicode.c`](https://github.com/ilbers/linux/blob/747ae0a96f1a78b35c5a3d93ad37a16655e16340/fs/udf/unicode.c) 中，`udf_uni2char_utf8` 是將 [Unicode](https://en.wikipedia.org/wiki/Unicode) 轉換為 [UTF-8](https://en.wikipedia.org/wiki/UTF-8) 的函式，由於 [Unicode](https://en.wikipedia.org/wiki/Unicode) 使用固定的多位元組來保存單個字元，在保存可以用單一位元組表示的字元時，會有空間浪費，所以 [UTF-8](https://en.wikipedia.org/wiki/UTF-8) 用於解決這個問題   根據 Unicode 的編碼範圍，`0~07F` 的編碼需要至少 7 bits 保存，`080~07FF` 則只少需要 11 bits 保存，使用以上規則，依序對應到 UTF8 的 1~4 bytes 構成 ![](https://i.imgur.com/BlPZL6P.png) 得知以上概念便可實作 ```c static int udf_uni2char_utf8(wchar_t uni, unsigned char *out, int boundlen) { int u_len = 0; if (uni >> (boundlen * 4) != 0) return -ENAMETOOLONG; if (boundlen <= 0) return -ENAMETOOLONG; if (uni < 0x80) { out[u_len++] = (unsigned char)uni; } else if (uni < 0x800) { if (boundlen < 2) return -ENAMETOOLONG; out[u_len++] = (unsigned char)(0xc0 | (uni >> 6)); out[u_len++] = (unsigned char)(0x80 | (uni & 0x3f)); } else { if (boundlen < 3) return -ENAMETOOLONG; out[u_len++] = (unsigned char)(0xe0 | (uni >> 12)); out[u_len++] = (unsigned char)(0x80 | ((uni >> 6) & 0x3f)); out[u_len++] = (unsigned char)(0x80 | (uni & 0x3f)); } return u_len; } ``` `boundlen` 代表可接受的 byte 上限，若此字元轉成 UTF8 後所需的位元組超過 `boundlen` 則會報錯程式先判斷 Unicode 編碼範圍，再依照不同的編碼範圍轉換至 UTF8，並將結果存入 `*out` 中 ```c else { if (boundlen < 3) return -ENAMETOOLONG; out[u_len++] = (unsigned char)(0xe0 | (uni >> 12)); out[u_len++] = (unsigned char)(0x80 | ((uni >> 6) & 0x3f)); out[u_len++] = (unsigned char)(0x80 | (uni & 0x3f)); } ``` :::warning 1. 調整程式碼縮排，使用 `lab0-c` 指定的風格。 2. 指出上述程式碼可改進之處，避免「舉燭」。 :notes: jserv ::: :::info 【原文】郢人有遺相國書者，夜書，火不明，因謂持燭者曰：“舉燭。”雲而過書舉燭。舉燭，非書意也，燕相受書而說之，曰：“舉燭者，尚明也，尚明也者，舉賢而任之。”燕相白王，王大說，國以治。治則治矣，非書意也。今世舉學者多似此類。《韓非子·外儲說左上》這則寓言尖鋭地諷刺了一些人隨意穿鑿附會的治學態度。它說明，做學問不能斷章取義，胡亂解釋前人的片言隻語，從中尋求什麼微言大義。同時也告誡人們，對待人事，要具體問題具體分析，不能主觀意斷，以免造成不良的後果。 ::: 其中先將前 4 個 bit 利用 `0xe0 | (uni >> 12)` 放入第一個 byte 的後4位，接著使用 bitmask 將格式完善成 `1110xxxx`，便可完成第一個 byte，接下來是第二個 byte ，使用 `((uni >> 6) & 0x3f))` 去除後 6 個位元，再使用 bitmask 保留6個右側位元，完善成 `10xxxxxx`，最後將後 6 個位元放入 `10xxxxxx` 即可，其餘範圍的 unicode 也依循相同邏輯處理 ### 改進上述程式碼依舊有改進的空間，由於有過多的分支，並且無法支援最新的 unicode ，也就是 21 位元的版本  已知最終 bytes 數會由輸入的最高位的 1-bit 決定，`7 bit` `11 bit` `16 bit` 分別對應輸出 `1 bytes` `2 bytes` `3 bytes` 換而言之，輸出若是 `1 bytes` `2 bytes` `3 bytes`，其對應的 `uni` 右移 8, 12, 17，值必定為 0，所以只要找出映射規則，就可以事先判斷好是否會超過 `boundlen` 使用 $f(x)=ax^2+bx+c$ 帶入，求得函數 $f(x)=\frac{x^2}{2}+\frac{5x}{2}+5$，將其寫入判斷式便可實現但以上運算有次方與除法運算，可使用查表取代運算，以節省計算時間  ```c static const int fuction_table_4[4] = {0, 8, 12, 17}; static int udf_uni2char_utf8(wchar_t uni, unsigned char *out, int boundlen) { int u_len = 0; if (uni >> fuction_table_4[4]) return -ENAMETOOLONG; if (uni < 0x80) { out[u_len++] = (unsigned char)uni; } else if (uni < 0x800) { out[u_len++] = (unsigned char)(0xc0 | (uni >> 6)); out[u_len++] = (unsigned char)(0x80 | (uni & 0x3f)); } else { out[u_len++] = (unsigned char)(0xe0 | (uni >> 12)); out[u_len++] = (unsigned char)(0x80 | ((uni >> 6) & 0x3f)); out[u_len++] = (unsigned char)(0x80 | (uni & 0x3f)); } return u_len; } ``` --- ## 測驗四以下程式碼可判定 16 位元無號整數是否符合特定樣式 (pattern): ```c #include <stdint.h> #include <stdbool.h> bool is_pattern(uint16_t x) { if (!x) return 0; for (; x > 0; x <<= 1) { if (!(x & 0x8000)) return false; } return true; } ``` 觀察以上代碼可得知，MSB是否為 1 ，並且 MSB 與最低位的 1 之間是否有 0 是判別的關鍵由於是無號數，若想脫離迴圈就必須 `x = 0b0` ，但若在 MSB 為 0 ，或是 MSB 到最低位的 1 之間有 0 的話，都會 `return false` 所以符合條件的無號整數必定是以下類型 ``` 1000000000000000 1100000000000000 1110000000000000 . . . 1111111111111111 ``` 由於上方的程式碼處理較大位元的數值時，迴圈次數也會增加，所以可以用以下程式碼改寫: ```c bool is_pattern(uint16_t x) { const uint16_t n = ~x + 1; return (n ^ x) < x; } ``` 先觀察符合條件的無號整數的共通點: > `1` 必定連續 > `0` 必定由最低位往高位延伸便可求出 > `~x` 必定為 `0b0111` 的型態 > `~x + 1` 會進位 > `~x + 1` 必定只會有一個 `1` 換句話說，進行 `~x + 1` 的操作後，就會取得 `x` 最右側的 `1` 這時再進行 `(~x + 1) ^ x` 便等於減去自己最右側的 1 ，這項操作若若用在 `~x + 1` 不只有一個 `1` 的值上時，`XOR` 和必定會使值變大，因為 `~x + 1` 的進位會停止在最靠近LSB的 `1` 上 :::info 為了稱呼方便，往後把【最靠近LSB的 `1`】稱為 **key bit** ::: ``` x = 0b00101110 ^ 這裡 ~x + 1 = 0b11010010 ``` 由於進位無法改變 key bit 後的值，這樣 key bit 後的值，或多或少都會因為 `XOR` 由 `0` 變 `1` ，使得值較原本的值要大，以便我們使用 `(n ^ x) < x` 來判斷回傳值 :::info 雖然理解本程式碼的意義，但對於如何找出 bitmask 的方法依舊困惑，希望理解的人可以留言告知，感謝:pray: ::: ### (待辦)在 Linux 核心原始程式碼找出上述 bitmask 及產生器，探討應用範疇 [Data Structures in the Linux Kernel](https://0xax.gitbooks.io/linux-insides/content/DataStructures/linux-datastructures-3.html) ___