第十組 分組作業共筆 === ###### tags: `sysprog2018` Contributed by <[`kevin110604`](https://github.com/kevin110604), [`ChingCheih`](https://github.com/ChingChieh) > ## 第三週測驗 `1` 考慮以下程式碼: ```C= #include <stdio.h> #include <stdint.h> struct test { unsigned int x : 5; unsigned int y : 5; unsigned int z; }; int main() { struct test t; printf("Offset of z in struct test is %ld\n", (uintptr_t) &t.z - (uintptr_t) &t); return 0; } ``` 在 GNU/Linux x86_64 環境中執行,得到以下輸出: ``` Offset of z in struct test is 4 ``` 倘若將第 10 和第 11 換為以下: ```C=10 printf("Address of t.x is %p", &t.x); ``` 會發生什麼事? ==作答區== * `(a)` 印出類似 `0x7ffd144e8ad4` 的輸出,也就是 `t` 結構物件的位址; * `(b)` 印出類似 `0x7ffd144e8ad4` 的輸出,和 `t` 結構物件差距 4 bytes; * `(c)` 可以印出數值,但這與編譯器實作有關,不能解讀; * `(d)` 編譯失敗,不能這樣宣告結構體的成員; * `(e)` 編譯失敗,不能將指標指向沒有對齊 1 byte 的結構體成員; 參考資料: [Portable Bit Fields in packetC](https://link.springer.com/content/pdf/10.1007/978-1-4302-4159-1_34.pdf) ### 想法 實際將 code 拿去編譯看看: ![](https://i.imgur.com/5BObvxl.png) ### 延伸問題 :::success 延伸問題: 解釋原因,並且找出 C99/C11 規格書對應的描述 ::: 規格書 §6.7.2.1 裡有對 bit-field 做明確的定義: > 8. A member of a structure or union may have any object type other than a variably modified type.^105)^ ==In addition, a member may be declared to consist of a specified number of bits (including a sign bit, if any). Such a member is called a **bit-field**;^106)^== its width is preceded by a colon. 也就是說我們可以在任何 structure 或是 union 裡面宣告 bit-field 變數,變數名字後面接 colon 和 width 。 不過細看下面的註解會發現: > 106)The unary `&` (address-of) operator **cannot** be applied to a bit-field object; thus, there are no pointers to or arrays of bit-field objects. C 語言禁止我們對 bit-field 使用 `&` (address-of)。 其實不只有 `&` (address-of) 不能使用, §6.5.3.4 有寫到連 `sizeof` 也不可以對 bit-field member 使用: > 1. The `sizeof` operator **shall not** be applied to an expression that has function type or an incomplete type, to the parenthesized name of such a type, ==or to an expression that designates a bit-field member.== 為了探究其原因,我們發現規格書 §6.2.6.1 這樣寫著: > 2. ==Except for bit-fields, objects are composed of contiguous sequences of one or more bytes,== the number, order, and encoding of which are either explicitly specified or implementation-defined. 也就是除了 bit-field 以外的 object 都具有連續的記憶體空間,而且是以 byte 為單位。 規格書 §6.7.2.1 下面尚有這段描述: > 10. An implementation may allocate any addressable storage unit large enough to hold a bit-field. If enough space remains, a bit-field that immediately follows another bit-field in a structure shall be packed into adjacent bits of the same unit. ==If insufficient space remains, whether a bit-field that does not fit is put into the next unit or overlaps adjacent units is implementation-defined.== The order of allocation of bit-fields within a unit (high-order to low-order or low-order to high-order) is implementation-defined. The alignment of the addressable storage unit is unspecified. 所以 bit-field 是否會連續,要考慮到有關 addressable storage unit 的議題。這個 storage unit 通常是 16 bits, 32 bits 或是 8 bits ,反正是 implementation-defined 。現在假設 storage unit 的大小是 16 bits ,並且有以下 strcture : ```c struct test2 { unsigned int x : 7; unsigned int y : 7; unsigned int z : 10; }; ``` 毫無疑問地, `x` 會被儲存到一個 storage unit 裡; `y` 也會被存到同一個 storage unit 裡,因為還有 9 bits 的空間可以放,而且 C 語言規範保證 `x` `y` 是緊鄰的。但當要儲存 `z` 的時候,這個 storage unit 的空間就不夠了。這時要把 `z` 的 2 個 bits 放在跟 `x` `y` 同一個 storage unit 裡,然後再把剩下的 8 bits 放在另一個 storage unit 裡呢,還是 `z` 的 10 個 bits 全都放在另一個 storage unit 呢,也是 implementation-defined 。 所以為什麼不能對 bit-field 使用 `&` (address-of) ,進而導致沒有 bit-field 的 pointer 呢?一般的變數大小單位都是以 byte 為單位,當我們想要 store/retrieve 一個 pointer 指向的內容時,必須要知道它確切的大小和確切的地址,而 bit-field 不以一般變數的儲存方式儲存,大概是這個原因吧。 --- ## 第三週測驗 `2` 考慮以下程式碼,在 little-endian 硬體上,會返回 `1`,反之,在 big-endian 硬體上會返回 `0`: ```C int main() { union { int a; char b; } c = { .a = K1 }; return c.b == K2; } ``` 補完上述程式碼。 ==作答區== K1 = ? * `(a)` 0 * `(b)` 1 * `(c)` -1 * `(d)` 254 K2 = ? * `(a)` 0 * `(b)` 1 * `(c)` 254 ### 想法 規格書 §6.7.2.1 > a structure is a type consisting of a sequence of members, whose storage is allocated in an ordered sequence, and ==a union is a type consisting of a sequence of members whose storage overlap.== > ==The size of a union is sufficient to contain the largest of its members.== The value of at most one of the members can be stored in a union object at any time. A pointer to a union object, suitably converted, points to each of its members. 因此這一題 union c 的大小是 `sizeof(int)`,且 c 的 member a 和 member b 是存在同一塊記憶體空間 規格書 §6.3.1.4 > The integer promotions preserve value including sign. As discussed earlier, whether a ==‘‘plain’’ char is treated as signed is implementation-defined.== char 是不是 signed 是 implementation-defined,在對 gcc 來說 char 是 singed 的因此能存的數字範圍是 -128 ~ 127 而 big endian 和 little endian 的差異是記憶體存數字時前者把高位的數字從低位的記憶體位置開始存,而後者是把低位的數字從低位的記憶體位置開始存,如下。 假設要存的數字是 0x12345678(4個byte),存在記憶體地址中的 0xb10 到 0xb13 ``` address : 高 低(0xb13 ~ 0xb10) big endian : 78 56 34 12 little endian: 12 34 56 78 |-----------| int 的範圍 |--| char 的範圍 ``` 因此可以觀察到如果要在 little endian 的硬體上得到回傳值是 1 只需要 int 最低 byte 數字和 char 一樣,而要在 big endian 的硬體上得到回傳值是 0 就是 int 最低 byte 的值和 char 的值不同,因此只有==K1 = 1== 和 ==K2 = 1== 符合。 ### 延伸問題 :::success 延伸問題: 解釋運作原理,並找出類似的程式碼 ::: * 類似程式碼: ```c char method1(){ int x = 1; char *y = (char*)&x; return (*y)+48; } int method2(){ unsigned int i = 1; return *(char*)&i == 1; } ``` --- ## 第三週測驗 `3` 以下程式碼計算 parity bit: ```C= #include <stdint.h> int parity(uint32_t x){ x ^= x >> 1; x ^= x >> 2; x = (x & 0x11111111U) * 0x11111111U; return (x >> P) & 1; } ``` 補完程式碼 ==作答區== P = ? * `(a)` 32 * `(b)` 31 * `(c)` 30 * `(d)` 29 * `(e)` 28 * `(f)` 27 * `(g)` 26 * `(h)` 25 * `(i)` 24 ### 想法 要計算 1 的數量在全部 32 bits 中是奇數個還是偶數個最簡單的方法就是每個 bit 做 XOR 如果結果是 1 就代表有奇數個 1。 * 第3,4行 我們先簡化問題用 8 bit 的 ABCDEFGH 來表示 x 的 8 個 bit, 字母相接代表他們做 XOR 。 ``` x: A B C D E F G H x >> 1: 0 A B C D E F G x ^= x >> 1: A AB BC CD DE EF FG GH x >> 2: 0 0 A AB BC CD DE EF x ^= x >> 2: A AB ABC ABCD BCDE CDEF DEFG EFGH ^ ^ ``` 可以觀察到最後結果的第 0 和 4 bit 包含每四個 bit XOR 的結果,也就是說這兩行把全部需要的資訊每隔四的 bit 存 * 第5,6行 把 `x = (x & 0x11111111U) * 0x11111111U` 拆開來看 ``` x = x & 0x11111111U ``` 因為所需要的資訊已經每 4 個 bit 存好了,所以就把其他 bit 的值都設為 0 確保等等做乘法可以得到我們要的結果 ``` x = x * 0x11111111U ``` 假設 x = 0000 0001 0001 0000 0001 0001 0001 0000 ``` 0000 0001 0001 0000 0001 0001 0001 0000 x 0001 0001 0001 0001 0001 0001 0001 0001 ------------------------------------------ 0000 0001 0001 0000 0001 0001 0001 0000 0001 0001 0000 0001 0001 0001 0000 0001 0000 0001 0001 0001 0000 ... + 0000 ------------------------------------------ ^ ``` 可以觀察到因為乘數 `0x11111111U` 第 0,4,8,...,28 bit 是 1 所以每 4 個 bit 做 XOR 相加的結果會存在 `31~28 bit` ,因此我們可以從第 28 bit的是 0 或 1 看出 parity 的結果是奇數還偶數。 ==因此要 right shift 28 bit 才能得到結果。== ### 延伸問題 :::success 延伸問題: 解釋原理,並且提出更快的實作機制 (提示: 透過 SIMD) ::: * 其他實作機制: 參考 [Bit Twiddling Hacks](https://graphics.stanford.edu/~seander/bithacks.html) ```c= unsigned int v; // word value to compute the parity of v ^= v >> 16; v ^= v >> 8; v ^= v >> 4; v &= 0xf; return (0x6996 >> v) & 1; ``` * 以上連結有提到可以省略2,3行 It may be optimized to work just on bytes in 5 operations by `removing the two lines immediately following "unsigned int v;"`. The method first shifts and XORs the eight nibbles of the 32-bit value together, leaving the result in the lowest nibble of v * 查看 CPU 支援哪些 SIMD 指令集 * Linux 作業系統: `cat /proc/cpuinfo` * 以我的電腦為例 >flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi ==mmx== fxsr ==sse== ==sse2== ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq dtes64 monitor ds_cpl vmx est tm2 ==ssse3== sdbg ==fma== cx16 xtpr pdcm pcid ==sse4_1== ==sse4_2== x2apic movbe popcnt tsc_deadline_timer aes xsave ==avx== f16c rdrand lahf_lm abm 3dnowprefetch cpuid_fault epb invpcid_single pti ssbd ibrs ibpb stibp tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 ==avx2== smep bmi2 erms invpcid mpx rdseed adx smap clflushopt intel_pt xsaveopt xsavec xgetbv1 xsaves dtherm ida arat pln pts hwp hwp_notify hwp_act_window hwp_epp flush_l1d * MacOS 作業系統: `sysctl machdep.cpu` * SIMD 指令查詢: [Intel Intrinsics Guide](https://software.intel.com/sites/landingpage/IntrinsicsGuide/#expand=3310) * Intel Streaming SIMD Extension * data type: 1. `float`:__m128 * `__m128 f; // = {float f3, f2, f1, f0}` 2. `double`: __m128d * `__m128d d; // = {double d1, d0}` 3. `int`: __m128i * `__m128i i; // = {int i3, i2, i1, i0}` 了解初步SIMD應用:[連結](https://stackoverflow.com/questions/11872952/simd-the-following-code) ### SIMD 實驗 假設現在有很多筆數據需要算parity bits ,我的測資共有9萬多筆數據,然後比較看看simd會不會比較快。 * 原本的 parity function: ```c int parity(uint32_t x){ x ^= x >> 1; x ^= x >> 2; x = (x & 0x11111111U) * 0x11111111U; return (x >> P) & 1; } ``` * simd 版本的 parity function: ```c __m128i parity_simd(const uint32_t num[]){ uint32_t mask[4] = {1,1,1,1}; __m128i vsum = _mm_set1_epi32(0); __m128i vmask = _mm_set1_epi32(0); vsum = _mm_load_si128((__m128i*)num); vmask = _mm_load_si128((__m128i*)mask); vsum = _mm_xor_si128(vsum,_mm_srai_epi32(vsum,16)); vsum = _mm_xor_si128(vsum,_mm_srai_epi32(vsum,8)); vsum = _mm_xor_si128(vsum,_mm_srai_epi32(vsum,4)); vsum = _mm_xor_si128(vsum,_mm_srai_epi32(vsum,2)); vsum = _mm_xor_si128(vsum,_mm_srai_epi32(vsum,1)); vsum = _mm_and_si128(vsum,vmask); return vsum; } ``` * 結果: * **without** simd: ``` $ sudo perf stat -r 100 -e cache-misses,cache-references,instructions,cycles ./n Performance counter stats for './n' (100 runs): 27,180 cache-misses # 30.782 % of all cache refs ( +- 2.03% ) 88,298 cache-references ( +- 0.76% ) 99,761,105 instructions # 2.93 insn per cycle ( +- 0.00% ) 34,018,115 cycles ( +- 0.24% ) 0.010096362 seconds time elapsed ( +- 1.10% ) ``` * **with** simd: ``` $ sudo perf stat -r 100 -e cache-misses,cache-references,instructions,cycles ./s Performance counter stats for './s' (100 runs): 27,112 cache-misses # 31.016 % of all cache refs ( +- 2.06% ) 87,415 cache-references ( +- 0.72% ) 89,559,505 instructions # 2.81 insn per cycle ( +- 0.00% ) 31,866,758 cycles ( +- 0.15% ) 0.009424136 seconds time elapsed ( +- 0.77% ) ``` 可...可惡 好像只有變快一點點點 --- ## 第三週測驗 `4` 考慮以下程式碼: ```C #include <stdlib.h> #include <string.h> typedef struct rec { char *value; struct rec *next; } record; void insert_sorted(record **r, const char *value) { record *newrec = NULL; while (*r && strcmp(value, (*r)->value) > 0) r = &((F1)->next); newrec = malloc(sizeof(record)); newrec->value = F2(value); newrec->next = F3; F4 = newrec; } ``` 函式 `insert_sorted` 的作用是在 r 所指向已依據數值排序的 linked list 中,安插給定的字串,考慮到 string literal 的生命週期,應該在新建立的節點中,複製給定的字串。 請補完上述程式碼。 ==作答區== F1 = ? * `(a)` r * `(b)` **r * `(c)` *r * `(d)` r->next * `(e)` *r->next F2 = ? * `(a)` 無 * `(b)` strdup * `(c)` strcpy * `(d)` strlen * `(e)` strcmp F3 = ? * `(a)` r * `(b)` **r * `(c)` *r * `(d)` r->next * `(e)` *r->next F4 = ? * `(a)` **r * `(b)` *r * `(c)` r * `(d)` r->next * `(e)` *r->next ### 想法 ```c typedef struct rec { char *value; struct rec *next; } record; ``` `record` 的資料結構如圖: ```graphviz digraph block_ele { node[shape=record] ele1 [label="<value> value|<next> next"] } ``` ```c= void insert_sorted(record **r, const char *value) { record *newrec = NULL; while (*r && strcmp(value, (*r)->value) > 0) r = &((F1)->next); newrec = malloc(sizeof(record)); newrec->value = F2(value); newrec->next = F3; F4 = newrec; } ``` 因為 `r` 是指向 `record` 指標的指標,所以 `*r` 才是指向 `record` 的指標。執行完第3行後,整個函式的概念大致上如下圖所示: ```graphviz digraph block_ele { node [shape=record]; rankdir=LR r [label="r", style=filled, fillcolor=yellow] head [label="head", style=filled, fillcolor=cyan] newrec [label="newrec", style=filled, fillcolor=cyan] NULL [shape=plaintext] ele0 [label="{<value> value_0|<next> }"] ele1 [label="{<value> value_1|<next> }"] period [label="......", shape=plaintext] ele2 [label="{<value> value_n-1|<next> }"] elen [label="{<value> value_n|<next> \\ }"] newrec -> NULL r -> head -> ele0:w ele0:next:c -> ele1:w [tailclip=false, arrowtail=dot, dir=both] ele1:next:c -> period:w [tailclip=false, arrowtail=dot, dir=both] period:e -> ele2:w ele2:next:c -> elen:w [tailclip=false, arrowtail=dot, dir=both] } ``` ```c=4 while (*r && strcmp(value, (*r)->value) > 0) r = &((F1)->next); ``` 迴圈的目的是在檢查傳入的字串 `value` 是否大於目前 `*r` 指向的節點裡的字串。如果大於, `*r` 就指向下一個節點;沒有大於的話就跳出迴圈。 因為 `*r` 的型態是指向 `record` 的指標,所以 F1 應該是 `*r` (這樣他指向的地方才有 `next` 這個 member )。又因為 `r` 的型態是指向 `record` 指標的指標,所以將 `(*r)->next` assign 給 `r` 之前要先取址。 所以,迴圈執行第一次的結果是(假設迴圈有執行): ```graphviz digraph block_ele { node [shape=record]; rankdir=LR r [label="r", style=filled, fillcolor=yellow] head [label="head", style=filled, fillcolor=cyan] newrec [label="newrec", style=filled, fillcolor=cyan] ele0 [label="{<value> value_0|<next> }"] ele1 [label="{<value> value_1|<next> }"] period [label="......", shape=plaintext] ele2 [label="{<value> value_n-1|<next> }"] elen [label="{<value> value_n|<next> \\ }"] NULL [shape=plaintext] newrec -> NULL head -> ele0:w r -> ele0:next ele0:next:c -> ele1:w [tailclip=false, arrowtail=dot, dir=both] ele1:next:c -> period:w [tailclip=false, arrowtail=dot, dir=both] period:e -> ele2:w ele2:next:c -> elen:w [tailclip=false, arrowtail=dot, dir=both] } ``` 迴圈執行第二次的結果是(假設迴圈有執行): ```graphviz digraph block_ele { node [shape=record]; rankdir=LR r [label="r", style=filled, fillcolor=yellow] head [label="head", style=filled, fillcolor=cyan] newrec [label="newrec", style=filled, fillcolor=cyan] ele0 [label="{<value> value_0|<next> }"] ele1 [label="{<value> value_1|<next> }"] period [label="......", shape=plaintext] ele2 [label="{<value> value_n-1|<next> }"] elen [label="{<value> value_n|<next> \\ }"] NULL [shape=plaintext] newrec -> NULL head -> ele0:w r -> ele1:next ele0:next:c -> ele1:w [tailclip=false, arrowtail=dot, dir=both] ele1:next:c -> period:w [tailclip=false, arrowtail=dot, dir=both] period:e -> ele2:w ele2:next:c -> elen:w [tailclip=false, arrowtail=dot, dir=both] } ``` 以此類推,直到跳出迴圈。 ```c=6 newrec = malloc(sizeof(record)); ``` 這裡分配空間給將要加入的新節點。 ```graphviz digraph block_ele { node [shape=record]; rankdir=LR newrec [label="newrec", style=filled, fillcolor=cyan] new_ele [label="{<value> value_new|<next> \\ }", style=filled, fillcolor=orange] newrec -> new_ele } ``` ```c=7 newrec->value = F2(value); ``` 複製給定的字串。F2 要使用 `strdup` 的原因是他有分配空間的效果。如果想要使用 `strcpy` 的話要先 `malloc` 才行。 ```c=8 newrec->next = F3; F4 = newrec; ``` 這兩步很明顯是在把新節點跟 list 接起來。 在離開迴圈的時候, `r` 指向的那個指標,是指向那一個「不大於」傳入的字串的節點的指標(也就是圖中的第 i-1 個節點的 `next` )。 ```graphviz digraph block_ele { node [shape=record]; rankdir=LR r [label="r", style=filled, fillcolor=yellow] head [label="head", style=filled, fillcolor=cyan] newrec [label="newrec", style=filled, fillcolor=cyan] ele0 [label="{<value> value_0|<next> }"] period [label="......", shape=plaintext] period2 [label="......", shape=plaintext] elei_1 [label="{<value> value_i-1|<next> }"] elei [label="{<value> value_i|<next> }"] elen [label="{<value> value_n|<next> \\ }"] new_ele [label="{<value> value_new|<next> \\ }", style=filled, fillcolor=orange] newrec -> new_ele r -> elei_1:next head -> ele0:w ele0:next:c -> period [tailclip=false, arrowtail=dot, dir=both] period:e -> elei_1:w elei_1:next:c -> elei:w [tailclip=false, arrowtail=dot, dir=both] elei:next:c -> period2 [tailclip=false, arrowtail=dot, dir=both] period2 -> elen:w } ``` 所以可以推測正確的程式碼應該是 `newrec->next = *r;` ```graphviz digraph block_ele { node [shape=record]; rankdir=LR r [label="r", style=filled, fillcolor=yellow] head [label="head", style=filled, fillcolor=cyan] newrec [label="newrec", style=filled, fillcolor=cyan] ele0 [label="{<value> value_0|<next> }"] period [label="......", shape=plaintext] period2 [label="......", shape=plaintext] elei_1 [label="{<value> value_i-1|<next> }"] elei [label="{<value> value_i|<next> }"] elen [label="{<value> value_n|<next> \\ }"] new_ele [label="{<value> value_new|<next> }", style=filled, fillcolor=orange] newrec -> new_ele r -> elei_1:next head -> ele0:w ele0:next:c -> period [tailclip=false, arrowtail=dot, dir=both] period:e -> elei_1:w elei_1:next:c -> elei:w [tailclip=false, arrowtail=dot, dir=both] elei:next:c -> period2 [tailclip=false, arrowtail=dot, dir=both] new_ele:next:c -> elei:w [tailclip=false, arrowtail=dot, dir=both, color=red] period2 -> elen:w } ``` 以及 `*r = newrec;` ```graphviz digraph block_ele { node [shape=record]; rankdir=LR r [label="r", style=filled, fillcolor=yellow] head [label="head", style=filled, fillcolor=cyan] newrec [label="newrec", style=filled, fillcolor=cyan] ele0 [label="{<value> value_0|<next> }"] period [label="......", shape=plaintext] period2 [label="......", shape=plaintext] elei_1 [label="{<value> value_i-1|<next> }"] elei [label="{<value> value_i|<next> }"] elen [label="{<value> value_n|<next> \\ }"] new_ele [label="{<value> value_new|<next> }", style=filled, fillcolor=orange] newrec -> new_ele r -> elei_1:next head -> ele0:w ele0:next:c -> period [tailclip=false, arrowtail=dot, dir=both] period:e -> elei_1:w elei:next:c -> period2 [tailclip=false, arrowtail=dot, dir=both] new_ele:next:c -> elei:w [tailclip=false, arrowtail=dot, dir=both, color=red] elei_1:next:c -> new_ele:w [tailclip=false, arrowtail=dot, dir=both, color=red] period2 -> elen:w } ``` ### 延伸問題 :::success 延伸問題: 解釋運作原理,並新增刪除和搜尋節點的程式,附上完整的測試計畫 ::: * 刪除節點 ```c void delete_sorted(record **node, const char *value) { record *curr, *prev; for (curr = *node, prev = NULL; curr != NULL && strcmp(value, curr->value); prev = curr, curr = curr->next) ; if (curr == NULL) return; if (prev == NULL) *node = (*node)->next; else prev->next = curr->next; free(curr); } ``` * 搜尋節點 ```c record *search_sorted(record *node, const char *value) { while (node) { if (!strcmp(value, node->value)) { return node; } node = node->next; } return NULL; } ``` * 參考 [sysprog21/linux-list](https://github.com/sysprog21/linux-list) 的方式實作的[完整的測試計畫](https://github.com/ChingChieh/listTest)