2018q3 Homework5 (bits)

# 2018q3 Homework5 (bits) contributed by < `butastur-rtos` > * 背景知識 * 作業要求 * 實驗環境 ## 背景知識 * [__bitrev32](https://github.com/torvalds/linux/blob/master/include/linux/bitrev.h#L26) * [clz 應用](https://hackmd.io/s/Bk-uxCYxz#) * [bit-reverse](https://hackmd.io/s/ByzoiggIb#%E4%BD%BF%E7%94%A8%E5%88%B0%E7%9A%84%E5%A0%B4%E5%90%88-Linux-Kernel) * [Reversing Bits and Bytes](http://bsmartt13.github.io/blog/2012/01/05/reversing-bits-and-bytes/) * [Bit Reversal on Uniprocessors](http://www.hpl.hp.com/techreports/93/HPL-93-89.pdf) * [Most Efficient Algorithm for Bit Reversal ( from MSB->LSB to LSB->MSB) in C](https://stackoverflow.com/questions/746171/most-efficient-algorithm-for-bit-reversal-from-msb-lsb-to-lsb-msb-in-c) * [EFLAGS Register](https://software.intel.com/sites/default/files/managed/39/c5/325462-sdm-vol-1-2abcd-3abcd.pdf#G5.29012) * [Bit Twiddling Hacks](http://graphics.stanford.edu/~seander/bithacks.html) ### [__bitrev32](https://github.com/torvalds/linux/blob/master/include/linux/bitrev.h#L26) 先從 8 bits 的反轉開始 [source](https://github.com/torvalds/linux/blob/master/include/linux/bitrev.h#L15) ```clike extern u8 const byte_rev_table[256]; static inline u8 __bitrev8(u8 byte) { return byte_rev_table[byte]; } ``` ### [EFLAGS Register](https://software.intel.com/sites/default/files/managed/39/c5/325462-sdm-vol-1-2abcd-3abcd.pdf#G5.29012) * OF (bit 11): Overflow flag [source](https://software.intel.com/sites/default/files/managed/39/c5/325462-sdm-vol-1-2abcd-3abcd.pdf#G5.4188) :::info Overflow flag 屬於 Status Flags，在 [Intel® 64 and IA-32 Architectures Software Developer’s Manual](https://software.intel.com/sites/default/files/managed/39/c5/325462-sdm-vol-1-2abcd-3abcd.pdf#G5.4188) 裡的定義如下： * Set if the integer result is too large a positive number or too small a negative number (excluding the sign-bit) to fit in the destination operand; cleared otherwise. This flag indicates an overflow condition for signed-integer (two’s complement) arithmetic. 也就是說當一個正整數大到無法 fit in the destination operand，或是一個負整小到無法 fit in the destination operand 的時候，==這個 flag 會設成 1，反之則為 0== ::: ## [作業要求](https://hackmd.io/s/r14wRqUjQ#) * fork 以及補完 [datalab](https://github.com/sysprog21/datalab) 欠缺的程式碼，並通過自動測試 * 確保儘可能少的指令 * 修改 bits.c * 避免用分支 (branch) * 觀察 bits.s 的 x86 (IA32) 指令 * 設計時間複雜度為 O(1) 的實作 * 選出其中 7 個位元操作的函式，詳細解釋其原理 * 舉出真實世界中的應用案例，並搭配有效的測試程式碼 * 探討讓 datalab 得以 64-bit friendly ### fork 以及補完 [datalab](https://github.com/sysprog21/datalab) 欠缺的程式碼，並通過自動測試 * 修改 bits.c :::info 以下是禁止的行為： * Use any control constructs such as if, do, while, for, switch, etc. * Define or use any macros. * Define any additional functions in this file. * Call any functions. * Use any other operations, such as &&, ||, -, or ?: * Use any form of casting. * Use any data type other than int. This implies that you cannot use arrays, structs, or unions. ::: 先執行 `make check` 看看，發生以下的 error ```shell $ make check gcc -O0 -g -Wall -Werror -m32 -lm -o btest bits.c btest.c decl.c tests.c In file included from btest.c:17:0: /usr/include/stdio.h:27:10: fatal error: bits/libc-header-start.h: 沒有此一檔案或目錄 #include <bits/libc-header-start.h> ^~~~~~~~~~~~~~~~~~~~~~~~~~ compilation terminated. In file included from decl.c:1:0: /usr/include/stdio.h:27:10: fatal error: bits/libc-header-start.h: 沒有此一檔案或目錄 #include <bits/libc-header-start.h> ^~~~~~~~~~~~~~~~~~~~~~~~~~ compilation terminated. In file included from /usr/lib/gcc/x86_64-linux-gnu/7/include-fixed/limits.h:194:0, from /usr/lib/gcc/x86_64-linux-gnu/7/include-fixed/syslimits.h:7, from /usr/lib/gcc/x86_64-linux-gnu/7/include-fixed/limits.h:34, from tests.c:3: /usr/include/limits.h:26:10: fatal error: bits/libc-header-start.h: 沒有此一檔案或目錄 #include <bits/libc-header-start.h> ^~~~~~~~~~~~~~~~~~~~~~~~~~ compilation terminated. Makefile:13: recipe for target 'btest' failed make: *** [btest] Error 1 ``` ```shell sudo apt-get install gcc-multilib g++-multilib ``` 安裝以上的 package 後就可以解決了。再 `make check` 一次 ```shell ...Gives 42[0x2a]. Should be 0[0x0] Total points: 0/228 ``` ### 修改 bits.c * 注意 Max ops * `&` 可以用 `~` `|` 來取代 * bitOr * oddBits * negate * tmin * tmax * bang ```shell 0101 & 0001 ------ 0001 如果用 ~ | 來取代的話，先用 ~ ~(0101): 1010 ~(0001): 1110 再用 | 1010 | 1110 ------ 1110 ``` 最後用 ~ ~(1110): 0001 ```clike int bitAnd(int x, int y) { return ~(~x | ~ y); } ``` ``` ...Gives 42[0x2a]. Should be 0[0x0] Total points: 1/228 ``` :::warning :question: > 這一題其實是 try and error 所 try 出來的，所以，是不是有什麼典故導致 `&` 可以用 `~` `|` 來取代 ? > [name=butastur-rtos][time=Fri, Oct 26, 2018 4:18 AM] ::: :::info > 這個在電機系邏輯設計（不知道資工系哪堂課有教）有教到， > 叫做 [De Morgan's laws](https://zh.wikipedia.org/wiki/%E5%BE%B7%E6%91%A9%E6%A0%B9%E5%AE%9A%E5%BE%8B) > [name=chenishi][time=Thur, Nov 1, 2018 3:15 PM] ::: #### bitOr * `|` 可以用 `~` `&` 來取代如果是 `|` ```shell 0101 | 0001 ------ 0101 ``` 如果用 `~` `&` 來取代，先用 ~ ~(0101): 1010 ~(0001): 1110 再用 & ```shell 1010 & 1110 ------ 1010 ``` 最後再用 `~` ~(1010): 0101 code 會是這樣 ```clike int bitOr(int x, int y) { return ~(~x & ~y); } ``` ```shell ...Gives 42[0x2a]. Should be 0[0x0] Total points: 2/228 ``` Merge for C99 / C11 6.5.7/4, 6.5.7/5 ```shell git pull upstream master remote: Enumerating objects: 4, done. remote: Counting objects: 100% (4/4), done. remote: Compressing objects: 100% (2/2), done. remote: Total 4 (delta 2), reused 4 (delta 2), pack-reused 0 Unpacking objects: 100% (4/4), done. From https://github.com/sysprog21/datalab * branch master -> FETCH_HEAD * [new branch] master -> upstream/master Merge made by the 'recursive' strategy. tests.c | 18 +++++++++--------- 1 file changed, 9 insertions(+), 9 deletions(-) ``` #### oddBits * 把 bits 1, 3, 5, 7, ..., 31 設為 1 bits 1, 3 設為 1: 0xA bits 1, 3, 5, 7, ..., 31 設為 1: 0xAAAAAAAA ```clike int oddBits(void) { return 0xAAAAAAAA; } ``` ```shell ...Gives 42[0x2a]. Should be 0[0x0] Total points: 4/228 ``` #### [negate](https://github.com/butastur-rtos/datalab/blob/master/bits.c#L970) * return -x ```clike int negate(int x) { return ~x + 1; } ``` ```shell ...Gives 42[0x2a]. Should be 0[0x0] Total points: 6/228 ``` #### tmin ```shell 4 bits tmin: -8 tmin: 1000 tmin: 0x8 1 << 3 8 bits tmin: -128 tmin: 1000 0000 tmin: 0x80 1 << 7 32 bits tmin: 0x80000000 1 << 31 ``` ```clike int tmin(void) { return (1 << 31); } ``` #### tmax ```shell 4 bits tmax: 7 tmax: 0111 ~(1 << 3) 32 bits tmax: ~(1<<31) ``` ```clike int tmax(void) { return ~(1 << 31); } ``` ```shell ...Gives 42[0x2a]. Should be 0[0x0] Total points: 8/228 ``` #### bitNor * ~(x|y) using only ~ and & * Rating: 1 ```clike int bitNor(int x, int y) { return (~x & ~y); } ``` ```shell ...Gives 42[0x2a]. Should be 0[0x0] Total points: 9/228 ``` #### bitXor ```shell A ^ B = (A & ~B) | (~A & B) = ~(~(A & ~B) & ~ (~A & B)) ``` ```clike int bitXor(int x, int y) { return ~(~(x & ~y) & ~(~x & y)); } ``` ```shell ...Gives 42[0x2a]. Should be 0[0x0] Total points: 10/228 ``` #### bang [C99 規格書, 6.5.3.3/5 Unary arithmetic operators](http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1256.pdf#%5B%7B%22num%22%3A180%2C%22gen%22%3A0%7D%2C%7B%22name%22%3A%22XYZ%22%7D%2C-27%2C816%2Cnull%5D) :::info The result of the logical negation operator ! is 0 if the value of its operand compares unequal to 0, 1 if the value of its operand compares equal to 0. The result has type int. The expression !E is equivalent to (0==E). ::: 用 ~ & ^ | + << >> 來判斷一個 int 的值不等於 0 <hr> ## 目前進度 ```shell ...Gives 42[0x2a]. Should be 0[0x0] Total points: 10/228 ``` ### 避免用分支 (branch) * `?:` 也是一種 branch ### 觀察 bits.s 的 x86 (IA32) 指令 * [Intel® 64 and IA-32 architectures software developer's manual volume 2D: Instruction set reference](https://software.intel.com/sites/default/files/managed/7c/f1/334569-sdm-vol-2d.pdf) * add * neg * negate ```shell $ gcc -m32 -S -o bits.s bits.c ``` #### add #### neg * NEG – Two's Complement Negation * [page 36, Intel® 64 and IA-32 architectures software developer's manual volume 2D: Instruction set reference](https://software.intel.com/sites/default/files/managed/7c/f1/334569-sdm-vol-2d.pdf) #### negate * IA32 的 instruction * 用了幾個 cycle ```clike int negate(int x) { return ~x + 1; } ``` ```clike negate: .LFB65: .cfi_startproc pushl %ebp .cfi_def_cfa_offset 8 .cfi_offset 5, -8 movl %esp, %ebp .cfi_def_cfa_register 5 call __x86.get_pc_thunk.ax addl $_GLOBAL_OFFSET_TABLE_, %eax movl 8(%ebp), %eax negl %eax popl %ebp .cfi_restore 5 .cfi_def_cfa 4, 4 ret .cfi_endproc ``` ## 實驗環境 CPU: Intel(R) Core(TM) i3-7130U CPU @ 2.70GHz Memory: 4 GB ```shell $ free -m total used free shared buff/cache available Mem: 3822 1366 1601 112 853 2110 置換： 2047 0 2047 $ cat /proc/cpuinfo processor : 0 vendor_id : GenuineIntel cpu family : 6 model : 142 model name : Intel(R) Core(TM) i3-7130U CPU @ 2.70GHz stepping : 9 microcode : 0x8e cpu MHz : 500.409 cache size : 3072 KB physical id : 0 siblings : 4 core id : 0 cpu cores : 2 apicid : 0 initial apicid : 0 fpu : yes fpu_exception : yes cpuid level : 22 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq dtes64 monitor ds_cpl vmx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch cpuid_fault epb invpcid_single pti ssbd ibrs ibpb stibp tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx rdseed adx smap clflushopt intel_pt xsaveopt xsavec xgetbv1 xsaves dtherm arat pln pts hwp hwp_notify hwp_act_window hwp_epp flush_l1d bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf bogomips : 5424.00 clflush size : 64 cache_alignment : 64 address sizes : 39 bits physical, 48 bits virtual power management: processor : 1 vendor_id : GenuineIntel cpu family : 6 model : 142 model name : Intel(R) Core(TM) i3-7130U CPU @ 2.70GHz stepping : 9 microcode : 0x8e cpu MHz : 500.036 cache size : 3072 KB physical id : 0 siblings : 4 core id : 1 cpu cores : 2 apicid : 2 initial apicid : 2 fpu : yes fpu_exception : yes cpuid level : 22 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq dtes64 monitor ds_cpl vmx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch cpuid_fault epb invpcid_single pti ssbd ibrs ibpb stibp tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx rdseed adx smap clflushopt intel_pt xsaveopt xsavec xgetbv1 xsaves dtherm arat pln pts hwp hwp_notify hwp_act_window hwp_epp flush_l1d bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf bogomips : 5424.00 clflush size : 64 cache_alignment : 64 address sizes : 39 bits physical, 48 bits virtual power management: processor : 2 vendor_id : GenuineIntel cpu family : 6 model : 142 model name : Intel(R) Core(TM) i3-7130U CPU @ 2.70GHz stepping : 9 microcode : 0x8e cpu MHz : 500.009 cache size : 3072 KB physical id : 0 siblings : 4 core id : 0 cpu cores : 2 apicid : 1 initial apicid : 1 fpu : yes fpu_exception : yes cpuid level : 22 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq dtes64 monitor ds_cpl vmx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch cpuid_fault epb invpcid_single pti ssbd ibrs ibpb stibp tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx rdseed adx smap clflushopt intel_pt xsaveopt xsavec xgetbv1 xsaves dtherm arat pln pts hwp hwp_notify hwp_act_window hwp_epp flush_l1d bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf bogomips : 5424.00 clflush size : 64 cache_alignment : 64 address sizes : 39 bits physical, 48 bits virtual power management: processor : 3 vendor_id : GenuineIntel cpu family : 6 model : 142 model name : Intel(R) Core(TM) i3-7130U CPU @ 2.70GHz stepping : 9 microcode : 0x8e cpu MHz : 500.025 cache size : 3072 KB physical id : 0 siblings : 4 core id : 1 cpu cores : 2 apicid : 3 initial apicid : 3 fpu : yes fpu_exception : yes cpuid level : 22 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq dtes64 monitor ds_cpl vmx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch cpuid_fault epb invpcid_single pti ssbd ibrs ibpb stibp tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx rdseed adx smap clflushopt intel_pt xsaveopt xsavec xgetbv1 xsaves dtherm arat pln pts hwp hwp_notify hwp_act_window hwp_epp flush_l1d bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf bogomips : 5424.00 clflush size : 64 cache_alignment : 64 address sizes : 39 bits physical, 48 bits virtual power management: ``` :::danger 持續更新中... > [name=butastur-rtos][time=Tue, Oct 30, 2018 5:03 AM] :::