# 2018q3 Homework5 (bits)
contributed by < `butastur-rtos` >
* 背景知識
* 作業要求
* 實驗環境
## 背景知識
* [__bitrev32](https://github.com/torvalds/linux/blob/master/include/linux/bitrev.h#L26)
* [clz 應用](https://hackmd.io/s/Bk-uxCYxz#)
* [bit-reverse](https://hackmd.io/s/ByzoiggIb#%E4%BD%BF%E7%94%A8%E5%88%B0%E7%9A%84%E5%A0%B4%E5%90%88-Linux-Kernel)
* [Reversing Bits and Bytes](http://bsmartt13.github.io/blog/2012/01/05/reversing-bits-and-bytes/)
* [Bit Reversal on Uniprocessors](http://www.hpl.hp.com/techreports/93/HPL-93-89.pdf)
* [Most Efficient Algorithm for Bit Reversal ( from MSB->LSB to LSB->MSB) in C](https://stackoverflow.com/questions/746171/most-efficient-algorithm-for-bit-reversal-from-msb-lsb-to-lsb-msb-in-c)
* [EFLAGS Register](https://software.intel.com/sites/default/files/managed/39/c5/325462-sdm-vol-1-2abcd-3abcd.pdf#G5.29012)
* [Bit Twiddling Hacks](http://graphics.stanford.edu/~seander/bithacks.html)
### [__bitrev32](https://github.com/torvalds/linux/blob/master/include/linux/bitrev.h#L26)
先從 8 bits 的反轉開始
[source](https://github.com/torvalds/linux/blob/master/include/linux/bitrev.h#L15)
```clike
extern u8 const byte_rev_table[256];
static inline u8 __bitrev8(u8 byte)
{
return byte_rev_table[byte];
}
```
### [EFLAGS Register](https://software.intel.com/sites/default/files/managed/39/c5/325462-sdm-vol-1-2abcd-3abcd.pdf#G5.29012)
* OF (bit 11): Overflow flag
[source](https://software.intel.com/sites/default/files/managed/39/c5/325462-sdm-vol-1-2abcd-3abcd.pdf#G5.4188)
:::info
Overflow flag 屬於 Status Flags,在 [Intel® 64 and IA-32 Architectures Software Developer’s Manual](https://software.intel.com/sites/default/files/managed/39/c5/325462-sdm-vol-1-2abcd-3abcd.pdf#G5.4188) 裡的定義如下:
* Set if the integer result is too large a positive number or too small a negative number (excluding the sign-bit) to fit in the destination operand; cleared otherwise. This flag indicates an overflow condition for signed-integer (two’s complement) arithmetic.
也就是說當一個正整數大到無法 fit in the destination operand,或是一個負整小到無法 fit in the destination operand 的時候,==這個 flag 會設成 1,反之則為 0==
:::
## [作業要求](https://hackmd.io/s/r14wRqUjQ#)
* fork 以及補完 [datalab](https://github.com/sysprog21/datalab) 欠缺的程式碼,並通過自動測試
* 確保儘可能少的指令
* 修改 bits.c
* 避免用分支 (branch)
* 觀察 bits.s 的 x86 (IA32) 指令
* 設計時間複雜度為 O(1) 的實作
* 選出其中 7 個位元操作的函式,詳細解釋其原理
* 舉出真實世界中的應用案例,並搭配有效的測試程式碼
* 探討讓 datalab 得以 64-bit friendly
### fork 以及補完 [datalab](https://github.com/sysprog21/datalab) 欠缺的程式碼,並通過自動測試
* 修改 bits.c
:::info
以下是禁止的行為:
* Use any control constructs such as if, do, while, for, switch, etc.
* Define or use any macros.
* Define any additional functions in this file.
* Call any functions.
* Use any other operations, such as &&, ||, -, or ?:
* Use any form of casting.
* Use any data type other than int. This implies that you cannot use arrays, structs, or unions.
:::
先執行 `make check` 看看,發生以下的 error
```shell
$ make check
gcc -O0 -g -Wall -Werror -m32 -lm -o btest bits.c btest.c decl.c tests.c
In file included from btest.c:17:0:
/usr/include/stdio.h:27:10: fatal error: bits/libc-header-start.h: 沒有此一檔案或目錄
#include <bits/libc-header-start.h>
^~~~~~~~~~~~~~~~~~~~~~~~~~
compilation terminated.
In file included from decl.c:1:0:
/usr/include/stdio.h:27:10: fatal error: bits/libc-header-start.h: 沒有此一檔案或目錄
#include <bits/libc-header-start.h>
^~~~~~~~~~~~~~~~~~~~~~~~~~
compilation terminated.
In file included from /usr/lib/gcc/x86_64-linux-gnu/7/include-fixed/limits.h:194:0,
from /usr/lib/gcc/x86_64-linux-gnu/7/include-fixed/syslimits.h:7,
from /usr/lib/gcc/x86_64-linux-gnu/7/include-fixed/limits.h:34,
from tests.c:3:
/usr/include/limits.h:26:10: fatal error: bits/libc-header-start.h: 沒有此一檔案或目錄
#include <bits/libc-header-start.h>
^~~~~~~~~~~~~~~~~~~~~~~~~~
compilation terminated.
Makefile:13: recipe for target 'btest' failed
make: *** [btest] Error 1
```
```shell
sudo apt-get install gcc-multilib g++-multilib
```
安裝以上的 package 後就可以解決了。
再 `make check` 一次
```shell
...Gives 42[0x2a]. Should be 0[0x0]
Total points: 0/228
```
### 修改 bits.c
* 注意 Max ops
* `&` 可以用 `~` `|` 來取代
* bitOr
* oddBits
* negate
* tmin
* tmax
* bang
```shell
0101
& 0001
------
0001
如果用 ~ | 來取代的話,先用 ~
~(0101): 1010
~(0001): 1110
再用 |
1010
| 1110
------
1110
```
最後用 ~
~(1110): 0001
```clike
int bitAnd(int x, int y)
{
return ~(~x | ~ y);
}
```
```
...Gives 42[0x2a]. Should be 0[0x0]
Total points: 1/228
```
:::warning
:question:
> 這一題其實是 try and error 所 try 出來的,所以,是不是有什麼典故導致 `&` 可以用 `~` `|` 來取代 ?
> [name=butastur-rtos][time=Fri, Oct 26, 2018 4:18 AM]
:::
:::info
> 這個在電機系邏輯設計(不知道資工系哪堂課有教)有教到,
> 叫做 [De Morgan's laws](https://zh.wikipedia.org/wiki/%E5%BE%B7%E6%91%A9%E6%A0%B9%E5%AE%9A%E5%BE%8B)
> [name=chenishi][time=Thur, Nov 1, 2018 3:15 PM]
:::
#### bitOr
* `|` 可以用 `~` `&` 來取代
如果是 `|`
```shell
0101
| 0001
------
0101
```
如果用 `~` `&` 來取代,先用 ~
~(0101): 1010
~(0001): 1110
再用 &
```shell
1010
& 1110
------
1010
```
最後再用 `~`
~(1010): 0101
code 會是這樣
```clike
int bitOr(int x, int y)
{
return ~(~x & ~y);
}
```
```shell
...Gives 42[0x2a]. Should be 0[0x0]
Total points: 2/228
```
Merge for C99 / C11 6.5.7/4, 6.5.7/5
```shell
git pull upstream master
remote: Enumerating objects: 4, done.
remote: Counting objects: 100% (4/4), done.
remote: Compressing objects: 100% (2/2), done.
remote: Total 4 (delta 2), reused 4 (delta 2), pack-reused 0
Unpacking objects: 100% (4/4), done.
From https://github.com/sysprog21/datalab
* branch master -> FETCH_HEAD
* [new branch] master -> upstream/master
Merge made by the 'recursive' strategy.
tests.c | 18 +++++++++---------
1 file changed, 9 insertions(+), 9 deletions(-)
```
#### oddBits
* 把 bits 1, 3, 5, 7, ..., 31 設為 1
bits 1, 3 設為 1: 0xA
bits 1, 3, 5, 7, ..., 31 設為 1: 0xAAAAAAAA
```clike
int oddBits(void)
{
return 0xAAAAAAAA;
}
```
```shell
...Gives 42[0x2a]. Should be 0[0x0]
Total points: 4/228
```
#### [negate](https://github.com/butastur-rtos/datalab/blob/master/bits.c#L970)
* return -x
```clike
int negate(int x)
{
return ~x + 1;
}
```
```shell
...Gives 42[0x2a]. Should be 0[0x0]
Total points: 6/228
```
#### tmin
```shell
4 bits
tmin: -8
tmin: 1000
tmin: 0x8
1 << 3
8 bits
tmin: -128
tmin: 1000 0000
tmin: 0x80
1 << 7
32 bits
tmin: 0x80000000
1 << 31
```
```clike
int tmin(void)
{
return (1 << 31);
}
```
#### tmax
```shell
4 bits
tmax: 7
tmax: 0111
~(1 << 3)
32 bits
tmax: ~(1<<31)
```
```clike
int tmax(void)
{
return ~(1 << 31);
}
```
```shell
...Gives 42[0x2a]. Should be 0[0x0]
Total points: 8/228
```
#### bitNor
* ~(x|y) using only ~ and &
* Rating: 1
```clike
int bitNor(int x, int y)
{
return (~x & ~y);
}
```
```shell
...Gives 42[0x2a]. Should be 0[0x0]
Total points: 9/228
```
#### bitXor
```shell
A ^ B = (A & ~B) | (~A & B) = ~(~(A & ~B) & ~ (~A & B))
```
```clike
int bitXor(int x, int y)
{
return ~(~(x & ~y) & ~(~x & y));
}
```
```shell
...Gives 42[0x2a]. Should be 0[0x0]
Total points: 10/228
```
#### bang
[C99 規格書, 6.5.3.3/5 Unary arithmetic operators](http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1256.pdf#%5B%7B%22num%22%3A180%2C%22gen%22%3A0%7D%2C%7B%22name%22%3A%22XYZ%22%7D%2C-27%2C816%2Cnull%5D)
:::info
The result of the logical negation operator ! is 0 if the value of its operand compares unequal to 0,
1 if the value of its operand compares equal to 0.
The result has type int. The expression !E is equivalent to (0==E).
:::
用 ~ & ^ | + << >> 來判斷一個 int 的值不等於 0
<hr>
## 目前進度
```shell
...Gives 42[0x2a]. Should be 0[0x0]
Total points: 10/228
```
### 避免用分支 (branch)
* `?:` 也是一種 branch
### 觀察 bits.s 的 x86 (IA32) 指令
* [Intel® 64 and IA-32 architectures software developer's manual volume 2D: Instruction set reference](https://software.intel.com/sites/default/files/managed/7c/f1/334569-sdm-vol-2d.pdf)
* add
* neg
* negate
```shell
$ gcc -m32 -S -o bits.s bits.c
```
#### add
#### neg
* NEG – Two's Complement Negation
* [page 36, Intel® 64 and IA-32 architectures software developer's manual volume 2D: Instruction set reference](https://software.intel.com/sites/default/files/managed/7c/f1/334569-sdm-vol-2d.pdf)
#### negate
* IA32 的 instruction
* 用了幾個 cycle
```clike
int negate(int x)
{
return ~x + 1;
}
```
```clike
negate:
.LFB65:
.cfi_startproc
pushl %ebp
.cfi_def_cfa_offset 8
.cfi_offset 5, -8
movl %esp, %ebp
.cfi_def_cfa_register 5
call __x86.get_pc_thunk.ax
addl $_GLOBAL_OFFSET_TABLE_, %eax
movl 8(%ebp), %eax
negl %eax
popl %ebp
.cfi_restore 5
.cfi_def_cfa 4, 4
ret
.cfi_endproc
```
## 實驗環境
CPU: Intel(R) Core(TM) i3-7130U CPU @ 2.70GHz
Memory: 4 GB
```shell
$ free -m
total used free shared buff/cache available
Mem: 3822 1366 1601 112 853 2110
置換: 2047 0 2047
$ cat /proc/cpuinfo
processor : 0
vendor_id : GenuineIntel
cpu family : 6
model : 142
model name : Intel(R) Core(TM) i3-7130U CPU @ 2.70GHz
stepping : 9
microcode : 0x8e
cpu MHz : 500.409
cache size : 3072 KB
physical id : 0
siblings : 4
core id : 0
cpu cores : 2
apicid : 0
initial apicid : 0
fpu : yes
fpu_exception : yes
cpuid level : 22
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq dtes64 monitor ds_cpl vmx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch cpuid_fault epb invpcid_single pti ssbd ibrs ibpb stibp tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx rdseed adx smap clflushopt intel_pt xsaveopt xsavec xgetbv1 xsaves dtherm arat pln pts hwp hwp_notify hwp_act_window hwp_epp flush_l1d
bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf
bogomips : 5424.00
clflush size : 64
cache_alignment : 64
address sizes : 39 bits physical, 48 bits virtual
power management:
processor : 1
vendor_id : GenuineIntel
cpu family : 6
model : 142
model name : Intel(R) Core(TM) i3-7130U CPU @ 2.70GHz
stepping : 9
microcode : 0x8e
cpu MHz : 500.036
cache size : 3072 KB
physical id : 0
siblings : 4
core id : 1
cpu cores : 2
apicid : 2
initial apicid : 2
fpu : yes
fpu_exception : yes
cpuid level : 22
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq dtes64 monitor ds_cpl vmx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch cpuid_fault epb invpcid_single pti ssbd ibrs ibpb stibp tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx rdseed adx smap clflushopt intel_pt xsaveopt xsavec xgetbv1 xsaves dtherm arat pln pts hwp hwp_notify hwp_act_window hwp_epp flush_l1d
bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf
bogomips : 5424.00
clflush size : 64
cache_alignment : 64
address sizes : 39 bits physical, 48 bits virtual
power management:
processor : 2
vendor_id : GenuineIntel
cpu family : 6
model : 142
model name : Intel(R) Core(TM) i3-7130U CPU @ 2.70GHz
stepping : 9
microcode : 0x8e
cpu MHz : 500.009
cache size : 3072 KB
physical id : 0
siblings : 4
core id : 0
cpu cores : 2
apicid : 1
initial apicid : 1
fpu : yes
fpu_exception : yes
cpuid level : 22
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq dtes64 monitor ds_cpl vmx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch cpuid_fault epb invpcid_single pti ssbd ibrs ibpb stibp tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx rdseed adx smap clflushopt intel_pt xsaveopt xsavec xgetbv1 xsaves dtherm arat pln pts hwp hwp_notify hwp_act_window hwp_epp flush_l1d
bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf
bogomips : 5424.00
clflush size : 64
cache_alignment : 64
address sizes : 39 bits physical, 48 bits virtual
power management:
processor : 3
vendor_id : GenuineIntel
cpu family : 6
model : 142
model name : Intel(R) Core(TM) i3-7130U CPU @ 2.70GHz
stepping : 9
microcode : 0x8e
cpu MHz : 500.025
cache size : 3072 KB
physical id : 0
siblings : 4
core id : 1
cpu cores : 2
apicid : 3
initial apicid : 3
fpu : yes
fpu_exception : yes
cpuid level : 22
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq dtes64 monitor ds_cpl vmx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch cpuid_fault epb invpcid_single pti ssbd ibrs ibpb stibp tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx rdseed adx smap clflushopt intel_pt xsaveopt xsavec xgetbv1 xsaves dtherm arat pln pts hwp hwp_notify hwp_act_window hwp_epp flush_l1d
bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf
bogomips : 5424.00
clflush size : 64
cache_alignment : 64
address sizes : 39 bits physical, 48 bits virtual
power management:
```
:::danger
持續更新中...
> [name=butastur-rtos][time=Tue, Oct 30, 2018 5:03 AM]
:::