2018q3 Homework5 (bits)

# 2018q3 Homework5 (bits) Contributed by < `0xff07` > # 環境 ## 核心 ``` Linux 4.18.0-10-generic ``` ## 編譯器 ``` gcc (Ubuntu 8.2.0-7ubuntu1) 8.2.0 ``` ## Distribution ``` $ cat /etc/lsb-release DISTRIB_ID=Ubuntu DISTRIB_RELEASE=18.10 DISTRIB_CODENAME=cosmic DISTRIB_DESCRIPTION="Ubuntu 18.10" ``` ## `lscpu` ``` Architecture: x86_64 CPU op-mode(s): 32-bit, 64-bit Byte Order: Little Endian CPU(s): 8 On-line CPU(s) list: 0-7 Thread(s) per core: 2 Core(s) per socket: 4 Socket(s): 1 NUMA node(s): 1 Vendor ID: GenuineIntel CPU family: 6 Model: 70 Model name: Intel(R) Core(TM) i7-4870HQ CPU @ 2.50GHz Stepping: 1 CPU MHz: 873.104 CPU max MHz: 3700.0000 CPU min MHz: 800.0000 BogoMIPS: 4988.76 Virtualization: VT-x L1d cache: 32K L1i cache: 32K L2 cache: 256K L3 cache: 6144K L4 cache: 131072K NUMA node0 CPU(s): 0-7 ``` 不確定是否是使用 18.10 的關係，使用作業說明給定的安裝方法安裝 32 位元開發套件時，會出現以下問題： ``` $ sudo apt-get install libc6-dev:i386 gcc:i386 Reading package lists... Done Building dependency tree Reading state information... Done Some packages could not be installed. This may mean that you have requested an impossible situation or if you are using the unstable distribution that some required packages have not yet been created or been moved out of Incoming. The following information may help to resolve the situation: The following packages have unmet dependencies: gcc:i386 : Depends: cpp:i386 (= 4:8.2.0-1ubuntu1) but it is not going to be installed Depends: gcc-8:i386 (>= 8.2.0-4~) but it is not going to be installed E: Unable to correct problems, you have held broken packages. ``` 但使用： ```bash $ sudo apt install libc6-dev-i386 ``` 後，就可以順利使用 GCC 的 `-m32` 選項編譯了。 # C 語言相關規格 ## Bitwise shift operators (6.5.7) 編譯時會出現 cppcheck 抱怨把有號數位移 31 位元是未定義行為。如果用就算是把 `<< 31` 改寫成 `( << 31) << 1`，cppcheck 也會抱怨。查了一下規格書發現以下內容： >**6.5.7 Bitwise shift operators** > > 3. The integer promotions are performed on each of the operands. The type of the result is that of the promoted left operand. If the value of the right operand is negative or is greater than or equal to the width of the promoted left operand, the behavior is undefined. > > 4. The result of E1 << E2 is E1 left-shifted E2 bit positions; vacated bits are filled with zeros.If E1 has an unsigned type, the value of the result is E1 × 2 ^E2^ , reduced modulo one more than the maximum value representable in the result type. If E1 has a signed type and nonnegative value, and E1 * 2^E2^ is representable in the result type, then that is the resulting value; otherwise, the behavior is undefined. 這邊僅有說明另外，CMU 的[ Software Engineering Institute 當中](https://wiki.sei.cmu.edu/confluence/display/c/INT13-C.+Use+bitwise+operators+only+on+unsigned+operands)有提到相關問題。另外對於右移，規格書也有定義： > 5. The result of E1 >> E2 is E1 right-shifted E2 bit positions. If E1 has an unsigned type or if E1 has a signed type and a nonnegative value, the value of the result is the integral part of the quotient of E1 divided by the quantity, 2 raised to the power E2. If E1 has a signed type and a negative value, the resulting value is implementation-defined. 看起來好一點，至少有 implementation defined。去查看看 gcc 的[標準](https://www.gnu.org/software/gnu-c-manual/gnu-c-manual.html#index-bit-shifting)： > Similarly, you use the right-shift operator >> to shift its first operand’s bits to the right. Bits shifted off the right side are discarded; new bits added on the left side are usually 0, but if the first operand is a signed negative value, then the added bits will be either 0 *or* whatever value was previously in the leftmost bit position. 接著查到[這個討論](http://fcamel-life.blogspot.com/2011/10/arithmetic-right-shift.html)。裡面提到 gcc 使用 arithmetic shift。但裡面引用的[手冊內容](https://gcc.gnu.org/onlinedocs/gcc-4.3.3/gcc/Integers-implementation.html#Integers-implementation)是 gcc-4.3.3 的(目前我的電腦上是使用 8.2.0)。不過幸好兩者規定看起來是[一樣](https://gcc.gnu.org/onlinedocs/gcc-8.2.0/gcc/Integers-implementation.html#Integers-implementation)的。正在思考能否使用 `1U << 31` ，即使用無號數的左移來避免位定義行為的窘境。但正在查閱會不會牽扯到 [promotion](https://wiki.sei.cmu.edu/confluence/display/c/INT34-C.+Do+not+shift+an+expression+by+a+negative+number+of+bits+or+by+greater+than+or+equal+to+the+number+of+bits+that+exist+in+the+operand) 的問題。 ## Logical Negation (6.5.3.3) > **6.5.3.3 Unary arithmetic operators** > 5. The result of the logical negation operator ! is 0 if the value of its operand compares unequal to 0, 1 if the value of its operand compares equal to 0. The result has type int. The expression !E is equivalent to (0==E). > 這是一個好消息：假定 `x` 不是 0，`!x` 會變成 0，`!!x` 會變成 1。 # 規定 ## 整數 ``` Each "Expr" is an expression using ONLY the following: 1. Integer constants 0 through 255 (0xFF), inclusive. You are not allowed to use big constants such as 0xffffffff. 2. Function arguments and local variables (no global variables). 3. Unary integer operations ! ~ 4. Binary integer operations & ^ | + << >> Some of the problems restrict the set of allowed operators even further. Each "Expr" may consist of multiple operators. You are not restricted to one operator per line. You are expressly forbidden to: 1. Use any control constructs such as if, do, while, for, switch, etc. 2. Define or use any macros. 3. Define any additional functions in this file. 4. Call any functions. 5. Use any other operations, such as &&, ||, -, or ?: 6. Use any form of casting. 7. Use any data type other than int. This implies that you cannot use arrays, structs, or unions. You may assume that your machine: 1. Uses 2s complement, 32-bit representations of integers. 2. Performs right shifts arithmetically. 3. Has unpredictable behavior when shifting if the shift amount is less than 0 or greater than 31. ``` ## 浮點數相對較鬆，暫時沒附。 # 策略 ## De Morgan's Law 有一些題目需要使用一個邏輯運算做出另外一個邏輯運算。這時考慮使用 De Morgan's Law 調整 `AND`, `OR` 的使用。如： ``` /* * bitAnd - x&y using only ~ and | * Example: bitAnd(6, 5) = 4 * Legal ops: ~ | * Max ops: 8 * Rating: 1 */ ``` 可以使用： $$ \begin{align} A \land B &= (\neg \neg A \land \neg \neg B)\newline &= \neg (\neg A \lor \neg B) \end{align} $$ 因此： ```c int bitAnd(int x, int y) { /* De Morgan's Law */ return ~(~x | ~y); } ``` 類似的想法用來實作 `bitNor`, `botOr`, `bitOr`, `bitXor`： ```c= int bitNor(int x, int y) { /* De Morgan's Law */ return (~x & ~y); } int bitOr(int x, int y) { /* De Mogan's Law */ return ~((~x) & (~y)); } int bitOr(int x, int y) { /* De Mogan's Law */ return ~((~x) & (~y)); } int bitXor(int x, int y) { return ~(x & y) & ~(~x & ~y); } ``` 另外 `botMatch` 本質上是用 `XOR`，因此也可以套用: ```c= int bitMatch(int x, int y) { return ~(~(x & y) & ~(~x & ~y)); } ``` ## Divide and Conquer ### 折疊大致上來說是像： ```c= x |= x >> 16; x |= x >> 8; x |= x >> 4; x |= x >> 2; ``` 這種形狀的作法。如： ``` /* * allOddBits - return 1 if all odd-numbered bits in word set to 1 * where bits are numbered from 0 (least significant) to 31 (most significant) * Examples allOddBits(0xFFFFFFFD) = 0, allOddBits(0xAAAAAAAA) = 1 * Legal ops: ! ~ & ^ | + << >> * Max ops: 12 * Rating: 2 */ ``` 可以考慮下面這種形式的作法： ```c= int allOddBits(int x) { x &= x >> 16; x &= x >> 8; x &= x >> 4; x &= x >> 2; return (2 & x) >> 1; } ``` 第一次的時候，得到的 0 ~ 15 位元，分別會是第 (0 , 16) (1, 17) ... (15, 31) 位元 AND 起來的結果。第二次進行的時候，第 0 ~ 7 個位元分別為 (0, 8, 16, 24), (1, 9, 17, 25) ... 的 AND。每進行一次「折半」，間隔就會少一半。最後會得到第 0 個位元位 (0, 2, 4, ... 30) 個位元的 AND，第 1 個位元會是 (1, 3, 5, ... , 31) 個位元的 AND。類似的道理，`allEvenBits` 也可以用類似方式實作。另外，也可以作為一種 parity bit 的實作方法。類似的作法用來實作下面的函數： ```c= int allOddBits(int x) { x &= x >> 16; x &= x >> 8; x &= x >> 4; x &= x >> 2; return (2 & x) >> 1; } int allOddBits(int x) { x &= x >> 16; x &= x >> 8; x &= x >> 4; x &= x >> 2; return (2 & x) >> 1; } int anyEvenBit(int x) { x |= x >> 16; x |= x >> 8; x |= x >> 4; x |= x >> 2; return x & 1; } int anyOddBit(int x) { x |= x >> 16; x |= x >> 8; x |= x >> 4; x |= x >> 2; return (x & 2) >> 1; } int bang(int x) { x |= x >> 16; x |= x >> 8; x |= x >> 4; x |= x >> 2; x |= x >> 1; return (x & 1) ^ 1; } int bitParity(int x) { x ^= x >> 16; x ^= x >> 8; x ^= x >> 4; x ^= x >> 2; x ^= x >> 1; return x & 1; } int evenBits(void) { int x = 5; x |= x << 4; x |= x << 8; x |= x << 16; return x; } ``` ### 0x55555555 使用這組 mask： ```c= ... int mask1 = 0x55555555; int mask2 = 0x33333333; int mask4 = 0x0f0f0f0f; int mask8 = 0x00ff00ff; int mask16 = 0x0000ffff; ... ``` 與 merge sort 類似，想辦法把既有的結果合併。以 `bitReverse` 為例：有了兩個一組位元的位元反轉之後，把相鄰兩組各自反過來，就可以得到4個一組為單位的反轉結果。以此遞推下去，就可以得到 8 個一組, 16 個一組，最後 32 個位元的反轉。 ```c= int bitReverse(int x) { int mask1 = 0x55555555; int mask2 = 0x33333333; int mask4 = 0x0f0f0f0f; int mask8 = 0x00ff00ff; int mask16 = 0x0000ffff; x = ((x & mask1) << 1) | ((x >> 1) & mask1); x = ((x & mask2) << 2) | ((x >> 2) & mask2); x = ((x & mask4) << 4) | ((x >> 4) & mask4); x = ((x & mask8) << 8) | ((x >> 8) & mask8); x = ((x & mask16) << 16) | ((x >> 16) & mask16); return x; } ``` 但題目限定不能使用大數，因此需要自己生成： ```c= int bitReverse(int x) { /* mask1 = 0x55555555 */ int mask1 = 0x55 | (0x55 << 8); mask1 = mask1 | (mask1 << 16); /* mask2 = 0x33333333; */ int mask2 = 0x33 | (0x33 << 8); mask2 = mask2 | (mask2 << 16); /* mask4 = 0x0f0f0f0f; */ int mask4 = 0x0f | (0x0f << 8); mask4 = mask4 |(mask4 << 16); /* mask8 = 0x00ff00ff;*/ int mask8 = 0xff; mask8 = mask8 | (mask8 << 16); /* mask16 = 0x0000ffff; */ int mask16 = 0xff | (0xff << 8); /* Divide and conquer */ x = ((x & mask1) << 1) | ((x >> 1) & mask1); x = ((x & mask2) << 2) | ((x >> 2) & mask2); x = ((x & mask4) << 4) | ((x >> 4) & mask4); x = ((x & mask8) << 8) | ((x >> 8) & mask8); x = ((x & mask16) << 16) | ((x >> 16) & mask16); return x; } ``` 類似的道理可以用來成為 `bitCount` 的一種實作方式：以 2 位元為單位，計算每個間隔中有多少 1。這個結果可以用 2 位元的整數表達出來，所以就暫存在原本的空間，因此得到 16 個 2 位元的無號整數統計結果; 接著兩個兩個 2 位元的數字再相加，得到 8 個 4 位元無號整數的統計結果。以此遞推，就可以加總出所有位元有多少個 1。 ```c= int bitCount(int x) { /*mask1 = 0x55555555 */; int mask1 = 0x55 | (0x55 << 8); mask1 = mask1 | (mask1 << 16); /* mask2 = 0x33333333 */; int mask2 = 0x33 | (0x33 << 8); mask2 = mask2 | (mask2 << 16); /* mask4 = 0x0F0F0F0F; */ int mask4 = 0x0F | (0x0F << 8); mask4 = mask4 | (mask4 << 16); /* mask8 = 0x00FF00FF; */ int mask8 = 0xFF | (0xFF << 16); /* mask16 = 0x0000FFFF; */ int mask16 = 0xFF | (0xFF << 8); x = ((x >> 1) & mask1) + (x & mask1); x = ((x >> 2) & mask2) + (x & mask2); x = ((x >> 4) & mask4) + (x & mask4); x = ((x >> 8) & mask8) + (x & mask8); x = ((x >> 16) & mask16) + (x & mask16); return x; } ``` > 寫下面這邊時看註解裡面的內容以為只能使用 `int` 而感到百思不得其解。後來去看 [pdf 說明](http://csapp.cs.cmu.edu/3e/datalab.pdf) 時才發現可以使用任何 integer type。會使用其他整數型態調整以下內容。雖然說這個作法下去測試可以拿到分數，但其實隱藏問題：會踩到 undefined behavior。按照[這篇](https://developers.redhat.com/blog/2014/10/16/gcc-undefined-behavior-sanitizer-ubsan/)使用 sanitizer 的說明，Makefile 的 CFLAG 當中加入 `-fsanitize=undefined` 的選項： ```Makefile CFLAGS = -O0 -g -Wall -Werror -m32 -fsanitize=undefined ``` 編譯執行之後會發現測試過程中出現問題： ``` bits.c:255:7: runtime error: signed integer overflow: 1431655765 + 1431655765 cannot be represented in type 'int' ``` 問題是在第一次加法中 `int` 的範圍限制會導致溢位： ```c= x = ((x >> 1) & mask1) + (x & mask1); ``` 因為 `int` 終究是有號的，因此不能像前面思考時描述的，把計算結果視為兩個兩個一組的無號數直接相加。因此手動實作加法： ```c= x = (((x >> 1) & mask1) ^ (x & mask1)) | ((((x >> 1) & mask1) & (x & mask1)) << 1); ``` 這樣看起來解決了問題。但其實踩到另外一個 undefined behavior: ``` bits.c:255:82: runtime error: left shift of 1431655765 by 1 places cannot be represented in type 'int' ``` 理由是因為前面規格描述 `E1 << E2` 左移之後的數值，必須要 $E1 \cdot 2^{E2}$ 在原先資料型別的範圍之中。但這邊若給定的輸入第 30 個位元是 1，那麼當 1 在運算過程被移至第 31 位元時，就會立刻踩到(超過整數的 2^31^ - 1)。另外一個細節是這樣會超過給定的運算次數 2 步。 ### Binary Search 碰到連續的位元狀況。比如說 countLeadingZero ```c= int countLeadingZero(int x) { /* mask1 = 0x80000000 */ int mask1 = 1U << 31; /* mask2 = 0xC0000000 */ int mask2 = 0x3 << 30; /* mask4 = 0xF0000000 */ int mask4 = 0xf << 28; /* mask8 = 0xFF000000 */ int mask8 = 0xff << 24; /* mask16 = 0xFFFF0000 */ int mask16 = (0xff | (0xff << 8)) << 16; /* binary search no. of leading zeroes */ int d = 0; int shmnt = 0; shmnt = (!(mask16 & x)) << 4; x = x << shmnt; d = d + shmnt; shmnt = (!(mask8 & x)) << 3; x = x << shmnt; d = d + shmnt; shmnt = (!(mask4 & x)) << 2; x = x << shmnt; d = d + shmnt; shmnt = (!(mask2 & x)) << 1; x = x << shmnt; d = d + shmnt; shmnt = (!(mask1 & x)); x = x << shmnt; d = d + shmnt; d += !(x); return d; } ``` 對 0 的個數進行 binary search。如果有超過 16 就加 16，沒有就不要加(+0)。有了 countLeadingZero 之後，就可以用相同的方法去做 Log2, `greatestBitPos`。 > 跟上面一樣，寫下面這邊的共筆時以為不能用 `int` 以外的其他整數型態。會修正程式與下面的內容。類似地，用 `-fsanitize=undefined` 的選項下去編譯，也會發現這個作法其實有踩到 undefined behavior： ``` bits.c:435:21: runtime error: left shift of 3 by 30 places cannot be represented in type 'int' bits.c:438:21: runtime error: left shift of 15 by 28 places cannot be represented in type 'int' bits.c:441:22: runtime error: left shift of 255 by 24 places cannot be represented in type 'int' bits.c:444:39: runtime error: left shift of 65535 by 16 places cannot be represented in type 'int' bits.c:451:11: runtime error: left shift of negative value -2147483648 bits.c:455:11: runtime error: left shift of negative value -2147483648 bits.c:459:11: runtime error: left shift of negative value -2147483648 bits.c:463:11: runtime error: left shift of negative value -2147483648 bits.c:467:11: runtime error: left shift of negative value -2147483648 ``` 大多數跑出來的問題是： 1. 如「left shift of 3 by 30 places cannot be represented in type 'int'」：前面引用的規格可知道若 `E1 << E2` 時 $E1 \cdot 2^{E2}$ 在原先的資料型態不能表示出來，則行為是未定義行為。 2. 「left shift of negative value」：前面規格亦提到對於負數進行 left shift 是未定義行為。看來仍然有待修改。 ## 除法系列 ```c= int dividePower2(int x, int n) { int sign = !!(x & 0x80000000); sign = ~sign + 1; return (x + (sign & ((1 << n) - 1))) >> n; } ``` 正數時明顯就是直接 >>。但負數的話直接做會有問題(>> 在負數時會往負的地方 rounding)。解法是發現是負數時，多補一個除數給他，這樣一來他往負的地方 rounding 時就會剛好跑到預期的地方。有了這個功能之後，就可以作 `ezThreeFourths`, `multFiveEighths` ## sign - 2'complement 互換 2 補數表示法中，前面的 sign bit 其實跟 $-2^{31}$ 是一樣的意思。用這個想法直接算負數的 mag。不過後來發現這個想法跟一開始的絕對值算法得到差不多的結論： ```c= int signMag2TwosComp(int x) { int mag = 0x7FFFFFFF & x; int sign = !!(0x80000000 & x); int mask = ~sign + 1; return (mag ^ mask) + sign; } ``` 從另外一邊改回來也類似，概念上來說是： ```c= int twosComp2SignMag(int x) { int sign = x & 0x80000000; int mag = sign ? 0x80000000u - (x & 0x7FFFFFFF) : (x & 0x7FFFFFFF); return sign | mag; } ``` 不過試著把他展開就會發現得到跟絕對值的程式很類似的結果： ```c= int twosComp2SignMag(int x) { unsigned int mask = !!((1U << 31) & x); mask = ~mask + 1; return ((mask ^ x) + !!mask) | ((1U << 31) & x); } ``` ## 整數加減 & 比較類這種問題的預期作法是暴力減下去算，觀察 overflow 之後的狀況決定結果。但在有號的整數型別中，溢位是未定義行為，所以可能需要借助 unsigned 的整數型別。然而在混合有號數與無號數運算的過程中，可能會被 integer promotion 陰，所以需要小心處理。 ### unsigned 的規範考慮以下表達： ```c= int x = foo(); unsigned int y = x; ``` 這時後會想知道是否 `y` 與 `x` 的位元表達會完全相同。這種狀況比如說用在 addOK 時，想要暴力加起來看正負有沒有變化，但 signed integer overflow 在 C 語言規範中是未定義行為。然而 `unsigned int` 再超過表達範圍時則保證為取餘數的結果，避開了溢位的未定義行為問題。因此希望可以借助這個特性來進行運算。 unsigned 整數的運算在 C 語言的規格中有保證 modulo 的行為： > 6.2.5 Types > 9. ... A computation involving unsigned operands can never overflow, because a result that cannot be represented by the resulting unsigned integer type is reduced modulo the number that is one greater than the largest value that can be represented by the resulting type. 如果計算結果超過可表達的範圍，以 unsigned int 是 32 位元的狀況下為例，比如說是負數，或是比 0xFFFFFFFFu 大，其結果是該計算結果定義為「該計算結果不斷加/減『該無號型別可表達的最大整數+1』，直到在 unsigned 可以表達的範圍中為止」。而 CS:APP 提到二補數前面的 sign bit 可以視作 $-2^{31}$，而在電腦使用二補數的狀況下，`unsigned int`「可表達的最大整數 + 1」恰好是 0xFFFFFFFFu + 1，即 $2^{32}$。因此，對於 `y` 來說，`y` 的值為： 1. 若 y >= 0，則沒有超出表達範圍的問題。 2. 若 y < 0，因為不在 `unsigned int` 的表達範圍內，所以會進行取餘數運算。因為 LP64 中 `unsigned int` 最大表達值是 $2^{32} - 1$，因此會對 $2^{32} - 1 + 1 = 2^{32}$ 取餘數。故這時結果為： $$ (-2^{31} + x) + 2^{32} = 2^{31} + x $$ 其中 $x$ 是除了 sign bit 外，後面 31 個位元的值。但可以發現這個值在二進位表達中的各位元，跟原先使用二補數表達的 `x` 相同。因此他們有相同的位元表達。 ### isGreater/isLess/isLessOrEqual 由前面討論知道在使用二補數狀況下，`int` 指派給 `unsigned int` 後，兩者的位元表示會一樣。然後借助 `unsigne int` 保證的性質，進行減法後判斷符號： ```c= int isGreater(int x, int y) { unsigned x_ = x; unsigned y_ = y; unsigned res = x_ + ~y_ + 1; unsigned sgnx = (x_ >> 31) & 1; unsigned sgny = (y_ >> 31) & 1; unsigned sgn = (sgnx ^ 1) & (sgny ^ 0); return (sgn | ((!(sgnx ^ sgny)) & !(res >> 31))) & !!(x ^ y); } ``` 同樣的作法可以拿去做 `isLess`, `isLessOrEqual` ## Misc ```c= x = a ? b : c ``` 可以用： ```c= int mask = ~(!a) + 1; x = ((~mask) & b) | (mask & c); ``` 做出來。 # 浮點數這邊開始感受到效能要求嗚嗚嗚。舉例來說： ```c= int floatIsEqual(unsigned uf, unsigned ug) { if ((uf & 0x7FFFFFFF) > 0x7F800000 || (ug & 0x7FFFFFFF) > 0x7F800000) return 0; if (!((uf & 0x7FFFFFFF) ||((ug & 0x7FFFFFFF)))) return 1; return uf == ug; } ``` 測試時會發現 10 秒內無法測完： ``` ERROR: Test floatIsEqual failed. Timed out after 10 secs (probably infinite loop) ``` 乍看之下會很匪夷所思：因為程式中並沒有無窮回圈。但如果修改 `btest.c` ，將裡面時間延長： ```c= #define TIMEOUT_LIMIT 1000 ``` 會發現該實作的計算結果正確： ``` ... 2 2 0 floatIsEqual ... ``` 只是速度慢到被測試程式當成無窮迴圈。但不是很確定為什麼，嘗試把 if 通通移除： ```C= int floatIsEqual(unsigned uf, unsigned ug) { int isNaN = ((uf & 0x7FFFFFFF) > 0x7F800000 || (ug & 0x7FFFFFFF) > 0x7F800000); int areZero = (!((uf & 0x7FFFFFFF) ||((ug & 0x7FFFFFFF)))); return (!isNaN) && (areZero || (uf == ug)); } ``` 仍然會超過 10 秒的測試時間。不是很確定這是因為效能不夠還是發生什麼問題。 ## 兩個性質這兩個是在 CS:APP 裡提到的。但我覺得很有用。 ### Smooth Transition ![](https://i.imgur.com/KVo0qrd.png) exponent 是 0x01 (也就是次數最小的那些數) 的正數，直接把 sign bit 以外的那些位元做 `>>` ，會自動得到除二之後的 denormalize 表示法。因為次數是 1 跟次數是 0 時，次方都是 $2^{1 - bias}$。假定有一個浮點數的表示法代表的數是： $$ \left[1.f\right]_{2} \times 2^{1 - bias} $$ 那麼除二之後，得到的結果為： $$ \left[0.1f\right]_{2} \times 2^{1 - bias} $$ 而上述數字表成浮點數後 (denormalized) 表示法恰好就是右移的結果(實際上右移會砍掉最低位的位元)。 ### 遞增性觀察除了 sign bit 以外的後 31 位元，可以發現絕對值愈大的浮點數，後 31 位元對應的二進位數值也愈大，一路從 0 到 0x7F800000 ($\infty$)。再往上數就會是 NaN。因此可以用以下方法判斷 NaN： ```c= int isNaN = (ug & 0x7FFFFFFF) > 0x7F800000; ``` 1. 把後 31 位視為 unsigned 2. 判斷該值是否比 0x7F800000 大。若超過則為 NaN。 ## floatNegate ```c= unsigned floatNegate(unsigned uf) { int maskD = ~(1U << 31); if ((uf & maskD) > 0x7F800000) return uf; return (1U << 31) ^ uf; } ``` 首先判斷這個數字是否為 NaN，是的話直接回傳，否則改掉 sign bit 後回傳。 ## floatPower2 直接看 exponent 有沒有超過範圍。超過的話回傳無限或 0; 沒超過的話直接加上 bias 並移到 exponent 的欄位。 ```c= unsigned floatPower2(int x) { if (x > 128) return 0x7F800000; if (x < -126) return 0; x = x + 0x7F; return x << 23; } ``` ## floatScale2 3 種狀況： 1. 是 NaN 或無限：回傳原值。 2. 是 Normalized 表示：次數加一後回傳。 3. 是 Denormalize 表示：(sign bit 以外部份)直接左移 1 位元。 ```c= unsigned floatScale2(unsigned uf) { if ((uf & 0x7FFFFFFF) >= 0x7F800000) return uf; unsigned int E = (uf >> 23) & 0xFF; if (E > 0) return ((E + 1) << 23) | (uf & ~0x7F800000); return (uf & 0x80000000) | ((uf & 0x7FFFFFFF) << 1); } ``` ## floatIsLess 按表操課的判斷 NaN, 比正負, 最後比大小。 ```c= int floatIsLess(unsigned uf, unsigned ug) { int maskD = 0x7FFFFFFFu; int maskNaN = 0x7F800000u; unsigned int isNaN = ((maskD & uf) > maskNaN) || ((maskD & ug) > maskNaN); unsigned int areZero = !(maskD & uf) && !(maskD & ug); unsigned int sgnf = (uf >> 31) & 1U; unsigned int sgng = (ug >> 31) & 1U; if (areZero || isNaN) return 0; if (sgng != sgnf) return sgng < sgnf; return (sgnf == 1) ? uf > ug : uf < ug; } ``` 最後一個 return 照理說要改成合法的運算。但先寫其他的嗚嗚。 165/228