2018q1 Homework2 (assessment)

contributed by <blake11235>

重做第一週測驗題

好東西分享，很完整的HackMD功能介紹

測驗1

考慮以下程式碼:






















#include <stdlib.h>
int isMultN(unsigned int n) {
    int odd_c = 0, even_c = 0; /* variables to count odd and even SET bits */
    
    if (n == 0)    // return true if difference is 0.
        return 1;
    if (n == 1)    // return false if the difference is not 0.
        return 0;
    
    while (n) {
        if (n & 1)   // odd bit is SET, increment odd_C
            odd_c++;
        n >>= 1; 
    
        if (n & 1)   // even bit is SET, increment even_c
            even_c++;
        n = n >> 1;
    }
    
    /* Recursive call till you get 0/1 */
    return(isMultN(abs(odd_c - even_c)));
}

其作用為檢查輸入整數是否為 N 的倍數

題目研讀

首先先觀察程式碼

while (n) {
    if (n & 1)   // odd bit is SET, increment odd_C
        odd_c++;
    n >>= 1; 
    if (n & 1)   // even bit is SET, increment even_c
        even_c++;
    n = n >> 1;
}

從這段可以得知他是在計算 n 用 binary 表示的話，奇數位與偶數位的 1 的數目。先是判斷最右邊的那位數，看是否為 1 ，接著向右位移再判斷是否為 1 ，一直操作到 n 為 0 ，代表全部計算完畢。

return(isMultN(abs(odd_c - even_c)));

最後將 odd bit count 和 even bit count 的差值再次輸入函式做遞迴，直到數值為 0 或 1 。也就是最終的判斷依據，是看奇數位 1 的數與偶數位 1 的數兩者要一樣。

既然這段程式碼是依據各 bit 在判斷，那麼就把數值列出來啦~

數值	binary	奇偶位 1 的數的差值
1	0001	1
2	0010	1
3	0011	0
5	0101	2
7	0111	1
11	1011	1
13	1101	1
搭啦~ 答案很明顯的就是 3 啦~
因為只有 3 他的奇位 1 的數目和偶位 1 的數目一樣，所以只能選他了。
若差值是 1 以上的，會再次進入函式遞迴，直到最後的數值是 1 或者 0 。

推導

但事情有那麼順利嗎？會不會有除了 3 以外的質數，他的奇位 1 的數目和偶位 1 的數目也一樣，是不是就爆炸了？那麼我們就來證證看囉！

首先，這個題目讓我想到了一個經典個國中數學題目：如何判斷一個數 3 的倍數？
只需要把每個位數的值都加起來，如果為 3 的倍數的話，那麼此數就是 3 的倍數。
推導方法很簡單，這裡就有點了的講…
那麼就從那個觀念延伸，來這提證明。

假設一個二進位的數值 ABCDEFGH ，代表的是 \(A\cdot2^7+B\cdot2^6+C\cdot2^5+D\cdot2^4+E\cdot2^3+F\cdot2^2+G\cdot2^1+H\cdot2^0 \\=A\cdot128+B\cdot64+C\cdot32+D\cdot16+E\cdot8+F\cdot4+G\cdot2+H\cdot1 \\=(A\cdot129-A)+(B\cdot63+B)+(C\cdot33-C)+(D\cdot15+D)+(E\cdot9-E)+(F\cdot3+F)+(G\cdot3-G)+(H\cdot0+H) \\=(A\cdot129+B\cdot63+C\cdot33+D\cdot15+E\cdot9+F\cdot3+G\cdot3+H\cdot0)+(-A+B-C+D-E+F-G+H)\)

\((A\cdot129+B\cdot63+C\cdot33+D\cdot15+E\cdot9+F\cdot3+G\cdot3+H\cdot0)\) 是 3 的倍數
所以只要判斷\((-A+B-C+D-E+F-G+H)\) 也要是 3 的倍數就好了
可能是 3、6、9 …等等，再把數值代進迴圈，直到最後奇偶位數相差為 0 ，便知道此數是 3 的倍數

其他質數

針對其他質數判斷是否為其倍數，2 和 5 較為簡單，就不多說了，那麼就想針對 7、11、13 做研究
再次回想起國中數學~~

7的倍數：由個數起每三位數字一節，各奇數節的和與偶數節的和相減，其差是7的倍數。
11的倍數：奇數位數字和與偶數位數字和相差為11的倍數。
13的倍數：由個數起每三位數字一節，各奇數節的和與偶數節的和相減，其差是13的倍數。

目標，只使用加減法位移與迴圈

7的倍數

針對一個數於乘於 7 直式乘法，可以觀察到一個有趣現象
\(ABCDEFG\cdot7 = ABCDEFG\cdot0111\)

       A B C D E F G
     A B C D E F G
+  A B C D E F G
-----------------------

\((A), (A+B), (A+B+C), (B+C+D), (C+D+E), (D+E+F), (E+F+G), (F+G), (G)\)
也就是說，把一個數除以 7 ，就是把最後一位數拿去減倒數第三位、倒數第二位和倒數第一位，減完之後數值向右位移，然後回傳給函式繼續進行，直到除了後三位的其他位數都是 0 ，最後再判斷數值是否為 7 。

int isMultN(unsigned int n)
{
    if(n>>3 == 0) {
        return (n==7);//判斷是否餘七
    }
    else{
        int temp;
        temp = (n&1);//取最後一位數值
        n = n - temp - (temp<<1) - (temp<<2);//除於七
        return(isMultN(n>>1));
    }
}

既然都打出程式了，那麼就來比賽一下，看看是否有比 mod 運算還快，不然而外想到的這種作法沒有比較快，內心會很挫折。

所以阿我用了 clock() 來算一下時間，跟 mod 來比賽看看

#include <stdlib.h>
#include <time.h>

static double diff_in_second(struct timespec t1, struct timespec t2)
{
    struct timespec diff;
    if (t2.tv_nsec-t1.tv_nsec < 0) {
        diff.tv_sec  = t2.tv_sec - t1.tv_sec - 1;
        diff.tv_nsec = t2.tv_nsec - t1.tv_nsec + 1000000000;
    } else {
        diff.tv_sec  = t2.tv_sec - t1.tv_sec;
        diff.tv_nsec = t2.tv_nsec - t1.tv_nsec;
    }
    return (diff.tv_sec + diff.tv_nsec / 1000000000.0);
}

int isMultN(unsigned int n)
{
    if(n>>3 == 0) {
        return (n==7);
    }
    else{
        int temp;
        temp = (n&1);
        n = n - temp - (temp<<1) - (temp<<2);
        return(isMultN(n>>1));
    }
}

int main()
{
    unsigned int x;
    scanf("%u", &x);
    struct timespec start1, end1, start2, end2;

    clock_gettime(CLOCK_REALTIME, &start1);
    printf("%s\n", isMultN(x)?"yes":"no");
    clock_gettime(CLOCK_REALTIME, &end1);
    printf("my time: %.10f sec\n\n", diff_in_second(start1, end1));

    clock_gettime(CLOCK_REALTIME, &start2);
    printf("%s\n", (x%7)?"no":"yes");
    clock_gettime(CLOCK_REALTIME, &end2);
    printf("mod time: %.10f sec\n\n", diff_in_second(start2, end2));

    return 0;
}

結論是…
我輸了

1354684745
no
my time:  0.0000273730 sec

no
mod time: 0.0000087500 sec

好吧…先做 11 的倍數，不然做不完啦…

11 的倍數

11 的倍數也可以用類似於 7 的倍數的方法，只是改變了一點數值。

int isMultN(unsigned int n)
{
    if(n>>4 == 0) {                   
        return (n==11);                                     
    }
    else{
        int temp;
        temp = (n&1);
        n = n - temp - (temp<<1) - (temp<<3);
        return(isMultN(n>>1));
    }
}

但是，這個方法他的運算時間，比 mod 慢更多，差到了100倍左右…

6541213
no
my time: 0.0000271370 sec

no
mod time: 0.0000034600 sec

~~想一個比 mod 慢的程式有屁用~~<bye94046165>

13 的倍數

也是可以用一樣的手法但是速度一樣慢…

65
yes
my time: 0.0000257650 sec

yes
mod time: 0.0000036420 sec

所以，強者我室友e94046165看了我的 code 叫我把函式遞迴改成 while() 迴圈，順便嘴我的資料結構…

結果，速度快的幾乎跟 mod 一樣了！！

重做第二週測驗題

測驗1

以類似 CS:APP 3/e 第 80 頁針對 8 位元浮點格式的示例，擴充至單精度浮點數

位表示(b x x b)	指數	小數	值	描述
bin hex hex bin	e, E, 2^E	f, M	V, 十進位
0 00 00000 000	0, -126, 2^-126	0, 0	0, 0.0	零
0 00 00000 001	0, -126, 2^-126	2^-23, 2^-23	2^-149, 0.00..	最小 denormalized
…	…	…	…	denormalized
0 00 11111 111	0, -126, 2^-126	2^-1+2^-2…+2^-23, 2^-1+2^-2…+2^-23	…	最大 denormalized
0 01 00000 000	1, -126, 2^-126	0, 1	2^-126, 0.00…	最小 normalized
0 01 00000 001	1, -126, 2^-126	2^-23, 1+2^-23	2^-126+2¹⁴⁹, 0.00…	normalized
…	…	…	…	normalized
0 fe fffff 111	254, 127, 2¹²⁷	2^-1+2^-2…+2^-23, 1+2^-1+2^-2…+2^-23	…	最大 normalized
0 ff 00000 000	…	…	…	正無限大

在來看題目：

#include <math.h>
static inline float u2f(unsigned int x) { return *(float *) &x; }

float exp2_fp(int x) {
    unsigned int exp /* exponent */, frac /* fraction */;

    /* too small */
    if (x < 2 - pow(2, Y0) - 23) {
        exp = 0;
        frac = 0;
    /* denormalize */
    } else if (x < Y1 - pow(2, Y2)) {
        exp = 0;
        frac = 1 << (unsigned) (x - (2 - pow(2, Y3) - Y4));
    /* normalized */
    } else if (x < pow(2, Y5)) {
        exp = pow(2, Y6) - 1 + x;
        frac = 0;
    /* too large */
    } else {
        exp = Y7;
        frac = 0;
    }

    /* pack exp and frac into 32 bits */
    return u2f((unsigned) exp << 23 | frac);
}

too small
- 在程式碼的這部份當中，代表的是此數值要比最小的 denormalized 還小，也就是 < 2^-149
  2^-149 是由指數部份 2^-126 和小數部份 2^-23 相乘得來，所以(x < 2 - pow(2, 7) - 23)

denormalized
- 此部份代表的是非規格化的數，要符合的話必須小於最大的 normalized 2^-126 ，也就是 exp - Bias = 1 - (2^8-1 - 1)，所以(x < 2 - pow(2, 7))
- frac 的部份，若要表示 2^x 必定只會有一個 1 ，範圍在 100…000 = 2^-1 * 2^-126 = 2^-127 最大，到 000…001 = 2^-23 * 2^-126 = 2^-149 最小
  右移 0 代表的數是 2^-149 ，左移 22 代表的數是 2^-127 ，所以1 << (x - (2 - pow(2, 7) - 23))

normalized
- 要符合規格化的數，必須比 normalized 能表示的最大數再大，也就是比 2¹²⁷ * ( 1 + 2^-1 + 2^-2 + … +2^-23 ) 大，那麼就是 2¹²⁸ 還小，所以(x < pow(2, 7))
- exp 的部份，E = e - Bias = e - (2⁷ - 1)，所以exp = pow(2, 7) - 1 + x
too large
- 根據定義，無窮大的表示方式為 s 11111111 000…000，所以是0xff

延伸題

給定一個單精度的浮點數值，輸出較大且最接近的 2^x 值，需要充分考慮到邊界

原來是針對原本的題目做反向操作阿，研究了一下，就用 bits 來操作吧

首先針對所得到的浮點數做分類：

0
很簡單，輸入是 0 的時候當然就是 0 啦
無限大
也很簡單，當數值為 0xffffffff 就是無限大
denormalized
比較麻煩一點，得先判斷 23~3 位是 0 ，再來判斷後面 frac 的部份，由於題目有要求到輸出較大，所以當考慮 frac 的最左邊的 1 時若其右邊還有 1 ，他的數值就必須在向左移，也就是多 2 倍
normalized
反而更簡單了，先處理 exp 的部份，將數值向左移 23 位出現 exp 的部份，再減去 Bias = 127
frac 的部份，幾乎不用考慮，因為是要輸出較大，所以只要不是全為 0 ，其他都要多乘 2¹

#include <stdlib.h>
#include <math.h>
#include <stdio.h>

unsigned int flo2bin(float f);

int main(){
    float x;                                     
    int ans = 0;
    printf("input a float number：");
    scanf("%f", &x);

    if(!x){                        //0
        printf("0\n");
    }
    else if(x == 0xffffffff){    //infinity
        printf("INFINITY\n");
    }
    else{
        if(x >> 23 == 0){        //denormalized
            x = x << 9;
            while(!(x&80000000)){
                x = x << 1;
                ans++;
            }
            if(x << 1)
                ans = -ans;
            else
                ans = -ans-1;
        }
        else{                    //normalized
            if(x << 9 == 0)     //M=(1+0)=2^0
                ans = (x >> 23) - 127;
            else                //M=(1+f)=2^1
                ans = (x >> 23) - 127 + 1;
        }
        printf("2 e%d", ans);
    }
    return 0;
}

完成了…嗎？

float.c: In function ‘main’:
float.c:23:8: error: invalid operands to binary >> (have ‘float’ and ‘int’)
   if(x >> 23 == 0){

原來， float 不能做 bitwise 的運算阿！_{我真的現在才知道}
于是乎，我想到了一招，用 pointer

先將 float 取位址
將位址內容轉型成 unsigned int* 指標
將unsigned int* 指標取值

意思就是透過轉換位址的型態，將位址傳給 type size 一樣的 unsigned int ，在對它取值，這樣就可以對 unsigned int 做 bitwise 運算了

unsigned int flo2bin(float f){
    unsigned int i = *(unsigned int*)&f;
    return i;
}

最後我再多計算輸出 2^x 與原本浮點數的差值

再啦幹
e94046165

…謝謝…
blake11235

input a float number：0.25
2 e-2
diff: 0.000000

input a float number：0.0035
2 e-8
diff: 0.000406
 
input a float number：5646513.154
2 e23
diff: 2742095.000000

重做第三週測驗題

測驗一

延續從 \(\sqrt{2}\) 的運算談浮點數，假設 double 為 32-bit 寬度，考慮以下符合 IEEE 754 規範的平方根程式，請補上對應數值，使其得以運作：

#include <stdint.h>

/* A union allowing us to convert between a double and two 32-bit integers.
 * Little-endian representation
 */
typedef union {
    double value;
    struct {
        uint32_t lsw;
        uint32_t msw;
    } parts;
} ieee_double_shape_type;

/* Set a double from two 32 bit ints.  */
#define INSERT_WORDS(d, ix0, ix1)       \
    do {                                \
        ieee_double_shape_type iw_u = { \
            .parts.msw = ix0,           \
            .parts.lsw = ix1,           \
        };                              \
        (d) = iw_u.value;               \
    } while (0)

/* Get two 32 bit ints from a double.  */
#define EXTRACT_WORDS(ix0, ix1, d)   \
    do {                             \
        ieee_double_shape_type ew_u; \
        ew_u.value = (d);            \
        (ix0) = ew_u.parts.msw;      \
        (ix1) = ew_u.parts.lsw;      \
    } while (0)

static const double one = 1.0, tiny = 1.0e-300;

double ieee754_sqrt(double x)
{
    double z;
    int32_t sign = 0x80000000;
    uint32_t r, t1, s1, ix1, q1;
    int32_t ix0, s0, q, m, t, i;

    EXTRACT_WORDS(ix0, ix1, x);

    /* take care of INF and NaN */
    if ((ix0 & KK1) == KK2) {
        /* sqrt(NaN) = NaN, sqrt(+INF) = +INF, sqrt(-INF) = sNaN */
        return x * x + x;
    }
    /* take care of zero */
    if (ix0 <= 0) {
        if (((ix0 & (~sign)) | ix1) == 0)
            return x; /* sqrt(+-0) = +-0 */
        if (ix0 < 0)
            return (x - x) / (x - x); /* sqrt(-ve) = sNaN */
    }
    /* normalize x */
    m = (ix0 >> 20);
    if (m == 0) { /* subnormal x */
        while (ix0 == 0) {
            m -= 21;
            ix0 |= (ix1 >> 11);
            ix1 <<= 21;
        }
        for (i = 0; (ix0 & 0x00100000) == 0; i++)
            ix0 <<= 1;
        m -= i - 1;
        ix0 |= (ix1 >> (32 - i));
        ix1 <<= i;
    }
    m -= KK3; /* unbias exponent */
    ix0 = (ix0 & 0x000fffff) | 0x00100000;
    if (m & 1) { /* odd m, double x to make it even */
        ix0 += ix0 + ((ix1 & sign) >> 31);
        ix1 += ix1;
    }
    m >>= 1; /* m = [m/2] */

    /* generate sqrt(x) bit by bit */
    ix0 += ix0 + ((ix1 & sign) >> 31);
    ix1 += ix1;
    q = q1 = s0 = s1 = 0; /* [q,q1] = sqrt(x) */
    r = 0x00200000;       /* r = moving bit from right to left */

    while (r != 0) {
        t = s0 + r;
        if (t <= ix0) {
            s0 = t + r;
            ix0 -= t;
            q += r;
        }
        ix0 += ix0 + ((ix1 & sign) >> 31);
        ix1 += ix1;
        r >>= 1;
    }

    r = sign;
    while (r != 0) {
        t1 = s1 + r;
        t = s0;
        if ((t < ix0) || ((t == ix0) && (t1 <= ix1))) {
            s1 = t1 + r;
            if (((t1 & sign) == sign) && (s1 & sign) == 0)
                s0 += 1;
            ix0 -= t;
            if (ix1 < t1)
                ix0 -= 1;
            ix1 -= t1;
            q1 += r;
        }
        ix0 += ix0 + ((ix1 & sign) >> 31);
        ix1 += ix1;
        r >>= 1;
    }

    /* use floating add to find out rounding direction */
    if ((ix0 | ix1) != 0) {
        z = one - tiny; /* trigger inexact flag */
        if (z >= one) {
            z = one + tiny;
            if (q1 == (uint32_t) 0xffffffff) {
                q1 = 0;
                q += 1;
            } else if (z > one) {
                if (q1 == (uint32_t) KK4)
                    q += 1;
                q1 += 2;
            } else
                q1 += (q1 & 1);
        }
    }
    ix0 = (q >> 1) + 0x3fe00000;
    ix1 = q1 >> 1;
    if ((q & 1) == 1)
        ix1 |= sign;
    ix0 += (m << KK5);
    INSERT_WORDS(z, ix0, ix1);
    return z;
}

乾這題好難，本來想偷偷看有誰有做這題的，結果沒有…
首先先解決那五格填空該填入什麼

/* take care of INF and NaN */
    if ((ix0 & KK1) == KK2) {
        /* sqrt(NaN) = NaN, sqrt(+INF) = +INF, sqrt(-INF) = sNaN */
        return x * x + x;
    }

可以觀察出此段程式碼要判斷是否為 INF 或 NAN ，而判斷標準就是數字 exp 的部份皆為 1 ，因此 KK2 為 0x7ff00000 ， mask 的部份就是取 ix0 的 62~52 位數，所以 KK1 也是 0x7ff00000

m -= KK3; /* unbias exponent */

程式碼後面的註解都說了要 unbias 也就是將偏置移動回來，對 double 雙精度而言 Bias = 2^k-1 - 1 = 1023

/* use floating add to find out rounding direction */
    if ((ix0 | ix1) != 0) {
        z = one - tiny; /* trigger inexact flag */
        if (z >= one) {
            z = one + tiny;
            if (q1 == (uint32_t) 0xffffffff) {
                q1 = 0;
                q += 1;
            } else if (z > one) {
                if (q1 == (uint32_t) KK4)
                    q += 1;
                q1 += 2;
            } else
                q1 += (q1 & 1);
        }
    }

其實這題我不太懂…

ix0 += (m << KK5);

最後這段，是要把 m exponent 放入 ix0 裡面，而針對 double 他的 exp 是在 62~52 位元，所以需要向左位移 51-32+1=20 位

分析程式碼

typedef union {
    double value;
    struct {
        uint32_t lsw;
        uint32_t msw;
    } parts;
} ieee_double_shape_type;

/* Set a double from two 32 bit ints.  */
#define INSERT_WORDS(d, ix0, ix1)       \
    do {                                \
        ieee_double_shape_type iw_u = { \
            .parts.msw = ix0,           \
            .parts.lsw = ix1,           \
        };                              \
        (d) = iw_u.value;               \
    } while (0)

/* Get two 32 bit ints from a double.  */
#define EXTRACT_WORDS(ix0, ix1, d)   \
    do {                             \
        ieee_double_shape_type ew_u; \
        ew_u.value = (d);            \
        (ix0) = ew_u.parts.msw;      \
        (ix1) = ew_u.parts.lsw;      \
    } while (0)

這部份的程式碼，是用 struct 的方式解決了我之前所遇到的問題， float 無法做 bitwise 的運算。宣告一個 union ，裏面包含了一個 double 的值與真正可以操作的 uint32_t 部份。之後再 define EXTRACT_WORDS 和 INSERT_WORDS，讓一個 double 的數值，第 63 位道 32 位能夠輸入到 uint_32 ix0 ， 31 位到 0 位輸入到 uint_32 ix1。

/* take care of INF and NaN */
    if ((ix0 & 0x7ff00000) == 0x7ff00000) {
        /* sqrt(NaN) = NaN, sqrt(+INF) = +INF, sqrt(-INF) = sNaN */
        return x * x + x;
    }
    /* take care of zero */
    if (ix0 <= 0) {
        if (((ix0 & (~sign)) | ix1) == 0)
            return x; /* sqrt(+-0) = +-0 */
        if (ix0 < 0)
         return (x - x) / (x - x); /* sqrt(-ve) = sNaN */
    }

用條件判斷來分別處理 INF 、 NAN 、 ZERO 這幾種狀況
- (ix0 & 0x7ff00000) 是代表 double 的 exp 部份，若皆為 1 則代表 INF or NAN ，為了要使sqrt(NaN) = NaN, sqrt(+INF) = +INF, sqrt(-INF) = sNaN 因此使用 x * x + x 來表示，參考
- 要處理輸入為 0 的部份，則判斷 ix0 的後 31 位及 ix1 的所有位數都要是 0 ， ix0 的第一位是 sign 用來判斷 +0 還是 -0
- 若輸入是小於 0 ，則可以用 0/0 來使 return 為 NAN

這裡不太清楚為何能讓透過 (x - x) / (x - x); 使得 sqrt(-ve) = -NaN ，是否因為產生了 -0/-0　？

/* normalize x */
    m = (ix0 >> 20);
    if (m == 0) { // subnormal x 
        while (ix0 == 0) {
            m -= 21;
            ix0 |= (ix1 >> 11);
            ix1 <<= 21;
        }
        for (i = 0; (ix0 & 0x00100000) == 0; i++)
            ix0 <<= 1;
        m -= i - 1;
        ix0 |= (ix1 >> (32 - i));
        ix1 <<= i;
    }

將非規格化的數規格化。這裡的 m 代表的是 sign exp 的部份，若為 0 則需要先規格化。

    m -= 1023; /* unbias exponent E = e - Bias */
    ix0 = (ix0 & 0x000fffff) | 0x00100000;
    if (m & 1) {
        ix0 += ix0 + ((ix1 & sign) >> 31);
        ix1 += ix1;
    }
    m >>= 1;

針對一個 normalized 的數值可以知道 \(V = (-1)^s\cdot M\cdot 2^E = (1+f)\cdot(e-Bias)\)
- 將 e 減去 1023 之後就可得到真正二的幕次
- 把 ix0 取出 frac 的部份，也就是第 51 到 32 位，之後再加上 2⁰ ，得到 (1+f) = M
- \(V = M\cdot 2^E\) ，若 E 為偶數，則 \(\sqrt{M\cdot 2^E} = 2^{E/2}\cdot\sqrt{M}\) 所以要將 E 化為偶數，所以當 E 為奇數 (m & 1)，將 ix0 和 ix1 的部份乘於 2 ， m 可以直接右移 1 代表除於 2 ，指數部份開根號就完成了。

    /* generate sqrt(x) bit by bit */
    ix0 += ix0 + ((ix1 & sign) >> 31);
    ix1 += ix1;
    q = q1 = s0 = s1 = 0; /* [q,q1] = sqrt(x) */
    r = 0x00200000;       /* r = moving bit from right to left */

    while (r != 0) {
        t = s0 + r;
        if (t <= ix0) {
            s0 = t + r;
            ix0 -= t;
            q += r;
        }
        ix0 += ix0 + ((ix1 & sign) >> 31);
        ix1 += ix1;
        r >>= 1;//r one time / 2
    }
    r = sign;
    while (r != 0) {
        t1 = s1 + r;
        t = s0;
        if ((t < ix0) || ((t == ix0) && (t1 <= ix1))) {
            s1 = t1 + r;
            if (((t1 & sign) == sign) && (s1 & sign) == 0)
                s0 += 1;
            ix0 -= t;
            if (ix1 < t1)
                ix0 -= 1;
            ix1 -= t1;
            q1 += r;
        }
        ix0 += ix0 + ((ix1 & sign) >> 31);
        ix1 += ix1;
        r >>= 1;
    }

原本以為這是用二分法去實現，因為看到了 ix0 和 t 比較大小，所以一直往二分法想，但結果不對，又不可能是十分法，牛頓法也不像，最後找了好久…原來是這個阿。
\((a\cdot2^a+b\cdot2^b+c\cdot2^c+d\cdot2^d...)^2=\\2^{2a}\cdot a^2 \\+(2^{2a-1}\cdot2a+2^{2a-2}\cdot b)\cdot b\\+(2^{2a-2}\cdot2a+2^{2a-3}\cdot 2b +2^{2a-4}\cdot c)\cdot c\\+(2^{2a-3}\cdot2a+2^{2a-4}\cdot 2b +2^{2a-5}\cdot 2c+2^{2a-6}\cdot d)\cdot d\\+ ...\)
分成 ix0 和 ix1 的部份執行，執行結束後可以得到 q 和 q1 兩個 root

/* use floating add to find out rounding direction */
    if ((ix0 | ix1) != 0) {
        z = one - tiny; /* trigger inexact flag */
        if (z >= one) {
            z = one + tiny;
            if (q1 == (uint32_t) 0xffffffff) {
                q1 = 0;
                q += 1;
            } else if (z > one) {
                if (q1 == (uint32_t) 0xfffffffe)
                    q += 1;
                q1 += 2;
            } else
                q1 += (q1 & 1);
        }
    }

這部份是用來捨入的，只是詳細的原因我不太懂…

ix0 = (q >> 1) + 0x3fe00000;
    ix1 = q1 >> 1;
    if ((q & 1) == 1)
        ix1 |= sign;
    ix0 += (m << KK5);
    INSERT_WORDS(z, ix0, ix1);
    return z;

終於到最後了，此時把 q 的部份放進 ix0 並且把之前扣掉的 Bias 加回去，在把 q1 也放進去 ix1，特別注意由於在前面有把 frac 的部份向右移，這裡要全部移回去。最後把 exp m 的部份加到 ix0 裏面就完成了。

float版本

#include <stdint.h>

/* A union allowing us to convert between a double and two 32-bit integers.
 * Little-endian representation
 */
typedef union {
    float value;
    struct {
        uint16_t lsw;
        uint16_t msw;
    } parts;
} ieee_float_shape_type;

/* Set a float from two 16 bit ints.  */
#define INSERT_WORDS(f, ix0, ix1)       \
    do {                                \
        ieee_float_shape_type iw_u = { \
            .parts.msw = ix0,           \
            .parts.lsw = ix1,           \
        };                              \
        (f) = iw_u.value;               \
    } while (0)

/* Get two 16 bit ints from a float.  */
#define EXTRACT_WORDS(ix0, ix1, f)   \
    do {                             \
        ieee_float_shape_type ew_u; \
        ew_u.value = (f);            \
        (ix0) = ew_u.parts.msw;      \
        (ix1) = ew_u.parts.lsw;      \
    } while (0)

static const float one = 1.0, tiny = 1.0e-300;//TODO

double ieee754_sqrt(float x)
{
    float z;
    int16_t sign = 0x8000;
    uint16_t r, t1, s1, ix1, q1;
    int16_t ix0, s0, q, m, t, i;

    EXTRACT_WORDS(ix0, ix1, x);

    /* take care of INF and NaN */
    if ((ix0 & 0x7f80) == 0x7f80) {
        /* sqrt(NaN) = NaN, sqrt(+INF) = +INF, sqrt(-INF) = sNaN */
        return x * x + x;
    }
    /* take care of zero */
    if (ix0 <= 0) {
        if (((ix0 & (~sign)) | ix1) == 0)
            return x; /* sqrt(+-0) = +-0 */
        if (ix0 < 0)
            return (x - x) / (x - x); /* sqrt(-ve) = sNaN */
    }


    // normalize x
    m = (ix0 >> 7);
/*    if (m == 0) { // subnormal x 
        while (ix0 == 0) {
            m -= 21;
            ix0 |= (ix1 >> 11);
            ix1 <<= 21;
        }
        for (i = 0; (ix0 & 0x00100000) == 0; i++)
            ix0 <<= 1;
        m -= i - 1;
        ix0 |= (ix1 >> (32 - i));
        ix1 <<= i;
    }
*/
    m -= 127; /* unbias exponent E = e - Bias */
    ix0 = (ix0 & 0x007f) | 0x0080; /* M = 1 + f = (2^0 == |00100000) + f */
    if (m & 1) { /*if m is odd , double x to make it even , in order to make 2^E--->2^(E/2) , E need to be even      2^(E+1)*M = 2^(E)*M*2 */
        ix0 += ix0 + ((ix1 & sign) >> 15);/* if need to carry in, ix1 & sign will be 1, ix1*2 overflow is OK */
        ix1 += ix1;
    }
    m >>= 1; /* m = [m/2]   2^(E+1)--->2^(E) */

    /* generate sqrt(x) bit by bit */
    ix0 += ix0 + ((ix1 & sign) >> 15);
    ix1 += ix1;
    q = q1 = s0 = s1 = 0; /* [q,q1] = sqrt(x) */
    r = 0x0100;       /* r = moving bit from right to left */

    while (r != 0) {
        t = s0 + r;
        if (t <= ix0) {
            s0 = t + r;
            ix0 -= t;
            q += r;
        }
        ix0 += ix0 + ((ix1 & sign) >> 15);
        ix1 += ix1;
        r >>= 1;//r one time / 2
	}
    r = sign;
    while (r != 0) {
        t1 = s1 + r;
        t = s0;
        if ((t < ix0) || ((t == ix0) && (t1 <= ix1))) {
            s1 = t1 + r;
            if (((t1 & sign) == sign) && (s1 & sign) == 0)
                s0 += 1;
            ix0 -= t;
            if (ix1 < t1)
                ix0 -= 1;
            ix1 -= t1;
            q1 += r;
        }
        ix0 += ix0 + ((ix1 & sign) >> 15);
        ix1 += ix1;
        r >>= 1;
    }

    /* use floating add to find out rounding direction   beacuse we have a bit that need to be rounding*/
    if ((ix0 | ix1) != 0) {
        z = one - tiny; /* trigger inexact flag */
        if (z >= one) {
            z = one + tiny;
            if (q1 == (uint16_t) 0xffff) {
                q1 = 0;
                q += 1;
            } else if (z > one) {
                if (q1 == (uint16_t) 0xfffe)
                    q += 1;
                q1 += 2;//make q1 == 0
            } else
                q1 += (q1 & 1);
        }
	}
    
    ix0 = (q >> 1) + 0x3f00;//Bias exponent E = e - 1023
    ix1 = q1 >> 1;
    if ((q & 1) == 1)
        ix1 |= sign;
    ix0 += (m << 7);
    INSERT_WORDS(z, ix0, ix1);
    return z;
}

int main(){
	float f;
	scanf("%f", &f);
	printf("%f\n", ieee754_sqrt(f));
}

感覺不難，只需要把數值改成對應 float 格式就好。

綜觀這一題，我想這就是老師所說的好的程式碼，能夠適應任何情況，不需要其他函式庫，透過簡單的 bitwise 操作，沒有複雜的乘除法，還考慮正負零、無限、非數等情況，捨入的部份也考慮到不同的捨入可能方式，這就是一段有價值的程式碼。

Syntax	Example	Reference
# Header	Header	基本排版
- Unordered List	Unordered List
1. Ordered List	Ordered List
- [ ] Todo List	Todo List
> Blockquote	Blockquote
Bold font	Bold font
Italics font	Italics font
~~Strikethrough~~	~~Strikethrough~~
19^th^	19^th
H~2~O	H₂O
++Inserted text++	Inserted text
==Marked text==	Marked text
[link text](https:// "title")	Link
![image alt](https:// "title")	Image
`Code`	`Code`	在筆記中貼入程式碼
```javascript var i = 0; ```	`var i = 0;`	在筆記中貼入程式碼
:smile:		Emoji list
{%youtube youtube_id %}	Externals
$L^aT_eX$	L^aT_eX
:::info This is a alert area. :::	This is a alert area.