2018q1 Homework (quiz3)

tags: `bauuuu1021`,`BroLeaf`

contributed by <bauuuu1021,BroLeaf>
2018q1 第 3 週測驗題

測驗 `1`

延續從

\sqrt{2}

的運算談浮點數，假設 double 為 32-bit 寬度，考慮以下符合 IEEE 754 規範的平方根程式，請補上對應數值，使其得以運作：










































































































































#include <stdint.h>

/* A union allowing us to convert between a double and two 32-bit integers.
 * Little-endian representation
 */
typedef union {
    double value;
    struct {
        uint32_t lsw;
        uint32_t msw;
    } parts;
} ieee_double_shape_type;

/* Set a double from two 32 bit ints.  */
#define INSERT_WORDS(d, ix0, ix1)       \
    do {                                \
        ieee_double_shape_type iw_u = { \
            .parts.msw = ix0,           \
            .parts.lsw = ix1,           \
        };                              \
        (d) = iw_u.value;               \
    } while (0)

/* Get two 32 bit ints from a double.  */
#define EXTRACT_WORDS(ix0, ix1, d)   \
    do {                             \
        ieee_double_shape_type ew_u; \
        ew_u.value = (d);            \
        (ix0) = ew_u.parts.msw;      \
        (ix1) = ew_u.parts.lsw;      \
    } while (0)

static const double one = 1.0, tiny = 1.0e-300;

double ieee754_sqrt(double x)
{
    double z;
    int32_t sign = 0x80000000;
    uint32_t r, t1, s1, ix1, q1;
    int32_t ix0, s0, q, m, t, i;

    EXTRACT_WORDS(ix0, ix1, x);

    /* take care of INF and NaN */
    if ((ix0 & KK1) == KK2) {
        /* sqrt(NaN) = NaN, sqrt(+INF) = +INF, sqrt(-INF) = sNaN */
        return x * x + x;
    }
    /* take care of zero */
    if (ix0 <= 0) {
        if (((ix0 & (~sign)) | ix1) == 0)
            return x; /* sqrt(+-0) = +-0 */
        if (ix0 < 0)
            return (x - x) / (x - x); /* sqrt(-ve) = sNaN */
    }
    /* normalize x */
    m = (ix0 >> 20);
    if (m == 0) { /* subnormal x */
        while (ix0 == 0) {
            m -= 21;
            ix0 |= (ix1 >> 11);
            ix1 <<= 21;
        }
        for (i = 0; (ix0 & 0x00100000) == 0; i++)
            ix0 <<= 1;
        m -= i - 1;
        ix0 |= (ix1 >> (32 - i));
        ix1 <<= i;
    }
    m -= KK3; /* unbias exponent */
    ix0 = (ix0 & 0x000fffff) | 0x00100000;
    if (m & 1) { /* odd m, double x to make it even */
        ix0 += ix0 + ((ix1 & sign) >> 31);
        ix1 += ix1;
    }
    m >>= 1; /* m = [m/2] */

    /* generate sqrt(x) bit by bit */
    ix0 += ix0 + ((ix1 & sign) >> 31);
    ix1 += ix1;
    q = q1 = s0 = s1 = 0; /* [q,q1] = sqrt(x) */
    r = 0x00200000;       /* r = moving bit from right to left */

    while (r != 0) {
        t = s0 + r;
        if (t <= ix0) {
            s0 = t + r;
            ix0 -= t;
            q += r;
        }
        ix0 += ix0 + ((ix1 & sign) >> 31);
        ix1 += ix1;
        r >>= 1;
    }

    r = sign;
    while (r != 0) {
        t1 = s1 + r;
        t = s0;
        if ((t < ix0) || ((t == ix0) && (t1 <= ix1))) {
            s1 = t1 + r;
            if (((t1 & sign) == sign) && (s1 & sign) == 0)
                s0 += 1;
            ix0 -= t;
            if (ix1 < t1)
                ix0 -= 1;
            ix1 -= t1;
            q1 += r;
        }
        ix0 += ix0 + ((ix1 & sign) >> 31);
        ix1 += ix1;
        r >>= 1;
    }

    /* use floating add to find out rounding direction */
    if ((ix0 | ix1) != 0) {
        z = one - tiny; /* trigger inexact flag */
        if (z >= one) {
            z = one + tiny;
            if (q1 == (uint32_t) 0xffffffff) {
                q1 = 0;
                q += 1;
            } else if (z > one) {
                if (q1 == (uint32_t) KK4)
                    q += 1;
                q1 += 2;
            } else
                q1 += (q1 & 1);
        }
    }
    ix0 = (q >> 1) + 0x3fe00000;
    ix1 = q1 >> 1;
    if ((q & 1) == 1)
        ix1 |= sign;
    ix0 += (m << KK5);
    INSERT_WORDS(z, ix0, ix1);
    return z;
}

danglling else

在 Line 26&36 開始的 macro define 中都用 do{…}while(0) 迴圈將內容包起來，查資料後才發現是 kernel 寫作的常用手法。以下舉例說明















#define function(A,B) { \
    if(A) \
        y = A + B; \
    else \
        y = B; \
}

int main() {
    int x = 2, y;
    if (x > 1)
        function(2, 3);
    else
        ;
    return 0;
}

編譯失敗:

define.c:14:5: error: ‘else’ without a previous ‘if’

加上 do…while(0) 即編譯成功

#define function(A, B) \
    do { \
    if (A) \
        y = A + B; \
    else \
        y = B; \
} while (0)

這是因為 macro 宣告時為 {…} ，使用時會在句尾加上 ';'。如果沒有加上 do…while(0) ，在上述程式碼中展開為










int main() {
    int x = 2, y;
    if (x > 1) { \
        if (A) \
            y = A + B; \
        else \
            y = B; \
    };
    else
        ;

在第 16 行有 ';' 就已經將 if 做結尾了，因此下一行的 else 就會出現 without a previous ‘if’ 這種結果

但另外經過實驗發現，如果在 11~13 行加上 { } ，即使沒有 do…while(0) 亦可成功編譯。










int main()
{
    int x = 2, y;
    if (x > 1) {
        function(2,3);
    }
    else
	        ;
    return 0;
}

儘管 dangling else 這種問題應該是 define 時要注意，但如果在使用 macro 時要保險一點可以使用 { } 匡住迴圈

KK1 & KK2
```
/* take care of INF and NaN */
if ((ix0 & KK1) == KK2) {
    /* sqrt(NaN) = NaN, sqrt(+INF) = +INF, sqrt(-INF) = sNaN */
    return x * x + x;
}
```
1. 查閱 newlib 原始程式碼，推敲 EXTRACT_WORDS 巨集的用途
2. 用牛頓法解釋 root square 運作原理，並且充分解釋本程式的設計考量
  Image Not Showing Possible Reasons
  The image file may be corrupted
  The server hosting the image is unavailable
  The image path is incorrect
  The image format is not supported
  Learn More →
  jserv
- 在計算 KK1 及 KK2 前要先計算 EXTRACT_WORDS(ix0, ix1, x); 所產出的 ix0 及 ix1
```
/* Get two 32 bit ints from a double.  */
#define EXTRACT_WORDS(ix0, ix1, d)   \
do {                             \
    ieee_double_shape_type ew_u; \
    ew_u.value = (d);            \
    (ix0) = ew_u.parts.msw;      \
    (ix1) = ew_u.parts.lsw;      \
} while (0)
```
- ieee_double_shape_type 宣告
```
typedef union {
double value;
struct {
    uint32_t lsw;
    uint32_t msw;
} parts;
} ieee_double_shape_type;
```
- 因為 ieee_double_shape_type 是 union 的型態，所有成員~~共享記憶體~~共用同一塊記憶體空間，因此 ix0 ix1 分別為傳入值 d 的最高32位及最低32位。
- 而題目所求的是 INF 及 NaN 的情況，此時 exp 部份全為 1，對 double 來說也就是從 sign bit 以下的 11 位都為 1。
- 用 bitwise 表示即為 : (ix0&0x7ff00000) == (0x7ff00000)，KK1=0x7ff00000，KK2=0x7ff00000
KK3
```
m -= KK3; /* unbias exponent */
```
這邊要將 exp 減去 bias，double 的 bias 為 1023，KK3 = 1023

Square Root of Binary Number Tutorial

Step 1 將欲開根號數從小數點開始向左右兩兩成一組
         
        -----------
       √ 1|01|00|01
       
Step 2 在 A B 處填入合適數值使 A B 相乘不大於被除數
         B                 1
         ------------     -----------
        √ 1|01|00|01     √ 1|01|00|01    (此處被除數為 1) 
       A               1   1
                         ---------------
                           0
        
Step 3 (原除數再加上 A 的計算結果)<<1，再於最低位加入新的 A ，成為新除數，重複 Step 2&3
         1 B
        ---------
       √ 1|01|00|01
-> 1     1
         1
        ---------
   10A     01  
((A+A)<<1)+$new_A$  

         1 0
        ---------
       √ 1|01|00|01
   1     1
         1
        ---------
  100      01
            0
        ---------
  100A      1 00
((100A+A)<<1)+$new_A$

         1  0  0
        ---------
       √ 1|01|00|01
   1     1
         1
        ---------
  100      01
            0
        ---------
  1000      1 00
               0
        ---------
            1 00 01
            
         1  0  0  1
        ---------
       √ 1|01|00|01
   1     1
         1
        ---------
  100      01
            0
        ---------
  1000      1 00
               0
        ---------
            1 00 01
  10001     1 00 01
        ---------
                  0

Image Not Showing Possible Reasons

The image file may be corrupted
The server hosting the image is unavailable
The image path is incorrect
The image format is not supported

Learn More →




































/* generate sqrt(x) bit by bit */
ix0 += ix0 + ((ix1 & sign) >> 31);
ix1 += ix1;
q = q1 = s0 = s1 = 0; /* [q,q1] = sqrt(x) */
r = 0x00200000;       /* r = moving bit from right to left */

while (r != 0) {
    t = s0 + r;
    if (t <= ix0) {
        s0 = t + r;
        ix0 -= t;
        q += r;
    }
    ix0 += ix0 + ((ix1 & sign) >> 31);
    ix1 += ix1;
    r >>= 1;
}

r = sign;
while (r != 0) {
    t1 = s1 + r;
    t = s0;
    if ((t < ix0) || ((t == ix0) && (t1 <= ix1))) {
        s1 = t1 + r;
        if (((t1 & sign) == sign) && (s1 & sign) == 0)
            s0 += 1;
        ix0 -= t;
        if (ix1 < t1)
            ix0 -= 1;
        ix1 -= t1;
        q1 += r;
    }
    ix0 += ix0 + ((ix1 & sign) >> 31);
    ix1 += ix1;
    r >>= 1;
}

KK4 & KK5






















/* use floating add to find out rounding direction */
if ((ix0 | ix1) != 0) {
    z = one - tiny; /* trigger inexact flag */
    if (z >= one) {
        z = one + tiny;
        if (q1 == (uint32_t) 0xffffffff) {
            q1 = 0;
            q += 1;
        } else if (z > one) {
            if (q1 == (uint32_t) KK4)
                q += 1;
            q1 += 2;
        } else
            q1 += (q1 & 1);
    }
}
ix0 = (q >> 1) + 0x3fe00000;
ix1 = q1 >> 1;
if ((q & 1) == 1)
    ix1 |= sign;
ix0 += (m << KK5);
INSERT_WORDS(z, ix0, ix1);

這段程式碼在做捨入，先解釋一些變數
- q,q1 組合成 sqrt(x)
```
q = q1 = s0 = s1 = 0; /* [q,q1] = sqrt(x) */
```
- one & tiny 的定義 (double range = -1.7e308~1.7e308)
```
static const double one = 1.0, tiny = 1.0e-300;
```
- m 為做完根號運算得到的答案的 exp

首先看到




if (q1 == (uint32_t) 0xffffffff) {
        q1 = 0;
        q += 1;
    }

當 q1 全位元都是 1 時，向上捨入。即更上面高位的數 q 加一，將 q1 設為 0

再往下看到




else if (z > one) {
    if (q1 == (uint32_t) KK4)
        q += 1;
    q1 += 2;

q1==KK4 時 q1 +2 可以向上捨入使 q+1，這時的 KK4 應為 111…10 的形式，KK4=0xfffffffe

以下程式碼將 m 送回 double 中 exp 該存放的位置(double 52~62 bits，ix0 的 20~30 bits)
```
ix0 += (m << KK5);
```
m 應該向左移 20 位，KK5=20

延伸題目: 解釋上述程式碼何以運作，並且改為 float (單精度) 版本，注意應該用更短的程式碼來實作

double & float 間相異處
- float 不需要拆成 ix0 ix1，但 float 無法做 shift 運算，必須轉成 int
- bias = 2^8-1 -1 = 127
- tiny 改為1e-30，因為 float 可表達的範圍約3.4e-38~e.4e-38
程式碼































































































#include <stdint.h>
/* A union allowing us to convert between a float and a 32-bit integers.*/
typedef union {
    float value;
    uint32_t v_int;
} ieee_float_shape_type;

/* Set a float from a 32 bit int. */
#define  INSERT_WORDS(d, ix0)	        \
    do {                                \
        ieee_float_shape_type iw_u;     \
            iw_u.v_int = (ix0);         \
        (d) = iw_u.value;               \
    } while (0)

/* Get a 32 bit int from a float. */
#define EXTRACT_WORDS(ix0, d)        \
    do {                             \
        ieee_float_shape_type ew_u;  \
        ew_u.value = (d);            \
        (ix0) = ew_u.v_int;          \
    } while (0)

static const float one = 1.0, tiny = 1.0e-30;

float ieee754_sqrt(float x)
{
	float z;
	int32_t sign = 0x80000000;
	uint32_t r;
    int32_t ix0, s0, q, m, t, i;
	
	EXTRACT_WORDS(ix0, x);
	
	/* take care of zero */
    if (ix0 <= 0) {
        if ((ix0 & (~sign)) == 0)
            return x; /* sqrt(+-0) = +-0 */
        if (ix0 < 0)
            return (x - x) / (x - x); /* sqrt(-ve) = sNaN */
    }
	/* take care of +INF and NaN */
    if ((ix0 & 0x7ff00000) == 0x7ff00000) {
        /* sqrt(NaN) = NaN, sqrt(+INF) = +INF,*/
        return x;
    }
	/* normalize x */
    m = (ix0 >> 23);
    if (m == 0) { /* subnormal x */
        for (i = 0; (ix0 & 0x00800000) == 0; i++)
            ix0 <<= 1;
        m -= i - 1;
    }
    m -= 127; /* unbias exponent */
    ix0 = (ix0 & 0x007fffff) | 0x00800000;
    if (m & 1) { /* odd m, double x to make it even */
        ix0 += ix0;
    }
    m >>= 1; /* m = [m/2] */
	
    /* generate sqrt(x) bit by bit */
    ix0 += ix0;
    q = s0 = 0; /* [q] = sqrt(x) */
    r = 0x01000000;       /* r = moving bit from right to left */

    while (r != 0) {
        t = s0 + r;
        if (t <= ix0) {
            s0 = t + r;
            ix0 -= t;
            q += r;
        }
        ix0 += ix0;
        r >>= 1;
    }

	/* use floating add to find out rounding direction */
	if (ix0 != 0) {
        z = one - tiny; /* trigger inexact flag */
        if (z >= one) {
            z = one + tiny;
            if (z > one)
                q += 2;
            else
                q += (q & 1);
        }
    }
	
    ix0 = (q >> 1) + 0x3f000000;
    ix0 += (m << 23);
	
    INSERT_WORDS(z, ix0);

    return z;
}

測驗 `2`

考慮到下方函式 shift_right_arith 和 shift_right_logical 分別表示算術右移和邏輯右移，請嘗試補完程式碼。可由 sizeof(int) * 8 得知整數型態 int 的位元數 w，而位移量 k 的有效範圍在 0 到 w - 1。

#include <stdio.h>
int shift_right_arith(int x, int k) {
    int xsrl = (unsigned) x >> k;
    int w = sizeof(int) << P1;
    int mask = (int) -1 << (P2);
    if (x < 0)
        return xsrl P3 mask;
    return xsrl;
}

unsigned shift_right_logical(unsigned x, int k) {
    unsigned xsra = (int) x >> k;
    int w = sizeof(int) << P4;
    int mask = (int) -1 << (P5);
    return xsra P6 P7;
}

作答區

P1 = ?
P2 = ?
P3 = ? (為某種 operation)
P4 = ?
P5 = ?
P6 = ? (為某種 operation)
P7 = ?

想法
從兩者的 input 就知道是在處理 signed/unsigned int 的右移問題
算術右移處理 signed 邏輯右移處理 unsigned
shift_right_arith
int w = sizeof(int) << P1
首先，根據題目已知 w 是 int 的 bit 數，為 sizeof(int) * 8 ，也就是左移 3
P1 = 3
再來是針對負數的操作，要先知道 signed int 的負數是原本的數去做二補數再 + 1 的，也就是我們期望的負數右移，在最左邊是要補 1 的，而不是預設的 0
這部份就是 mask 的作用了，就是把右移掉的 1 補回最左邊，要補的位數是最左邊數過來 k 位，所以
P2 = w - k
return xsrl P3 mask;
這段就是在問 0 跟 1 做什麼運算出來會是 1所以簡單得出
P3 = |

|X X X X X X X X| w bit
|_ _ _ _ _|X X X| 右移 k bit
|1 1 1 1 1 1 1 1| 把 -1 左移 w - k 個 bit 就可以得到 mask
|1 1 1 1 1|0 0 0| 跟右移後的結果做 or 就能把左邊 k 個 bit mask 掉

shift_right_logical
跟上一題一樣
P4 = 3
P5 = w - k
因為前面有強制轉行成 int 再右移，所以有可能變成負數，而導致右移的部份變成 1
而 P6,P7 為了使最左邊共 w - k 位補零，為了使 1 轉變成零，應該選擇 and 與 ~mask
P6 = &
P7 = ~mask

延伸題目: 在 x86_64 和 Aarch32/Aarch64 找到對應的指令，並說明其作用和限制

x86_64
- Arithmetic shift
  左移是 sal 代表 shift arithmetic left ，右移是 sar 代表 shift arithmetic right
- Logical shift
  左移是 shl 代表 shift logical left，右移是 shr 代表 shift logical right
- 語法
  根據不同的 syntax 會有不同的表達，AT&T Syntax 就是 GCC 所使用的組合語言
  - AT&T Syntax
    sal/sar/shl/shr src, dest
  - intel syntax
    sal/sar/shl/shr dest, src
- 兩者都是把 dest 算數左/右移 src 個 bit ，特性是最後一個被移走的 bit 會在 carry flag 當中，而算術位移還會依照原本的 sign bit 是多少，把空缺位都補上一樣的 0 或 1
Aarch32/Aarch64
- Arithmetic shift
  右移是 ASR
- Logical shift
  左移是 LSL ，右移是 LSR
- 注意這裡並沒有算術左移這個 function ，是因為可以用邏輯左移代替，因為兩者的動作在這裡是一樣的

參考資料

2018q1 Homework2 (作業區)
BroLeaf 共筆內容
TingL7 共筆內容
rex662624 共筆內容
PunchShadow 共筆內容
Floating point division and square root algorithms and implementation in the AMD-K7/sup TM/ microprocessor
√Square Roots
Arithmetic Shift Operations (63)
X86 Assembly/Shift
Intel and AT&T Syntax
arm-assembly-arithmetic-shift-logical-shift

2018q1 Homework (quiz3)

tags: bauuuu1021,BroLeaf

測驗 1

測驗 2

參考資料

tags: `bauuuu1021`,`BroLeaf`

測驗 `1`

測驗 `2`