2024q1 Homework5 (assessment)

contributed by < rain20010126 >

閱讀〈因為自動飲料機而延畢的那一年〉的啟發

資工系的學生不會寫程式，機械系的學生不會做機械

你該學習的不是看到事情要完蛋了就去避免失敗，而是應該學習如何處理與承受失敗，你才能變得比以前更強大。

要是現在就放棄的話，你這輩子日後遇到這種等級的困難，就只會想逃避而已

人不付出犧牲，就得不到任何回報。如果要得到什麼，就必須付出同等的代價，這就是鍊金術的基本原則，等價交換。當時我們深信著，這就是這世界的真理。–––《鋼之鍊金術師》

此時的我就如同作者口中的機械系的學生相同，除了沒辦法設計出一個完整的機構外，文章中的愷宏所需要解決的問題我能解決出來的真的屈指可數，如果我處在他們的團隊中，很不幸的沒有什麼是我能幫上忙的，這是我應該檢討的地方。

在作業中常常令我挫折，在理解每週的考題與教材都需要花費我大量的時間，而作業中的延伸問題需要自己動手寫出程式碼，常常遇到些 error 就會想跳過這個問題，或者在看完題目要求後只會默默地對自己說：這問題太難了，我一定做不出來就直接放棄。我需要更誠實面對自己，沒有人一開始就會，需要一步一腳印慢慢加強自己，如同作者慢慢解決他在實作所遇到的困難一樣，試著解決問題，克服逃避問題的心態。

參照 2023 年期末專題，簡述你想投入的專案 (亦可建立新專案)，至少選出 (或訂出) 二個。

由於是機械背景，相關的先備知識不夠充足，因此我會先把閱讀課程教材以及補齊欠的作業補齊作為主要努力的方向，以下是我對期末專案的想法為以下

依照有興趣作為排序

補齊課程教材以及~~完成未完成的作業~~
並行程式設計
紅黑樹(?)實作改進

前六周學習狀況

homework1: make test 完成95/100，完成 q_shuffle
homework2: 理解 quiz1 和 quiz2 的程式碼
homework4: 理解 quiz3 程式碼

最後包裝成題目的程式碼如以下

#include <stdint.h>
void divmod_10(uint32_t in, uint32_t *div, uint32_t *mod)
{
    uint32_t x = (in | 1) - (in >> 2); /* div = in/10 ==> div = 0.75*in/8 */
    uint32_t q = (x >> 4) + x;
    x = q;
    q = (q >> 8) + x;
    q = (q >> 8) + x;
    q = (q >> 8) + x;
    q = (q >> 8) + x;

    *div = (q >> 3);
    *mod = in - ((q & ~0x7) + (*div << DDDD));   
}

首先 uint32_t x = (in | 1) - (in >> 2) 部分，當 in 為偶數時會需要做 +1 ，

\begin{array}{r} x = \frac{3 (i n + 1)}{4} \end{array}

當 in 為奇數時，

\begin{array}{r} x = \frac{3 i n}{4} \end{array}

接著 uint32_t q = (x >> 4) + x;，當 in 為偶數時，

\begin{array}{r} q = \frac{3 (i n + 1)}{4} + \frac{3 (i n + 1)}{64} = \frac{51 (i n + 1)}{64} \end{array}

當 in 為奇數時，

\begin{array}{r} q = \frac{3 i n}{4} + \frac{3 i n}{64} = \frac{51 i n}{64} \end{array}

此時若是

q < 128

，則 q = (q >> 8) + x; 可以等價為 q = x，也就是 q 不變，假設 in 為奇數，(q >> 3) 會等價於

\frac{51 \times i n}{64} \times \frac{1}{8}

即可取得商數

這邊我有發現若是將 in 設定為偶數 (120) 代入，若是沒有經過 +1 處理的話算出來的商數會錯誤，但不理解詳細原因為何

若是

q > 128

則會透過 q = (q >> 8) + x; ，假設 in 為奇數，可以將

\frac{51 i n}{64} \approx 0.797 i n

逼近至

0.8 i n

最後取餘數程式碼 *mod = in - ((q & ~0x7) + (*div << 1)); 即為 *div * 10 ，其中 q & ~0x7 目的是保留最後三個 bits 以外的位元，等同於 *div << 3

TODO: 改善上方程式碼

IEEE 754, Assume float is 32-bit

// Return x * 10
float float_mul10(float x)
{
    // using bitwise operation; No mul/div

    unsigned uf = *(unsigned *)&x;
    unsigned uf_sgn = (uf >> 31) & 0x01;
    unsigned uf_exp = (uf >> 23) & 0xff;
    
    if(uf_exp == 0){
        unsigned uf_mul2 = (uf << 1) + (uf_sgn << 31);
        unsigned uf_mul8 = (uf << 3) + (uf_sgn << 31);
        float ans = *(float*) &uf_mul8 + *(float*) &uf_mul2;
        return ans;
    } else if(uf_exp != 0xff){
        unsigned uf_exp_2 = uf_exp + 1;
        unsigned uf_exp_8 = uf_exp + 3;
        uf &= 0x807fffff;
        unsigned uf_mul8 = uf + (uf_exp_8 << 23);
        unsigned uf_mul2 = uf + (uf_exp_2 << 23);

        float ans = *(float*) &uf_mul8 + *(float*) &uf_mul2;
        printf("result: %f", ans);
        return ans;
    }
    printf("%f", x);
    return x;
}

誠實面對自己
TODO: 撰寫程式碼、測試，附上分析

實作浮點數乘法

首先由 IEEE 754 可知 32-bit 浮點數表示法如下圖

Image Not Showing Possible Reasons

The image was uploaded to a note which you don't have access to
The note which the image was originally uploaded to has been deleted

Learn More →

其中 sign 記錄正負號， exponent 用來計算指數的偏移量，計算方式為

2^{e - 127}

，其中

127

為固定偏移值，fraction 部分為計算數字的實際值，計算公式為

\sum_{i = 1}^{23} m_{23 - i} 2^{- i}

，由上述可知，32-bit 浮點數的表示方式為

(- 1)^{Sign Bit} \times (1 + Fraction) \times 2^{Exponent - 127}

實作以上程式碼之前，我初步的想法是將輸入 x 分別乘以

2^{3}

和

2^{1}

做相加，也就是 exponent 分別加上

3

和

2

，因此直接使用較值觀的方法，直接在 exponent 加上對應到

\times 2

和

\times 8

的數值

接著參考浮點數運算和定點數操作, 你所不知道的 C 語言: 浮點數運算了解到除了上述表示方法（Normalized）,還需要考慮到 NaN ,INF 以及 Denormalize 的表示方式如下圖

Denormalize 的表示方式是用於處理極小的數值，此表示方法允許浮點數系統在接近零的範圍內平滑地過渡，避免超出 Normalized 表示方法時能表達的最小數值，使數值直接跳到 0，以此來減少計算中的誤差和 underflow 問題。

Normalized numbers 能表示的最小正數值大約是
$1.17549435 \times 10^{38}$
Denormalized numbers 能表示的最小正數值大約是
$1.4 \times 10^{45}$

因此程式碼部分需要分成三種情況

Normalize: 將浮點數表示方法中的指數部分加 1 (2倍)與指數部分加 3 (8倍) 的結果相加
Denormalize: 可以直接將原數左移一單位得到 2 倍的結果，fraction 部分若是 overflow 會自動變成 exp 的 1，之後再將 sign bit 放回即可。接著將原數左移 3 位即可得到 8 倍的結果，最後將兩結果做相加
遇到 NaN 或 INF 直接回傳

以下為完整程式碼

float float_mul10(float x)
{
    // using bitwise operation; No mul/div

    unsigned uf = *(unsigned *)&x;
    unsigned uf_sgn = (uf >> 31) & 0x01;
    unsigned uf_exp = (uf >> 23) & 0xff;
    
    if(uf_exp == 0){
        unsigned uf_mul2 = (uf << 1) + (uf_sgn << 31);
        unsigned uf_mul8 = (uf << 3) + (uf_sgn << 31);
        float ans = *(float*) &uf_mul8 + *(float*) &uf_mul2;
        return ans;
    } else if(uf_exp != 0xff){
        unsigned uf_exp_2 = uf_exp + 1;
        unsigned uf_exp_8 = uf_exp + 3;
        uf &= 0x807fffff;
        unsigned uf_mul8 = uf + (uf_exp_8 << 23);
        unsigned uf_mul2 = uf + (uf_exp_2 << 23);

        float ans = *(float*) &uf_mul8 + *(float*) &uf_mul2;
        printf("result: %f", ans);
        return ans;
    }
    printf("%f", x);
    return x;
}

測試部分參考 Youri 的作法，設立 10000 筆資料，將自己實作的 float_mul10 和內建的浮點數乘法使用 perf stat 進行比較，並將浮點數的三種不同表示方法分開進行比較

Normalize 測試程式碼

int main()
{
    srand(42);
    float x[NUM_FLOATS];
    float result;
    for (int i = 0; i < NUM_FLOATS; i++) {
        x[i] = generate_random_float(LOWER_BOUND, UPPER_BOUND);
        result = float_mul10(x[i]);
    }
    // for (int i = 0; i < NUM_FLOATS; i++) {
    //     x[i] = generate_random_float(LOWER_BOUND, UPPER_BOUND);
    //     result = x[i] * 10;
    // }
    return 0;
}

float_mul10 結果:

 Performance counter stats for './my_program' (10 runs):

   <not supported>      cache-misses                                                          
   <not supported>      cache-references                                                      
          19186464      instructions                     #    4.10  insn per cycle              ( +-  0.03% )
           4684814      cycles                                                                  ( +-  0.20% )

          0.002462 +- 0.000576 seconds time elapsed  ( +- 23.41% )

內建浮點數乘法的結果:

 Performance counter stats for './my_program' (10 runs):

   <not supported>      cache-misses                                                          
   <not supported>      cache-references                                                      
          13076338      instructions                     #    4.11  insn per cycle              ( +-  0.02% )
           3180906      cycles                                                                  ( +-  0.32% )

          0.001924 +- 0.000400 seconds time elapsed  ( +- 20.81% )

Denormalize 測試程式碼

int main()
{   
    float lower_bound = 1.4012e-45f;
    float upper_bound = 1.1756e-38f;
    srand(42);
    float x[NUM_FLOATS];
    float result;
    for (int i = 0; i < NUM_FLOATS; i++) {
        x[i] = generate_random_float(lower_bound, upper_bound);
        result = float_mul10(x[i]);
    }
    // for (int i = 0; i < NUM_FLOATS; i++) {
    //     x[i] = generate_random_float(LOWER_BOUND, UPPER_BOUND);
    //     result = x[i] * 10;
    // }
    return 0;
}

float_mul10 結果:

 Performance counter stats for './my_program' (10 runs):

   <not supported>      cache-misses                                                          
   <not supported>      cache-references                                                      
          18088039      instructions                     #    4.51  insn per cycle              ( +-  0.02% )
           4008018      cycles                                                                  ( +-  0.35% )

          0.002439 +- 0.000345 seconds time elapsed  ( +- 14.16% )

內建浮點數乘法的結果:

 Performance counter stats for './my_program' (10 runs):

   <not supported>      cache-misses                                                          
   <not supported>      cache-references                                                      
     <not counted>      instructions                                                            ( +- 40.82% )  (0.00%)
     <not counted>      cycles                                                                  ( +- 40.83% )  (0.00%)

          0.003401 +- 0.000373 seconds time elapsed  ( +- 10.97% )

NaN 測試程式碼

int main()
{
    srand(42);
    float x[NUM_FLOATS];
    float result;
    unsigned int a = 0x7F800001;
    for (int i = 0; i < NUM_FLOATS; i++) {
        x[i] = *(float*) &a;
        result = float_mul10(x[i]);
    }
    // for (int i = 0; i < NUM_FLOATS; i++) {
    //     x[i] = *(float*) &a;
    //     result = x[i] * 10;
    // }
    printf("%f\n", result);
    return 0;
}

float_mul10 結果:

   <not supported>      cache-references                                                      
           5150722      instructions                     #    4.68  insn per cycle              ( +- 16.67% )
           1101704      cycles                                                                  ( +- 16.71% )

          0.001399 +- 0.000406 seconds time elapsed  ( +- 29.01% )

浮點數乘法的結果:

 Performance counter stats for './my_program' (10 runs):

   <not supported>      cache-misses                                                          
   <not supported>      cache-references                                                      
           2599171      instructions                     #    2.73  insn per cycle              ( +- 11.11% )
            950576      cycles                                                                  ( +- 11.20% )

          0.001202 +- 0.000196 seconds time elapsed  ( +- 16.31% )

TODO: 找同學同組並確認
TODO: 第 4 周測驗 3 的 XTree ，將其與紅黑樹、 AVL 樹進行比較: 量化分析 (演算法分析: tree height, 數學！)，提出 XTree 的改進方案

XTree 改進與和紅黑樹及 AVL Tree 比較結果

2024q1 Homework5 (assessment)

閱讀〈因為自動飲料機而延畢的那一年〉的啟發

參照 2023 年期末專題，簡述你想投入的專案 (亦可建立新專案)，至少選出 (或訂出) 二個。

前六周學習狀況

實作浮點數乘法

Read more

2024q1 Homework4 (quiz 3 + 4)

linux2024-homework1

2024q1 Homework2 (quiz1+2)