2025-02-{18,20} 課堂問答簡記

# 2025-02-{18,20} 課堂問答簡記 :::success 編輯本頁，紀錄課堂問答內容並留下你的 GitHub 帳號 ::: ## jserv [連鋼琴師都用 Linux](https://wiwi.blog/blog/pianist-using-linux/) > [comment](https://www.facebook.com/JservFans/posts/pfbid0WkNzK3Ag183uHXoKkAwaJAfotRVMLaKcjpSUSU8fvMquYoe6mw24uG8mQyS4RCK1l): 人家是 NiceChord (好和弦)，我們是 NiceCode (好程式) ## devarajabc [課程簡介和注意須知](https://bit.ly/linux2025-intro)第 21 頁提及的程式碼: ```c #include <stdio.h> int main() { float sum = 0.0f; for (int i = 0; i < 10000; i++) sum += i + 1; printf("Sum: %f\n", sum); return 0; } ``` 其執行結果是 `Sum: 50002896.000000`，與 1 + 2 + 3 + ... + 10000 實數加總的預期結果 `50005000` 不同，該如何解釋？以 IEEE 754 標準，針對浮點數表示和對應的位元逐一說明。為什麼會這樣呢？ **因為並不是每個整數都能以單精度浮點數的型態表示！** 舉例來說，將單精度浮點數 `A` 做加法並輸出： ```c float A = 16777216.0; printf("%f \n",A); printf("%f \n",A+1); printf("%f \n",A+2); printf("%f \n",A+3); printf("%f \n",A+4); ``` 會得到錯誤的結果，`16777217` 和 `16777219` 消失了 ```shell 16777216.000000 16777216.000000 16777218.000000 16777220.000000 16777220.000000 ``` Why ? 讓我們來觀察浮點數轉換的過程： ![image](https://hackmd.io/_uploads/rJovVLGcye.png) 整數 $16777217$ 的二進位制表示式為 $1000000000000000000000001_2$(25 位元) $16777217 = 1.000000000000000000000001_2* 2^{24}$ 其中 $000000000000000000000001_2$(24 位元) 會被存入 mantissa field, 不過因為單精度浮點數的 mantissa field 只有 23 位元，所以實際存入的只有 $00000000000000000000000_2$(23 位元) 如此一來，$16777217$ 和 $16777216$ 以單精度浮點數型態來表示都會是 $1.0* 2^{24}$ 難道超過 $16777216$ 的奇數以單精度浮點數來表示就會自動減一嗎？錯！大錯特錯，觀察下面的例子：整數 $16777219$ 的二進位表示式為 $1000000000000000000000011_2$ (25 位元) ，最後一位元的 `1` 被捨棄了，以單精度浮點數來表示就是 $(1.0+2^{-23}) \times 2^{24}=16777218$ 怎麼會這樣呢？為什麼實際輸出的結果是 $16777220$ 呢？ (我不知道，四捨五入的規則是什麼呢？Rounding to even ?) 感謝 `Dennis40816` 的說明在規格書 4.3.1 中提到 `roundTiesToEven`： > The floating-point number nearest to the infinitely precise result shall be delivered; if the two nearest floating-point numbers bracketing an unrepresentable infinitely precise result are equally near, the one with an even least significant digit shall be delivered. 無法以浮點數型態表示的數字會以最接近其數值的數來表示，若「最接近的數字」有兩個，則會以 LSB 為 `0` 的數值來表示。注意，此處的 "least significant digit" 指的是 mantissa field 內的 least significant digit. 那問題來了，如何知道「最接近的數字」是多少呢？ ``` 16777216 -> 16777216 16777217 -> 16777216 (-1) 16777218 -> 16777218 16777219 -> 16777220 (+1) 16777220 -> 16777220 16777221 -> 16777220 (-1) 16777222 -> 16777222 16777223 -> 16777224 (+1) ``` 根據輸出的結果，我們可以觀察到： - 整數範圍 [$2^{24}$, $2^{25}-1$] 中只有偶數能以浮點數型態表示 - mantissa field 的 LSB 為 `0` 的數會出現三次 - 以浮點數型態表示的數值和實際數值的差異存在 [0, -1, 0, +1] 的循環接下來我們來討論 26 位元的整數以浮點數型態表示的形況: $33554433$ 以二進位表示為 $10000000000000000000000001_2$ 其 significand 有 25 位，末端兩位元無法表示。換句話說，只有四的倍數能以浮點數型態表示(末兩位為零) ，配合 `roundTiesToEven` ，可以得到下表： ```shell 33554432 -> 33554432 33554433 -> 33554432 (-1) 33554434 -> 33554432 (-2) 33554435 -> 33554436 (+1) 33554436 -> 33554436 33554437 -> 33554436 (-1) 33554438 -> 33554440 (+2) 33554439 -> 33554440 (+1) 33554440 -> 33554440 33554441 -> 33554440 (-1) 33554442 -> 33554440 (-2) 33554443 -> 33554444 (+1) 33554444 -> 33554444 33554445 -> 33554444 (-1) 33554446 -> 33554448 (+2) 33554447 -> 33554448 (+1) 33554448 -> 33554448 33554449 -> 33554448 (-1) 33554450 -> 33554448 (-2) 33554451 -> 33554452 (+1) 33554452 -> 33554452 33554453 -> 33554452 (-1) 33554454 -> 33554456 (+2) 33554455 -> 33554456 (+1) 33554456 -> 33554456 ``` 根據輸出的結果，我們可以觀察到： - 整數範圍 [$2^{25}$, $2^{26}-1$] 中只有四的倍數可以以浮點數形式表示 - 誤差存在循環的情況，即 [0, -1, -2, +1, 0, -1, +2, +1] 既然知道「誤差值」會循環出現，是否可以利用餘數來計算誤差值呢？ ```c int diff24[4] = {0, -1, 0, 1}; int diff25[8] = {0, -1, -2, 1, 0, -1, 2, 1}; ``` 以 `sum%4` 作為 `diff24` 的索引值但考量到 `sum` 本身可能就是被錯誤近似的值採用 `sum%4 + i%4` 作為索引值問題：如果 `sum` 無法表示呢？ ## wx200010 [課程簡介和注意須知](https://bit.ly/linux2025-intro)第 22 頁提及的程式碼: ```c int main() { float sum = 0.0f, corr = 0.0f; /* corrective value for rounding error */ for (int i = 0; i < 10000; i++) { float y = (i + 1) - corr; /* add the correction to specific item */ float t = sum + y; /* bits might be lost */ corr = (t - sum) - y; /* recover lost bits */ sum = t; } printf("Sum: %f\n", sum); return 0; } ``` ### Q1: 英語的 "correction" 和 "corrective" 是什麼意思？(查閱英英字典，得知其用法) <s> [correction(名詞)](https://dictionary.cambridge.org/zht/%E8%A9%9E%E5%85%B8/%E8%8B%B1%E8%AA%9E-%E6%BC%A2%E8%AA%9E-%E7%B9%81%E9%AB%94/correction)：改正；修改；更正。 [corrective(形容詞)](https://dictionary.cambridge.org/zht/%E8%A9%9E%E5%85%B8/%E8%8B%B1%E8%AA%9E-%E6%BC%A2%E8%AA%9E-%E7%B9%81%E9%AB%94/corrective)：改正的，糾正的。 </s> :::warning 漢語在描述概念的分野時，很難有精準的詞彙，因此你務必查閱 English-English dictionary。例如 instruction 和 command 是英語沒有任何共通詞源的單字，但因為漢語分別翻譯為「指令」及「命令」，結果導致一堆人根本無法區分二者，本例可看出現代漢語的侷限。 ::: 由於先前直接查閱中文翻譯實在很難看出這兩者的差異，我重新查閱了其他免費的[英英字典](https://www.oxfordlearnersdictionaries.com/)： * [correction(noun)](https://www.oxfordlearnersdictionaries.com/definition/english/correction_1?q=correction) 1. a change that makes something more accurate than it was before 2. the act or process of correcting something * [corrective(adjective)](https://www.oxfordlearnersdictionaries.com/definition/english/corrective_1?q=corrective) 1. designed to make something right that was wrong before 對於 correction 的英譯，著重於「使某事物比之前更準確的**一項更改**」或「一個能修正某事物的行為或過程」；而 corrective 的英譯，則是「旨在糾正先前錯誤的某事物」，能用於形容「某個用來糾正錯誤的東西」。為了更好地體會這兩者在語意上的不同，我們來回頭看看程式碼註解吧！在程式碼中，有兩行註解使用到這兩項單字，分別是： ```c float sum = 0.0f, corr = 0.0f; /* corrective value for rounding error */ ``` ```c float y = (i + 1) - corr; /* add the correction to specific item */ ``` 整份程式碼在執行過程中會修正浮點數運算因精度不足造成的 rounding error，變數`corr` 的作用便是計算出缺失值，並在後續的計算中將缺失值加回結果，來讓結果更準確。在第一段程式碼旁，原註解是 `/* corrective value for rounding error */`，意思是「用於處理rounding error的修正值」，主要是對變數`corr`的存在意義進行說明。若將原註解改為`/* correction for rounding error */`，意思就會變成「修正 rounding error」，然而這行程式碼只是在宣告變數，並未實際修正rounding error，因此語意上會不太恰當。而在第二段程式碼旁，原註解是`/* add the correction to specific item */`，意思是「對特定項目進行修正」，強調了這一行程式碼的主要目的。若將原註解改成`/* add the corrective value to specific item */`，意思會變成「將修正值加入特定項目」，雖然的確是這行程式碼所執行的操作，但這樣會變成強調「加入某個數值」的操作，而不是「對某個東西進行修正」的主要目的，所以重點會變不一樣。 ### Q2: [Kahan summation algorithm](https://en.wikipedia.org/wiki/Kahan_summation_algorithm) 的原理？限制是什麼？由於討論前需要先了解 rounding error 的知識，這邊先簡單說明 rounding error 為何會發生。首先，[單精度浮點數](https://en.wikipedia.org/wiki/Single-precision_floating-point_format)分成 sign (1 bit), exponent (8 bits), fraction (23 bits) 來表示，fraction (23 bits) 用於表示二進位的小數部分，而在小數部分前面會有個隱藏的 1 (即 `1.xxxxxxxxxx...`)。當時我透過這些概念做了份猜想：「一個單精度浮點數能夠儲存的精度可能是由fraction的23 bits+前面隱藏的1 bit掌管」。下面的程式範例則可以讓大家感受看看這份猜想： ```c #include <stdio.h> int main() { int nums[4] = {0x400001, 0x800001, 0x1000001, 0x8000007}; float f; for(int i = 0; i < 4; ++i){ f = nums[i]; printf("nums[%d] = %ld, f = %.2f\n", i, nums[i], f); } } /* output : nums[0] = 4194305, f = 4194305.00 nums[1] = 8388609, f = 8388609.00 nums[2] = 16777217, f = 16777216.00 nums[3] = 134217735, f = 134217728.00 */ ``` 我們發現，整數`0x1000001`與`0x8000007`在轉換成單精度浮點數會有誤差產生。若將這兩個整數改以二進位來觀察，會發現最高位為1的bit位置與最低位1的bit位置間隔皆超過了23個bits，因此若將這些整數轉換成單精度浮點數(float)，會因精度限制(24 bits)導致只能留住從最高位為1開始的24個bits，剩下的那些低位bits則會被丟棄。 ![image](https://hackmd.io/_uploads/S1CcpBEqyl.png) > 註：這張示意圖只是我目前對浮點數轉換上的理解，不確定是不是完全正確的。 > 關於捨去低位 bits 的部分，[這篇文章](https://gist.github.com/kaatinga/cecd8c26f544e270dd2008290818a20c#how-ieee-754-stores-floating-points) 提到 > Decimal places: The exact or closest value of the decimal places of the storage factor. > 另外我自己用 [FloatConverter](https://www.h-schmidt.net/FloatConverter/IEEE754.html) 玩也是一樣的結果（會看後面那段要被捨棄的 bits 第一位是 0 或 1 ，如果 0 就捨去 1 就進位，感覺就是把十進位的四捨五入改成二進位零捨一入xd），也因此 rounding error 變大邊小都有可能，以 1 加到 100000 為例，得出的結果會是 5000050176.000000 ，較正確值 5000050000 還大 [name=moonjam] 回頭來看[課程簡介和注意須知](https://bit.ly/linux2025-intro)第 22 頁的部分程式碼: ```c float sum = 0.0f, corr = 0.0f; /* corrective value for rounding error */ for (int i = 0; i < 10000; i++) { float y = (i + 1) - corr; /* add the correction to specific item */ float t = sum + y; /* bits might be lost */ corr = (t - sum) - y; /* recover lost bits */ sum = t; } ``` `y = (i + 1) - corr` 是這輪要加進結果的值 (第一次發生 rounding error 前 `corr` 皆是 0)，而`t = sum + y;`則是這輪運算後所得到的實際結果值。通常來說，`t`應與我們理想的結果值 `sum + y` 相同，然而在發生 rounding error 時，`t`就會與原先預期的結果值`sum + y`不同。因此透過`corr = (t - sum) - y`的運算，便能將這輪因 rounding error 導致的誤差計算出來，從而能夠校正下次的計算。為什麼 `corr = (t - sum) - y` 能夠計算出這輪的誤差值呢？ `y` 是這輪預期加進結果的值，而 `(t-sum)` 則是這輪實際加進結果的值，將兩者相減後，`corr`便得到這輪因 rounding error 而造成的誤差值了，便能在下次的計算中去校正結果值。要注意的是，若改成 `corr = t - (sum + y)`，會因為`(sum + y)`本身會觸發 rounding error 的關係，導致兩者相減後會恆為0而變得毫無意義，因此將 `t - (sum + y)`改為 `(t - sum) - y` 也是為了避免計算誤差值時又再次發生 rounding error 的窘境。 >就很像是`(L + R) / 2` 可以改寫為`L + (R - L) / 2`，來避免`(L + R)`發生overflow的問題 >數學上等價的算式，對有限的二進位來說可能會有不同的結果。目前的校正方式什麼時候會失敗呢？那便是使`y = (i + 1) - corr`發生rounding error的情況，y的缺失值就會直接消失，造成不準確。例如當`i`不低於`0x1000000`時，`(i + 1)`的部分就可能會開始發生rounding error了，很危險！在使用這種校正方法前，必須要考慮清楚或明確限制輸入值的範圍，否則會無濟於事。 >對[Kahan summation algorithm](https://en.wikipedia.org/wiki/Kahan_summation_algorithm)的閱讀還未完成，因此內容可能會再更新。 $\to$ [Mado](https://github.com/sysprog21/mado): 完全使用定點數避開浮點數運算的進階視窗系統 ## LinMarc1210 Q: 如果在 for 迴圈的最後一輪發生 `corr` 不為 0，那少加的值會不會沒被加進 `sum` > 會，實際測試從 1 加到 9997，使用投影片上的程式碼會算出 49975004.000000 而非正確答案 49975003，然而即便在迴圈外再做 `sum -= corr`，輸出的答案仍然會是 49975004.000000，因為 49975003 無法用 float 表示，應該只能分別輸出 sum 和 corr 或將兩者轉為 int 再相加才能解決吧？ [name=moonjam] ## devarajabc Q: Kahan summation algorithm 要如何處理那些「無法被浮點數型態表示」的數字?，例如 $50005001$ 修改投影片第 22 頁的程式： ```diff -u - float sum = 0.0f, corr = 0.0f; /* corrective value for rounding error */ + float sum = 1.0f, corr = 0.0f; /* corrective value for rounding error */ ``` 預期輸出應為 $50005001$，但實際輸出為 $50005000$ ## moonjam Q: [正版的 IEEE 754 規格書](https://standards.ieee.org/ieee/754/6210/) 好像要錢，有什麼免費且不侵害智慧財產取得的方式嗎？ > 用 "Draft" 來找，例如 [IEEE Standard for Floating-Point Arithmetic](https://www-users.cse.umn.edu/~vinals/tspot_files/phys4041/2020/IEEE%20Standard%20754-2019.pdf) ## halzq 如果想知道為何float32會失準，首先要了解float的十進位的轉換成二進位後的格式。 **首先知道FP32的bits分配方式** * sign: 1 bit * expo: 8 bits (使用 127 作為 bias) * mantissa: 23 bits ![image](https://hackmd.io/_uploads/rkjyoRA5gl.png) [圖片出處](https://www.exxactcorp.com/blog/hpc/what-is-fp64-fp32-fp16) **接著知道這個公式** $$value = (-1)^{sign}\ \times\ 1.mantissa\ \times 2^{exponent-bias}$$ 假設現在有個十進位數字10$$(10)_{10}$$ 轉換成二進位 $$(1010)_{2}$$ 改成類似科學記號的方式 $$1.01 \times\ (2)^{3}$$ 回想剛剛提到的公式 $$value = (-1)^{sign}\ \times\ 1.mantissa\ \times 2^{exponent-127}$$ 可知 $$ sign=0, \ expo=(130)_{10} =(10000010)_{2},\ mantissa=(01)_{2}$$ 放進32bits的排列: $$ 0\ 10000010\ 01000000000000000000000$$ 所以會知道 exponent很像科學記號中次方的角色後面的mantissa才是能決定數字的重點這邊解釋一下另一個重要的角色: **hidden 1** 回想一下，**數字要轉換成科學記號，開頭不就一定會有個1嗎！** 例如$$10100$$ 不會轉換成$$0.101 * 2^{5}$$ 肯定正確是$$1.01 * 2^4$$ 因此就可以透過這種方式讓float32多爭取1位數的precision -> 也就是hidden 1 因hidden 1 加上 mantissa 共24bits $$1111\ 1111\ 1111\ 1111\ 1111\ 1111$$ 所以 $$2^{23+1} - 1 =2^{24}-1= 16777215$$ 可以知道float32中小於等於 16777215 (24bits最大)的正整數一定能被正確計算至於這邊不是寫16777216(25bits)是==從bits的角度==分析的緣故。這就是造成1+2+3.....+10000 != 50005000 的原因。那反過頭來是從哪裡開始出錯的呢？假設想找到$$\frac{n(n+1)}{2} <= 16777216$$ $$n(n+1)<={16777216\times2}$$ $$n \approx \sqrt{16777216\times2} = 5792.61875148$$ 那假設n = 5792 則$$\frac{5792 \times 5793}{2} = 16776528$$ ![Screenshot 2025-07-06 at 23-09-54 Compiler Explorer](https://hackmd.io/_uploads/HJToMzuBgg.png) 驗算n=5793 則$$\frac{5793 \times 5794}{2} = 16782321$$ ![Screenshot 2025-07-06 at 23-14-14 Compiler Explorer](https://hackmd.io/_uploads/rkOkmMurgg.png) 可以看到n=5793時，圖片中顯示16782320確實不符合理論上應該要是16782321！所以如果真的要用float做迴圈運算，這邊最大應該只能加到5792，之後就會開始產生失準了。

Syntax	Example	Reference
# Header	Header	基本排版
- Unordered List	Unordered List
1. Ordered List	Ordered List
- [ ] Todo List	Todo List
> Blockquote	Blockquote
Bold font	Bold font
Italics font	Italics font
~~Strikethrough~~	~~Strikethrough~~
19^th^	19^th
H~2~O	H₂O
++Inserted text++	Inserted text
==Marked text==	Marked text
[link text](https:// "title")	Link
![image alt](https:// "title")	Image
`Code`	`Code`	在筆記中貼入程式碼
```javascript var i = 0; ```	`var i = 0;`
:smile:		Emoji list
{%youtube youtube_id %}	Externals
$L^aT_eX$	L^aT_eX
:::info This is a alert area. :::	This is a alert area.