2019q3 Homework1 (review)

# 2019q3 Homework1 (review) contributed by < `nckuoverload` > ###### tags: `sysprog2019` ## 作業要求 * 從 [課程簡介和注意須知](https://docs.google.com/presentation/d/1Wo22s5aYuuyry97-z2MMmUmdxhjD0wKTdhxIW5oqjvo/edit?usp=sharing), [第 1 週測驗題](https://hackmd.io/@sysprog/HksKVpUIr) 選出你感興趣的 3 個題組進行作答，並且回答附帶的「延伸問題」 * 應當包含 [課程簡介和注意須知](https://docs.google.com/presentation/d/1Wo22s5aYuuyry97-z2MMmUmdxhjD0wKTdhxIW5oqjvo/edit?usp=sharing) 的 Page 9 到 Page 16 裡頭的 C 語言程式設計議題 * 比照 [課前測驗參考解答: Q1](https://hackmd.io/s/ByzoiggIb) 和 [對 Linked List 進行氣泡排序](https://hackmd.io/s/BklGm1MhZ) 的模式來撰寫共筆，需要詳細分析自己的想法、參閱的材料 (包含 C 語言規格書的章節)，以及==進行相關實驗== * 閱讀 [第一週課程進度](http://wiki.csie.ncku.edu.tw/sysprog/schedule) 所有材料，記錄所見所聞並提問，若有不能理解的部分，請標註出來。善用 HackMD 的語法 `:::info` 和 `:::` 標註你的提問 :::info 像是這樣標註提問 ::: :::danger 注意共筆書寫的規範： 1. 中英文之間應以空白字元區隔; 2. 偏好的程式碼縮排是 K&R 風格和 4 spaces; :notes: jserv ::: ## 改善精準度根據wiki和等頁面資料，可以快速的理解IEEE-754和浮點數運算會產生的問題。在一些空間受限的情況中，無法改變、擴充資料結構，因此這時候並須以有限的資源方式去改善精準度問題。另外在[浮點數的美麗與哀愁](https://champyen.blogspot.com/2017/04/blog-post.html) 這篇文章中，也有討論到這個議題，內文分別對IEEE-754 Format本身、CPU 間的問題、GPU等去討論。 ### 解決方法解決方法的部份，可以參考自[Estimating Errors In Summing Arrays Of Floating Point Numbers](https://www.reidatcheson.com/statistics/floating%20point/error/2018/01/15/float-sum-errors.html) 中，他分別都提到 Kahan summation, Pairwise summation 演算法，以及他對最壞偏差情況做的實驗。兩個演算法可分別參考wiki 。 ### Kahan summation algorithm 簡單來說，在每次加減完後，對該次的運算偏差作修正。 ```cpp #include <stdio.h> int main() { float sum = 0.0f, corr = 0.0f; /* corrective value for rounding error */ for (int i = 0; i < 10000; i++) { float y = (i + 1) - corr; /* add the correction to specific item */ float t = sum + y; /* bits might be lost */ corr = (t - sum) - y; /* recover lost bits */ sum = t; } printf("Sum: %f\n", sum); // output: 50005000.000000 return 0; } ``` corr變數用來當作每次運算完之後的偏差量，在一開始的運算，浮點數偏差可能為0。但經過多次運算後，逐漸會產生偏差，這時候使用corr來儲存偏差，並且在下一次的運算中彌補。偏差的計算可以參考下列程式碼: ```cpp #include <stdio.h> int main(){ float a = 123456.0; float b = 3.141; printf("the sum is: %f",a+b); printf("the corr is: %f",a+b-a-b); } ``` 運算結果為 ```shell the sum is: 123459.140625 the corr is: -0.000375 ``` ### Pairwise summation 簡單來說，在一定的運算次數N 內，並不會產生偏差，因此運用這種特性，將一連串的浮點數總和運算拆分成多個運算。例如題目中要相加10000次，但一定會出現誤差，因此只要再寫一個`add()`去做遞迴運算，把10000次分成多個小的組數，並且分別將他們計算的結果做相加就可以得到最後沒有誤差的結果，但這種方式需要較多記憶體以及不適合使用在第一次相加就會造成誤差的情況中。在這篇[浮點數float累加誤差解決方式總結]()文章中，第四個方法也算是一種pairwise 的表現，但我認為可以搭配另外的函式去對數值做遞迴。 Pairwise summation： ```cpp float f = 0.1; float sum = 0; for( i=0; i<100; i++) { int sumEachBig=0; for(....k<400....) { int sumEachSmall=0; for(....j<100.....) sumEachSmall += f; sumEachBig+=sumEachSmall; } sum += sumEachBig; } ``` :::danger 缺乏誤差分析 :notes: jserv ::: ### 參考資料 [1] [浮點數的美麗與哀愁](https://champyen.blogspot.com/search/label/linux) [2] [Estimating Errors In Summing Arrays Of Floating Point Numbers](https://www.reidatcheson.com/statistics/floating%20point/error/2018/01/15/float-sum-errors.html) --- ## 測驗 `1` 考慮以下 C 程式的 `align4` 巨集的作用是，針對 4-byte alignment，找到給定地址的 round up alignment address。 ```cpp #include <stdio.h> #define align4(x) (((x) + K) & (-4)) int main(void) { int p = 0x1997; printf("align4(p) is %08x\n", align4(p)); return 0; } ``` 預期程式輸出 `align4(p) is 00001998` ==作答區== `K` = ? ### 作答想法簡單來說，輸入的數字x 必須無條件進位，以4 byte 為一個進位的標準。主要是為了做對齊，因為一次擷取為4 byte ，如果一個int 分別位在兩個區段，這樣要擷取兩次，整體效能會較差，因此需要作對齊。詳細的解說可以參考老師其他的課程內容如[你所不知道的C語言：記憶體管理、對齊及硬體特性](/@sysprog/BkuMDQ9K7)，其中在data alignment 部份有提到完整的應用和解釋。解題的部分，主要就是讓輸入的參數x 將該區段使用完畢，並且跳至下一區段，因此當x=0x1997時，K可以為1~3；但當x=0x1995時，K只能為3。取的是4-byte ，答案為`3`。 ### 延伸題目: 在 Linux 核心原始程式碼找出類似的 alignment 巨集並解釋其作用 - 使用關鍵字: linux kernel alignment code [bootlin](https://elixir.bootlin.com/linux/latest/ident/ALIGN) 這個網站可以透過關鍵字搜尋Linux Kernel Source Code ，因此針對ALIGN直接做搜尋，會搜尋出多個有使用 ALIGN Macro 的程式，這邊以`include/linux/kernel.h` 為例。 [`include/linux/kernel.h`:](https://elixir.bootlin.com/linux/latest/source/include/linux/kernel.h#L33) ```C=33 #define ALIGN(x, a) __ALIGN_KERNEL((x), (a)) #define ALIGN_DOWN(x, a) __ALIGN_KERNEL((x) - ((a) - 1), (a)) #define __ALIGN_MASK(x, mask) __ALIGN_KERNEL_MASK((x), (mask)) #define PTR_ALIGN(p, a) ((typeof(p))ALIGN((unsigned long)(p), (a))) #define IS_ALIGNED(x, a) (((x) & ((typeof(x))(a) - 1)) == 0) ``` 可以搭配這個[參考資料](http://charette.no-ip.com:81/programming/doxygen/netfilter/iptables_2include_2linux_2kernel_8h.html) ，搜尋__ALIGN_KERNEL_MASK 可以透過這個網站的解析該Macro 的內容。 ```C= #define __ALIGN_KERNEL(x,a) __ALIGN_KERNEL_MASK(x, (typedef(x))(a)-1) #define __ALIGN_KERNEL_MASK(x, mask) (((x)+(mask)) & ~(mask)) ``` 這個部分和測驗中的程式碼相似，測驗中，K 的數字為3，主要是為了以-4~(10)~對齊。如果將測驗的程式碼以kernel.h 中的程式碼呈現，則x為任意int 的變數，a 為-3 (signed int)，測驗的主體部分則是 __ALIGN_KERNEL_MASK()。 ### 參考資料： - [1] [bootlin `include\linux\kernel.h`](https://elixir.bootlin.com/linux/latest/source/include/linux/kernel.h#L33) 主要參考的source code。 - [2] [linux 内核 AGLIN 巨集](http://blog.sina.com.cn/s/blog_1382bb9dd0102wcqw.html) 有簡單的講解ALIGN 巨集的功用，另外還有上下界對齊如 `alignment_down(a, size)` 和`alignment_up(a, size)` 。 - [3] [你所不知道的C語言：記憶體管理、對齊及硬體特性](https://hackmd.io/@sysprog/BkuMDQ9K7?type=view#data-alignment) 有完整的關於對齊的解說。 - [4] [kernel.h File Reference ](http://charette.no-ip.com:81/programming/doxygen/netfilter/iptables_2include_2linux_2kernel_8h.html) 比較完整的定義ALIGN Macro 。 - [5] [align macro kernel](https://stackoverflow.com/questions/13122846/align-macro-kernel) Stackoverflow 上其中一個關於ALIGN 對齊的問答，可以和[2] 擇一選看。 --- ## 測驗 `3` 考慮以下 C 程式碼: ```cpp #include <stdbool.h> bool func(unsigned int x) { return x && ((x & (~x + 1)) == x); } ``` 可改寫為以下等價的程式碼: ```cpp bool func(unsigned int x) { unsigned int unknown = 0; while (x && unknown <= 1) { if ((x & Z) == 1) unknown++; x >>= 1; } return (unknown == 1); } ``` 請補完程式。 ==作答區== `Z` = ? ### 作答想法首先是第一個bitwise 程式的部份，`~x+1` 可以視為x 的補數，因此要思考x 和自己的補數作 AND運算之後還會等於自己的可能有哪些，答案只有 0~3 因此直接將答案帶進去可以發現只有當答案等於0, 1, 4時成立。但當答案等於0時會return False，其餘則是True。再來是把三個數字分別帶入第二個程式碼的部份，第二個程式碼中，while 中主要在用來計算`unsigned int x` 轉成二進位後有幾個1，當超過2個會中止該loop，return 又只return 只有1個1的情況，因此可以判別這個函式主要用來處理輸入的x 是否只有1個1。所以`Z`可以簡單推論為1，用來計算最右邊的位元是否為1。 ### 延伸題目延伸題目內容提到需要找出類似const-time的程式。目前有找到下方兩個參考資料有類似的方法， [A Faster Approach to Count Set Bits in a 32-bit Integer](https://codingforspeed.com/a-faster-approach-to-count-set-bits-in-a-32-bit-integer/) : 主要用來計算有幾個1，和題目的應用有些相似。 - non const-time computing ```cpp inline int bitCount1(int i) { int count = 0; for (int j = 0; j < 32; j ++) { if (i < 0) count ++; i <<= 1; } return count; } ``` - const-time computing ```cpp inline int bitCount2(int i) { i = i - ((i >> 1) & 0x55555555); i = (i & 0x33333333) + ((i >> 2) & 0x33333333); i = (i + (i >> 4)) & 0x0f0f0f0f; i = i + (i >> 8); i = i + (i >> 16); return i & 0x3f; } ``` [Bitwise can avoid loops](https://blogs.sap.com/2012/04/09/bitwise-can-even-avoid-loops/) : 主要判斷，該數字是否為2的倍數。如果是則return True，如果否則 False。 - non const-time computing ```cpp int main() { int a = 256, rem,flag = 0; while (a > 1) { c = a % 2; if (c == 1) { flag = 1; break; } a = a / 2; } if (flag == 1) printf("Not Power of 2"); else printf("Power of 2"); return 0; } ``` - const-time computing ```cpp int main() { int a=256; if((a&(a-1)==0) printf("Power of 2"); else printf("Not Power of 2"); return 0; } ``` --- ##