2016q3 Homework1 (clz)

# 2016q3 Homework1 (clz) contributed by <`workfunction`> :::info 我的github專案庫：[https://github.com/workfunction/clz-tests](https://github.com/workfunction/clz-tests) ::: ## 修改recursive version & Harley’s algorithm ### Harley’s algorithm 老師給的code無法正常運行，所以參考[這篇](http://www.hackersdelight.org/hdcodetxt/nlz.c.txt)的Harley's algorithm with multiply expanded方法，比對後發現是table的值錯了 :::warning **原版table** ```clike= static prog_uint8_t const Table[] = { 0xFF, 0, 0xFF, 15, 0xFF, 1, 28, 0xFF, 16, 0xFF, 0xFF, 0xFF, 2, 21, 29, 0xFF, 0xFF, 0xFF, 19, 17, 10, 0xFF, 12, 0xFF, 0xFF, 3, 0xFF, 6, 0xFF, 22, 30, 0xFF, 14, 0xFF, 27, 0xFF, 0xFF, 0xFF, 20, 0xFF, 18, 9, 11, 0xFF, 5, 0xFF, 0xFF, 13, 26, 0xFF, 0xFF, 8, 0xFF, 4, 0xFF, 25, 0xFF, 7, 24, 0xFF, 23, 0xFF, 31, 0xFF, }; ``` > 這個table是用來判斷CTZ(最後面有幾個0) ::: 參考後的正確版本 :::success **正確table** ```clike= const char table[64] = {32,31, u,16, u,30, 3, u, 15, u, u, u,29,10, 2, u, u, u,12,14,21, u,19, u, u,28, u,25, u, 9, 1, u, 17, u, 4, u, u, u,11, u, 13,22,20, u,26, u, u,18, 5, u, u,23, u,27, u, 6, u,24, 7, u, 8, u, 0, u}; ``` > u在次table中無用，可另外給定任意數字 ::: ### Recursive版本原版本並不是一個完整的實作，缺少遞迴終止條件與位移量錯誤 :::danger **原版** ```clike= uint8_t clz(uint32_t x) { /* shift upper half down, rest is filled up with 0s */ uint16_t upper = (x >> 16); // mask upper half away uint16_t lower = (x & 0xFFFF); return upper ? clz(upper) : 16 + clz(lower); } ``` 這裡面16應該要隨著進入不同層而變動 16 -> 8 -> 4 -> 2 -> 1 mask的值也要跟著位移 ::: 新增兩個全域變數==count==, ==mask==來紀錄現在到第幾層，並加上終止條件與==count==位移 :::success **修正版本** ```clike= uint32_t count = 16; uint32_t mask = 0xFFFFFFFF; int clz(uint32_t x) { int result; // shift upper half down, rest is filled up with 0s uint16_t upper = (x >> count); // mask upper half away mask >>= count; uint16_t lower = (x & mask); // stopping condition if(count == 1) { return !(x >> 1); } // shifting count and go into recursive count >>= 1; result = upper ? clz(upper) : (count << 1) + clz(lower); count <<= 1; return result; } ``` > 之後會試著簡化最後條件判斷式 ::: ## 驗證與測試 :::info **題外話** - 讓printf可以印出顏色！加上這些macro ```clike= #define NONE "\033[m" #define RED "\033[0;32;31m" #define LIGHT_RED "\033[1;31m" #define GREEN "\033[0;32;32m" #define LIGHT_GREEN "\033[1;32m" #define BLUE "\033[0;32;34m" #define LIGHT_BLUE "\033[1;34m" #define DARY_GRAY "\033[1;30m" #define CYAN "\033[0;36m" #define LIGHT_CYAN "\033[1;36m" #define PURPLE "\033[0;35m" #define LIGHT_PURPLE "\033[1;35m" #define BROWN "\033[0;33m" #define YELLOW "\033[1;33m" #define LIGHT_GRAY "\033[0;37m" #define WHITE "\033[1;37m" ``` 並在printf字串之前加上就可以換顏色了可以參考下方程式碼 ::: > 跟 ptt 可以加顏色一樣XD [name=louielu] ### 驗證 ```clike= int check(uint32_t x, int n) { return ((!x) && n >> 5) || (!(x >> (31 - n) >> 1) && (x >> (31 - n))); } int main(){ uint32_t i = 0; #pragma omp parallel default (shared) private(i) num_threads(omp_get_max_threads()) #pragma omp for schedule(static) for (i = 0 ; i <= 0xFFFFFFFE; i++) { !(check(i, clz(i))) && printf ( LIGHT_RED "Error in %u count %d\n" NONE, i, clz(i)); if((i & (i - 1)) == 0) printf( DARY_GRAY "Check point %u\n" NONE, i); } printf ( LIGHT_GREEN "Check done!\n" NONE ); } ``` 使用openmp平行驗證0~0xFFFFFFFF之間所有數字 ### 測試 :::info **Hardware Info** - Linux Mint 18 - Linux kernel 4.4.0 - Intel core i7-3960x ::: ![result](https://i.imgur.com/JW8hiTK.png) 將結果\*10^6輸出的數字 ```clike= 0 0.838000 1 1.397000 2 0.838000 3 1.118000 4 1.397000 5 1.117000 6 1.117000 7 1.396000 8 0.838000 9 1.118000 10 0.838000 11 1.117000 12 1.117000 13 0.838000 14 0.838000 15 1.117000 16 1.118000 17 1.117000 18 0.838000 ``` 時間結果全部都只在幾個特定的值內！一開始以為可能是計時部份的程式碼有問題，後來找了同學的程式碼測試 ![](https://i.imgur.com/t6rKdyq.png) > c.c [ponsheng](https://hackmd.io/s/rJmy9z_a) 使用不同電腦跑出來的結果卻相當漂亮... :::info **Hardware Info** - Linux Mint 17 - Linux kernel 3.13.0-24 - Intel core i7-4770 ::: ![](https://i.imgur.com/i7DuPkR.png) > 相同的Code在不同電腦有不同結果，初步判斷是硬體計時器精度不足 > 需要設計更多實驗驗證 ### RDTSC 量測 :::warning TODO:[Linux time measurement](http://stackoverflow.com/questions/12392278/measure-time-in-linux-time-vs-clock-vs-getrusage-vs-clock-gettime-vs-gettimeof) :::