Try   HackMD

2016q3 Homework1 (clz)

contributed by <workfunction>

修改recursive version & Harley’s algorithm

Harley’s algorithm

老師給的code無法正常運行,所以參考這篇的Harley's algorithm with multiply expanded方法,比對後發現是table的值錯了

原版table

static prog_uint8_t const Table[] = { 0xFF, 0, 0xFF, 15, 0xFF, 1, 28, 0xFF, 16, 0xFF, 0xFF, 0xFF, 2, 21, 29, 0xFF, 0xFF, 0xFF, 19, 17, 10, 0xFF, 12, 0xFF, 0xFF, 3, 0xFF, 6, 0xFF, 22, 30, 0xFF, 14, 0xFF, 27, 0xFF, 0xFF, 0xFF, 20, 0xFF, 18, 9, 11, 0xFF, 5, 0xFF, 0xFF, 13, 26, 0xFF, 0xFF, 8, 0xFF, 4, 0xFF, 25, 0xFF, 7, 24, 0xFF, 23, 0xFF, 31, 0xFF, };

這個table是用來判斷CTZ(最後面有幾個0)

參考後的正確版本

正確table

const char table[64] = {32,31, u,16, u,30, 3, u, 15, u, u, u,29,10, 2, u, u, u,12,14,21, u,19, u, u,28, u,25, u, 9, 1, u, 17, u, 4, u, u, u,11, u, 13,22,20, u,26, u, u,18, 5, u, u,23, u,27, u, 6, u,24, 7, u, 8, u, 0, u};

u在次table中無用,可另外給定任意數字

Recursive版本

原版本並不是一個完整的實作,缺少遞迴終止條件與位移量錯誤

原版

uint8_t clz(uint32_t x) { /* shift upper half down, rest is filled up with 0s */ uint16_t upper = (x >> 16); // mask upper half away uint16_t lower = (x & 0xFFFF); return upper ? clz(upper) : 16 + clz(lower); }

這裡面16應該要隨著進入不同層而變動 16 -> 8 -> 4 -> 2 -> 1
mask的值也要跟著位移

新增兩個全域變數count, mask來紀錄現在到第幾層,並加上終止條件與count位移

修正版本

uint32_t count = 16; uint32_t mask = 0xFFFFFFFF; int clz(uint32_t x) { int result; // shift upper half down, rest is filled up with 0s uint16_t upper = (x >> count); // mask upper half away mask >>= count; uint16_t lower = (x & mask); // stopping condition if(count == 1) { return !(x >> 1); } // shifting count and go into recursive count >>= 1; result = upper ? clz(upper) : (count << 1) + clz(lower); count <<= 1; return result; }

之後會試著簡化最後條件判斷式

驗證與測試

題外話 - 讓printf可以印出顏色!
加上這些macro

#define NONE "\033[m" #define RED "\033[0;32;31m" #define LIGHT_RED "\033[1;31m" #define GREEN "\033[0;32;32m" #define LIGHT_GREEN "\033[1;32m" #define BLUE "\033[0;32;34m" #define LIGHT_BLUE "\033[1;34m" #define DARY_GRAY "\033[1;30m" #define CYAN "\033[0;36m" #define LIGHT_CYAN "\033[1;36m" #define PURPLE "\033[0;35m" #define LIGHT_PURPLE "\033[1;35m" #define BROWN "\033[0;33m" #define YELLOW "\033[1;33m" #define LIGHT_GRAY "\033[0;37m" #define WHITE "\033[1;37m"

並在printf字串之前加上就可以換顏色了
可以參考下方程式碼

跟 ptt 可以加顏色一樣XD louielu

驗證

int check(uint32_t x, int n) { return ((!x) && n >> 5) || (!(x >> (31 - n) >> 1) && (x >> (31 - n))); } int main(){ uint32_t i = 0; #pragma omp parallel default (shared) private(i) num_threads(omp_get_max_threads()) #pragma omp for schedule(static) for (i = 0 ; i <= 0xFFFFFFFE; i++) { !(check(i, clz(i))) && printf ( LIGHT_RED "Error in %u count %d\n" NONE, i, clz(i)); if((i & (i - 1)) == 0) printf( DARY_GRAY "Check point %u\n" NONE, i); } printf ( LIGHT_GREEN "Check done!\n" NONE ); }

使用openmp平行驗證0~0xFFFFFFFF之間所有數字

測試

Hardware Info

  • Linux Mint 18
  • Linux kernel 4.4.0
  • Intel core i7-3960x

result
將結果*10^6輸出的數字

0 0.838000 1 1.397000 2 0.838000 3 1.118000 4 1.397000 5 1.117000 6 1.117000 7 1.396000 8 0.838000 9 1.118000 10 0.838000 11 1.117000 12 1.117000 13 0.838000 14 0.838000 15 1.117000 16 1.118000 17 1.117000 18 0.838000

時間結果全部都只在幾個特定的值內!
一開始以為可能是計時部份的程式碼有問題,後來找了同學的程式碼測試

c.c ponsheng

使用不同電腦跑出來的結果卻相當漂亮

Hardware Info

  • Linux Mint 17
  • Linux kernel 3.13.0-24
  • Intel core i7-4770

相同的Code在不同電腦有不同結果,初步判斷是硬體計時器精度不足
需要設計更多實驗驗證

RDTSC 量測