aweimeow
iteration
、binary search
、byte-shift
這三者看不出他們的執行速度差異 QQ閱讀 重新理解數值 裡頭關於 count leading zero (clz) 的描述,設計實驗,比較 32-bit 數值對以下實做的效能差異:
uint8_t clz(uint32_t x)
{
/* shift upper half down, rest is filled up with 0s */
uint16_t upper = (x >> 16);
// mask upper half away
uint16_t lower = (x & 0xFFFF);
return upper ? clz(upper) : 16 + clz(lower);
}
還沒想出老師給的版本如何修正
先自己想一個recursive 一起比較
int clz(uint32_t x) {
return (x)? clz(x>>1)-1 : 32;
}
#define u 99 //unused term
int clz(uint32_t x) {
static char table[64] =
{32,31, u,16, u,30, 3, u, 15, u, u, u,29,10, 2, u,
u, u,12,14,21, u,19, u, u,28, u,25, u, 9, 1, u,
17, u, 4, u, u, u,11, u, 13,22,20, u,26, u, u,18,
5, u, u,23, u,27, u, 6, u,24, 7, u, 8, u, 0, u};
x = x | (x >> 1); // Propagate leftmost
x = x | (x >> 2); // 1-bit to the right.
x = x | (x >> 4);
x = x | (x >> 8);
x = x | (x >> 16);
x = (x << 3) - x; // Multiply by 7.
x = (x << 8) - x; // Multiply by 255.
x = (x << 8) - x; // Again.
x = (x << 8) - x; // Again.
return table[x >> 26];
}
驗證該演算法是否正確,要走訪全部的 32-bit 數值
n = clz(num);
return ((!num)&&n>>5)||((!(num>>(31-n)>>1))&&(num>>(31-n)));
/* check first n bits check n+1th bit*/
極端case
n=32 num=0
n=0 num=2^31
ISO/IEC 9899:1999 6.5.7 Bitwise shift operators ¶3
The integer promotions are performed on each of the operands. The type of the result is that of the promoted left operand. If the value of the right operand is negative or is greater than or equal to the width of the promoted left operand, the behavior is undefined.
#include <time.h>
static double diff_in_second(struct timespec t1, struct timespec t2)
{
struct timespec diff;
if (t2.tv_nsec-t1.tv_nsec < 0) {
diff.tv_sec = t2.tv_sec - t1.tv_sec - 1;
diff.tv_nsec = t2.tv_nsec - t1.tv_nsec + 1000000000;
} else {
diff.tv_sec = t2.tv_sec - t1.tv_sec;
diff.tv_nsec = t2.tv_nsec - t1.tv_nsec;
}
return (diff.tv_sec + diff.tv_nsec / 1000000000.0);
}
int main()
{
struct timespec start, end;
double cpu_time;
clock_gettime(CLOCK_REALTIME, &start);
clz(num);
clock_gettime(CLOCK_REALTIME, &end);
cpu_time = diff_in_second(start, end);
}
考量到檔案大小的問題,我只測了從0到2^20的數字做(ps.recursive不是改老師的)
分佈圖 集中於 0us-0.5us
recursive version
iteration version
binary search technique
byte-shift version
Harley’s algorithm
走訪全部的 32-bit 數值,用上述演算法帶入計算 clz 值,
先驗證正確性,如果演算法不正確,試圖改正
比照 phonebook 和 compute-pi,設計一套 benchmark suite,得以針對所有的 32-bit 數值進行個別 clz 實做效能的分析,並透過 gnuplot 製圖
-O0
來抑制最佳化[ ]找至少 3 個案例,說明 clz 的應用場合