Try   HackMD

2016q3 Homework1 (clz)

contributed by <Jing Zhou>

開發環境

ubuntu 16.04

安裝

  • AVR GCC
    ​sudo apt-get install gcc-avr binutils-avr gdb-avr avr-libc avrdude
    
    • 執行依然產生錯誤
    ​$ make
    ​gcc -c -O0  -std=gnu99 -Wall  clz_function.c -o clz_function.o
    ​clz_function.c:3:26: fatal error: avr/pgmspace.h: No such file or directory
    ​compilation terminated.
    ​Makefile:12: recipe for target 'clz_function.o' failed
    ​make: *** [clz_function.o] Error 1
    

count leading zero

由高到低位元,計算在出現1前0的個數

32位元為例
0x00000006 為 0..0000110,前面0共出現29次

實做clz

方法

  • recursive version (修正後)

    ​uint8_t clz_recursive(uint32_t x){
    ​	uint32_t upper = (x >> 16);
    ​	uint32_t lower = (x & 0x0000FFFF);
    
    ​	return upper ? -16 + clz_recursive(upper) : lower ? -1 + clz_recursive(lower >> 1) : 32;
    ​}
    
  • iteration version

    ​uint8_t clz_iteration(uint32_t x){
    ​	uint8_t n = 32, c = 16;
    ​	do {
    ​		uint32_t y = x >> c;
    ​		if (y) { n -= c; x = y; }
    ​		c >>= 1;
    ​	} while (c);
    ​	return (n - x);
    ​}
    
  • binary search technique

    ​uint8_t clz_binary_search(uint32_t x){
    ​	if (x == 0) return 32;
    ​	uint8_t n = 0;
    ​	if (x <= 0x0000FFFF) { n += 16; x <<= 16; }
    ​	if (x <= 0x00FFFFFF) { n += 8; x <<= 8; }
    ​	if (x <= 0x0FFFFFFF) { n += 4; x <<= 4; }
    ​	if (x <= 0x3FFFFFFF) { n += 2; x <<= 2; }
    ​	if (x <= 0x7FFFFFFF) { n += 1; x <<= 1; }
    ​	return n;
    ​}
    
  • byte-shift version

    ​uint8_t clz_byte_shift(uint32_t x){
    ​	if (x == 0) return 32;
    ​	uint8_t n = 1;
    ​	if ((x >> 16) == 0) { n += 16; x <<= 16; }
    ​	if ((x >> 24) == 0) { n += 8; x <<= 8; }
    ​	if ((x >> 28) == 0) { n += 4; x <<= 4; }
    ​	if ((x >> 30) == 0) { n += 2; x <<= 2; }
    ​	n = n - (x >> 31);
    ​	return n;
    ​}
    
  • Harley’s algorithm (待測)

    
    

驗證正確性



四個方法都正確,最後一排是各個的執行時間(sec)

效能分析

參照compute-pi的方法用clock_gettime()設計benchmark suite,各點為經過10000次loop的時間,發現數字小的話recursive的時間高出許多,整體效能binary search和byte-shift最優

clz運用場合

  • 多媒體的抄大規模信號整合技術,將資料壓縮
    參考 Journal of Very Large Scale Integration Signal Processing Systems for Signal, Image, and Video Technology [Kluwer Academic Publishers, 2000]

  • implement Gosper's loop-detection algorithm
    參考 Gosper, Bill (1972). "Loop detector"

  • The expression 16 − clz(x − 1)/2 is an effective initial guess for computing the square root of a 32-bit integer using Newton's method

參考資料