owned this note
owned this note
Published
Linked with GitHub
# 2016q3 Homework1 (clz)
contributed by <`workfunction`>
:::info
我的github專案庫:[https://github.com/workfunction/clz-tests](https://github.com/workfunction/clz-tests)
:::
## 修改recursive version & Harley’s algorithm
### Harley’s algorithm
老師給的code無法正常運行,所以參考[這篇](http://www.hackersdelight.org/hdcodetxt/nlz.c.txt)的Harley's algorithm with multiply expanded方法,比對後發現是table的值錯了
:::warning
**原版table**
```clike=
static prog_uint8_t const Table[] = {
0xFF, 0, 0xFF, 15, 0xFF, 1, 28, 0xFF,
16, 0xFF, 0xFF, 0xFF, 2, 21, 29, 0xFF,
0xFF, 0xFF, 19, 17, 10, 0xFF, 12, 0xFF,
0xFF, 3, 0xFF, 6, 0xFF, 22, 30, 0xFF,
14, 0xFF, 27, 0xFF, 0xFF, 0xFF, 20, 0xFF,
18, 9, 11, 0xFF, 5, 0xFF, 0xFF, 13,
26, 0xFF, 0xFF, 8, 0xFF, 4, 0xFF, 25,
0xFF, 7, 24, 0xFF, 23, 0xFF, 31, 0xFF,
};
```
> 這個table是用來判斷CTZ(最後面有幾個0)
:::
參考後的正確版本
:::success
**正確table**
```clike=
const char table[64] =
{32,31, u,16, u,30, 3, u, 15, u, u, u,29,10, 2, u,
u, u,12,14,21, u,19, u, u,28, u,25, u, 9, 1, u,
17, u, 4, u, u, u,11, u, 13,22,20, u,26, u, u,18,
5, u, u,23, u,27, u, 6, u,24, 7, u, 8, u, 0, u};
```
> u在次table中無用,可另外給定任意數字
:::
### Recursive版本
原版本並不是一個完整的實作,缺少遞迴終止條件與位移量錯誤
:::danger
**原版**
```clike=
uint8_t clz(uint32_t x)
{
/* shift upper half down, rest is filled up with 0s */
uint16_t upper = (x >> 16);
// mask upper half away
uint16_t lower = (x & 0xFFFF);
return upper ? clz(upper) : 16 + clz(lower);
}
```
這裡面16應該要隨著進入不同層而變動 16 -> 8 -> 4 -> 2 -> 1
mask的值也要跟著位移
:::
新增兩個全域變數==count==, ==mask==來紀錄現在到第幾層,並加上終止條件與==count==位移
:::success
**修正版本**
```clike=
uint32_t count = 16;
uint32_t mask = 0xFFFFFFFF;
int clz(uint32_t x)
{
int result;
// shift upper half down, rest is filled up with 0s
uint16_t upper = (x >> count);
// mask upper half away
mask >>= count;
uint16_t lower = (x & mask);
// stopping condition
if(count == 1) {
return !(x >> 1);
}
// shifting count and go into recursive
count >>= 1;
result = upper ? clz(upper) : (count << 1) + clz(lower);
count <<= 1;
return result;
}
```
> 之後會試著簡化最後條件判斷式
:::
## 驗證與測試
:::info
**題外話** - 讓printf可以印出顏色!
加上這些macro
```clike=
#define NONE "\033[m"
#define RED "\033[0;32;31m"
#define LIGHT_RED "\033[1;31m"
#define GREEN "\033[0;32;32m"
#define LIGHT_GREEN "\033[1;32m"
#define BLUE "\033[0;32;34m"
#define LIGHT_BLUE "\033[1;34m"
#define DARY_GRAY "\033[1;30m"
#define CYAN "\033[0;36m"
#define LIGHT_CYAN "\033[1;36m"
#define PURPLE "\033[0;35m"
#define LIGHT_PURPLE "\033[1;35m"
#define BROWN "\033[0;33m"
#define YELLOW "\033[1;33m"
#define LIGHT_GRAY "\033[0;37m"
#define WHITE "\033[1;37m"
```
並在printf字串之前加上就可以換顏色了
可以參考下方程式碼
:::
> 跟 ptt 可以加顏色一樣XD [name=louielu]
### 驗證
```clike=
int check(uint32_t x, int n) {
return ((!x) && n >> 5) || (!(x >> (31 - n) >> 1) && (x >> (31 - n)));
}
int main(){
uint32_t i = 0;
#pragma omp parallel default (shared) private(i) num_threads(omp_get_max_threads())
#pragma omp for schedule(static)
for (i = 0 ; i <= 0xFFFFFFFE; i++) {
!(check(i, clz(i))) && printf ( LIGHT_RED "Error in %u count %d\n" NONE, i, clz(i));
if((i & (i - 1)) == 0)
printf( DARY_GRAY "Check point %u\n" NONE, i);
}
printf ( LIGHT_GREEN "Check done!\n" NONE );
}
```
使用openmp平行驗證0~0xFFFFFFFF之間所有數字
### 測試
:::info
**Hardware Info**
- Linux Mint 18
- Linux kernel 4.4.0
- Intel core i7-3960x
:::
![result](https://i.imgur.com/JW8hiTK.png)
將結果\*10^6輸出的數字
```clike=
0 0.838000
1 1.397000
2 0.838000
3 1.118000
4 1.397000
5 1.117000
6 1.117000
7 1.396000
8 0.838000
9 1.118000
10 0.838000
11 1.117000
12 1.117000
13 0.838000
14 0.838000
15 1.117000
16 1.118000
17 1.117000
18 0.838000
```
時間結果全部都只在幾個特定的值內!
一開始以為可能是計時部份的程式碼有問題,後來找了同學的程式碼測試
![](https://i.imgur.com/t6rKdyq.png)
> c.c [ponsheng](https://hackmd.io/s/rJmy9z_a)
使用不同電腦跑出來的結果卻相當漂亮...
:::info
**Hardware Info**
- Linux Mint 17
- Linux kernel 3.13.0-24
- Intel core i7-4770
:::
![](https://i.imgur.com/i7DuPkR.png)
> 相同的Code在不同電腦有不同結果,初步判斷是硬體計時器精度不足
> 需要設計更多實驗驗證
### RDTSC 量測
:::warning
TODO:[Linux time measurement](http://stackoverflow.com/questions/12392278/measure-time-in-linux-time-vs-clock-vs-getrusage-vs-clock-gettime-vs-gettimeof)
:::