2016q3 Homework1 (phonebook)

contributed by <bobsonlin>

開發環境

OS： Ubuntu 16.04.1 LTS
Memory: 12201MB
CPU:
- Name:
  - 4-core Intel® Core™ i5-2520M CPU @ 2.50GHz
- Cache:
  - L1d Cache: 32KB
  - L1i Cache: 32KB
  - L2 Cache: 256KB
  - L3 Cache: 3072KB
- Frequency:
  - CPU max MHz: 3200
  - CPU min MHz: 800

查詢方式：

點左下方的menu -> system tool -> System Profiler and Benchmark
由lscpu指令可得知CPU的最大和最小的工作頻率

案例分析： Phonebook

未優化版本

$ make run

size of entry : 136 bytes
execution time of append() : 0.110389 sec
execution time of findName() : 0.007590 sec

$ perf stat --repeat 5 -e cycles,instructions,cache-references,cache-misses,branch-instructions,branch-misses ./phonebook_orig

Performance counter stats for ’./phonebook_orig’ (5 runs):
2,10730710 cycles ( +-  1.15% ) [65.49%]
2,6152,9155 instructions # 1.24  insns per cycle ( +-  0.18% ) [82.75%]
226,7098 cache-references                                              ( +-  0.32% ) [82.75%]
208,8783 cache-misses # 92.135 % of all cache refs ( +-  1.51% ) [82.75%]
4660,8227 branch-instructions ( +-  1.01% ) [85.59%]
99,5033 branch-misses # 2.13% of all branches ( +-  1.13% ) [83.47%]

0.079568228 seconds time elapsed                                          ( +-  13.06% )

用 perf 找尋熱點：

$ ./phonebook_orig & sudo perf top -p $!

由以上的結果可知，在phonebook的程式中，消耗CPU週期最多的是findName()，

優化版本

修改struct的結構

$ make run

size of entry : 32 bytes
execution time of append() : 0.095677 sec
execution time of findName() : 0.004692 sec

$ perf stat --repeat 5 -e cycles,instructions,cache-references,cache- misses,branch-instructions,branch-misses ./phonebook_orig

 Performance counter stats for ’./phonebook_orig’ (5 runs):
 1,4712,4605 cycles ( +-  0.87% ) [65.26%]
 2,4898,6819 instructions # 1.69  insns per cycle ( +-  0.47% ) [82.80%]
 69,3457 cache-references                                              ( +-  3.35% ) [83.83%]
 38,4917 cache-misses # 55.507 % of all cache refs ( +-  2.35% ) [83.87%]
 4366,7742 branch-instructions ( +-  0.53% ) [83.87%]
 99,4378 branch-misses # 2.28% of all branches ( +-  2.30% ) [84.65%]

 0.060416386 seconds time elapsed                                          ( +-  19.98% )

plot
分析結果

Hash Function 加速查詢

Hash Table 建立：










hashTable *createHashTable(int tableSize){
    int i;
    hashTable *tb;
    tb = (hashTable*)malloc(sizeof(hashTable));
    tb->list = (entry **)malloc(sizeof(entry*)*tableSize);
    for(i = 0;i < tableSize;i++){
        tb->list[i] = NULL;
    }
    return tb; 
}

hash function：









unsigned int hash(char *str)
{
    unsigned int hash_value = 0;
    unsigned int seed = 131;

    while(*str)
        hash_value = hash_value * seed + (*str++);
    return (hash_value % TABLE_SIZE);
}

findName 修改：












entry *findName_HashTable(char lastName[], hashTable *tb)
{
    entry *e;
    unsigned int index = hash(lastName);

    for(e = tb->list[index]; e != NULL; e = e->pNext){
        if (strcasecmp(lastName, e->lastName) == 0)
            return e;
      
    }
    return NULL;
}

append 修改









void append_HashTable(char lastName[], hashTable *tb)
{
    unsigned int index = hash(lastName);
    entry *e;
    e = (entry*)malloc(sizeof(entry));
    strcpy(e->lastName, lastName);
    e->pNext = tb->list[index];
    tb->list[index] = e;
}

performace

 Performance counter stats for './phonebook_opt_HASH' (100 runs):

         32,5297      cache-misses              #   53.204 % of all cache refs      ( +-  0.43% )
         61,1409      cache-references                                              ( +-  0.37% )
     2,3601,3893      instructions              #    1.53  insns per cycle          ( +-  0.02% )
     1,5460,6436      cycles                                                        ( +-  0.36% )

     0.062424603 seconds time elapsed                                          ( +-  3.01% )

plot

修改讀檔方式，降低append()的執行時間
- 解說：
  參考c14006078同學的HackMD，覺得他這個修改很棒，既然從fgets()得來的lastname不影響比對，可以把對字串的處理移至findname()再進行。
- plot

工作進度

[9/25] 灌好OS，熟悉Github、HackMD、Perf、gnuplot的操作
[9/28] 實現Hash function
Now 希望能在gnuplot產生3個資料的直方圖，並仔細的分析結果，也希望可以對hash function做深入的研究。

遇到的問題

cache-references 在網路上查(https://www.ibm.com/developerworks/cn/linux/l-cn-perf1/)是指cache的命中次數，我以為命中次數要愈高愈好，但在成大資工Wiki的Perf介紹中(http://wiki.csie.ncku.edu.tw/embedded/perf-tutorial)卻好像指cache-references降低是件好事？

請回去讀書，計算機組織的課本寫得很清楚。人之所缺乏專業，往往就是活在「腦補」的世界中。 jserv

我google了有關cache的資料，做了以下的整理:

由參考1得知，cache讀取命中稱為cache hit，讀取錯失稱為cache miss，又若將terminal中的cache-misses次數除以cache-references次數，剛好與terminal中顯示的百分比相同，由此推論cache references應不是cache的命中次數，而是CPU向cache要求資料的總次數。

由參考1的儲存系統階層可以得知，向cache取得資料時機為要找的資料不在CPU暫存器時，才會向cache找資料。反向推論，若cache-references次數較少，代表CPU需要的資料可能大多已存在暫存器中。

綜合上方兩者推論，成大資工WiKi的Perf介紹中，利用存取的局部性修改程式，大幅降低cache-references次數，亦降低cache-misses次數，代表CPU所需要的資料大多存在CPU暫存器中，使得執行時間縮短為原本的三分之一。

當我執行make run時，發現append()的執行時間為findName()的15倍，但是進行perf找尋熱點時，findName()卻為熱點，明顯大於append()。想了解執行時間與CPU消耗週期的差異。
當我在執行$ git push時，發生permission denied，經過同學的幫忙，才發現我當時是從老師給的github網址直接clone下來。可以由git remote -v來得知本機的專案從哪裡clone下來，以及將會被push到何處，若欲修改，可先$ git remote #代號再$ git remote add #代號，最後$ git push --set-upstream #代號 master

2016q3 Homework1 (phonebook)

開發環境

案例分析： Phonebook

未優化版本

用 perf 找尋熱點：

優化版本

工作進度

遇到的問題

待完成項目

參考資料

tags: `bobsonlin`

2016q3 Homework1 (phonebook)

開發環境

案例分析： Phonebook

未優化版本

用 perf 找尋熱點：

優化版本

工作進度

遇到的問題

待完成項目

參考資料

tags: bobsonlin

tags: `bobsonlin`