2016q1 Homework 1 (phonebook)

contributed by <HuangWenChen>

Reviewed by `shelly4132`

可以利用字串壓縮法降低資料表現的成本
可利用binary search改寫演算法

開發環境

Description : Ubuntu 16.04.1 LTS
linux kernel version : 4.4.0-38-generic
CPU : AMD A6-4455M APU with Radeon™ HD Graphics
Cache :
- L1d cache : 16K
- L1i cache : 64K
- L2 cache : 2048K

可使用$ lscpu and $ cat /etc/os-release 查看規格

預期目標

準備 GNU/Linux 開發工具
學習使用 Git 與 GitHub
學習效能分析工具
研究軟體最佳化

Cache 相關知識

參考閱讀

There are three categories of cache misses:

Compulsory misses - occur on first reference to a data item.
Capacity misses - occur when the working set exceeds the cache capacity.
Conflict misses - occur when a data item is referenced after the cache line containing the item was evicted.

Git

參考閱讀
- Git 情境劇

將常用的指令先大概看過一遍，上機操做看看。

git init 開始一個新的 Git repo.
git status 確認現在所有檔案的狀況
git clone 複製一個專案
git add 將想要的檔案加入 Stage
git add . 將所有編修過的檔案加入 Stage
git commit 將 Stage 狀態的檔案做 Commit 動作
git pull remote名稱 branch名稱 下載一個遠端的 branch 並合併 pull = fetch + merge

git push 將本地端的 branch 上傳到遠端。

效能分析工具 Perf：Phonebook

參考閱讀
- Linux 效能分析工具: Perf
- kptr_restrict for hiding kernel pointers

完成前置準備安裝

如果要檢測 cache miss event ，需要先取消 kernel pointer 的禁用。
$ sudo sh -c " echo 0 > /proc/sys/kernel/kptr_restrict"
從 kptr_restrict for hiding kernel pointers 文章裡看到 "If kptr_restrict is set to 0, no deviation from the standard %p behavior occurs." kptr_restrict 設為零可以%p行為發生沒有偏差

未優化版本

$ make

$ make run

size of entry : 136 bytes
execution time of append() : 0.117969 sec
execution time of findName() : 0.009333 sec

$ perf stat -r 100 -e cache-misses,cache-references,instructions,cycles ./phonebook_orig

當執行要觀看未優化版本的 cache-misses rate 時，發生了奇怪的事情，預期執行後應該會落在 9x％-8x% 之間，但執行之後沒有看到預期的結果，竟然只有 0.530 %，目前尚未了解原因。

dmidecode 是可以解析 CPU 型號、主機板型號與記憶體相關的型號等等資訊。
$ dmidecode -t cache

Handle 0x0029, DMI type 7, 19 bytes
Cache Information
        Socket Designation: L1 CACHE
        Configuration: Enabled, Not Socketed, Level 1
        Operational Mode: Write Back
        Location: Internal
        Installed Size: 96 kB
        Maximum Size: 96 kB
        Supported SRAM Types:
                Pipeline Burst
        Installed SRAM Type: Pipeline Burst
        Speed: 1 ns
        Error Correction Type: Multi-bit ECC
        System Type: Unified
        Associativity: 2-way Set-associative

發現顯示 L1-cache 是 96kbytes

觀查到 cache-references 非常的大。

size of entry : 136 bytes
execution time of append() : 0.103768 sec
execution time of findName() : 0.009330 sec

Performance counter stats for './phonebook_orig' (100 runs):

           368,925      cache-misses              #    0.530 % of all cacherefs      ( +-  0.80% )  (74.39%)
        69,551,729      cache-references                                             ( +-  0.11% )  (51.84%)
       262,353,690      instructions              #    0.95  insns per cycl          ( +-  0.03% )  (76.04%)
       276,827,399      cycles                                                       ( +-  0.08% )  (74.63%)

       0.134035292 seconds time elapsed                                         ( +-  0.10% )

優化版本

參考閱讀 Programming Small

減少entry空間

從參考資料中，了解到在搜尋的過程只需要搜尋 lastName，其他資料內容尚未使用到，所以可以設計一個 struct 只儲存 lastName 使原本的 struct 大小從 136 bytes → 32 bytes ，讓 L1 cache 多放更多資料。
從約30筆entry到約128筆entry



















typedef struct __PHONE_BOOK_DETAIL {
   char lastName[MAX_LAST_NAME_SIZE];
   char firstName[16];
   char email[16];
   char phone[10];
   char cell[10];
   char addr1[16];
   char addr2[16];
   char city[16];
   char state[2];
   char zip[5];
   struct __PHONE_BOOK_DETAIL *pNext;
} entry_detail;

typedef struct __PHONE_BOOK_ENTRY{
   char lastName[MAX_LAST_NAME_SIZE];
   entry_datail *datail;
   struct __PHONE_BOOK_ENTRY *pNext;
}entry;

將修改完的程式再編譯一次

$ make

$ perf stat -r 100 -e cache-misses,cache-references,instructions,cycles ./phonebook_opt

cache-misses rate 問題依舊尚未了解，從時間上與 cache-misses rate 是有改善。但從 cache-misses rate 看沒有很大的變化。

cache-misses 下降0.2%

size of entry : 32 bytes
execution time of append() : 0.078245 sec
execution time of findName() : 0.009389 sec

Performance counter stats for './phonebook_opt' (100 runs):

           211,003      cache-misses              #    0.330 % of all cache refs      ( +-  0.67% )  (73.75%)
        63,978,495      cache-references                                              ( +-  0.22% )  (51.55%)
       242,944,615      instructions              #    1.10  insns per cycle          ( +-  0.08% )  (77.21%)
       221,482,539      cycles                                                        ( +-  0.12% )  (75.64%)

       0.107024164 seconds time elapsed                                          ( +-  0.14% )

$ ./phonebook_hash_opt & sudo perf top -p $!
將空間減少後時間都花在比對上面

使用hash function方式

參考資料

hash 的方法是將資料透過某種公式(hash function)轉換成 key值，再去bucket中尋找資料。
當沒有overflow的時候，計算hash function的時間: O(1)

而設定 hash table 大小通常會找較大質數，可以讓取出的餘數平均分佈在表中。這裡參考 Programming Small 設定 42373 大小。
從各種字符串Hash函數比較發現 BKDRHash function 效果最好，所以採用此 hash function。

在處理 static hashing Overflow Handling 有兩種最受歡迎的方式:

Open Addressing
Chaining

此方法使用 Chaining












unsigned int BKDRHash(char *str)
{
    unsigned int seed = 131; // 31 131 1313 13131 131313 etc..
    unsigned int hash = 0;

    while (*str)
    {
        hash = hash * seed + (*str++);
    }

    return (hash & TABLE_SIZE);
}

再寫 hash 方式，因為對C不熟再 code 方面研究了很久，看很多 code 慢慢去改寫原本的 code 。

經過 BKDRhash function 優化後 :

append() 時間明顯得上升
findName() 時間明顯下降為0
cache-misses 下降 1.7 %
$ make

$ perf stat -r 100 -e cache-misses,cache-references,instructions,cycles ./phonebook_hash_opt

size of entry : 32 bytes
execution time of append() : 0.104041 sec
execution time of findName() : 0.000000 sec

Performance counter stats for './phonebook_hash_opt' (100 runs):

           236,819      cache-misses              #    0.360 % of all cache refs      ( +-  0.36% )  (73.68%)
        65,704,977      cache-references                                              ( +-  0.17% )  (51.50%)
       235,583,333      instructions              #    1.07  insns per cycle          ( +-  0.04% )  (77.26%)
       220,478,824      cycles                                                        ( +-  0.05% )  (75.60%)

       0.106951672 seconds time elapsed                                          ( +-  0.21% )

$ ./phonebook_hash_opt & sudo perf top -p $!
從下圖可以發現時間幾乎花在 BKDRHash function 上面。

使用第二種 hash function 做比較，選用 Programming Small 提到的 djb2

$ make

$ perf stat -r 100 -e cache-misses,cache-references,instructions,cycles ./phonebook_hash2_opt

size of entry : 32 bytes
execution time of append() : 0.101210 sec
execution time of findName() : 0.000000 sec

 Performance counter stats for './phonebook_hash2_opt' (100 runs):

           207,652      cache-misses              #    0.326 % of all cache refs      ( +-  0.82% )  (73.63%)
        63,632,029      cache-references                                              ( +-  0.13% )  (52.49%)
       235,883,099      instructions              #    1.10  insns per cycle          ( +-  0.05% )  (76.70%)
       215,110,824      cycles                                                        ( +-  0.37% )  (74.88%)

       0.104529298 seconds time elapsed                                          ( +-  0.42% )

經過 djb2hash function 優化後 :

append() 時間明顯得上升
findName() 時間明顯下降為0
cache-misses 下降 2.04%

發現 djb2 cache-misses 下降較多。

2016q1 Homework 1 (phonebook)

Reviewed by shelly4132

開發環境

預期目標

Cache 相關知識

Git

效能分析工具 Perf：Phonebook

未優化版本

優化版本

減少entry空間

使用hash function方式

Reviewed by `shelly4132`