2017q1 Homework1 (phonebook)

contributed by < s8w1e2ep >

tags: `sys2017` `s8w1e2ep` `phonebook`

Reviewed by `davis8211`

可以說明為什麼使用 BKDR hash function。

各種字符串Hash函數比較
這個網頁裡面有各個hash function的比較實驗，其中BKDR hash的分數最高，因此我選擇使用BKDR hash function。s8w1e2ep

Hash table 設定的大小可以再說明的清楚一點，或許有人會對 44,887 這個大小有疑問。

44,887是一個質數，會選擇這麼大的數字，是因為 data set 有35萬筆資料。如果 size 太小，發生 collision 的機率會提高。導致搜尋時間變長。而 size 大到某個程度後，發生 collision 的機率相差不大，搜尋效率也不會變得更好，反而會佔取額外的空間。s8w1e2ep

Hash 版本的 code













entry *append(char lastName[], entry *e)
{
    unsigned int hashValue = BKDRHash(lastName);
    
    e = (entry *) malloc(sizeof(entry));
    e->pNext = hashTable[hashValue];
    strcpy(e->lastName, lastName);
    e->detailEntry = NULL;
    e->pNext = NULL;
    hashTable[hashValue] = e;
    
    return e;
}

第六行，你將新建立的 entry e 的指標指向 hashTable[hashValue]，但在第九行，又將指標指向 NULL，因此你的 hashTable 一直都把過去儲存的資料蓋掉，需再思考一下。

確實會蓋過過去儲存的資料，我會再做修改。s8w1e2ep

開發環境

OS:                    Ubuntu 16.04.1 (LTS)
Architecture:          x86_64
CPU 作業模式：    32-bit, 64-bit
Byte Order:            Little Endian
CPU(s):                4
On-line CPU(s) list:   0-3
每核心執行緒數：1
每通訊端核心數：4
Socket(s):             1
NUMA 節點：         1
供應商識別號：  GenuineIntel
CPU 家族：          6
型號：              158
Model name:            Intel(R) Core(TM) i5-7500 CPU @ 3.40GHz
製程：              9
CPU MHz：             799.987
CPU max MHz:           3800.0000
CPU min MHz:           800.0000
BogoMIPS:              6800.00
虛擬：              VT-x
L1d 快取：          32K
L1i 快取：          32K
L2 快取：           256K
L3 快取：           6144K
NUMA node0 CPU(s):     0-3

前置作業

終端機與Vim

終端機與Vim設定
終端機文字顏色

因為我安裝的Ubuntu版本是16.04，因此語法上有些微不同。

PS1='\[\033[01;32m\]\u@\h\[\033[00m\]:\[\033[01;34m\]\w\[\033[00m\]\[\033[01;33m\]$\[\033[00m\] '

Vim設定

set ai
syntax on
set background=dark
set enc=utf8
set hls 
set number
map <F4> : set nu!<BAR>set nonu?<CR>
set ic
set expandtab
set tabstop=4
set shiftwidth=2

Perf

perf 原理和實務
遇到的問題
我照著步驟輸入下列指令

$ sudo sh -c " echo 0 > /proc/sys/kernel/kptr_restrict"

接著輸入$ perf top
結果出現

一開始想法是用Vim直接改/proc/sys/kernel/perf_event_paranoid這個檔案，
後來發現無法強制改值。最後參考了這位大大的共筆。

nekoneko - 共筆

輸入下列指令

$ sudo sh -c " echo 1 > /proc/sys/kernel/perf_event_paranoid"

再執行$ perf top，就解決啦！

Astyle

輸入下列指令就可以依照 K&R style 自動排版程式碼

$ astyle --style=kr --indent=spaces=4 --indent-switches --suffix=none *.[ch]

gnuplot

開發紀錄

Cache 簡介

Cache entries: Data is transferred between memory and cache in blocks of fixed size, called cache lines or cache blocks. When a cache line is copied from memory into the cache, a cache entry is created. The cache entry will include the copied data as well as the requested memory location (called a tag).
Cache line: CPU cache 中最小的緩存單位，也可以稱 cache block。
Cache miss: A cache miss is a failed attempt to read or write a piece of data in the cache, which results in a main memory access with much longer latency.

資料在 memory 與 cache 傳送時，會以固定大小的block型式傳遞，我們稱為 cache line 或是 cache block。當1個 cache line 從 memory 複製到 cache 時，1個 cache entry 會被創造。 Cache entry 包含了複製的資料以及請求的記憶體位址(又稱為 tag )。

當程序需要讀取或寫入資料到記憶體位址時，首先會確認 cache 中對應的 entry 。並確認任一個可能包含此位址的 cahce line 的內容。如果程序在 cache 中找到位址，就會發生 cache hit ；反之，則會發生 cache miss 。在 cache hit 的情況下，程序可以從 cache line 中立即的讀取或寫入資料；在 cache miss 的情況下， cache 會分配一個新的 entry 的空間並將資料從記憶體複製到 entry 中，然後用 cache 的內容去完成請求。

發生 Cache miss 的情況分為三種

instruction read miss (largest delay)
data read miss (smaller delay)
data write miss (shortest delay)

Cache entry structure

tag	index	block offset

block offset: ⌈ log₂( b ) ⌉ bits, b is the number of bytes per cache block.
index: ⌈ log₂( n ) ⌉ bits, n is the number of ways.
tag : the length of address - index - block offset bits

輸入下列指令，可以得到自己電腦的cache line size，我的電腦是64B。

$ cat /sys/devices/system/cpu/cpu0/cache/index0/coherency_line_size

執行下列指令可以查看 cache 詳細資訊

$ sudo dmidecode -t cache | more

我的電腦是64位元的 OS ，因此位址長度為 64 bits。而 cache block 大小為 64B，L1 cache 使用的是 8 - way associative，且 L1d cache 大小為 32KB。

data offset = log₂64 = 6(bits)
index = log₂(32(KB) / 64(B) / 8) = 6(bits)
tag = 64 - 6 - 6 = 52(bits)

計算

已知條件
- L1 cache size : 64KB
- L1d size : 32KB (L1中用來處理 data 的部分)
- cache line size = cache block size : 64B
- 1 個 address size : 8B = 64 bits
- 每個 entry size : 136B(data size) + 8B(data pointer) = 144B
需要花費多少 cache block?
- 350,000(總資料筆數) * 144 B = 50,400,000B
- 50,400,000B / 64B * 2(兩個 function 存取) = 1,575,000(block)
- 從下方原始版本的 cache miss 次數，可以看到 1,575,000 與 5,029,615 量級相同。reference 一次 = 讀 1 個 block
有多少個 set?
- 我的電腦是採用 8-way associative
- 32KB / 64B / 8 = 64，每個 set 有 8 個 cache line, 共有 64 個 set
每個 set 可以存多少筆資料?
- 144B(entry size) / 64B(blcok) = 2.25，也就是說至少要有3個 block 才能完整存一筆資料。
- 每個 set 只能存 2 筆資料，因為每個 set 只有 8 個 block

原始版本

append()與findName()執行時間
先清除快取
$ echo 1 | sudo tee /proc/sys/vm/drop_caches
再執行$ ./phonebook_orig

size of entry : 136 bytes
execution time of append() : 0.035636 sec
execution time of findName() : 0.005177 sec

cache miss
執行$ make cache-test
發現 cache-miss 高達 91%

Performance counter stats for './phonebook_orig' (100 runs):

         4,586,848      cache-misses              #   91.197 % of all cache refs      ( +-  0.15% )
         5,029,615      cache-references                                              ( +-  0.09% )
       262,487,884      instructions              #    1.45  insn per cycle           ( +-  0.02% )
       180,634,865      cycles                                                        ( +-  0.08% )

       0.050913126 seconds time elapsed                                          ( +-  0.18% )

優化版本1 - 減少 phonebook 資料結構大小

修改了 phonebook_opt.h

















typedef struct __PHONE_BOOK_DETAIL {
    char firstName[16];
    char email[16];
    char phone[10];
    char cell[10];
    char addr1[16];
    char addr2[16];
    char city[16];
    char state[2];
    char zip[5];
} detail;

typedef struct __PHONE_BOOK_ENTRY {
    char lastName[MAX_LAST_NAME_SIZE];
    detail *detailEntry;
    struct __PHONE_BOOK_ENTRY *pNext;
} entry;

append() 與 findName() 執行時間
執行$ ./phonebook_opt

size of entry : 32 bytes
execution time of append() : 0.042310 sec
execution time of findName() : 0.001569 sec

cache miss
執行$ make cache-test
cache miss 從原本 91% 降低到 63%，比原本好一些，但還是很高。

 Performance counter stats for './phonebook_opt' (100 runs):

         1,401,857      cache-misses              #   62.629 % of all cache refs      ( +-  0.45% )
         2,238,341      cache-references                                              ( +-  0.12% )
       244,525,333      instructions              #    2.09  insn per cycle           ( +-  0.02% )
       117,255,055      cycles                                                        ( +-  0.10% )

       0.032406767 seconds time elapsed                                          ( +-  0.28% )

優化後的phonebook效能比較圖

優化版本2 - 使用 hash function

各種字符串Hash函數比較
使用 BKDR hash function
因為data set數量有350,000筆資料，所以我將 MAX_HASH_TABLE_SIZE 設定為 44,887。










unsigned int BKDRHash(char *str)
{
    unsigned int seed = 131, hash = 0;

    while(*str) {
        hash = hash * seed + (*str++);
    }

    return hash % MAX_HASH_TABLE_SIZE;
}

append() 與 findName() 執行時間
執行$ ./phonebook_hash

size of entry : 32 bytes
execution time of append() : 0.034605 sec
execution time of findName() : 0.000000 sec

cache miss
執行$ make cache-test
cache miss 已經降低成 37%。跟原始版本相比減少了將近 6 成。

 Performance counter stats for './phonebook_hash' (100 runs):

           586,421      cache-misses              #   37.400 % of all cache refs      ( +-  0.51% )
         1,567,973      cache-references                                              ( +-  0.17% )
       242,581,266      instructions              #    1.93  insn per cycle           ( +-  0.02% )
       125,639,830      cycles                                                        ( +-  0.10% )

       0.035417917 seconds time elapsed                                          ( +-  0.23% )

與其它優化版本的phonebook效能比較圖