2017q3 Homework1 (phonebook)

contributed by <Lihikoka>

開發環境

DRAM:                  16GB
CPU(s):                8
Core(s) per socket:    4
CPU:                   Intel(R) Core(TM) i7-3770 CPU @ 3.40GHz
L1d cache:             32K
L1i cache:             32K
L2 cache:              256K
L3 cache:              8192K

原始程式分析

Memory Leak

在 main.c 最後面釋放記憶體的程式碼，但只釋放了第一、二個 entry 。


if (pHead->pNext) free(pHead->pNext);
free(pHead);

改寫成






while (pHead) {
    e = pHead;
    pHead = pHead->pNext;
    free(e);
}
free(pHead);

可將所有 entry 釋放掉，已解決 memory leak。

Structure Size

phonebook_orig.h 中原始 phone book 的 structure：













typedef struct __PHONE_BOOK_ENTRY {
    char lastName[MAX_LAST_NAME_SIZE];
    char firstName[16];
    char email[16];
    char phone[10];
    char cell[10];
    char addr1[16];
    char addr2[16];
    char city[16];
    char state[2];
    char zip[5];
    struct __PHONE_BOOK_ENTRY *pNext;
} entry;

這個 structure 佔用的記憶體大小為 136 Bytes，我的系統的 L1 cache 大小為 32 KB ，因此 L1 cache 中能存 32 * 1024 / 136 = 240.94 筆 entry。
改進方式：在 phonebook_orig.c 的 findName 函式中，可以看到搜尋的依據是 lastName，其他的項對於搜尋沒有幫助，因此可以考慮使用較小的 structure ，例如把 lastName 以外的項利用一個 pointer to structure 指到的 struct 存起來，如此一來 entry 的大小即降為 32 Bytes 。



















typedef struct __PHONE_BOOK_DETAIL {
    char lastName[MAX_LAST_NAME_SIZE];
    char firstName[16];
    char email[16];
    char phone[10];
    char cell[10];
    char addr1[16];
    char addr2[16];
    char city[16];
    char state[2];
    char zip[5];
    struct __PHONE_BOOK_ENTRY *pNext;
} phonebook_detail;

typedef struct __LAST_NAME_ENTRY {
    char lastName[MAX_LAST_NAME_SIZE];
    phonebook_detail *detail;
    struct __LAST_NAME_ENTRY *pNext;
} entry;

實驗結果

執行 sudo make cache-test
未優化：

Performance counter stats for './phonebook_orig' (100 runs):

          235,5270      cache-misses              #   95.170 % of all cache refs      ( +-  0.08% )
          247,4805      cache-references                                              ( +-  0.10% )
       3,2287,3113      instructions              #    1.43  insn per cycle                                              ( +-  0.02% )
       2,2621,0953      cycles                                                        ( +-  0.26% )

       0.062875512 seconds time elapsed                                          ( +-  0.97% )

縮小 structure size：

Performance counter stats for './phonebook_opt' (100 runs):

           35,7711      cache-misses              #   65.760 % of all cache refs      ( +-  0.37% )
           54,3961      cache-references                                              ( +-  0.44% )
       2,8347,0947      instructions              #    1.89  insn per cycle                                              ( +-  0.01% )
       1,4989,4403      cycles                                                        ( +-  0.19% )

       0.044388147 seconds time elapsed                                          ( +-  1.57% )

可以看到 cache-misses 減少了 (95.170 - 65.760) / 95.170 = 30.1%

append 的時間少了 19.2% ， findName 的時間少了 59.5% 。

Memory Pool

Memory pool 是一種記憶體管理的方式，每一次呼叫 malloc 系統就會分配一塊記憶體給 process ，但是 malloc 、 free 太多次的話很容易產生零碎的記憶體片段，導致下一次如果要 malloc 一塊較大的記憶體時可能會失敗，而使用 memory pool 的話則是一次先 malloc 一大塊記憶體，然後再自行管理。

參考 Stackoverflow 上別人寫的簡易版 memory pool 並將其特化成適合此次作業的版本









































typedef struct __POOL {
    entry *next;
    entry *end;
} pool;

entry *append_with_mempool(char lastName[], entry *e, pool *p)
{
    e->pNext = pool_alloc(p, sizeof(entry));
    if (NULL == e->pNext) return NULL; // memory pool is full
    e = e->pNext;
    strcpy(e->lastName, lastName);
    e->pNext = NULL;

    return e;
}

pool *pool_create(size_t size)
{
    pool *p = (pool*) malloc(size + sizeof(pool));
    p->next = (entry*) &p[1];
    p->end = p->next + size;
    return p;
}

void pool_destroy(pool *p)
{
    free(p);
}

size_t pool_available(pool *p)
{
    return p->end - p->next;
}

entry *pool_alloc(pool *p, size_t size)
{
    if (pool_available(p) < sizeof(entry)) return NULL;
    entry *mem = (entry*) p->next;
    p->next += size;
    return mem;
}

其中 pool_alloc 函式的 argument size 是需要謹慎選擇的，如果所需要分配的記憶體超過了一開始分配給 memory pool 的記憶體，則會發生 segmentation fault (core dump) ，處理的方式比較麻煩，在本次實作中並沒有考慮處理機制，因此 size 的值必須夠大。

本次實作中為了動態配置 memory pool 或 hash tabel (有關 hash 的部份下一部份會介紹) 的大小有用一個 count_file_line 函式來計算 file 的行數，程式碼如下：








static unsigned int count_file_line(FILE* fp)
{
    unsigned int num_line = 0;
    char line[MAX_LAST_NAME_SIZE];
    while (fgets(line, sizeof(line), fp)) num_line++;
    rewind(fp);
    return num_line;
}

並將 size 設為 行數 + redundant ，其中 redundant 設為 500

但是改成使用 memory pool 之後卻在 append 10940 個 entry 之後發生 segmentation fault，將 pool->end 和每一次的 e 作比較發現 e 沒有超過 pool->end ，所以推斷不是分配給 memory pool 的記憶體大小不夠大，詳細原因待查明。

2017/10/15 新增

與 ChuiYiTang 同學討論過後，發現 memory pool API 的 pool_alloc 有問題，問題出在 p->next += size; ，因為 p->next 的型態為 entry*，entry 的大小為 32 bytes ，因此對 entry* 作加法時是以 sizeof(entry) 為單位，也就是說我以為加上了 size ，其實是加上了 size * sizeof(entry) 因此 pool->next 增加的比預期的還快 32 (sizeof(entry)) 倍，所以很快就超過分配的 memory pool 記憶體範圍，因此發生 segmentation fault (實在是不夠懂 C 語言啊！) ，實驗過程如下:

(gdb) p pool_alloc(pPool, sizeof(entry))
$1 = (void *) 0x7fffe2440020
(gdb) 
$2 = (void *) 0x7fffe2440420

可以看到 pool->next 一次增加了

(400)_{16} = 4 \times 256 = 1024

Bytes，預期只有 32 Bytes，修正程式碼如下：







entry *pool_alloc(pool *p, size_t size)
{
    if (pool_available(p) < size) return NULL;
    void *mem = p->next;
    p->next = (void *) p->next + size;
    return mem;
}

把第 5 行改成如上所示，把 p->next 強制轉型成 void* ，因為 sizeof(void) 為 1 Byte ，這樣就能讓行為跟預期的一樣了：

(gdb) p pool_alloc(pPool, sizeof(entry))
$1 = (void *) 0x7fffe2440020
(gdb) 
$2 = (void *) 0x7fffe2440040

一次加了 32 Bytes ，符合預期。

perf stat --repeat 100 -e cache-misses,cache-references,instructions,cycles ./phonebook_mempool

 Performance counter stats for './phonebook_mempool' (100 runs):

           27,7214      cache-misses              #   69.563 % of all cache refs      ( +-  0.19% )
           39,8508      cache-references                                              ( +-  0.82% )
       2,5219,5433      instructions              #    1.59  insn per cycle                                              ( +-  0.02% )
       1,5846,5443      cycles                                                        ( +-  0.25% )

       0.043636574 seconds time elapsed                                          ( +-  0.88% )

可以發現使用 memory pool 一次分配大塊的連續記憶體的話，append 的時間可以降低 40 % 的時間，原因是因為多次使用 malloc 的話容易造成記憶體破碎，spatial locality 不好，進而導致 cache miss 增加，但是用 perf 分析出來的結果卻與推論不符合，原因待查證。

Hash

Hash 的原理

假設我們要做搜尋，我們可以維護一個 hash table ，裡面紀錄著 key 與 value 的對應關係，key 是資料本身，value 則是將所有可以搜尋的資料餵給一個叫作 hash function 的函式的輸出 (hash value)。

由以上我們可以得出搜尋的過程為：

將待查資料放進 hash function
將 hash function 的輸出當 index 來找出 hash table 的值，若找的到即成功，找不到則失敗。
hash_value = hash(data), hash_table[hash_value] 存在 or 不存在

理想上用 hash 可以將搜尋的時間複雜度降為 O(1)。

但是也有可能兩個輸入值得到同一個 hash value ，這時就會產生 collision (碰撞) ，解決 collision 問題的方法有兩種：開放定址法 (open addressing)、鍵結法 (chaining)。

開放定址法 (open addressing)

線性探測 (linear probing) 、二次探測 (quadratic probing)

在 append hash table 時，一個資料經過 hash function 運算得到已經出現出現過的 key 值時，我們則將這個 key 作「調整」，而調整的方式根據 probing 的方法會有不同的結果：

Linear probing：
遇到碰撞時就把
$k e y + 1$ ，若還是遇到碰撞就再
$+ 1$ ，依此類推。
Quadratic Probing：
遇到碰撞就把
$k e y + 1^{2}$ ，若再遇到則改成
$+ 2^{2}$ ，遇到第
$i$ 次碰撞則
$+ i^{2}$ 。

使用 linear probing 會使得有連續一坨填滿的 bucket ，這樣會使得搜尋需要花很多的時間來跨越這些填滿的部份，會造成效能上的損失。

鍵結法 (chaining)

若遇到碰撞就在該 bucket 後加上一個 linked list ，若 hash value 相同則都去那個 linked list 查找 / append。

實驗 & 結果

此次作業採用的 hash function 是 BKDR function ，處理碰撞的方法是 open addressing 的 quadratic probing ， hash table 的 loading factor 是 0.75































unsigned int BKDRhash(char *str)
{
    unsigned int seed = 131;
    unsigned int hash = 0;
    while (*str)
        hash = hash * seed + (*str++);
    return (hash & 0x7FFFFFFF);
}

entry *findName_hash(char lastName[], entry *ht[], int ht_size)
{
    unsigned int hash_value = BKDRhash(lastName) % ht_size;
    if (NULL == ht[hash_value]) return NULL;
    for (int i = 0; i < ht_size; i++)
        if (!strcmp(lastName, ht[(hash_value + i * i) % ht_size]->lastName))
            return ht[(hash_value + i * i) % ht_size];

    return NULL;
}

void append_hash(char lastName[], entry *ht[], int ht_size)
{
    unsigned int hash_value = BKDRhash(lastName) % ht_size;
    for (int i = 0; i < ht_size; i++) {
        if (NULL == ht[(hash_value + i * i) % ht_size]) {
            ht[(hash_value + i * i) % ht_size] = (entry*) malloc(sizeof(entry));
            strcpy(ht[(hash_value + i * i) % ht_size]->lastName, lastName);
            break;
        }
    }
}

跑 perf 看結果：

$perf stat --repeat 100 -e cache-misses,cache-references,instructions,cycles ./phonebook_hash

 Performance counter stats for './phonebook_hash' (100 runs):

           55,8906      cache-misses              #   45.902 % of all cache refs      ( +-  0.50% )
          121,7606      cache-references                                              ( +-  0.23% )
       3,2939,1985      instructions              #    1.27  insn per cycle                                              ( +-  0.03% )
       2,5951,7427      cycles                                                        ( +-  0.40% )

       0.073122489 seconds time elapsed                                          ( +-  0.99% )

cache-misses 降到 45.902%

HackMD 有時候怪怪的，明明就貼圖了可是卻顯示不出來

append 的時間增加了快兩倍，但是 findName 的時間如預期的非常低。