2016q3 Homework1 (phonebook)

contributed by <kaizsv>

請依照格式填寫標題和"contributed by"課程助教

開發環境

lubuntu 16.04
CPU: Intel® Core™ i5-5200U CPU @ 2.20GHz
Memory: 8 GB
Cache:
- L1d: 32K
- L1i: 32K
- L2: 256K
- L3: 3072K

phonebook

phonebook_orig

$ perf stat -r 100 -e cache-misses,cache-references,instructions,cycles ./phonebook_orig

Performance counter stats for './phonebook_orig' (100 runs):

         3,421,394      cache-misses              #   93.107 % of all cache refs
         3,674,686      cache-references
       261,157,949      instructions              #    1.46  insns per cycle
       179,200,521      cycles

       0.068406278 seconds time elapsed

phonebook_opt

1. 改寫 structure __PHONE_BOOK_ENTRY

參考programming small，append() 及 findName()只會處理 lastName 欄位，而原本的 entry structure 包含了其它欄位，在這個例子裡我們可以用另外一個 structure 存放原本 entry 中除了 lastName 外的其它欄位，透過降底 entry size 減少 cache-miss 次數。

Performance counter stats for './phonebook_opt' (100 runs):

         1,198,333      cache-misses              #   74.482 % of all cache refs
         1,608,887      cache-references
       244,631,790      instructions              #    2.00  insns per cycle
       122,607,619      cycles

       0.047359455 seconds time elapsed

很明顯 cache-misses cache-references 減少約一半，cycles 也減少且 IPC 提高到 2。

2. smaz 壓縮

使用老師提供的連結 smaz，但是壓縮字串須要給壓縮前的長度及壓縮後的長度，解壓縮回來也須要知道壓縮前的字串長度，不然不會準確以\0為字串結尾。如此就不能在 findName() 直接以 strcasecmp（） 比較字串。

因此我的做法是直接在 main.c 壓縮字串，而不直接改 phonebook_opt.c，而且沒有解壓縮，直接使用壓縮後的字串進行比對，等於是把壓縮演算法想成 hash function。我手動設壓縮率為50%，且 entry 的 lastName 可以為 8 bytes 就好。

phonebook_opt.h





typedef struct __PHONE_BOOK_ENTRY {
	char lastName[8];
	detail *pDetail;
	struct __PHONE_BOOK_ENTRY *pNext;
}

main.c















while (fgets(line, sizeof(line), fp)) {
        while (line[i] != '\0')
            i++;
        line[i - 1] = '\0';
		#if defined(OPT)
		int smaz_size = (float) (i-1) * SMAZ_LAST_NAME_SCALE + 0.5f;
		char smaz_line[smaz_size];
		smaz_compress(line, i - 1, smaz_line, smaz_size);
		#else
		char smaz_line[MAX_LAST_NAME_SIZE];
		strcpy(smaz_line, line);
		#endif
        i = 0;
        e = append(smaz_line, e);
    }

要搜尋再把壓縮 input 壓縮後傳到findName()。

Performance counter stats for './phonebook_opt' (100 runs):

           832,114      cache-misses              #   73.942 % of all cache refs
         1,125,354      cache-references
       742,530,000      instructions              #    1.52  insns per cycle
       487,882,795      cycles

       0.184548656 seconds time elapsed

從 perf 可以看到 cache-misses 又下降一些，append()時間變成3倍，findName()時間變為 1/3 倍。但append()時間還是太長了，我把程式碼放在 smaz branch。

3. pthread

去年有做了hash function, red-black tree, fuzzy search等版本，因此就直接做 pthread。

先研究這份資料，他希望不要用pthread_join()就可以讓 main thread 不做額外的 signal 並等待其它 thread 結束，pthread_cond_wait()要與 mutex lock 一起呼叫，它會釋放 mutex lock 且 block 在 pthread_cond_t，允許其它人取得 mutex lock，因此我參考他的做法，在pthread_creat()後用迴圈等待其它 thread 完成。







while (not_read_end) {
    pthread_mutex_lock(&finish_mutex);
    while (finished < NUM_OF_THREADS) {
        pthread_cond_wait(&finish_cond, &finish_mutex);
    }
    pthread_mutex_unlock(&finish_mutex);
}

在 thread function 裡有判斷結束的條件，也就是 fgets(line, sizeof(line), fp) == NULL，此時main thread因finish_cond而等待，finish_mutex被釋放，就可以讓全域變數加1，當所有thread 都加1後就會被main threadcancel。

參考資料的做法是要等所有的 thread 都準備好的時候再一起執行，我想說作業要求分析append()的時間，就沒有做這部份，只確保所有 thread 執行完後再回到 main thread。





















while (1) {
    pthread_mutex_lock(&current_mutex);
    not_read_end = fgets(line, sizeof(line), fp);
    pthread_mutex_unlock(&current_mutex);

    if (not_read_end == NULL)
        break;

    while (line[i] != '\0')
        i++;
    line[i - 1] = '\0';
    i = 0;
    data->e = append(line, data->e);
}
	
pthread_mutex_lock(&finish_mutex);
finished++;
pthread_cond_signal(&finish_cond);
pthread_mutex_unlock(&finish_mutex);

pthread_exit(NULL);

下面分別是 2 個 thread 及 5 個 thread 版本的效能分析。

2 threads
Performance counter stats for './phonebook_opt' (100 runs):

         1,003,743      cache-misses              #   11.291 % of all cache refs
         8,889,500      cache-references
       439,770,486      instructions              #    0.76  insns per cycle
       576,668,041      cycles

       0.126104883 seconds time elapsed

5 threads
Performance counter stats for './phonebook_opt' (100 runs):

           949,669      cache-misses              #    6.498 % of all cache refs
        14,614,192      cache-references
       500,208,293      instructions              #    0.57  insns per cycle
       873,157,426      cycles

       0.135140143 seconds time elapsed

因為沒有修改phonebook_opt.c我猜想findName()時間會下降，是因為pHead->lastName的順序已經不是照字母排了，剛好zyxel在比較前面的位置，不過這只是我的猜想，沒有證明。

BUG

有時候make plot會發生findName() assert錯誤的情況，找不到輸入字串，有先試著用 GDB 看看是什麼情況，還是沒有解決，稍晚再試。