2016q3 Homework1 (phonebook)

contributed by <f5120125>

tags: `sysprog`

請依照指定格式填寫標題和"contributed by" 課程助教

Reviewed by `GreenYo0413`

使用二元搜尋數來當作資料結構是個不錯的想法，但產生分枝的方法可以再詳述，像是使用strcmp還是自己定義的hash function。

已新增二元搜尋樹和雜湊函數f5120125

開發環境

Ubuntu 14.04 LTS

$ lscpu

CPU: Intel® Core™ i5 CPU 650 @ 3.20GHz × 4
Mem: 8 GiB
Cache:
L1d cache: 32 KB
L1i cache: 32 KB
L2 cache: 256 KB
L3 cache: 4096 KB

前置作業

Github

$ git branch <branch name>
$ git checkout <branch name>
$ git add .
$ git commit -m "my commit"
$ git pull
$ git push origin <branch name>

Gnuplot

reset
set ylabel 'time(sec)'
set style fill solid
set title 'perfomance comparison'
set term png size 1024,512 enhanced font 'Verdana,11'
set output 'runtime.png'

plot [:][:0.08]'output.txt' using 2:xtic(1) with histogram title 'original', \
'' using ($0-0.15):($2+0.001):2 with labels title ' ', \
'' using 3:xtic(1) with histogram title 'small-struct'  , \

Perf

參考[1]和[2]的文章所列出幾個常用指令學習perf

列出所有可以觸發perf採樣點的EVENT：

$ perf list

採樣點類型
1. [Hardware event]
2. [Software event]
3. [Hardware cache event]
4. [Kernel PMU event]

提供指定程序執行的概括情況

更詳細的說明

$ perf stat --help

以phonebook_orig為例：

$ perf stat ./phonebook_orig

取得概略資訊：

pid: 6196
size of entry : 136 bytes
execution time of append() : 0.057027 sec
execution time of findName() : 0.009792 sec

Performance counter stats for './phonebook_orig':

         89.716798      task-clock (msec)         #    0.994 CPUs utilized          
                18      context-switches          #    0.201 K/sec                  
                 0      cpu-migrations            #    0.000 K/sec                  
            12,357      page-faults               #    0.138 M/sec                  
       291,019,350      cycles                    #    3.244 GHz                      (82.23%)
       167,571,318      stalled-cycles-frontend   #   57.58% frontend cycles idle     (82.19%)
        98,915,774      stalled-cycles-backend    #   33.99% backend  cycles idle     (64.47%)
       272,060,112      instructions              #    0.93  insns per cycle        
                                                  #    0.62  stalled cycles per insn  (82.27%)
        46,888,250      branches                  #  522.625 M/sec                    (86.55%)
         1,274,939      branch-misses             #    2.72% of all branches          (85.12%)

       0.090283053 seconds time elapsed

察看整體系統負載：

$ sudo perf top

更詳細的說明

$ perf top --help

小結

往後可以根據觸發採樣點的EVENT類型, 去得知需要的資訊

例如, 若想知道cache miss在整體系統的狀況

$ perf top -e cache-misses

案例分析 - Phonebook

目標

減少Cache miss
提升效能

►根據提示可以減少Cache miss或提升效能的修改方向：

改用較小的結構
改用二元搜尋
使用雜湊函數

1. 未優化版本

►原本的結構 (共計 136 bytes)

16+16+16+10+10+16+16+16+2+5=123
compiler做alignment後有128-byte
最後加上pointer的size(在64-bit中為8-byte)：128+8=136















#define MAX_LAST_NAME_SIZE 16
#define OPT 1
typedef struct __PHONE_BOOK_ENTRY {
    char lastName[MAX_LAST_NAME_SIZE];
    char firstName[16];
    char email[16];
    char phone[10];
    char cell[10];
    char addr1[16];
    char addr2[16];
    char city[16];
    char state[2];
    char zip[5];
    struct __PHONE_BOOK_ENTRY *pNext;
}detail;

2. 優化版本 - 使用較小的結構

►新增結構 (只有32 bytes)





typedef struct __LAST_NAME{
	char lastName[ MAX_NAME_SIZE ];
	struct __LAST_NAME *pNext;
	struct __PHONE_BOOK_ENTRY *dNext;
}entry;

►執行時間

size of entry : 32 bytes
execution time of append() : 0.034547 sec
execution time of findName() : 0.003680 sec

尚未釐清
16+16+16+10+10+16+16+16+2+5=123
compiler做alignment後有128-byte
最後加上pointer的size(在64-bit中為8-byte)：128+8=136
在Makefile中已經將最佳化關掉，為什麼還是會有alignment的最佳化
上網查到關閉alignment的方法為使用關鍵字：attribute((packed));

3. 優化版本 - 使用二元搜尋

►採用二元搜尋樹的原因

Binary Search Tree	Average	Worst
space	$Θ (n)$	$O (n)$
search	$Θ (l o g n)$	$O (n)$
insert	$Θ (l o g n)$	$O (n)$
delete	$Θ (l o g n)$	$O (n)$

由上表可得知再一般情況下搜尋資料可以在
$Θ (l o g n)$

►如何建立二元搜尋樹

Top-down appraoch
$O (n l o g n)$
- 由root開始一筆一筆insert資料來建立二元搜尋樹
Bottom-up approach
$O (n)$
- 由於Input為 sorted list. 藉由已知的資料集, 我們可從leaf開始建立二元搜尋樹

►二元搜尋樹的實作

Tree node





typedef struct _PHONE_BOOK_BST {
    struct __LAST_NAME *data;
    struct _PHONE_BOOK_BST *left;
    struct _PHONE_BOOK_BST *right;
}BST;

Search function

假設str1為欲在樹中搜尋的字串, str2為目前探訪節點的字串
- 利用strcasecmp或者strncasecmp
  strcasecmp(const char *str1, const char *str2)
  strncasecmp(const char *str1, const char *str2, size_t len)
- 皆是 case insensitive
- 若 str1 和 str2 相同, 回傳值==0
- 若 str1 > str2, 回傳值>0 (往 right child 探訪)
- 若 str1 < str2, 回傳值<0 (往 left child 探訪)

Build tree

♦如何建立balanced tree structure

當root為資料集中間數, 樹的結構能夠較balanced, 所以以n筆sorted list來看, 我們將中間數放在root, 高度預期為
$Θ (l o g n)$

♦從sorted list中的何處開始建立subtree到完成整棵binary search tree

下列為從資料集中挑出的已排序資料

預期建立的binary search tree

Algorithm


























BST *sortedListToBSTRecur(entry** node, int len)
{
    /*
    base step:
    if len is zero, then return NULL
    */

    //build left subtree recursively
    BST *left = sortedListToBSTRecur(node, len>>1);

    /*
    allocate root here and set its attributes
    */

    /*
    Move the node to the next in this recursion step.
    When this step returns, the previous recursion step 
    will get the next node
    */
    *node = (*node)->pNext;

    //build right node
    root->right = sortedListToBSTRecur(node, len-(len>>1)-1);

    return root;
}

►分析演算法

step 1

遞迴建立左節點, 且因sorted list從第一筆開始, 其資料為最小, 要將其放入正確位置則必須深入binary tree的高度, 因此長度會在每次遞迴時除以2,直到碰到NULL

step 2

allocate節點並放入資料

step 3

因為已經處理完目前的資料, 要將指標移向sorted list的下一筆資料

step 4

遞迴建立右節點

Time Complexity

由於每個節點探訪一次並馬上allocate空間並放入資料, 其時間複雜度為
$O (n)$

►執行時間

size of entry : 32 bytes
size of tree node : 24 bytes
execution time of append() : 0.071578 sec
execution time of findName() : 0.000019 sec

4. 優化版本 - 使用雜湊函數

如何實作hash table
- 定義TABLE_SIZE
- 實作hash_function去取得key值
- 實作hash_append建立hash table
- 實作hash_find_name去找尋給定的last name
HashTable




typedef struct _HASH_TABLE_{
	int len;
	HashItem *next;
}HashTable;

HashItem





typedef struct _HASH_ITEM_{
	char lastName[ MAX_LAST_NAME_SIZE ];
	int key;
	struct _HASH_ITEM_ *next;
}HashItem;

hash_append
我將新增的item插入slut前端, 每個slut有自己的linked-list. 由於file是由小排序到大讀取, 因此linked-list會是由大到小
估計append的時間和建立linked-list差不多
由於每個slut有自己的linked-list. 若資料分佈均勻, findName時間可以下降
執行時間

size of hash item : 32 bytes
execution time of append() : 0.049486 sec
execution time of findName() : 0.000014 sec

2016q3 Homework1 (phonebook)

tags: sysprog

Reviewed by GreenYo0413

開發環境

Ubuntu 14.04 LTS

前置作業

Github

Gnuplot

Perf

列出所有可以觸發perf採樣點的EVENT：

提供指定程序執行的概括情況

察看整體系統負載：

小結

案例分析 - Phonebook

目標

►根據提示可以減少Cache miss或提升效能的修改方向：

1. 未優化版本

►原本的結構 (共計 136 bytes)

2. 優化版本 - 使用較小的結構

►新增結構 (只有32 bytes)

►執行時間

3. 優化版本 - 使用二元搜尋

►採用二元搜尋樹的原因

►如何建立二元搜尋樹

►二元搜尋樹的實作

Tree node

Search function

Build tree

♦如何建立balanced tree structure

♦從sorted list中的何處開始建立subtree到完成整棵binary search tree

Algorithm

►分析演算法

step 1

step 2

step 3

step 4

Time Complexity

►執行時間

4. 優化版本 - 使用雜湊函數

5. runtime的比較圖

tags: `sysprog`

Reviewed by `GreenYo0413`