2016q3 Homework1 (phonebook)

contributed by <TempoJiJi>

tags: `TempoJiJi` `phonebook` `sysprog21`

預期目標

這次會將目標放在挑戰題上，當然基本的需求也會全部再做一次復習。
附上之前的共筆：2016q1phonebook

準備 GNU/Linux 開發工具
學習使用 Git 與 GitHub
學習效能分析工具
研究軟體最佳化
挑戰題
- 縮減append()時間
- fuzzy search
- 回顧輸入的資料分佈，指出其中不合理的設計，並且著手改進

開發環境

Ubuntu 16.04 LTS
L1d cache: 32K
L1i cache: 32K
L2 cache: 256K
L3 cache: 3072K
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 4
Model name: Intel® Core™ i5-4200U CPU @ 1.60GHz

補上電腦硬體環境描述，像是 CPU, cache, main memory 等資訊，這樣日後重現實驗時，才有參考 jserv

首先測試程式效能：

size of entry : 136 bytes
execution time of append() : 0.076191 sec
execution time of findName() : 0.005781 sec

Performance counter stats for './phonebook_orig' (100 runs):

           967,794      cache-misses              #   87.985 % of all cache refs      ( +-  0.37% )
         1,099,949      cache-references                                              ( +-  0.28% )
       260,680,693      instructions              #    1.48  insns per cycle          ( +-  0.02% )
       175,878,472      cycles                                                        ( +-  0.26% )

       0.071930206 seconds time elapsed                                          ( +-  0.79% )

修改結構大小降低cache miss


















typedef struct __PHONE_BOOK_DETAILS {
    char firstName[16];
    char email[16];
    char phone[10];
    char cell[10];
    char addr1[16];
    char addr2[16];
    char city[16];
    char state[2];
    char zip[5];
} Details;

typedef struct __PHONE_BOOK_ENTRY {
    char lastName[MAX_LAST_NAME_SIZE];
    struct __PHONE_BOOK_ENTRY *pNext;
    Details *details;
} entry;

結構大小減到32，而cache miss也大幅度降低

size of entry : 32 bytes
execution time of append() : 0.042138 sec
execution time of findName() : 0.002712 sec

 Performance counter stats for './phonebook_orig' (100 runs):

           115,790      cache-misses              #   29.443 % of all cache refs      ( +-  0.89% )
           393,264      cache-references                                              ( +-  0.64% )
       243,880,933      instructions              #    1.93  insns per cycle          ( +-  0.02% )
       126,628,229      cycles                                                        ( +-  0.47% )

       0.051748813 seconds time elapsed                                          ( +-  0.67% )

實做Hash降低findname時間

Table Size 爲 42737，選用質數碰撞的幾率較小

Table的結構：



typedef struct _HAST_TABLE {
    entry **table;
} hash_table;

爽指標的用意爲：將碰撞的資料用linkedlist鏈接起來

Hash function：







int hash(char *str)
{
    unsigned int hash_value = 0;
    while(*str)
        hash_value = (hash_value << 5) - hash_value + (*str++);
    return (hash_value % SIZE);
}

這裏使用的是BKDR hash function

使用PThread降低append時間

觀察append：







   while (fgets(line, sizeof(line), fp)) {
        while (line[i] != '\0')
            i++;
        line[i - 1] = '\0';
        i = 0;
        append(line,my_hash_table);     //for OPT
    }












void append(char *line)
{
    entry *new_entry;
    int hash_value = hash(lastName);

    new_entry = (entry *) malloc(sizeof(entry));

    /* Creating entry list by hash table */
    strcpy(new_entry->lastName , lastName);
    new_entry->pNext = my_hash_table->table[hash_value];
    my_hash_table->table[hash_value] = new_entry;
}

由上述的程式碼中，可以確認每筆資料在append時能夠獨立處理，所以能夠使用多執行緒來進行append。

上網爬了一下文，發現POSIX pthread在讀檔時會確保每個執行緒是獨立安全的，所以不需要利用mutex_lock等類似的機制（但是要確保function可以獨立進行），且會將檔案以byte來進行分割。
參考資料

之後就是實做方面，以下是我的code：



























#if defined (OPT)
void work(){
    char line[16];
    int i = 0;
    while(!feof(fp)){
        if(fgets(line,sizeof(line),fp)!=NULL){
            while(line[i] != '\0')
                i++;
            line[i-1] = '\0';
            i=0;
            append(line);
        }
    }
}
#endif         

int main(){
...
#if defined(OPT)
    clock_gettime(CLOCK_REALTIME, &start);
    pthread_t Thread[MAX_THREAD];
    for(i=0;i<MAX_THREAD;i++)
        pthread_create(&Thread[i],NULL,(void*)work,NULL);
    for(i=0;i<MAX_THREAD;i++)
        pthread_join(Thread[i],NULL);
...
}

執行結果：

size of entry : 32 bytes
execution time of append() : 0.170208 sec
execution time of findName() : 0.000000 sec

append時間大幅度飆高…原因還需要在研究看看

爬了一下文，原因是受到I/O的限制，就算分成再多的執行緒效能也不會好到哪裏去，而且還有可能會降低效能。參考資料

使用 mmap() 來存取資料，而非 read()，你需要先撰寫一個工具，將文字檔案轉換為 binary representation，之後再用 mmap()，這樣才能省去非必要的效能損失 jserv

利用mmap改善效能

存储映射I/O(Memory-mapped I/O)使一个磁盘文件与存储空间的一个缓冲区相映射。当从缓存区中取数据，就相当于读文件中的相应字节。与此类似，将数据存入缓冲区，相应的就自动地写入文件。这样就可以在不使用read和write的情况下执行I/O。普通文件被映射到进程地址空间后，进程可以像访问普通内存一样对文件进行访问。
優點：

操作文件就像操作内存一样，，因此不需要多餘的system call、context switch等，适合于对较大文件的读写

缺點：

可能會有多餘沒使用到空間，假設文件大小爲10bytes，那麼（假設每一次mapping size都4096）4096-10=4086bytes的空間就會浪費掉

先來點測試：



























char *s;

unsigned int size;
struct stat buf;

/* O_RDONLY = read only */
int fd = open (DICT_FILE,O_RDONLY);

/* Get the size of the file */
int status = fstat(fd, &buf);
size = buf.st_size;

/* Start mapping */
clock_gettime(CLOCK_REALTIME, &start);
s = mmap (0, size, PROT_READ, MAP_PRIVATE, fd, 0);

int k = 0;
for(int i = 0; i < size; i++){
    line[k] = s[i];
    if(s[i] == 10){	//newline
    	line[k] = '\0';
    	k = 0;
	e = append(line,e);
    }
    else
    	k++;
}

執行結果：

可以看到append的時間降低了不少。由這個結果就可以發現，read()的system call等機制花了不少時間。

參考資料：

共享内存mmap介绍

BK TREE

這裏先介紹 Levenshtein Distance （編輯距離）：

指一個字串轉換成為另外一個字串所需的最少「編輯操作」次數，所謂的編輯操作可以是替換字元、插入字元、刪除字元，是個用來衡量兩字串相似度的重要指標。

例如將`kitten`轉成`sitting`：
step 1: kitten → sitten  (substitution of "s" for "k")
step 2: sitten → sittin  (substitution of "i" for "e")
step 3: sittin → sitting (insertion of "g" at the end)

會經過3次編輯操作，所以編輯距離爲 3

而BK tree的構造就是以編輯距離來計算形成的。

在append時計算字串跟當前節點的編輯距離，然後再繼續往下尋找計算，直到空節點爲止，才將字串加入結構。
這樣的結構可以省下不少findname的時間，但在append時每次都需要往下尋找直到空節點，所以時間會較長。

實做方面：













/* 計算編輯距離 */
int dist(char *a,char *b)         
{
    int count = 0;
    int max = MAX(strlen(a),strlen(b));
    int min = MIN(strlen(a),strlen(b));
    for(int i=0; i<max; i++) {
        if(a[i] != b[i] || i >= min)
            count ++;
    }
	/* MAX_SIZE爲子節點的數量 */
    return count % MAX_SIZE;
}

















void append(char lastName[],entry *e)
{
    while(e != NULL) {
        int distance = dist(lastName , e->lastName);
        if(e->pNext[distance] != NULL)
            e = e->pNext[distance];
        else {
            e->pNext[distance] = (entry *) malloc(sizeof(entry));
            e = e->pNext[distance];
            strcpy(e->lastName, lastName);
	     /* Initialize */
            for(int i=0; i<MAX_SIZE; i++)
                e->pNext[i] = NULL;
            return;
        }
    }
}

結果：

MAX_SIZE = 16
size of entry : 152 bytes
execution time of append() : 0.645208 sec
execution time of findName() : 0.000014 sec


Performance counter stats for './bk_tree' (100 runs):

         2,564,592      cache-misses              #   33.193 % of all cache refs      ( +-  0.48% )
         7,726,256      cache-references                                              ( +-  0.10% )
     2,629,117,840      instructions              #    1.57  insns per cycle          ( +-  0.17% )
     1,675,390,835      cycles                                                        ( +-  0.10% )

       0.654240669 seconds time elapsed                                          ( +-  0.12% )

比較不同MAX_SIZE的效能：

MAX_SIZE = 8
size of entry : 88 bytes
execution time of append() : 0.665988 sec
execution time of findName() : 0.000016 sec

MAX_SIZE = 4
size of entry : 56 bytes
execution time of append() : 0.654146 sec
execution time of findName() : 0.000014 sec

MAX_SIZE = 2
size of entry : 40 bytes
execution time of append() : 0.699443 sec
execution time of findName() : 0.000014 sec

參考資料：
Levenshtein Distance (編輯距離)

Fuzzy Seach

支援 fuzzy search，允許搜尋 lastName 時，不用精準打出正確的寫法
比方說電話簿有一筆資料是 McDonald，但若使用者輸入 MacDonald 或 McDonalds，也一併檢索出來。

這裏的實做方面也是利用編輯距離來進行，以編輯距離的大小來進行搜尋











void findName(char lastname[], entry *pHead)
{
    int distance;
    char buf[MAX_LAST_NAME_SIZE];
    while (pHead != NULL) {
        distance = dist(lastname , pHead->lastName);
        if(distance <= 2)
            printf("intput=%s , %s , dist=%d\n",lastname,pHead->lastName,distance);
        pHead = pHead -> pNext;
    }
}

將所有編輯距離小於3的資料都列出來

以 "douglas" 來進行測試

intput=douglas , dongles , dist=2
intput=douglas , douala , dist=2
intput=douglas , doubles , dist=2
intput=douglas , dougl , dist=2
intput=douglas , douglas , dist=0
intput=douglas , douglass , dist=1
intput=douglas , dougnac , dist=2
intput=douglas , dougnan , dist=2
intput=douglas , dounias , dist=2

size of entry : 32 bytes
execution time of append() : 0.054440 sec
execution time of findName() : 0.022088 sec

在這裏想說要加個記錄最小dist的資料，但考慮到如果在真實世界，有太多名字相同且相似的人，那樣列出來的最小dist的資料也會有幾十萬筆，那就有搜尋跟沒搜尋沒什麼差別。但如果再加多幾個詳細資料在details中，那這個fuzzy searching可能就有非常大的作用了。

參考資料：
Levenshtein Distance (編輯距離)

總結所有效能落差：

優化（mmap 加上 Hash function的優化版本）與未優化版本比較：

2016q3 Homework1 (phonebook)

tags: TempoJiJi phonebook sysprog21

預期目標

開發環境

修改結構大小降低cache miss

實做Hash降低findname時間

使用PThread降低append時間

利用mmap改善效能

BK TREE

Fuzzy Seach

Read more

針對多執行緒環境設計的 Memory allocator

Dynamically allocate space Note

2016q3 Homework3 (mergesort-concurrent)

2016q3 Homework1 (raytracing)

tags: `TempoJiJi` `phonebook` `sysprog21`