2017q3 Homework1(PhoneBook)

# 2017q3 Homework1(PhoneBook) contributed by < `kevin55216` > ## 測試平台 ``` OS: ubuntu 16.04 LTS AMD64 RAM: 31.4GB CPU: i7-4790k 4.0GHz 4C8T Disk: 212.3GB ``` ## tree-ver 版本 ![](https://imgur.com/CdCdTxg.png) 第一版本，做一有五子樹的樹以字元做分類。 ![](https://i.imgur.com/wQA3LGK.png) 下圖為將 `i++` 改為 `for(int i=0; i<strlen(lastName); i=i+4)`，即比對第一、第五、第九字元，並將main中的測資改為中段 `invulnerably` (orig)。原執行測資為 `z` 開頭已經接近worst case。 ## 加入 hash 版本 ```clike= int aa=0; entry *hash[26]; entry *findName(char lastName[], entry *pHead) { if(aa!=0) { } else if(aa==0) { for(int i=0; i<26; i++) { hash[i]=NULL; } aa=1; } int j=tolower(lastName[0])-97; if(hash[j]!=NULL) { pHead=hash[j]; while (pHead != NULL) { if (strcasecmp(lastName, pHead->lastName) == 0) return pHead; pHead = pHead->pNext; } } else { return NULL; } printf("%s",(hash[j]->pNext)->firstName); return NULL; /* TODO: implement */ } entry *append(char lastName[], entry *e) { if(aa!=0) { } else if(aa==0) { for(int i=0; i<26; i++) { hash[i]=NULL; } aa=1; } int j=tolower(lastName[0])-97; strcpy(e->lastName, lastName); if(hash[j]!=NULL) { e->pNext=hash[j]->pNext; hash[j]->pNext=e; } else { hash[j]=e; } /* TODO: implement */ entry *pHead = (entry *) malloc(sizeof(entry)); return pHead; } ``` ![](https://i.imgur.com/nDBr0GV.png) 增加以字首`a`到`z`作為`hash table entry`得出`optimized Hash` ``` Performance counter stats for './phonebook_orig' (100 runs): 13,436,896 cache-misses # 86.654 % of all cache refs ( +- 0.46% ) 15,506,313 cache-references ( +- 0.49% ) 1,236,150,115 instructions # 1.17 insn per cycle ( +- 0.00% ) 1,059,967,066 cycles ( +- 0.31% ) 0.247856390 seconds time elapsed ( +- 0.49% ) Performance counter stats for './phonebook_opt' (100 runs): 585,066 cache-misses # 71.403 % of all cache refs ( +- 1.14% ) 819,389 cache-references ( +- 0.29% ) 352,048,173 instructions # 1.47 insn per cycle ( +- 0.01% ) 239,787,592 cycles ( +- 0.36% ) 0.056335788 seconds time elapsed ( +- 0.42% ) Performance counter stats for './phonebook_opth' (100 runs): 699,446 cache-misses # 68.985 % of all cache refs ( +- 0.33% ) 1,013,916 cache-references ( +- 0.23% ) 264,424,498 instructions # 1.61 insn per cycle ( +- 0.02% ) 164,601,972 cycles ( +- 0.39% ) 0.038370320 seconds time elapsed ``` ## 更改測試方式 ![](https://i.imgur.com/brltuZn.png) 更改`find()`函式的測試方式成從`randomlist.txt`中取値後求平均。 ```clike= fp = fopen(RAMDOM_FILE, "r"); if (fp == NULL) { printf("cannot open the file\n"); return -1; } char *input[100]; int j=0; while (fgets(line, sizeof(line), fp)) { while (line[i] != '\0') { i++; } line[i - 1] = '\0'; i = 0; input[j]=malloc(strlen(line)+1); strcpy(input[j],line); j++; } fclose(fp); . . . cpu_time_tmp=0; for(int k=0; k<100; k++) { clock_gettime(CLOCK_REALTIME, &start); findName(input[k], e); clock_gettime(CLOCK_REALTIME, &end); cpu_time_tmp += diff_in_second(start, end); } cpu_time2=cpu_time_tmp/100.0; ``` ## 回答問題 1. 有代表性嗎？跟真實英文姓氏差異大嗎？沒有代表性，測試資料裡面很多無意義的字母組合，但是演算法還是要能支援；例如"黃宏成台灣阿成世界偉人財神總統"這位，在戶政系統中也是要能對其做搜尋的 2. 資料難道「數大就是美」嗎？要測試演算法複雜度，確實有需要數大，但精簡才是美，用最少資料表達最多內容 3. 如果我們要建立符合真實電話簿情境的輸入，該怎麼作？去戶政單位要名單，台灣就至少有2300萬筆了；或者可以建造如GOOGLE般龐大系統，擁有許多使用者資料，其中包含使用者名稱 ## 使用網站產生 Random Name List: [http://lab.25sprout.com](http://lab.25sprout.com/nrprnd/)