2017q3 Homework1(PhoneBook)

contributed by < kevin55216 >

測試平台

OS: ubuntu 16.04 LTS AMD64
RAM: 31.4GB
CPU: i7-4790k 4.0GHz 4C8T
Disk: 212.3GB

tree-ver 版本

第一版本，做一有五子樹的樹以字元做分類。

下圖為將 i++ 改為 for(int i=0; i<strlen(lastName); i=i+4)，即比對第一、第五、第九字元，並將main中的測資改為中段 invulnerably (orig)。
原執行測資為 z 開頭已經接近worst case。

加入 hash 版本






















































int aa=0;
entry *hash[26];


entry *findName(char lastName[], entry *pHead)
{

    if(aa!=0) {

    } else if(aa==0) {
        for(int i=0; i<26; i++) {
            hash[i]=NULL;
        }
        aa=1;
    }
    int j=tolower(lastName[0])-97;
    if(hash[j]!=NULL) {
        pHead=hash[j];
        while (pHead != NULL) {
            if (strcasecmp(lastName, pHead->lastName) == 0)
                return pHead;
            pHead = pHead->pNext;
        }
    } else {
        return NULL;
    }
    printf("%s",(hash[j]->pNext)->firstName);
    return NULL;
    /* TODO: implement */
}

entry *append(char lastName[], entry *e)
{
    if(aa!=0) {
    } else if(aa==0) {
        for(int i=0; i<26; i++) {
            hash[i]=NULL;
        }
        aa=1;
    }
    int j=tolower(lastName[0])-97;
    strcpy(e->lastName, lastName);
    if(hash[j]!=NULL) {
        e->pNext=hash[j]->pNext;
        hash[j]->pNext=e;
    } else {
        hash[j]=e;
    }
    /* TODO: implement */
    entry *pHead = (entry *) malloc(sizeof(entry));
    return pHead;
}

增加以字首a到z作為hash table entry得出optimized Hash

 Performance counter stats for './phonebook_orig' (100 runs):

        13,436,896      cache-misses              #   86.654 % of all cache refs      ( +-  0.46% )
        15,506,313      cache-references                                              ( +-  0.49% )
     1,236,150,115      instructions              #    1.17  insn per cycle           ( +-  0.00% )
     1,059,967,066      cycles                                                        ( +-  0.31% )

       0.247856390 seconds time elapsed                                          ( +-  0.49% )

 Performance counter stats for './phonebook_opt' (100 runs):

           585,066      cache-misses              #   71.403 % of all cache refs      ( +-  1.14% )
           819,389      cache-references                                              ( +-  0.29% )
       352,048,173      instructions              #    1.47  insn per cycle           ( +-  0.01% )
       239,787,592      cycles                                                        ( +-  0.36% )

       0.056335788 seconds time elapsed                                          ( +-  0.42% )
 Performance counter stats for './phonebook_opth' (100 runs):

           699,446      cache-misses              #   68.985 % of all cache refs      ( +-  0.33% )
         1,013,916      cache-references                                              ( +-  0.23% )
       264,424,498      instructions              #    1.61  insn per cycle           ( +-  0.02% )
       164,601,972      cycles                                                        ( +-  0.39% )

       0.038370320 seconds time elapsed

更改測試方式

更改find()函式的測試方式成從randomlist.txt中取値後求平均。





























    fp = fopen(RAMDOM_FILE, "r");
    if (fp == NULL) {
        printf("cannot open the file\n");
        return -1;
    }
    char *input[100];
    int j=0;
    while (fgets(line, sizeof(line), fp)) {
        while (line[i] != '\0') {
            i++;
        }
        line[i - 1] = '\0';
        i = 0;
        input[j]=malloc(strlen(line)+1);
        strcpy(input[j],line);
        j++;
    }
    fclose(fp);
    .
    .
    .
    cpu_time_tmp=0;
    for(int k=0; k<100; k++) {
        clock_gettime(CLOCK_REALTIME, &start);
        findName(input[k], e);
        clock_gettime(CLOCK_REALTIME, &end);
        cpu_time_tmp += diff_in_second(start, end);
    }
    cpu_time2=cpu_time_tmp/100.0;

回答問題

有代表性嗎？跟真實英文姓氏差異大嗎？
沒有代表性，測試資料裡面很多無意義的字母組合，但是演算法還是要能支援；例如"黃宏成台灣阿成世界偉人財神總統"這位，在戶政系統中也是要能對其做搜尋的
資料難道「數大就是美」嗎？
要測試演算法複雜度，確實有需要數大，但精簡才是美，用最少資料表達最多內容
如果我們要建立符合真實電話簿情境的輸入，該怎麼作？
去戶政單位要名單，台灣就至少有2300萬筆了；或者可以建造如GOOGLE般龐大系統，擁有許多
使用者資料，其中包含使用者名稱

使用網站

產生 Random Name List: http://lab.25sprout.com