2017q1 Homework4 (mergesort-concurrent)

contributed by <baimao8437>

開發環境

baimao@baimao-Aspire-V5-573G:~$ lscpu
Architecture:          x86_64
CPU 作業模式：    32-bit, 64-bit
Byte Order:            Little Endian
CPU(s):                4
On-line CPU(s) list:   0-3
每核心執行緒數：2
每通訊端核心數：2
Socket(s):             1
NUMA 節點：         1
供應商識別號：  GenuineIntel
CPU 家族：          6
型號：              69
Model name:            Intel(R) Core(TM) i5-4200U CPU @ 1.60GHz
製程：              1
CPU MHz：             1711.242
CPU max MHz:           2600.0000
CPU min MHz:           800.0000
BogoMIPS:              4589.38
虛擬：              VT-x
L1d 快取：          32K
L1i 快取：          32K
L2 快取：           256K
L3 快取：           3072K
NUMA node0 CPU(s):     0-3

原始效能分析

$ make $ ./sort 4 8
output:程式記憶體區段錯誤 (core dumped)
太完美了
後來發現第二個參數是檔名… $ ./sort 4 ./test_data/words.txt
output: 一堆 0 太棒了原來還沒支援 string sort
number sort test
新增 number.txt 裡面放 5 4 3 2 1
./sort 4 ./test_data/number.txt
成功輸出 1 2 3 4 5

邁向 phonebook sort

由於原本的程式碼是對"數字"做排序

先支援讀取 string
這其實不太難去擴充 struct node

typedef struct node {
char lastName[16];
val_t data; /**< Data of the node */
struct node *next;  /**< Pointer to the next node */
} node_t;

還要修改list_add & main 的build_list_from_file

可以存入 string 後由於此 mergesort 是對數字做排序
所以需要給每個 string 對應的數字去排序第一個想法是這樣算

static val_t evaluate(char lastName[])
{
    val_t val=0;
    for(int i = 0;i<strlen(lastName);i++){
        val *= 26;
        val += lastName[i]-'a'+1;
    }
    printf("%s->%ld\n",lastName,val);
    return val;
}

但排序結果…

aaaa                  correct: aaaa
aaah                           aaaaa
aacr                           aaaaaa
aaem                           aaaaaaa
aage                           aaaaaaaa
aagh                           aaaaaaaah

很明顯不符合 phonebook 要求雖然我覺得還照字串長度分好也不錯…

改良過後終於排出合理結果了雖然數字都很大不過還不會overflow

static val_t evaluate(char lastName[])
{
    val_t val = 0;
    //Max_size = 12
    for (int i = 0; i <12; i++) {
        val *= 26;
        if (lastName[i])
            val += lastName[i] - 'a' + 1;
    }
    return val;
}

缺點就是數字都會乘過12次26…

效能分析

先產生測試檔後測試

$ make genData 就會產生一個順序打亂的 input.txt
$ ./sort 4 ./test_data/input.txt > result.txt

#Total_tasks_consumed: 12
#Elapsed_time: 169.642 ms
#Throughput: 70 (per sec)
aaaa
...

$ ./sort 8 ./test_data/input.txt > result.txt

#Total_tasks_consumed: 24
#Elapsed_time: 164.929 ms
#Throughput: 145 (per sec)
aaaa
...

$ ./sort 16 ./test_data/input.txt > result.txt

#Total_tasks_consumed: 48
#Elapsed_time: 280.029 ms
#Throughput: 171 (per sec)
aaaa
...

$ ./sort 32 ./test_data/input.txt > result.txt

#Total_tasks_consumed: 96
#Elapsed_time: 505.960 ms
#Throughput: 189 (per sec)
aaaa
...

可以看到 thread 愈多並沒有愈快

改進

傳遞 lastName 都改用指標

typedef struct node {
    char *lastName;
    val_t data; /**< Data of the node */
    struct node *next;  /**< Pointer to the next node */
} node_t;

引進上一份作業的 text align and mmap

static uint32_t build_list_from_file(llist_t *_list, char const *fileName)
{
    alignAndMap(fileName);

    for (int i = 0 ; i < file_size / MAX_LAST_NAME_SIZE; i++) {
        list_add(_list, map + MAX_LAST_NAME_SIZE * i);
    }

    return _list->size;
}

Image Not Showing Possible Reasons

The image file may be corrupted
The server hosting the image is unavailable
The image path is incorrect
The image format is not supported

Learn More →

遇到一個問題就是 node->lastName 都包含換行符號
嘗試用if (lastName[i] == '\n') lastName[i] = '\0';
但輸出(xprintln)還是會有多空一行的情況

...
abbreviate
abbreviated
abbreviately

abbreviates
abbreviating

abbreviation

abbreviator
...

先讓 node->lastName 包含換行然後用 xprint 輸出結果就不會有多空一行的情況

效能

$ ./sort 4 ./test_data/input.txt > result.txt

orginal file size = 3206080
#Total_tasks_consumed: 12
#Elapsed_time: 129.541 ms
#Throughput: 92 (per sec)
aaaa
...

比未用mmap 的 169.642 ms 快蠻多的

Code Refactoring

把 list 包裝成 API 方便之後使用多型對於非字串輸入的排序
把 phonebook concurrent 的 text align and mmap 包裝成一個 API 這樣之後只要透過簡單的指令就可以實現 mmap
只是再設計 API 上還有些考慮不周的地方例如還沒有建 map 就呼叫 MMap.total_line() 要如何防呆之類的可見 API 的設計還有很多要學呢!