2016q3 Homework2 (phonebook-concurrent)

contribute by <bobsonlin>

前言

理解程式碼

先從main.c作為理解程式的開頭吧！







#ifndef OPT
    FILE *fp;
    int i = 0;
    char line[MAX_LAST_NAME_SIZE];
#else
    struct timespec mid;
#endif

優化版本由於是讓4個thread各自append整個word.txt，append的方式不同，所以只有未優化版本需要fp, line[MAX_LAST_NAME_SIZE]

















#ifndef OPT
    /* check file opening */
    fp = fopen(DICT_FILE, "r");
    if (!fp) {
        printf("cannot open the file\n");
        return -1;
    }
#else

#include "file.c"
#include "debug.h"
#include <fcntl.h>
#define ALIGN_FILE "align.txt"
    file_align(DICT_FILE, ALIGN_FILE, MAX_LAST_NAME_SIZE);    //why the file must be aligned ?
    int fd = open(ALIGN_FILE, O_RDONLY | O_NONBLOCK);
    off_t fs = fsize( ALIGN_FILE);
#endif

在讀word.txt前，我們必須將word.txt做對齊，原因是若沒將word.txt裡的每個字串做長度的對齊，在將字串投射至memory時，我們會不知道該如何移動pointer才會指到下一個字串。
file_align(), fsize()是此程式作者自己寫的function，定義在file.c
我們用open()來得到word.txt的file descriptor，是為了將來的mmap()













char *map = mmap(NULL, fs, PROT_READ, MAP_SHARED, fd, 0);

assert(map && "mmap error");

    /* allocate at beginning */
entry *entry_pool = (entry *) malloc(sizeof(entry) * fs / MAX_LAST_NAME_SIZE);

assert(entry_pool && "entry_pool error");

pthread_setconcurrency(THREAD_NUM + 1);

pthread_t *tid = (pthread_t *) malloc(sizeof( pthread_t) * THREAD_NUM);
append_a **app = (append_a **) malloc(sizeof(append_a *) * THREAD_NUM);

接下來第一步就是將word.txt mapping至記憶體，回傳值map是字典檔mapping後的記憶體起始位址(可以利用此pointer來search字典檔裡的每個lastname)
entry_pool的概念是，一個lastname會有一個entry，entry_pool就是建了一大長串的entry來裝word.txt裡的每個lastname。
pthread_setconcurrency告知系統需要的concurrency level(詳見參考資料)
app[i]是指第i個thread在word.txt裡所建的append list









for (int i = 0; i < THREAD_NUM; i++)
    app[i] = new_append_a(map + MAX_LAST_NAME_SIZE * i, map + fs, i, THREAD_NUM, entry_pool + i);

clock_gettime(CLOCK_REALTIME, &mid);
for (int i = 0; i < THREAD_NUM; i++)
    pthread_create( &tid[i], NULL, (void *) &append, (void *) app[i]);

for (int i = 0; i < THREAD_NUM; i++)
    pthread_join(tid[i], NULL);

第一個for loop是在初始化每個app[i]
第二、三個for loop的結果是讓4個thread各自append字典檔裡的lastname

到此處我們需先停一下，解釋現在的狀況：

執行完第一個loop後，情況如下圖所示
Image Not Showing Possible Reasons
- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More →
接著4個thread要各自執行append()，所以必須先了解append的作法




























void append(void *arg)
{
    struct timespec start, end;
    double cpu_time;

    clock_gettime( CLOCK_REALTIME, &start);

    append_a *app = (append_a *) arg;

    int count = 0;
    entry *j = app->entryStart;
    for (char *i = app->ptr; i < app->eptr;
            i += MAX_LAST_NAME_SIZE * app->nthread,
            j += app->nthread,count++) {
        app->pLast->pNext = j;
        app->pLast = app->pLast->pNext;

        app->pLast->lastName = i;
        dprintf("thread %d append string = %s\n", app->tid, app->pLast->lastName);
        app->pLast->pNext = NULL;
    }
    clock_gettime(CLOCK_REALTIME, &end);
    cpu_time = diff_in_second(start, end);

    dprintf("thread take %lf sec, count %d\n", cpu_time, count);

    pthread_exit(NULL);
}

此函式的for loop內在做兩件事

在經過mapping的memory區域中(就是map指向的memory區域)，找到自己thread要的對應位置的lastname，將之存入entry內。
串接每個entry，使之形成一個list

當4個thread append完字典檔裡的所有字後，會執行下方的程式















entry *etmp;
pHead = pHead->pNext;
for (int i = 0; i < THREAD_NUM; i++) {
    if (i == 0) {
        pHead = app[i]->pHead->pNext;
        dprintf("Connect %d head string %s %p\n", i, app[i]->pHead->pNext->lastName, app[i]->ptr);
    } else {
        etmp->pNext = app[i]->pHead->pNext;
        dprintf("Connect %d head string %s %p\n", i, app[i]->pHead->pNext->lastName, app[i]->ptr);
    }

    etmp = app[i]->pLast;
    dprintf("Connect %d tail string %s %p\n", i, app[i]->pLast->lastName, app[i]->ptr);
    dprintf("round %d\n", i);
}

此程式目的是將4個thread所建立的append list串接起來，形成一個大list，以供之後的findname使用

之後的程式碼就沒做太多修改，findname的整體方式也沒有變(還是有小修改，詳見video)，就是在entry_pool裡的逐一比對每個entry內的lastname

程式碼錯誤之處

此程式碼有一個很明顯的錯誤，在main.c的第105行程式裡:
ps.為了完整性，我在此附上整個片斷的程式
















for (int i = 0; i < THREAD_NUM; i++) {
    if (i == 0) {
        pHead = app[i]->pHead->pNext;
        dprintf("Connect %d head string %s %p\n", i,
                    app[i]->pHead->pNext->lastName, app[i]->ptr);
    } else {
        etmp->pNext = app[i]->pHead->pNext;
        dprintf("Connect %d head string %s %p\n", i,
                    app[i]->pHead->pNext->lastName, app[i]->ptr);
    }

    etmp = app[i]->pLast;
    dprintf("Connect %d tail string %s %p\n", i,
                app[i]->pLast->lastName, app[i]->ptr);
    dprintf("round %d\n", i);
}

此處很明顯看到，pHead被assign給app[0]的第二個entry的位址(跳過了第一個entry)，所以之後的findName函式是無法找到word.txt裡的第一個lastname。除此之外，觀察程式中的etmp可以發現，thread 1所建立的append list的尾端式連接到thread 2的append list的第二個entry，同理thread 3, thread 4。所以透過findName是找不到word.txt的第二個lastname。

解決方法為
















for (int i = 0; i < THREAD_NUM; i++) {
    if (i == 0) {
        pHead = entry_pool;
        dprintf("Connect %d head string %s %p\n", i,
                    app[i]->pHead->pNext->lastName, app[i]->ptr);
    } else {
        etmp->pNext = app[i]->pHead;
        dprintf("Connect %d head string %s %p\n", i,
                    app[i]->pHead->pNext->lastName, app[i]->ptr);
    }

    etmp = app[i]->pLast;
    dprintf("Connect %d tail string %s %p\n", i,
                app[i]->pLast->lastName, app[i]->ptr);
    dprintf("round %d\n", i);
}