2016q3 Homework2 (phonebook-concurrent)

contributed by <jkrvivian>

預期目標

學習第二週提及的 concurrency 程式設計
學習 POSIX Thread
學習效能分析工具
code refactoring 練習
探索 clz 的應用

資料閱讀

Concurrency 系列

multithread底下有問題，常是因執行順序的不確定性
Sequenced-before: 對 同一個 thread下求值的描述
- evaluation(求值)
  - value computation (如:left-to-right associativity)
  - side effect (修改記憶體的值呼叫Library)

A is not sequenced-before B and B is not sequenced-before A ==>執行順序不一定，求值時可能會重疊，優化後的執行順序會交錯

雖然訂定了許多evaluation order，但仍有一些未定義的行為
- i = i++

concurrency系列(二)

value computation V.S. side effect

有點混亂了@@
i++這類的後置運算子，value computation會先於side effect

對於assignment operator而言(=, +=, -=之類的)，會先進行運算元的value computation，之後才是assigment的side effect, 最後是整個assigment expression的value computation

happens-before : 前一個操作的效果在後一個操作執行之前必須要可見
- 執行順序不一定就如程式所撰寫的順序一樣
- A happens-before B != A happening before B
  - 執行順序不一定是A在前
  - 只要A在B執行前可見就好
- 同一個thread內sequenced-before就是happen-before
  - C++定義能夠建立跨thread間的happens-before
    - 1. A synchronizes-with B (A和B有適當的同步)
    - 1. A is dependency-ordered before B (A和B有相依的順序關係)
    - 1. A synchronizes-with some evaluation X, and X is sequenced-before B
    - 1. A is sequenced-before some evaluation X, and X inter-thread happens-before B
    - 1. A inter-thread happens-before some evaluation X, and X inter-thread happens-before B
synchornized-with :兩個不同thread間的同步行為，當A synchronized-with B的時，代表A對記憶體操作的效果，對於B是可見的。而A和B是兩個不同的thread的某個操作
- 跨thread版本的happens-before
- synchronized in java
  - mutual exclusive
    - 只允許一個thread使用
  - 建立happens before關係
    - 確保每次對物件的修改都會對接下來的thread可見
- volatile in java
  - 不同thread對變數的讀寫是atomic
  - 都會讀到最新的值
memory consistency models: 系統要想辦法在保證正確的情況下，盡可能的最佳化，讓程式跑的又快又好
- 只是一種 幻象約定 只要結果跟約定好的定義相同就好，改變執行順序也沒關係
- sequential consistency:
  - 對於每個獨立的處理單元，執行時都維持程式的順序(Program Order)
  - 整個程式以某種順序在所有處理器上執行
cache architecture 與 sequential consistency
- cache coherence:確保不同處理器上的cache在系統中保持一致
- detecting the completion of write operations
- maintaining the illusion of atomicity for writes

冼鏡光的並行計算投影片

平行(parallel)：兩件事齊頭並行
並行(concurrent)：兩件事中交錯執行、但每一瞬間每件事都有點進展 ==> 一般電腦就是一個並行系統

Concurrency (並行) vs. Parallelism (平行)

影片
- concurrent : a composition of independently process(executing things) ==> typically functions
  - dealing a lot of things at once
  - goal: good structure
  - 不一定要平行
- parallel : a simultaniously execution of multiple things
  - doing a lot of things at once

parallelism can come from better expression of concurrent problem

Process vs. Thread vs. Coroutines

Process: 包括了一個執行中的程式，和一些他所需的系統資源，諸如檔案描述表和位址空間等
Threads: 一個程式計數器、一個堆疊再加上一組暫存器,掌握了所有的執行活動,『共享』系統資源
coroutines: 多工實做技術，合作式多工

過往接觸過的：
A每次呼叫B或C都是從兩人的最開始執行到最後再回到A

coroutines:
每次都是回到ABC三者的中斷點再接續執行，而不是從頭開始

若有使用到local變數，那執行結果可能就不如預期了

解決辦法：allocate separate stack space for each function that is running as a thread

讀Jserv老師的部落格還有文中提到的經典文章coroutines in C，完全不知道switch可以這樣用(跪)，還沒完全看懂，研讀中~
- 用switch其實就是goto的效果！

a case statement is still legal within a sub-block of its matching switch statement

POSIX Threads

Getting Started With POSIX Threads 宋振華

thread同步問題
- POSIX 提供
  - mutex: 控制共享資源存取的一組函數
  - condition variable: 能夠做到讓某一個 thread 能暫停，並等待另一個 thread 的信號（signal）,當信號來了，thread 就醒過來，然後將相關的mutex lock 起來，為mutex功能的延伸
  - semaphore: 透過整數運算，所有semaphore的初始值為1
    - wait(),decrement直到為0時block
    - post(),increment

pthread_create 之前定義或在其所呼叫的函數中定義的變數來完成其工作，並將他的成果經由整體變數合併。對這些大家都可以存取的變數，我們必須加以控制。

回想之前實作raytracing使用omp進行優化時，就是因此才需要使用private()保護變數

檢查thread函式是否正確執行是必要的，若失敗大多數都回傳-1
condition varaible
critical section:
reentrant:
mutual exclusion:

驗證程式碼正確性及效能

opt效能

Thread_num = 4

size of entry : 24 bytes
execution time of append() : 0.009720 sec
execution time of findName() : 0.004989 sec

cache misses約49.5%

 Performance counter stats for './phonebook_opt' (100 runs):

           774,269      cache-misses              #   49.546 % of all cache refs      ( +-  1.73% )
         1,562,737      cache-references                                              ( +-  1.64% )
       225,877,891      instructions              #    1.25  insns per cycle          ( +-  0.04% )
       181,307,063      cycles                     ( +-  1.23% )

       0.129546375 seconds time elapsed                                          ( +-  6.50% )

plot
- 圖中可以明顯看見findName()的時間與orig版本差異不大，append()下降許多
- 應想辦法降低findName()時間 ==> 雖然使用hash function可以大大降低，但這樣append()的時間又會拉高@@

正確性

phonebook_opt.c中已有個show_entry()的function可以列出所有entry中的last name，因此可以與原有的words.txt比對
打開opt版本的字典檔發現與原本的words.txt不同

直接看是不對的啊！因為輸出檔沒有排序，直接看下來當然會不同@@

執行diff指令發現多處不同，其一原因是phonebook_opt的output並沒有sort，因此先執行sort指令排序
- 執行結果：
  發現少了前4組

0a1,4
> aaaa
> aaaaa
> aaaaaa
> aaaaaaa

修正













for (int i = 0; i < THREAD_NUM; i++) {
	if (i == 0) {
		pHead = app[i]->pHead->pNext;                                                                      
		dprintf("Connect %d head string %s %p\n", i, app[i]->pHead->pNext->lastName, app[i]->ptr);
	} else {
		etmp->pNext = app[i]->pHead->pNext;
		dprintf("Connect %d head string %s %p\n", i, app[i]->pHead->pNext->lastName, app[i]->ptr);
	}   

	etmp = app[i]->pLast;
	dprintf("Connect %d tail string %s %p\n", i, app[i]->pLast->lastName, app[i]->ptr);
	dprintf("round %d\n", i); 
}

把pHead = app[i]->pHead->pNext和etmp->pNext = app[i]->pHead->pNext裡的pNext拿掉就對了

mmap

creates a new mapping in the virtual address space of the calling process.




#include <sys/mman.h>

void *mmap(void *addr, size_t length, int prot, int flags,
                  int fd, off_t offset);

Code Refactoring

refactoring（重構）的定義：「在不改變軟體的外在行為之下，改善既有軟體的內部設計」

甚麼是refactoring有提到22種壞味道

main.c58行開始及77行include和define搬到最前面，個人習慣，看起來比較整齊












#ifndef OPT

#include "file.c"
#include "debug.h"
#include <fcntl.h>

#ifndef THREAD_NUM
#define THREAD_NUM 4

#define ALIGN_FILE "align.txt"

#endif

Line48 與 Line95 struct timespec mid雖然有gettime()，但之後完全沒有用到，所以直接刪除
struct timespec mid刪除後就可以合併前後的#ifndef OPT內的程式碼，並把struct timespec start, end搬到外面


















#ifndef OPT
    FILE *fp;
    int i = 0;
    char line[MAX_LAST_NAME_SIZE];
    struct timespec start, end;
    double cpu_time1, cpu_time2;

    /* check file opening */
    fp = fopen(DICT_FILE, "r");
    if (!fp) {
        printf("cannot open the file\n");
        return -1; 
    }   
#else
    file_align(DICT_FILE, ALIGN_FILE, MAX_LAST_NAME_SIZE);
    int fd = open(ALIGN_FILE, O_RDONLY | O_NONBLOCK);
    off_t fs = fsize(ALIGN_FILE);
#endif

Line92的兩個for回圈可以合併




for (int i = 0; i < THREAD_NUM; i++) {
	app[i] = new_append_a(map + MAX_LAST_NAME_SIZE * i, map + fs, i, THREAD_NUM, entry_pool + i);          
	pthread_create( &tid[i], NULL, (void *) &append, (void *) app[i]);
}

參考資料

課程指定材料
 吳彥寬學長共筆
 mmap