2016q3 Homework2 (phonebook-concurrent)

# 2016q3 Homework2 (phonebook-concurrent) ## 開發環境 * 系統：Ubuntu 16.04 LTS (64 bit) * CPU：8-core 3.40GHz * Memory：24GB * Cache： * L1 data : 32KB * L1 instruction : 32KB * L2 : 256KB * L2 : 8192KB 使用`liscpu`和`sudo lshw`查看硬體資訊 ``` Architecture: x86_64 CPU 作業模式： 32-bit, 64-bit Byte Order: Little Endian CPU(s): 8 On-line CPU(s) list: 0-7 每核心執行緒數：2 每通訊端核心數：4 Socket(s): 1 NUMA 節點： 1 供應商識別號： GenuineIntel CPU 家族： 6 型號： 94 Model name: Intel(R) Core(TM) i7-6700 CPU @ 3.40GHz 製程： 3 CPU MHz： 799.929 CPU max MHz: 4000.0000 CPU min MHz: 800.0000 BogoMIPS: 6816.61 虛擬： VT-x L1d 快取： 32K L1i 快取： 32K L2 快取： 256K L3 快取： 8192K NUMA node0 CPU(s): 0-7 ``` ## Concurrency #### Concurrency (並行) vs. Parallelism (平行) * Concurrency 一條線上的分工 * Parallelism 分好幾條線 #### Concurrency * A sequeced-before B 對同一個thread下，A先完成之後才開始進行B * A happens-before B 代表A的效果可見，而且發生於B之前 * synchronized-with 跨thread版本的happens-before * happens-before存在相依性的問題 ![](http://preshing.com/images/org-chart.png) * 最直覺的約定Sequential Consistency的定義: * 對於每個獨立的處理單元，執行時都維持程式的順序(Program Order) * 整個程式以某種順序在所有處理器上執行 * 程式在編譯與執行的時候，不一定會真的照所寫的順序發生，non-blocking和cache存取的快慢有可能會違反SC #### Blocking 與 Non-blocking I/O * Blocking I/O 允許 sleep/awaken 動作的 process，基本動作： * 當 process 進行 read 動作但是資料尚未就緒時，process 必須被 block 住 (sleep)，而當有資料進來時就需要立即被喚醒 (awaken)，就算資料仍不完整也是 * 當 process 進行 write 動作但是 buffer 沒有空間時，process 也必須被 block 住，而且必須被放到跟 read 動作不同的 wait queue 中去等待 wake-up (buffer 騰出空間寫入) * Non-blocking I/O * 讀不到資料或寫不進 buffer 時，他會送出 retry 的動作，再試一次 #### POSIX Threads http://www.csie.ntu.edu.tw/~r92094/c++/pthread.txt 文中提到使用 thread 同步的基本指令： mutex 和 condition variable ，最後可用更強悍的整數 semaphore 來取代 mutex ，並解決 spin-lock 問題。對於 thread 的 task migration，可用 atomics 這種輕量級的同步機制，效能會比 pthread_mutex_t 好許多。 #### lock-free thread pool 實做測試 ```shell $ git clone https://github.com/xhjcehust/LFTPool $ cd LFTPool/ $ make $ ./testtpool ok 1 - one thread in thread pool time: 3105157us ok 2 - heavy work time: 573655us ok 3 - light work time: 64862us ok 4 - drop remaing works and exit directly time: 125113us ###0x7f5c928c9700.dispatch_work2thread: queue of thread selected is full!!! not ok 5 - increase thread num ok 6 - decrease thread num time: 64772us ok 7 - set least load alogrithm time: 571112us ``` ## Phonebook Concurrency #### offset 文件映射開始處的偏移量，紀錄兩個元素間的距離，0代表從最前端開始映射依說明，因為不知道每個字元的offset，所以先將file整理成16 char alignment，之後再使用 mmap() #### mmap 目的是解決fgets這個blocking IO，讓程式可以順利平行化 ```shell void *mmap(void *addr, size_t length, int prot, int flags, int fd, off_t offset); ``` creates a new mapping ==in the virtual address space== of the calling process. The starting address for the new mapping is specified in addr. The length argument specifies the length of the mapping. 其中開啟fd的open()中使用==O_NONBLOCK==設定: 以非阻塞<s>不可阻断</s>的方式開啟檔案<s>打开文件</s>，也就是无论有无<s>数据</s>读取或等待，都会立即返回行程<s>进程</s>之中 > 可加速程式進行，但有可能違反Sequential Consistency? >> 請尊重傳統文化，使用台灣科技術語 [name=jserv] #### pthread + 簡易 memory pool 為thread開新的struct append_a存放，其中包括要append的phonebook entry ```c typedef struct _append_a { char *ptr; char *eptr; int tid; int nthread; entry *entryStart; entry *pHead; entry *pLast; } append_a; ``` new_append_a()是使用mmap得到虛擬空間之後再依序設定位置 >> 說好的 code refactoring 呢？ [name=jserv] ## 優化 #### memory pool ## 參考資料 https://hackmd.io/s/BkN1JNQp https://hackmd.io/s/H10MXXoT http://www.csie.ntu.edu.tw/~r92094/c++/pthread.txt http://ithelp.ithome.com.tw/articles/10161404 http://man7.org/linux/man-pages/man2/mmap.2.html