2016q3 Homework2 (phonebook-concurrent) - HackMD

<style> h2.part{color:#0099B0;} h3.part{color:#D92424;} h4.part{color:#005BB0;} h5.part{color:#FD6F0A;} h6.part{color:#4400B0;} </style> # 2016q3 Homework2 (phonebook-concurrent) contributed by <`f5120125`> ###### tags: `sysprog` ## 開發環境 #### Ubuntu 14.04 LTS - CPU: Intel® Core™ i5 CPU 650 @ 3.20GHz × 4 - Mem: 8 GiB - Cache: L1d cache: 32 KB L1i cache: 32 KB L2 cache: 256 KB L3 cache: 4096 KB ## Concurrency - ### Concurrency(並行性) 不為 Parallelism (平行性) [[1]](http://learn-gevent-socketio.readthedocs.io/en/latest/general_concepts.html)[[2]](https://hackmd.io/s/H10MXXoT)[[3]](http://stackoverflow.com/questions/1050222/concurrency-vs-parallelism-what-is-the-difference) - #### 實作threads主要是為了達到Concurrency：再一條時間軸上, threads間有所重疊, 因此藉由共享==code section==, ==data section==, ==task上能夠運用的資源==達到有效利用CPU, 達到==vitual paralellism== - #### 實作processes主要是為了達到Parallelism：再同一時間, 有多個tasks需處理，利用多個CPU去處理達到平行化 - ### [Sequenced Before](http://enginechang.logdown.com/posts/788206-concurrency-series-2-starting-from-sequenced-before) - threads間執行順序的不確定性, 得需知道求值順序 - ### [Happens-Before](http://enginechang.logdown.com/posts/797113-concurrency-series-3-happens-before) - 前一個操作的效果在後一個操作執行之前必須要visible - 只關注是否看起來有這樣的效果，從外界看起來，就像是先執行某一行A，再執行其接著的下一行B - 關鍵在於只要A的效果在B執行之前，對於B可見就可以了，實際上怎麼執行的並不需要深究 ## 案例分析 Phonebook-concurrent - 此程式延續之前作業, 修改為thread版本來提升效能 ### Preliminaries - ==**Macro**== 的使用 - **```#if...#else...#endif```** - **```#ifdef MACRO_NAME...#endif```, ```#ifndef MACRO_NAME...#endif```** - **```#if defined( MACRO_NAME )...#endif```** - **```#include IMPL```** - ==**Thread management**== - **```pthread_create()```** - **```pthread_join()```** - **```pthread_exit()```** - ==**Map file into memory**== - **```mmap()```** - **```munmap()```** - ==**File alignment**== ### 1. Macro 的使用 - 在```phonebook_opt.h```中定義了一個macro： ```OPT``` , 主要功能是拿來判斷```main.c``` 中哪裡是原來版本或是執行緒版本該執行的 #### example ```clike= #if OPT /*do sommthing*/ #endif ``` ```clike= #ifdef OPT /*do something*/ #endif ``` ```clike= #if defined(OPT) && OPT /*do something*/ #endif ``` ### 2. Thread management #### pthread_create() ```clike= int pthread_create(pthread_t *thread, const pthread_attr_t *attr, void *(*start_routine)(void *), void *arg); ``` ###### ◆第一個參數 - ==**unique Thread ID**== ###### ◆第二個參數 - 用於指定不同 ==**thread的attributes**==, 通常為傳入NULL, 代表是default attribute ###### ◆第三個參數 - **```void *```** 為 ==**generic pointer**==, 主要功用 - ==**儲存任何型態的address**== - ==**typecast成任何型態**== ```clike= void *somePtr; int *integerPtr = 10; int *anotherIntPtr; somePtr = &integerPtr;/*儲存int型態的address*/ //error occur !! printf("%d\n", (int)(*somPtr));/*不能使用`*`去存取值*/ anotherIntPtr = (int*)somePtr;/*typecast成(int*)型態*/ printf("%d\n", *anotherIntPtr); ``` - **```(*start_routine)```** 為 ==**function pointer**==, 其所能傳入的參數型態為 **```void *```** ###### ◆第四個參數 - start_routine要用到的參數, 若是start_routine有許多參數, 可以傳入一個structure #### pthread_join() ```clike= int pthread_join(pthread_t thread, void **retval); ``` ###### ◆第一個參數 - ==**unique Thread ID**== ###### ◆第二個參數 - 用於 ##### pthread_exit() ### 3. Map file into memory #### MMAP [SYNOPSIS]((http://man7.org/linux/man-pages/man2/mmap.2.html)) ```clike= #include <sys/mman.h> void *mmap(void *addr, size_t length, int prot, int flags, int fd, off_t offset); ``` mmap() 將檔案映射到一個 virtual memory address 上面，藉由直接對 memory address 修改的方式不用經過buffer，減少資料load到緩衝區的overhead，快速的記憶體存取可以提高對檔案 IO 的效率 :::info **Blocking** VS **Non-blocking I/O** * Blocking I/O 允許 sleep/awaken 動作的 process，基本動作： * 當 process 進行 read 動作但是資料尚未就緒時，process 必須被 block 住 (sleep)，而當有資料進來時就需要立即被喚醒 (awaken)，就算資料仍不完整也是 * 當 process 進行 write 動作但是 buffer 沒有空間時，process 也必須被 block 住，而且必須被放到跟 read 動作不同的 wait queue 中去等待 wake-up (buffer 騰出空間寫入) * Non-blocking I/O * 讀不到資料或寫不進 buffer 時，他會送出 retry 的動作，再試一次 >>[參考觀念](https://hackmd.io/AwJgzA7ARgHMCsBaEJjEQFgMYE4uJgFMcA2RXYEiYYmAMxgiA===?both) ::: >[記憶體映射函數 mmap 的使用方法](http://welkinchen.pixnet.net/blog/post/41312211) * `addr` : 要 map 到的記憶體位置 * 如果 addr 為 NULL，kernel 將在建立的 mapping 地址中自行選擇，這是建立新的 mapping 最簡單的方法。 * 如果 addr 不是NULL，mapping 會在附近的 page boundary 建立，並將新 mapping 地址傳回。 * `length`: 要 mapping 的長度，它會從 fd + offset 的位置開始 mapping 檔案。 * `prot`: 要保護的記憶體保護 mapping **不能跟 open mode 衝突** * PROT_EXEC 可能執行 * PROT_READ 可能讀取 * PROT_WRITE 可能寫入 * PROT_NONE 可能不能存取 * `flags`: 選擇 map file 能不能被其他程式看到 * MAP_SHARED 分享此 mapping * MAP_PRIVATE 建立一個私有 copy-on-write mapping * `fd`: 用系統函式 open 打開的檔案位置，open mode 可以選擇檔案的讀寫屬性，不能跟 prot 有衝突。 * `offset`: 從檔案的哪裡開始 mapping，offset 必須是 page size 的倍數，可以用 sysconf(_SC_PAGE_SIZE) 取得。 #### munmap munmap() 用來取消參數 start 所指的映射記憶體起始地址，參數 length 則是欲取消的記憶體大小。當行程結束或利用 exec 相關函數來執行其他程式時，映射記憶體會自動解除，但關閉對應的文件描述詞時不會解除映射。 >[C語言munmap()函數：解除内存映射](http://c.biancheng.net/cpp/html/139.html) ### 4. File alignment ### 5. 檢查aligned file是否和原輸入一樣 #### check.c ```clike= #include <stdio.h> #include <sys/stat.h>//stat, stat(paath, struct stat) #include <sys/mman.h>//mmap() #include <fcntl.h>//open(file_name, ) #include <stdbool.h> #define MAX 16 #define ORIG_FILE_PATH "./dictionary/words.txt" #define ALIGNMENT_FILE "align.txt" off_t getFileSize( char *path); int main(){ bool consistency=true; char orig_buff[MAX]; char align_buff[MAX]; FILE *orig_fp = fopen(ORIG_FILE_PATH, "r"); FILE *align_fp = fopen(ALIGNMENT_FILE, "r"); int fd = open(ALIGNMENT_FILE, O_RDONLY | O_NONBLOCK); off_t new_file_size = getFileSize(ALIGNMENT_FILE); char *map = mmap(NULL, new_file_size, PROT_READ, MAP_SHARED, fd, 0); char *tmp=map; while( fgets(orig_buff, sizeof(orig_buff), orig_fp) ){ if( strncmp(orig_buff, tmp, sizeof(orig_buff)) != 0){ consistency=false; break; } tmp+=MAX; } if(consistency) printf("Files are consistent\n"); else printf("Files are not the same\n"); return 0; } off_t getFileSize( char *path){ struct stat file_stat; stat(path, &file_stat); return file_stat.st_size; } ``` ##### 執行結果 - 檢查完aligned file和原本words.txt檔後, 確認檔案一樣 ``` Performance counter stats for './phonebook_opt' (100 runs): 841,854 cache-misses # 66.507 % of all cache refs ( +- 1.05% ) 1,265,807 cache-references ( +- 0.65% ) 172,850,372 instructions # 0.76 insns per cycle ( +- 0.07% ) 226,345,721 cycles ( +- 0.88% ) 0.090487839 seconds time elapsed ( +- 2.76% ) cc -Wall -std=gnu99 calculate.c -o calculate ./calculate hua@hua-ubuntu:~/Desktop/sysprog-class/homework2/phonebook-concurrent$ ./check Files are consistent ``` ### Code Refactoring - [x] 整理程式碼 - 變重新命名 - 整理Macro - [x] 程式的改進 - findName()的改進 #### original - 下列程式碼為append完後, 將threads間的linked-list合併在一起 - 改進方向可以考慮對每個thread做findName() ```clike= entry *etmp; pHead = pHead->pNext; for (int i = 0; i < THREAD_NUM; i++) { if (i == 0) { pHead = argsPtr[i]->entryHead->pNext; } else { etmp->pNext = argsPtr[i]->entryHead->pNext; } etmp = argsPtr[i]->entryTail; } ``` ##### 原本程式執行結果 ![image alt](http://i.imgur.com/ZQz4jds.png) #### 合併pthread_create()和pthread_join()的分析 ##### 執行結果 - create 和 join合併到同一個for loop ![image alt](http://i.imgur.com/yOwJiH0.png) #### threaded_findName() ##### 執行結果 ![image alt](http://i.imgur.com/pQxRqWf.png) ### Thread Pool >>[參考觀念](https://hackmd.io/GYVgLATAbGAmYFpYHYBGsFgMYEZgNRFgA4CBmVMYATmIENYAGLCIA===?both) - Thread 缺陷 - 建立太多執行緒，會浪費一定資源，並且有些執行緒未被充份使用 - 銷毀太多執行緒，會導致之後浪費時間再次創建它們 - 建立執行緒太慢，導致效能變差 - 銷毀執行緒太慢，導致其它執行緒沒有資源 - Thread Pool 功能 - 管控 Thread 的產生與回收 - 分發 Thread 處理 request - 處理 request 的 queue >[C 的 Thread Pool 筆記](http://swind.code-life.info/posts/c-thread-pool.html) >[github source code](https://github.com/mbrossard/threadpool) ```clike= typedef struct { void (*function)(void *); void *argument; } threadpool_task_t; //Thread Pool 需要 job/task 讓 Thread 知道他們要做什麼事情 threadpool_t *threadpool_create(int thread_count, int queue_size, int flags); //傳入 thread 的數量，queue 的大小，回傳 threadpool 型態是 threadpool_t int threadpool_add(threadpool_t *pool, void (*routine)(void *), void *arg, int flags); //傳入 threadpool，要執行的 function 位置，要傳入的參數 int threadpool_destroy(threadpool_t *pool, int flags); //傳入 threadpool，會把 threadpool 的空間 free 掉 ``` ### Lock-free Thread Pool #### Thread Pool VS Lock-Free Thread Pool * Thread Pool ![](https://hackpad-attachments.s3.amazonaws.com/embedded2016.hackpad.com_VJmq0R0ILi6_p.537916_1459234875775_001.PNG) 共享 workqueue 的操作必須在 mutex 的保護下安全進行。 * Lock-Free Thread ![](https://hackpad-attachments.s3.amazonaws.com/embedded2016.hackpad.com_VJmq0R0ILi6_p.537916_1459234893887_002.PNG) * 解決 lock 機制：解決 lock-free 必須避免共享資源的競爭，因此將共享 workqueue 拆分成每個 worker thread 一個 workqueue。對於 main thread 放入工作和 worker thread 取出任務的競爭問題，可以採取 ring-buffer 的方式避免。 * condition variable 問題： condition variable 解決執行緒之間的通訊。 semaphore 作為一種通信方式，可以代替之。 >[高效線程池之無鎖化實現](http://blog.csdn.net/xhjcehust/article/details/45844901) ```clike= void *tpool_init(int num_threads) //初始化時決定要用哪種方式進行 put work，選用 `ring-buffer` 的方式將 worker 加入queue static int wait_for_thread_registration(int num_expected) //確保所有執行緒都在準備中 int tpool_add_work(void *pool, void(*routine)(void *), void *arg) //配合 `dispatch` 加入 task 到 work queue 中 static void *tpool_thread(void *arg) //給 worker task，沒有用到 mutex_lock static tpool_work_t *get_work_concurrently(thread_t *thread) //利用 CAS 讓每個執行緒拿到自己的 task，確保從 work queue 出來時，遞減 `out` 的變數同步 void tpool_destroy(void *pool, int finish); //判斷所有的 queue 是不是空的，確保所有 worker 完成工作 ``` :::info **CAS** 以具有 atomic 特性的計算操作來達成。 ```clike= bool CAS(T* addr, T exp, T val) { if (*addr == exp) { *addr = val; return true; } return false; } ``` ::: >[github source code](https://github.com/xhjcehust/LFTPool) #### 實做測試 ```shell $ git clone https://github.com/xhjcehust/LFTPool $ cd LFTPool/ $ make $ ./testtpool ```