2016q3 Homework2 (phonebook-concurrent)

# 2016q3 Homework2 (phonebook-concurrent) contributed by <`ruby0109`> > 反省這兩個禮拜糟糕的自己, 完全不行阿QAQQQ > 參考了[CheHsuan](https://hackmd.io/s/rJPYFYRa) 和 [TempoJiJi](https://hackmd.io/s/rymKa4aT)兩位同學的筆記之後, 發現自己沒有驗證原本的程式正確性, mmap沒有下去探討, 學習thread pool等沒有先從小程式實作，和遇到問題不會拆解所以卡住。以下加入預期目標, 繼續完成作業。之前進度：[phonebook](https://hackmd.io/BwIwLAjApgZgJgVgLQCYEHYCcSwOAQyUxHwhymAhgDYBmRGBfIA=) ## 預期目標 * 學習驗證程式正確性 * concurrency 認知整理 * 學習其他人共筆 * thread pool ## 理解作業程式碼 ### 影片開始實作之前, 我先看了[吳彥寬同學程式解說video](https://www.youtube.com/watch?v=sWtWslsr4Ac), 筆記如下： * `fgets` 本身就可以讀行, 所以可以簡化 :::info 補充fgets基礎知識：根據c99 規格書, `#include <stdio.h>` `char *fgets(char * restrict s, int n,FILE * restrict stream);` description: The fgets function reads at most one less than the number of characters specified by n from the stream pointed to by stream into the array pointed to by s. No additionalcharacters are read after a new-line character (which is retained) or after end-of-file. A null character is written immediately after the last character read into the array. ::: 理解： fgets讀取stream指向的stream最多n-1個字元到s指向的陣列中, 遇到換行符號或是檔案結尾字元就不會再讀, '\0' 會寫在陣列最後一個字元之後。 * `strncasecmp`： :::info #include <strings.h> int strcasecmp(const char *s1, const char *s2);int strncasecmp(const char *s1, const char *s2, size_t n); from linux manual: strcasecmp() ：function compares the two strings s1 and s2, ignoring the case of the characters. It returns an integer less than, equal to, or greater than zero if s1 is found, respectively, to be less than, to match, or be greater than s2.The strncasecmp() function is similar, except it compares the only first n bytes of s1 ::: strncasecmp比較s1前n bytes和s2, 比較時忽略大小寫, 若s1和s2相同傳回0; s1長度大於s2傳大於0值; 小於則傳負值。 * ~~memory pool 和 pthread 這兩個部份看影片仍然不太明白~~ * ~~可以在findname放其他資料,而不是在append整個放進去~~（findname或許會些微上升, 可是不會很多）append進去的只有lastName 的指標, findname時再放入lastName * 整個append用pthread下去做而不是只有mmap(還不懂為什麼原本只做mmap時間會上升) ### **檢測程式正確性** * 先執行一次看看, make plot之後： Performance counter stats for './phonebook_orig' (100 runs): 4,457,101 cache-misses # 91.778 % of all cache refs( +- 0.07% ) 4,856,373 cache-references ( +- 0.05% ) 263,262,629 instructions # 1.69 insns per cycle ( +- 0.02% ) 155,898,150 cycles ( +- 0.06% ) 0.076228638 seconds time elapsed ( +- 0.31% ) Performance counter stats for './phonebook_opt' (100 runs): 3,548,725 cache-misses # 61.592 % of all cache refs( +- 0.25% ) 5,761,661 cache-references ( +- 0.46% ) 224,875,100 instructions # 1.93 insns per cycle ( +- 0.04% ) 116,299,224 cycles ( +- 0.42% ) 0.278607164 seconds time elapsed ( +- 2.06% ) 優化後的cache miss略下降一些。 ![](https://i.imgur.com/wMmao18.png) * file align 先看在main.c中被include的file.c產生的align.txt, 直接點開的話裡面只有aaaa,用vim 的話則是: ```= aaaa ^@^@^@^@^@^@^@^@^@^@^@aaaaa ^@^@^@^@^@^@^@^@^@^@aaaaaa ^@^@^@^@^@^@^@^@^@aaaaaaa ^@^@^@^@^@^@^@^@aaaaaaaa ^@^@^@^@^@^@^@aaaaaaaah ^@^@^@^@^@^@aaaaaaauugh ^@^@^@^@aaaaaagh ^@^@^@^@^@^@^@aaaaaahhhhh ^@^@^@^@aaaaaaugh ^@^@^@^@^@^@aaaaagh ^@^@^@^@^@^@^@^@aaaaah ^@^@^@^@^@^@^@^@^@aaaarthur ^@^@^@^@^@^@aaaaw ^@^@^@^@^@^@^@^@^@^@aaagghhhh ^@^@^@^@^@^@aaah ^@^@^@^@^@^@^@^@^@^@^@aaaugh ^@^@^@^@^@^@^@^@^@aaccf ``` 不是很懂^@是什麼, 參考[what is ^@ character in vi?](http://stackoverflow.com/questions/21569165/what-is-character-in-vi)才知道是NULL bytes。不過奇怪的是，首先第一行沒有^@,再來每一行的行數不同,像第7行有17個,第8行只有12個。先用GDB觀察main.c跳進file_align function後的執行情形。在wbuf和rbuf都有align, 顯示出來的值也都正確。rbuf的大小似乎可以調小一點, 畢竟也有措施避免strcpy的overflow。 * linked list的資料是否正確試試看原本同學寫的debug.h,跑完之後發現在終端機不太方便測試是否有遺漏的資訊, ~~要怎麼比對呢？~~ 可以直接用findname ```clike // use findname to test whether all elements in the list // open file for test and origin word.txt FILE *test,*org; test = fopen("test","a"); org = fopen(DICT_FILE, "r"); char line[MAX_LAST_NAME_SIZE]; // use findname to test whether all elements in the list while(fgets(line, sizeof(line),org)){ e = pHead; if(findName(line, e) == NULL) fprintf(test,"%s not found\n",line); } fclose(org); fclose(test); ``` 直接把words.txt檔的東西丟下去找, 可是這樣若找尋350000筆，每筆0.03897要尋找約22分鐘。發現[TempoJiJi](https://hackmd.io/s/rymKa4aT)大大直接另外寫一個測, 才發現原來這個版本沒有Hash 寫了entry_test.c 可是有segmentation fault debug還是卡了好久, 用GDB下去看發現是在append_target中, 第60行的地方, malloc執行到第三個word就會直接掛掉。把他改成小一點的程式, 縮小錯誤範圍： ```clike= #include <stdlib.h> #include <string.h> #include <stdio.h> #include <ctype.h> #define MAX_LAST_NAME_SIZE 16 #define HASH_SIZE 42737 #define DICT_FILE "./dictionary/words.txt" typedef struct __PHONE_BOOK_ENTRY { char *lastName; struct __PHONE_BOOK_ENTRY *pNext; } entry; entry *append_target(char lastName[], entry *e) { e->pNext = (entry *)malloc(sizeof(entry)); e = e->pNext; strcpy(e->lastName, lastName); e->pNext = NULL; return e; } int main(void) { FILE *org; org = fopen("./dictionary/words.txt","r"); char line[MAX_LAST_NAME_SIZE]; entry *target, *tHead; tHead = (entry *)malloc(sizeof(entry)); target = tHead; target -> pNext = NULL; while(fgets(line,sizeof(line),org)) { target = append_target(line,target); } fclose(org); free(tHead); free(tHead->pNext); } ``` gdb跑出的結果： Program received signal SIGSEGV, Segmentation fault. __strcpy_sse2_unaligned () at ../sysdeps/x86_64/multiarch/strcpy-sse2-unaligned.S:605 605 ../sysdeps/x86_64/multiarch/strcpy-sse2-unaligned.S: 沒有此一檔案或目錄. 發現是struct 忘記改>< 感謝徐朝逸大大幫我看codeQAQ 幾經波折, 把所有菜鳥可以犯的錯犯過一遍然後debug完後, 發現有一長串都沒有在裡面, 這裡擷取一小部份： aaaa not found angelika not found anklet not found antescript not found antirobin not found assiento not found 發現我的程式是錯的！！雖然有幾個字母帶進原程式會發現的確沒在裡面, 可以很多都是跑得出來的。把append和findname找不到的資料互相比較, 發現其實有append進去, 可是findName沒有找到，　原來是findName的地方有寫錯, 這樣代表之前自己的code也有錯... * 正確性解決把程式改好之後, 跑出來的結果： aaaa not found aaaaa not found aaaaaa not found aaaaaaa not found 因為pthrea跑的方式是 thread0:0 4 8.. thread1:1 5 9.. thread2:2 6 10.. thread3:3 7 11.. 所以可以推測這可能是pthread各自list的開頭有問題。仔細看之後發現在main.c 中的pHead = stack[i]->pHead->pNext和etmp->pNext = stack[i]->pHead->pNext 應該要把pNext去掉!! ## 實作可以改良的地方 * 在語法的部份: ```clike if ((suffix = (pad - strlen(rbuf))) != 0) strcpy(wbuf, rbuf); ``` 應該要大於0, 以防超過陣列大小 * 沒有註解看起來好痛苦QAQQ : 加入註解 * 變數在main.c中和.c file不好對照, 有點難讀 * 把new_append_a改成ThrdInitial(char *StartAdrs, char *EndAdrs, int tid, int nthrd, entry *pptr) * append_a改成 ```clike typedef struct _Thread_Stack_ { char *StartAdrs; char *EndAdrs; int tid; int nthread; entry *PoolPtr; entry *pHead; entry *pTail; } ThrdStack; ``` ## concurrency 筆記： * [Concurrency is not Parallelism](https://blog.golang.org/concurrency-is-not-parallelism), by Rob Pike * sometime people run a program with may processes, but it gets slow, why? what is concurrency? * concurrency in computer world is a way to build things, concurrent is a composition of independently executing things typically funcitons but they don't have to be(independently executing processes)這裡的process泛指各種thread, coroutine等 **理解**：concurrency就像是一種方法, 由各個獨立執行的函式(包含process, thread, coroutine?等等)組成, 可是方法不一定要採用。 * parallelism is simultaneously executing lots of multiple thing, possible related possible not **理解**：parallelism是同時執行很多東西，可能相關，也可能互相獨立。 * concurrency is about dealing with a lot of things at once and parallelism is about doing a lot of things at once **理解**：concurrency是一次處理很多東西, 但不一定要同時執行, parallelism 就是同時執行很多東西。 * concurrency is about the structure parallelism is about the execution * concurrency is a structure things that you can maybe use parallelism to do a better job but parallism is not the goal of concurrency concurrency's goal is a good structure ex. OS deal with lots of things, but these things aren't necessarily parallel, only one processor only one of them is ever running the time, so there are concurrencty structure, but not necessary parallel **理解**：parallism不是concurrency的目標, 就像是OS也是一次處理很多東西, 而且不一定要都同時執行。也就是concurrency是一個結構，可是不一定要用parallism的方式執行。 * parallelism像是一個箭頭,可以分成不同的方向 * concurrency gives you a structure program different pieces, but we need to find out how to coordinate those pieces and communication **理解**：concurrency給我們一個架構, 可以把程式分成不同的區塊, 可是我們必須找出怎麼協調和讓這些區塊可以溝通後面的GO就聽不是很懂了>< * [The Free Lunch Is Over](http://www.gotw.ca/publications/concurrency-ddj.htm) A Fundamental Turn Toward Concurrency in Software By Herb Sutter * 這30年來, CPU 設計者主要藉由改善以下來達到效能的提昇 * clock speed * execution optimization-為了改善效能, 甚至會有風險改變程式 * pipelining * branch prediction * executing multiple instructions in the same clock cycle(s) * reordering the instruction stream for out-of-order execution. * cache * 可是發展到了現在 clock speed很難再提昇, 因為如果再提昇會有無法散去高熱, 太耗能跟電流液出的情形 * Myths and Realities: 2 x 3GHz < 6 GHz * 即使有兩個thread在兩個實體處理器上運作, 也不會得到兩倍的效果, 為什麼呢？ * 要維持cache coherency * 除非兩個core在跑不同的程式, thread run independently所以不需要等待彼此不然這些core不會被良好的運用 * 雖然系統還是會被潛在的廣告軟體和間諜軟體佔用 * 未來可以藉由以下來繼續提高效能 * hyperthreading- 在一個CPU上運行數個thread in parallel * 但仍然只有一個 cache, 一個 integer math unit, 一個 FPU, and in general just one each of most basic CPU features. * 很難提昇兩倍效能, 而且對於single-threaded應用沒有幫助。 * multicore * 很難提昇兩倍效能, 而且對於single-threaded應用沒有幫助。 * cache * 應該可以繼續成長，因為space is speed * 一個cache miss就要多花10~50倍的時間(相較於去cache拿data) * Concurrency is the next major revolution in how we write software * [Concurrency的Youtube Lecture](https://www.youtube.com/watch?v=iKtvNJQoCNw&list=PLrWfHqCHBgVJ4jPMXnv2tbB6zg5uegbc3&index=3) * Locks: 擁有權, 擁有者才能unlock。作業中的concurrency: * 第一次看到pthread_setconcurrency, 查manual理解之下是在linuxthread沒有太大作用, 可是可以增加相容性。 ## 學習其他人共筆 [LanKuDot (李昆憶)共筆](https://hackmd.io/s/HJFhaiAp) * 參考筆記並閱讀[The Free Lunch Is Over](http://www.gotw.ca/publications/concurrency-ddj.htm), [對於重構的兩則常見誤解](http://www.ithome.com.tw/node/79813) * valgrind： * 安裝：$ sudo apt-get install valgrind * 說明：$ valgrind --help * 參考資料：[分析工具valgrind的使用](http://ottoshare.blogspot.tw/2012/03/valgrind.html)(這篇的指令tools要改成tool)，[Valgrind is *NOT* a leak checker](http://maintainablecode.logdown.com/posts/245425-valgrind-is-not-a-leak-checker) * 用途： * 偵測undefined behavior ```valgrind ./phonebook_opt``` * function and memory profiler * ==data-race detection==(覺得這好酷！！) ```valgrind --tool=helgrind ./phonebook_opt``` * 可以用來看memory leak ``` --leak-check=full``` ==14872== Memcheck, a memory error detector ==14872== Copyright (C) 2002-2013, and GNU GPL'd, by Julian Seward et al. ==14872== Using Valgrind-3.10.1 and LibVEX; rerun with -h for copyright info ==14872== Command: ./phonebook_opt ==14872== size of entry : 24 bytes execution time of append() : 0.118666 sec execution time of findName() : 0.002815 sec ==14872== ==14872== HEAP SUMMARY: ==14872== in use at exit: 2,215 bytes in 16 blocks ==14872== total heap usage: 28 allocs, 12 frees, 8,408,637 bytes allocated ==14872== ==14872== 16 bytes in 1 blocks are definitely lost in loss record 1 of 13 ==14872== at 0x4C2AB80: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so) ==14872== by 0x400F23: file_align.4663 (in /home/ruby/2016_autumn/W2/phonebook-concurrent/phonebook_opt) ==14872== by 0x40101B: main (in /home/ruby/2016_autumn/W2/phonebook-concurrent/phonebook_opt) ==14872== ==14872== 16 bytes in 1 blocks are definitely lost in loss record 2 of 13 ==14872== at 0x4C2AB80: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so) ==14872== by 0x40166E: findName (in /home/ruby/2016_autumn/W2/phonebook-concurrent/phonebook_opt) ==14872== by 0x4013F6: main (in /home/ruby/2016_autumn/W2/phonebook-concurrent/phonebook_opt) ==14872== ==14872== 16 bytes in 1 blocks are definitely lost in loss record 3 of 13 ==14872== at 0x4C2AB80: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so) ==14872== by 0x40166E: findName (in /home/ruby/2016_autumn/W2/phonebook-concurrent/phonebook_opt) ==14872== by 0x40142A: main (in /home/ruby/2016_autumn/W2/phonebook-concurrent/phonebook_opt) ==14872== ==14872== 16 bytes in 1 blocks are definitely lost in loss record 4 of 13 ==14872== at 0x4C2AB80: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so) ==14872== by 0x40166E: findName (in /home/ruby/2016_autumn/W2/phonebook-concurrent/phonebook_opt) ==14872== by 0x40147E: main (in /home/ruby/2016_autumn/W2/phonebook-concurrent/phonebook_opt) ==14872== ==14872== 24 bytes in 1 blocks are definitely lost in loss record 5 of 13 ==14872== at 0x4C2AB80: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so) ==14872== by 0x401050: main (in /home/ruby/2016_autumn/W2/phonebook-concurrent/phonebook_opt) ==14872== ==14872== 107 bytes in 1 blocks are definitely lost in loss record 8 of 13 ==14872== at 0x4C2AB80: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so) ==14872== by 0x4016B1: findName (in /home/ruby/2016_autumn/W2/phonebook-concurrent/phonebook_opt) ==14872== by 0x4013F6: main (in /home/ruby/2016_autumn/W2/phonebook-concurrent/phonebook_opt) ==14872== ==14872== 107 bytes in 1 blocks are definitely lost in loss record 9 of 13 ==14872== at 0x4C2AB80: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so) ==14872== by 0x4016B1: findName (in /home/ruby/2016_autumn/W2/phonebook-concurrent/phonebook_opt) ==14872== by 0x40142A: main (in /home/ruby/2016_autumn/W2/phonebook-concurrent/phonebook_opt) ==14872== ==14872== 107 bytes in 1 blocks are definitely lost in loss record 10 of 13 ==14872== at 0x4C2AB80: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so) ==14872== by 0x4016B1: findName (in /home/ruby/2016_autumn/W2/phonebook-concurrent/phonebook_opt) ==14872== by 0x40147E: main (in /home/ruby/2016_autumn/W2/phonebook-concurrent/phonebook_opt) ==14872== ==14872== 192 bytes in 4 blocks are definitely lost in loss record 11 of 13 ==14872== at 0x4C2AB80: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so) ==14872== by 0x401709: ThrdInitial (in /home/ruby/2016_autumn/W2/phonebook-concurrent/phonebook_opt) ==14872== by 0x4011EC: main (in /home/ruby/2016_autumn/W2/phonebook-concurrent/phonebook_opt) ==14872== ==14872== LEAK SUMMARY: ==14872== definitely lost: 601 bytes in 12 blocks ==14872== indirectly lost: 0 bytes in 0 blocks ==14872== possibly lost: 0 bytes in 0 blocks ==14872== still reachable: 1,614 bytes in 4 blocks ==14872== suppressed: 0 bytes in 0 blocks ==14872== Reachable blocks (those to which a pointer was found) are not shown. ==14872== To see them, rerun with: --leak-check=full --show-leak-kinds=all ==14872== ==14872== For counts of detected and suppressed errors, rerun with: -v ==14872== ERROR SUMMARY: 9 errors from 9 contexts (suppressed: 0 from 0) * 問題：看到一些文章說free並不是必須, 可是大部分是好的, 比如[In C, is it necessary to free a pointer at exit?](http://stackoverflow.com/questions/3126122/in-c-is-it-necessary-to-free-a-pointer-at-exit), [What REALLY happens when you don't free after malloc?](http://stackoverflow.com/questions/654754/what-really-happens-when-you-dont-free-after-malloc?rq=1), [Does calling free or delete ever release memory back to the “system”](http://stackoverflow.com/questions/1421491/does-calling-free-or-delete-ever-release-memory-back-to-the-system) 要視情況而定, 可是有種很難體會的感覺。 * Solve memory leak * 在findName中, 不需要把lastname放進entry->lastName 中, 因為strncasecmp就是用pointer即可。 * 學習李昆憶大大的重構方式 * 在每個段落加上這個段落的作用, 比如file preprocessing, release the memory等等方便看的人快速理解程式內容 * 把在程式碼中的indlude header移到最上方 * 用參數讓輸入進函式的資料更容易解讀 * 去掉不需要的參數如etmp * 改掉dprintf * 優化append * 原本是12341234的方式(row-major) * 改成把資料分成四大段, 符合cache的排列方式結果： ![](https://i.imgur.com/9d1PdsX.png) ### Thread Pool * 參考[C 的 Thread Pool 筆記](http://swind.code-life.info/posts/c-thread-pool.html) * 以下使用[threadpool-mbrossard](https://github.com/mbrossard/threadpool)的程式碼 * 理解ThreadPool 這個程式看起來好舒服!!!原來這就叫做程式碼><!! * 理解過程中程式筆記 * 在每個function前說明function作用, 和argument說明 * 學到goto用法 * 原來可以這樣寫：pool->head = pool->tail = pool->count = 0; * 程式運作：先來看wiki 使用thread pool處理Task的圖 ![](https://upload.wikimedia.org/wikipedia/commons/thumb/0/0c/Thread_pool.svg/580px-Thread_pool.svg.png) 再來看程式的struct, 首先threadpool中會紀錄兩個array, 一個是threads另一個是task queue。 ```clike struct threadpool_t { pthread_mutex_t lock; pthread_cond_t notify; pthread_t *threads;//放置所有thread空間的指標 threadpool_task_t *queue;//放置所有task空間的指標 int thread_count; int queue_size; int head; int tail; int count; int shutdown; int started; }; ``` threads的struct是pthread_t; task queue的struct則是由funciton和argument組成 ```clike typedef struct { void (*function)(void *); void *argument; } threadpool_task_t; ``` 再來看執行的流程圖 ```flow st=>start: threadpool_create 決定好thread的數量和task queue的最大值 e=>end: Thread pool destroy op=>operation: thread開始執行(threadpool_thread) op2=>operation: thread搶mutex lock cond=>condition: 某個thread搶到之後裡面是否有task？ op7=>operation: op3=>operation: 從task queue拿task之後離開, unlock op4=>operation: 下一個thread進來 op5=>operation: pthread_cond_wait 等待threadpool_add的訊號, 同時unlock, 讓其他thread進來一起等待 op6=>operation: threadpool_add pthread_cond_signal新增task, 剛剛搶到lock的thread開始工作 st->op->op2->cond cond(yes)->op7->op3->op4->e cond(no)->op5->op6->op3->e ``` 不懂的地方： ~~next = (pool->tail + 1) % pool->queue_size; pool->head = (pool->head + 1) % pool->queue_size;~~ 上面這個是像一個ring, 有head和tail在兩端移動 * Implementation * 在implement的時候, 我遇到的問題是不知道thread pool的大小要設多大, 參考[TempoJiJi](https://hackmd.io/s/rymKa4aT)是8, 先試試看 * 把thread pool大小設成thread的兩倍大 ![](https://i.imgur.com/4GzHVnz.png) 從圖上可以發現四個thread的時候最快, 8個以上時間就變得相當高改進： * 學習github.ignore * 參考資料-[Git 教學(1) : Git 的基本使用](http://gogojimmy.net/2012/01/17/how-to-use-git-1-git-basic/) * vim .gitignore 裡面加上想要忽略的檔案 * thread pool影響 * 比較(row-major opt 和thread pool) ![](https://i.imgur.com/nbdpWeG.png) 好像沒有變快, 為什麼呢？ ## 參考資料 * [C fopen vs open](http://stackoverflow.com/questions/1658476/c-fopen-vs-open) * [LanKuDot (李昆憶)共筆](https://hackmd.io/s/HJFhaiAp) ## 待做筆記 * semaphore ###### tags: `ruby` `2016_Autumn` `HW2` `phonebook-concurrent`