Jou-An Chen
    • Create new note
    • Create a note from template
      • Sharing URL Link copied
      • /edit
      • View mode
        • Edit mode
        • View mode
        • Book mode
        • Slide mode
        Edit mode View mode Book mode Slide mode
      • Customize slides
      • Note Permission
      • Read
        • Only me
        • Signed-in users
        • Everyone
        Only me Signed-in users Everyone
      • Write
        • Only me
        • Signed-in users
        • Everyone
        Only me Signed-in users Everyone
      • Engagement control Commenting, Suggest edit, Emoji Reply
    • Invite by email
      Invitee

      This note has no invitees

    • Publish Note

      Share your work with the world Congratulations! 🎉 Your note is out in the world Publish Note

      Your note will be visible on your profile and discoverable by anyone.
      Your note is now live.
      This note is visible on your profile and discoverable online.
      Everyone on the web can find and read all notes of this public team.
      See published notes
      Unpublish note
      Please check the box to agree to the Community Guidelines.
      View profile
    • Commenting
      Permission
      Disabled Forbidden Owners Signed-in users Everyone
    • Enable
    • Permission
      • Forbidden
      • Owners
      • Signed-in users
      • Everyone
    • Suggest edit
      Permission
      Disabled Forbidden Owners Signed-in users Everyone
    • Enable
    • Permission
      • Forbidden
      • Owners
      • Signed-in users
    • Emoji Reply
    • Enable
    • Versions and GitHub Sync
    • Note settings
    • Note Insights
    • Engagement control
    • Transfer ownership
    • Delete this note
    • Save as template
    • Insert from template
    • Import from
      • Dropbox
      • Google Drive
      • Gist
      • Clipboard
    • Export to
      • Dropbox
      • Google Drive
      • Gist
    • Download
      • Markdown
      • HTML
      • Raw HTML
Menu Note settings Versions and GitHub Sync Note Insights Sharing URL Create Help
Create Create new note Create a note from template
Menu
Options
Engagement control Transfer ownership Delete this note
Import from
Dropbox Google Drive Gist Clipboard
Export to
Dropbox Google Drive Gist
Download
Markdown HTML Raw HTML
Back
Sharing URL Link copied
/edit
View mode
  • Edit mode
  • View mode
  • Book mode
  • Slide mode
Edit mode View mode Book mode Slide mode
Customize slides
Note Permission
Read
Only me
  • Only me
  • Signed-in users
  • Everyone
Only me Signed-in users Everyone
Write
Only me
  • Only me
  • Signed-in users
  • Everyone
Only me Signed-in users Everyone
Engagement control Commenting, Suggest edit, Emoji Reply
  • Invite by email
    Invitee

    This note has no invitees

  • Publish Note

    Share your work with the world Congratulations! 🎉 Your note is out in the world Publish Note

    Your note will be visible on your profile and discoverable by anyone.
    Your note is now live.
    This note is visible on your profile and discoverable online.
    Everyone on the web can find and read all notes of this public team.
    See published notes
    Unpublish note
    Please check the box to agree to the Community Guidelines.
    View profile
    Engagement control
    Commenting
    Permission
    Disabled Forbidden Owners Signed-in users Everyone
    Enable
    Permission
    • Forbidden
    • Owners
    • Signed-in users
    • Everyone
    Suggest edit
    Permission
    Disabled Forbidden Owners Signed-in users Everyone
    Enable
    Permission
    • Forbidden
    • Owners
    • Signed-in users
    Emoji Reply
    Enable
    Import from Dropbox Google Drive Gist Clipboard
       owned this note    owned this note      
    Published Linked with GitHub
    Subscribed
    • Any changes
      Be notified of any changes
    • Mention me
      Be notified of mention me
    • Unsubscribe
    Subscribe
    # 2016q3 Homework3 (mergesort-concurrent) contributed by <`carolc0708`> * [作業說明A07: mergesort-concurrent](/s/rJ-GWtJ0) * [WEEK3 課程筆記](/s/r1ddPnNC) ### 預期目標 * 作為 [concurrency](/s/H10MXXoT) 的展示案例 * 學習 POSIX Thread Programming,特別是 [synchronization object](https://docs.oracle.com/cd/E19683-01/806-6867/6jfpgdcnd/index.html) * 為日後效能分析和 scalability 研究建構基礎建設 * 學習程式品質分析和相關的開發工具 ### 作業要求 - [ ] 將 merge sort 的實做改為可接受 [phonebook-concurrent](https://github.com/sysprog21/phonebook-concurrent) 的 35 萬筆資料輸入的資料檔 * 字典檔資料需要事先用 `sort -R` 處理過 * 思考如何得到均勻分佈的亂數排列,並且設計自動測試的機制 - [ ]研究 thread pool 管理 worker thread 的實做,提出實做層面的不足,並且參照 [concurrent-ll](https://github.com/jserv/concurrent-ll),提出 lock-free 的實做 - [ ] 學習 [concurrent-ll](https://github.com/jserv/concurrent-ll) (concurrent linked-list 實作) 的 scalability 分析方式,透過 gnuplot 製圖比較 merge sort 在不同執行緒數量操作的效能 * 注意到 linked list 每個節點配置的記憶體往往是不連續,思考這對效能分析的影響 - [ ] 一併嘗試重構 (refactor) 給定的程式碼,使得程式更容易閱讀和維護。延續 [A05: introspect](/s/BkhIF92p),不只是在共筆上用文字提出良性詳盡的批評,也該反映在程式碼的變革 * 共筆的內容儘量用 GraphViz 製作 * 截止日期: * 08:00AM Oct 7, 2016 (含) 之前 * 越早在 GitHub 上有動態、越早接受 code review,評分越高 ### 挑戰題 * 引入 C11 本身的 thread.h 實作多執行並用 `_Atomic` 改寫。參考資料: * [Atomics in C programming](https://www2.informatik.hu-berlin.de/~keil/docs/keil_-_c11_atomics_20140202.pdf) * [You Can Do Any Kind of Atomic Read-Modify-Write Operation](http://preshing.com/20150402/you-can-do-any-kind-of-atomic-read-modify-write-operation/) * [Boost atomic examples](http://www.boost.org/doc/libs/1_61_0/doc/html/atomic/usage_examples.html) ### Hint 計算merge sort 的空間複雜度,建立memory pool給merge sort 使用,分析完後配置最大空間,但是同一時間可能會同時有兩個地方在merge,防範overlap ,要把記憶體切成若干區段,最後的資料搬動(找memory pool的實作),malloc的回收排序的資料在記憶體中不連續(找link-list cache) ## [Thread Models](http://maxim.int.ru/bookshelf/PthreadsProgram/htm/r_19.html) #### The boss/worker model ![](https://i.imgur.com/ipAz6Jd.gif) * The boss creates each worker thread, assigns it tasks, and, if necessary, waits for it to finish. * In the **pthread_create** call it uses to create each worker thread, the boss **specifies the task-related routine** the thread will execute. * After creating each worker, the boss returns to the top of its loop to process the next request. * Once finished, each worker thread can be made responsible for any output resulting from its task, or it can **synchronize with the boss** and let it handle its output. ``` main() /* The boss */ { forever { get a request switch request case X : pthread_create( ... taskX) case Y : pthread_create( ... taskY) . . . } } taskX() /* Workers processing requests of type X */ { perform the task, synchronize as needed if accessing shared resources done } taskY() /* Workers processing requests of type Y */ { perform the task, synchronize as needed if accessing shared resources done } ``` * Alternatively, the boss could save some run-time overhead by **creating all worker threads up front.** ==> thread pool ``` main() /* The boss */ { for the number of workers pthread_create( ... pool_base ) forever { get a request place request in work queue signal sleeping threads that work is available } } pool_base() /* All workers */ { forever { sleep until awoken by boss dequeue a work request switch case request X: taskX() case request Y: taskY() . . . } } ``` * it is important that you **minimize the frequency with which the boss and workers communicate** * can't create too many interdependencies among the workers, or all workers will **suffer a slowdown**. #### The peer model (workcrew model) ![](https://i.imgur.com/NiU9afg.gif) * all threads work concurrently on their tasks without a specific leader. * one thread must create all the other peer threads when the program starts, this thread then **acts as just another peer thread** that processes requests, or **suspends itself waiting** for the other peers to finish. * this model makes each thread responsible for its own input: A peer knows its own input ahead of time, has its own private way of obtaining its input, or shares a single point of input with other peers. ``` main() { pthread_create( ... thread1 ... task1 ) pthread_create( ... thread2 ... task2 ) . . . signal all workers to start wait for all workers to finish do any clean up } task1() { wait for start perform task, synchronize as needed if accessing shared resources done } task2() { wait for start perform task, synchronize as needed if accessing shared resources done } ``` * suitable for applications that have **a fixed or well-defined set of inputs**, such as matrix multipliers, parallel database search engines, and prime number generators. ==> e.g. 一個地區分很多區塊,每個區塊的事情丟給不同的 thread * Because there is no boss, peers themselves must **synchronize their access to any common sources of input**. * However, like workers in the boss/worker model, peers can also slow down if they must frequently synchronize to access shared resources. #### The pipeline model ![](https://i.imgur.com/nzwEu9c.gif) * The pipeline model assumes: * A long stream of input * A series of suboperations (known as stages or filters) through which every unit of input must be processed * Each processing stage can handle a different unit of input at a time * Applications in which the pipeline might be useful are image processing and text processing or any application that can be broken down into a series of filter steps on a stream of input. ``` main() { pthread_create( ... stage1 ) pthread_create( ... stage2 ) . . . wait for all pipeline threads to finish do any clean up } stage1() { forever { get next input for the program do stage 1 processing of the input pass result to next thread in pipeline } } stage2() { forever { get input from previous thread in pipeline do stage 2 processing of the input pass result to next thread in pipeline } } stageN() { forever { get input from previous thread in pipeline do stage N processing to the input pass result to program output } } ``` * We could add **multiplexing or demultiplexing** to this pipeline, allowing multiple threads to work in parallel on a particular stage. * We could also dynamically configure the pipeline at run time, having it **create and terminate stages** (and the threads to service them) as needed. * the overall throughput of a pipeline is limited by **the thread that processes its slowest stage.** * when designing, programmer should aim at **balancing the work** to be performed across all stages; that is, all stages should take about the same amount of time to complete. ## Mutrace: 分析 mutex contention * **Lock contention**: this occurs whenever **one process or thread attempts to acquire a lock held by another process or thread.** The more **fine-grained** the available locks, the less likely one process/thread will request a lock held by the other. (For example, locking a row rather than the entire table, or locking a cell rather than the entire row.) * mutrace v.s. valgrind/drd In contrast to valgrind/drd it **does not virtualize the CPU instruction set**, making it a lot faster. In fact, the hooks mutrace relies on to profile mutex operations should only minimally influence application runtime. Mutrace is **not useful for finding synchronizations bugs**, it is solely useful for profiling locks. * About mutrace * mutrace is implemented entirely in **userspace**. * build your application with `-rdynamic` to make the backtraces mutrace generates useful. * `-r`| --track-rt: checks on each mutex operation wheter it is executed by a realtime thread or not. =>use this to track down which mutexes are good candidates for **priority inheritance**. * `matrace` that can be used to track down **memory allocation** operations in realtime threads. * 將原始碼抓下來看有更多詳細內容可以參考 ``` $ git clone http://git.0pointer.net/clone/mutrace.git ``` * 其中 `mutrace.in` 裡面可以看到其他不同 flags 的功能 * 也有 `matrace` 的內容可以參考 * 安裝 mutrace ``` $ sudo apt-get install mutrace ``` * 在 `mergesort-concurrent` 程式執行看看: ``` (for i in {1..8}; do echo $RANDOM; done) | mutrace ./sort 4 8 ``` 結果如下: ``` mutrace: 0.2 sucessfully initialized for process sort (pid 4814). input unsorted data line-by-line sorted results: [9602] [10669] [11911] [16147] [19714] [22776] [22870] [32443] mutrace: Showing statistics for process sort (pid 4814). mutrace: 3 mutexes used. Mutex #1 (0x0x216c9b0) first referenced by: /usr/lib/mutrace/libmutrace.so(pthread_mutex_init+0xf2) [0x7f5a530484b2] ./sort(tqueue_init+0x38) [0x401277] ./sort(tpool_init+0x6a) [0x4014cc] ./sort(main+0x161) [0x401c74] /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf0) [0x7f5a52a7f830] Mutex #0 (0x0x7f5a5064e860) first referenced by: /usr/lib/mutrace/libmutrace.so(pthread_mutex_lock+0x49) [0x7f5a530486b9] /lib/x86_64-linux-gnu/libgcc_s.so.1(_Unwind_Find_FDE+0x2c) [0x7f5a5044afec] [(nil)] mutrace: Showing 2 most contended mutexes: Mutex # Locked Changed Cont. tot.Time[ms] avg.Time[ms] max.Time[ms] Flags 1 62 5 1 0.005 0.000 0.001 Mx.--. 0 20 3 0 0.002 0.000 0.000 M-.--. |||||| /||||| Object: M = Mutex, W = RWLock /|||| State: x = dead, ! = inconsistent /||| Use: R = used in realtime thread /|| Mutex Type: r = RECURSIVE, e = ERRRORCHECK, a = ADAPTIVE /| Mutex Protocol: i = INHERIT, p = PROTECT / RWLock Kind: r = PREFER_READER, w = PREFER_WRITER, W = PREFER_WRITER_NONREC mutrace: Note that the flags column R is only valid in --track-rt mode! mutrace: Total runtime is 0.375 ms. mutrace: Results for SMP with 4 processors. ``` * 表格意思: * `Locked`: how often the mutex was locked during the entire runtime. * `Changed`: how often the owning thread of the mutex changed. =>If the number is high this means the risk of contention is also high. * `Cont.`: how often the lock was already taken when we tried to take it and we had to wait. ==> Mutex #1 is the most contended lock * `tot.Time[ms]`: for how long during the entire runtime the lock was locked * `avg.Time[ms]`: the average lock time * `max.Time[ms]`: the longest time the lock was held. * `Flags`: what kind of mutex this is (recursive, normal or otherwise). :::info 可用 `addr2line` 來找到實際對應的程式行數: * 注意: 要確保編譯時加入 `-g` 參數,確保包含 debug info 的執行檔正確產生。以 `(tqueue_init+0x38) [0x401277]` 這個地址來說,對應的原始程式碼為: ``` $ addr2line -e sort 0x401277 /home/carol/mergesort-concurrent/threadpool.c:15 ``` ::: >我用 0x38 結果是 `??:0` [name=Carol Chen] ## W2-QA: 以 linked-list 實作 recursive merge sort * 將 [Linked List Bubble Sort](http://faculty.salina.k-state.edu/tim/CMST302/study_guide/topic7/bubble.html) 裡頭,對 linked list 的排序演算法從 recursive bubble sort 更換為 recursive merge sort --- * 參考 [Merge Sort for Linked Lists](http://www.geeksforgeeks.org/merge-sort-for-linked-list/) :::info **MergeSort(headRef)** 1) If head is NULL or there is only one element in the Linked List then return. 2) Else divide the linked list into two halves. FrontBackSplit(head, &a, &b); /* a and b are two halves */ 3) Sort the two halves a and b. MergeSort(a); MergeSort(b); 4) Merge the sorted a and b (using SortedMerge() discussed here) and update the head pointer using headRef. *headRef = SortedMerge(a, b); ::: ## [mergesort-concurrent](https://github.com/sysprog21/mergesort-concurrent) 程式碼分析 & refactoring ### 資料結構: linked-list #### `list.h` ```C typedef intptr_t val_t; ``` * 定義 `intptr_t` 的目的是讓 node 中的資料可以是指標型態,也可以是一般資料型態。 * [When/Why to use intptr_t](http://stackoverflow.com/questions/6326338/why-when-to-use-intptr-t-for-type-casting-in-c) in type-casting: 我們再將 pointer 轉成 integer 時,32 bits 或 64 bits 的 machine 下會有 pointer 長度的差別。會使用到 `intptr_t`,是因為它能夠做到以下: ``` #ifdef MACHINE 64 typedef long intptr; #else // MACHINE 32 typedef int intptr; #endif ``` 往後我們只需 `#include <stdint>` 就能做到上述,免除麻煩的型態轉換。除此之外,也是確保型態轉換上不會出問題的保守作法。 * 參照 Stackoverflow 對於 `intptr_t` 的[討論](http://stackoverflow.com/questions/35071200/what-is-the-use-of-intptr-t): 相較於 `void *`,`intptr_t` 的位址可以做 bitwise operation 而 `void *` 不能。然而,若要做 bitwise operation,使用 `uintptr_t` 會比 `intptr_t` 更佳。 ```C typedef struct node { val_t data; struct node *next; } node_t; ``` 一般的 linked-list 資料型態,每個 node 當中會有儲存自己 `data` 的部分和指向下一個 node 的指標 `next`。 ```C typedef struct llist { node_t *head; uint32_t size; } llist_t; ``` 一串 linked-list 必須要紀錄 `head` 的位置,在這裡還多將 linked-list 的長度 `size` 包入資料結構中。 #### `list.c` ```C //在 next 前方放入一個新的 node,並回傳指向該 node 之指標 static node_t *node_new(val_t val, node_t *next); //初始化一串新的 linked-list llist_t *list_new(); ``` ```C //在 linked-list 的尾端加入一個新的 node int list_add(llist_t *the_list, val_t val); ``` * 註解中提到: 若傳入的 val 已經存在,則不再重新建立新的 node,原程式碼中並沒有相對應的檢查機制。 ==> 設計一套檢查機制: ```C= //check if value already exist node_t *cur = list->head; while(cur){ if(cur-> data == val) return list; cur = cur->next; } ``` * `list_add` 做完之後回傳 0 並沒有太大的意義,回傳值須跟著狀況調整。 ==> 改為回傳建好後新的 `llist` 指標 ```C //回傳 linked-list 中,第 index 個 node 的指標 node_t *list_nth(llist_t *the_list, uint32_t index); ``` * index out of range 條件判斷有誤,在這裡依照習慣把 nth 的 n 當作 array 的 index 來看,排序為 0~`list->size-1` ==> 判斷 out of range 條件改為 `if(idx > list->size-1)` ==> 更改變數名稱(因個人習慣): 移動指標 `head` 改為 `cur` ==> 多加一行 `if(idx == 0) return cur;` ```C //印出整串 linked-list 內容 void list_print(llist_t *the_list); ``` * ==FIXME 處提到須先驗證 list 是否已排序再印出== --- ### Threadpool #### `threadpool.h` ```C typedef struct _task { void (*func)(void *); void *arg; struct _task *next, *last; } task_t; ``` 定義 task 型態,基本上會包含工作函式`*func` 和傳入參數 `*arg`。除此之外,task queue 結構中,單一 task 的前後指標 `*next` 和 `*last` 也一併定義在這裡。 ```C typedef struct { task_t *head, *tail; pthread_mutex_t mutex; pthread_cond_t cond; uint32_t size; } tqueue_t; ``` task queue 的型態。 紀錄 `*head` 和 `*tail` 的 task、queue 的大小 `size`, `mutex` 以及 condition variable `cond`。 ```C typedef struct { pthread_t *threads; uint32_t count; tqueue_t *queue; } tpool_t; ``` threadpool 的型態。當中定義了指向所有 pthread 的指標 `*threads`、 thread 的數目 `count`、指向 task queue 的指標 `*queue`。 #### `threadpool.c` ```C //釋放單一 task 先前所配置的記憶體空間 int task_free(task_t *the_task); ``` * 該函式回傳值無意義,須隨記憶體釋放情況做調整 * 無法確定 task 的記憶體空間是由 malloc 配置,free 可能會出現問題 ```C //初始化 task queue int tqueue_init(tqueue_t *the_queue); ``` * 該函式回傳值無意義,須隨初始化情況做調整 * 未進行所傳入 `tqueue_t` 的記憶體空間配置 * 在使用前配置好空間再傳入指標 * 直接在函式中配置空間 ==> malloc 配置新的記憶體空間,成功回傳 0,失敗回傳 -1 ```C //將 task queue 中的單一 task 取出派遣給 thread task_t *tqueue_pop(tqueue_t *the_queue); ``` * Queue 應該是 FIFO,這裡 pop 出來的不是頭而是尾 ==> 把 `tail` 改成 `head` ```C= task_t *tqueue_pop(tqueue_t *the_queue) { task_t *ret; pthread_mutex_lock(&(the_queue->mutex)); ret = the_queue->head; if (ret) { the_queue->head = ret->last;//if it is NULL, then let it be if (the_queue->head) { the_queue->head->next = NULL; } else { the_queue->tail = NULL; } the_queue->size--; } pthread_mutex_unlock(&(the_queue->mutex)); return ret; } ``` ```C //回傳 task queue 當前的大小 uint32_t tqueue_size(tqueue_t *the_queue); ``` * 此函式在程式中沒有被呼叫過,可以刪除。 ```C //將新的 task 加入 task queue 中 int tqueue_push(tqueue_t *the_queue, task_t *task); ``` * 回傳值沒有處理錯誤情況 ==> 若 task 不存在則回傳 -1,成功 push 則回傳 0 * 新加入的 task 應該置於 queue 的尾端,這裡放在頭 ==> 將 `head` 改為 `tail` ```C= int tqueue_push(tqueue_t *the_queue, task_t *task) { if(task == NULL) return -1; pthread_mutex_lock(&(the_queue->mutex)); task->next = NULL; task->last = the_queue->tail; if (the_queue->tail) the_queue->tail->next = task; the_queue->tail = task; //only one task in queue if (the_queue->size++ == 0) the_queue->head = task;//only one task in queue pthread_mutex_unlock(&(the_queue->mutex)); return 0; } ``` ```C //釋放已配置的 task queue 空間 int tqueue_free(tqueue_t *the_queue); ``` * 此函式回傳值無意義 ==> 多加了判斷 `the_queue` 是否存在的情況,否則回傳 -1,並且不在執行下面釋放記憶體部分 * 少了 condition variable 的 destroy (有被初始化就需 destroy) ==>`thread_cond_detroy(&(the_queue->cond));` > 在 phonebook-concurrent 作業中,參照 [mbrossard/threadpool](https://github.com/mbrossard/threadpool/blob/master/src/threadpool.c),在 destroy 二者之前有先 lock 住。 不確定為什麼要這樣做,不這樣做會怎樣? 這邊是先沒有 lock。 ```C //初始化 threadpool int tpool_init(tpool_t *the_pool, uint32_t count, void *(*func)(void *)); ``` ```C //釋放 threadpool 記憶體空間 int tpool_free(tpool_t *the_pool); ``` * 此函式回傳值無意義 ==> 增加 `the_pool` 是否存在的判斷,以及`pthread_join()` 是否成功的判斷 ```C= int tpool_free(tpool_t *the_pool) { if(the_pool == NULL) return -1; for (uint32_t i = 0; i < the_pool->count; ++i) if(pthread_join(the_pool->threads[i], NULL) != 0) return -1; free(the_pool->threads); tqueue_free(the_pool->queue); return 0; } ``` :::info **Threadpool 問題總整理** 1. 定義了 condition variable 但從頭到尾沒有使用 2. shutdown 的方式沒有定義(立即 shutdown 或是 等事情做完再 shutdown) ::: --- ### `main.c` * 參考 `LanKuDot` 同學[共筆](https://hackmd.io/s/S1ezGhIA),看了好久才看懂 * 程式主要可以分兩個部分: * 未達 max_cut: `cut_func()` recursive 切 左右 list * 達 max_cut 之後: `merge_sort()` recursive 切左右 list 並 merge 回來 ```C //dispatch the division task to queue void cut_func(void *data); ``` * `main()` 中 `init_tpool()` 後作的**第一件事** * 在 `main()` 中有定義 `max_cut` = MIN(`thread_count`, `data_count`)-1 * 此函式在 `cut_count` 未達 `max_cut` 或 `list->size<1` 之前,會將 list 對半切,並將 左右 list 和 `cut_func()` 作為 task 的 `arg` 和 `func`, push 到 `t_queue()` 內 * 達到條件之後,不管切到哪裡就直接將傳入的 list 放入 `merge_sort()` 並將結果 `merge()` * 在 critical section 有 lock 的設計 * 在此更改變數名稱 `_list` 和 `list` 為 `llist` 和 `rlist` 增加程式易讀性。也改正左右 list 註解相反錯誤之處。 ```C //merge the cross level list void merge(void *data); ``` * critical section 有做 lock 處理 * 若 _list->size < data_count: 另一邊的 list 還未進入,則先將自己記錄在 `tmp_list`;若進入了,再將 task push 到 `t_queue()`中 * task 的 `arg` : `merge_list(左,右)` * task 的 `func` : `merge()` * 若 _list->size = data_count: 終止條件。全部 sort 完畢,將結果以全域變數 `the_list` 紀錄之,並 `print_list()` 印出 * ==這邊還將空 `func` push 到 `t_queue()` 中== >不太懂這樣做的意義為何? [name=Carol Chen] ``` //cut list to left & right and then merge back llist_t *merge_sort(llist_t *list); ``` * 為 recursive function * 將傳入的 `list` recursive 切左右,直到 `list->size<2` * 回傳透過 `merge_list()` merge 好的結果 ```C //merge left & right list llist_t *merge_list(llist_t *a, llist_t *b); ``` * 比大小來做 `*a` 和 `*b` list 的 merge ```C //grab one task from task queue and excute static void *task_run(void *data); ``` * 相當於 [mbrossard/threadpool](https://github.com/mbrossard/threadpool/blob/master/src/threadpool.c) 中的 `threadpool_thread()`,由 t_queue 中取出一個 task,並派遣給 thread 去執行 * 覺得結構上應該要放在 `tpool.c` 中 * 在此沒有設計 lock 或 condition varible ```C int main(int argc, char const *argv[]); ``` 1. 讀入 `thread_count`、`data_count`,計算 `max_cut` 2. 讀入 input 數字串建立 `list` 3. 初始化 `tpool` 以及其他全域變數 4. `cut_func()` 作為第一個 task,push 到 `t_queue()` 中 ## 設計自動測試實驗 * 參考 `Tempojiji` 同學的[共筆](https://hackmd.io/s/B1-Kytv0) 對於自動測試程式的設計,建立一個 `input_generator.c` 作為產生亂數的程式: ```C= #include <stdio.h> #include <time.h> #include <stdlib.h> int main(int argc, char *argv[]) { int MAX_SIZE = atoi(argv[1]); FILE *fp = fopen("input","w+"); srand((long int)time(NULL)); for(int i = 0; i < MAX_SIZE; i++) fprintf(fp, "%u\n", rand() / 10000); fclose(fp); } ``` * 在 `main.c` 中使用`clock_gettime()` 量測時間並加入這一段: ```C= # if defined(BENCH) FILE *fp = fopen("input","r"); long int data; while((fscanf(fp, "%ld\n", &data)) != EOF){ e = list_add(e,data); } fclose(fp); # endif . . . #if defined(BENCH) fp = fopen("output","a+"); if(thread_count == 1) fprintf(fp, "%d,", data_count); fprintf(fp, "%lf", cpu_time); if(thread_count == 64) fprintf(fp, "\n"); else fprintf(fp,","); fclose(fp); printf("%d %d %lf\n", data_count, thread_count, cpu_time); #endif ``` * 接著修改 `Makefile`: ```C bench: for i in `seq 100 100 800`; do \ ./input_generator $$i; \ for j in 1 2 4 8 16 32 64; do \ ./sort $$j $$i; \ done \ done plot: bench gnuplot runtime.gp ``` * 也要撰寫相對應的 `runtime.gp` 繪圖腳本 * 以下是目前測試結果: ![](https://i.imgur.com/mNP67Qf.png) 可以看到 `thread_num` 越大,時間跳動越明顯。 >不知道為什麼沒有辦法執行到 10000 個數字以上的 sort, >時間就會卡很久在那邊不動... [name=Carol Chen] ## scalability 分析方式 :::info **scalability** 在 wikipedia 上的定義: * An algorithm, design, networking protocol, program, or other system is said to **scale** if it is **suitably efficient and practical when applied to large situations** (e.g. a large input data set, a large number of outputs or users, or a large number of participating nodes in the case of a distributed system). If the design or system fails when a quantity increases, it does not scale. * In practice, if there are a large number of things (n) that affect scaling, then resource requirements (for example, algorithmic time-complexity) must grow less than n^2 as n increases. An example is a search engine, which scales not only for the number of users, but also for the number of objects it indexes. * Scalability refers to the ability of a site to increase in size as demand warrants. 在 Stackoverflow 上面的討論: * Scalability is **the ability of a program to scale**. For example, if you can do something on a small database (say less than 1000 records), a program that is highly scalable would **work well on a small set as well as working well on a large set** (say millions, or billions of records). * It would have a **linear growth of resource requirements**. Look up Big-O notation for more details about how programs can require more computation the larger the data input gets. Something parabolic like Big-O(x^2) is far less efficient with large x inputs than something linear like Big-O(x). ::: --- * 參考 `LanKuDot` 同學的[共筆](https://hackmd.io/s/S1ezGhIA),對於 scalability 的整理,以下是他的筆記: #### 運作 1. 執行時須帶有參數,第一個參數指定 `core`,`shift` 會讓所有參數左移 (原本第二個參數會變第一個) 2. 執行 `config` 及 `lock_exec` script 3. 讀取指定執行程式名到 `prog`,剩下後接的參數到 `params` 4. 運行 1~`core` 核心的測試,並輸出**與 core 1 的執行結果比較** 5. ==一共有四個資訊會輸出 (下標 N 為執行的 core 數) a. #cores:執行的核心數 b. throughput:每單位時間的執行工作數量 c. %linear:$(1-\frac{N-scalaility}{N})\times100\%=(\frac{scalability}{N})\times100\%$,每個 core 的平均 scalability in percent d. scalability:$\frac{throughput_N}{ throughput_1}$,throughput 增加比== 6. 執行 `unlock_exec` script --- #### [concurrent-ll](https://github.com/jserv/concurrent-ll) 對於 scalability 的分析 #### `scalability.sh` * `scalability1.sh` 和 `scalability2.sh` 差在一次可執行的程式數為 1 個 或 2 個 * 執行看看 ``` carol@carol-PC:~/concurrent-ll$ scripts/scalability1.sh all ./out/test-lock -i128 ## Use 4 as the number of cores. ## If not correct, change it in scripts/config #cores throughput %linear scalability 1 309985 100.00 1 2 186428 30.07 0.60 3 195184 20.99 0.63 4 2493 0.20 0.01 ``` ``` carol@carol-PC:~/concurrent-ll$ scripts/scalability2.sh all ./out/test-lock ./out/test-lockfree -i100 ## Use 4 as the number of cores. ## If not correct, change it in scripts/config # ./out/test-lock ./out/test-lockfree #cores throughput %linear scalability throughput %linear scalability 1 310796 100.00 1 464000 100.00 1 2 185595 29.86 0.60 736644 79.38 1.59 3 192483 20.64 0.62 1033290 74.23 2.23 4 8355 0.67 0.03 1039714 56.02 2.24 ``` >後面傳入的這個 `-i128` 和 `-i100` 是什麼意思? [name=Carol Chen] >怎麼用它來測自己的程式? [name=Carol Chen] ## 更改實作使其可接受 phonebook 的資料檔 ## threadpool 使用 condition variable * 原程式的 threadpool 定義了 condition variable 但未使用 * 參考 [mbrossard/threadpool](https://github.com/mbrossard/threadpool/blob/master/src/threadpool.c) 的設計 --- * 在 `tqueue_pop()`: 若定義了 MONITOR 就要把 lock 拿掉。 ```C= task_t *tqueue_pop(tqueue_t *the_queue) { task_t *ret; #ifndef MONITOR pthread_mutex_lock(&(the_queue->mutex)); #endif ret = the_queue->head; if (ret) { the_queue->head = ret->next;//if it is NULL, then let it be if (the_queue->head) { the_queue->head->last = NULL; } else { the_queue->tail = NULL; } the_queue->size--; } #ifndef MONITOR pthread_mutex_unlock(&(the_queue->mutex)); #endif return ret; } ``` * `tqueue_push()` ```C= int tqueue_push(tqueue_t *the_queue, task_t *task) { #ifndef MONITOR if(task == NULL) return -1; #endif pthread_mutex_lock(&(the_queue->mutex)); task->next = NULL; task->last = the_queue->tail; if (the_queue->tail) the_queue->tail->next = task; the_queue->tail = task; //only one task in queue if (the_queue->size++ == 0) the_queue->head = task;//only one task in queue #if defined(MONITOR) pthread_cond_signal(&(the_queue->cond)); #endif pthread_mutex_unlock(&(the_queue->mutex)); return 0; } ``` * `tqueue_free()` ```C= int tqueue_free(tqueue_t *the_queue) { if(the_queue == NULL) return -1; task_t *cur = the_queue->head; while (cur) { the_queue->head = the_queue->head->next; free(cur); cur = the_queue->head; } #if defined(MONITOR) pthread_mutex_lock(&(the_queue->mutex)); #endif pthread_mutex_destroy(&(the_queue->mutex)); pthread_cond_destroy(&(the_queue->cond)); return 0; } ``` * `tpool_free()` ```C= int tpool_free(tpool_t *the_pool) { #if defined(MONITOR) pthread_cond_broadcast(&(the_pool->queue->cond)); pthread_mutex_unlock(&(the_pool->queue->mutex)); #endif if(the_pool == NULL) return -1; for (uint32_t i = 0; i < the_pool->count; ++i) if(pthread_join(the_pool->threads[i], NULL) != 0) return -1; free(the_pool->threads); tqueue_free(the_pool->queue); return 0; } ``` * `task_run()` ```C= static void *task_run(void *data) { (void) data; while (1) { #if defined(MONITOR) pthread_mutex_lock(&(pool->queue->mutex)); while( pool->queue->size == 0) { pthread_cond_wait(&(pool->queue->cond), &(pool->queue->mutex)); } if (pool->queue->size == 0) break; #endif task_t *_task = tqueue_pop(pool->queue); #if defined(MONITOR) pthread_mutex_unlock(&(pool->queue->mutex)); #endif if (_task) { if (!_task->func) {//if function does not exist, push task back to queue tqueue_push(pool->queue, _task); break; } else { _task->func(_task->arg);//excute task free(_task); } } } pthread_exit(NULL); } ``` --- * 結果入下: ![](https://i.imgur.com/jKkaw3A.png) 比原先快很多,可以看到 y 軸最高只有 0.014 秒 * 和原版的比較: ![](https://i.imgur.com/V3yAGxy.png) ## 改良 threadpool 為 lock-free 版本 * 參考 [concurrent-ll](https://github.com/jserv/concurrent-ll) 對於 lock-free threadpool 的做法 * [Hungry-bird](https://github.com/jserv/hungry-birds) lock-free queue 的做法 * [Lock-free linked-list slides](http://www.slideshare.net/anniyappa/linked-lists-28793865) ## 修改 queue 結構比較其效能 * 參考 `第 20 組` 同學們 [共筆](https://hackmd.io/s/SJUfbulyx),將 queue 的結構改成 ring structure 並比較效能優劣

    Import from clipboard

    Paste your markdown or webpage here...

    Advanced permission required

    Your current role can only read. Ask the system administrator to acquire write and comment permission.

    This team is disabled

    Sorry, this team is disabled. You can't edit this note.

    This note is locked

    Sorry, only owner can edit this note.

    Reach the limit

    Sorry, you've reached the max length this note can be.
    Please reduce the content or divide it to more notes, thank you!

    Import from Gist

    Import from Snippet

    or

    Export to Snippet

    Are you sure?

    Do you really want to delete this note?
    All users will lose their connection.

    Create a note from template

    Create a note from template

    Oops...
    This template has been removed or transferred.
    Upgrade
    All
    • All
    • Team
    No template.

    Create a template

    Upgrade

    Delete template

    Do you really want to delete this template?
    Turn this template into a regular note and keep its content, versions, and comments.

    This page need refresh

    You have an incompatible client version.
    Refresh to update.
    New version available!
    See releases notes here
    Refresh to enjoy new features.
    Your user state has changed.
    Refresh to load new user state.

    Sign in

    Forgot password

    or

    By clicking below, you agree to our terms of service.

    Sign in via Facebook Sign in via Twitter Sign in via GitHub Sign in via Dropbox Sign in with Wallet
    Wallet ( )
    Connect another wallet

    New to HackMD? Sign up

    Help

    • English
    • 中文
    • Français
    • Deutsch
    • 日本語
    • Español
    • Català
    • Ελληνικά
    • Português
    • italiano
    • Türkçe
    • Русский
    • Nederlands
    • hrvatski jezik
    • język polski
    • Українська
    • हिन्दी
    • svenska
    • Esperanto
    • dansk

    Documents

    Help & Tutorial

    How to use Book mode

    Slide Example

    API Docs

    Edit in VSCode

    Install browser extension

    Contacts

    Feedback

    Discord

    Send us email

    Resources

    Releases

    Pricing

    Blog

    Policy

    Terms

    Privacy

    Cheatsheet

    Syntax Example Reference
    # Header Header 基本排版
    - Unordered List
    • Unordered List
    1. Ordered List
    1. Ordered List
    - [ ] Todo List
    • Todo List
    > Blockquote
    Blockquote
    **Bold font** Bold font
    *Italics font* Italics font
    ~~Strikethrough~~ Strikethrough
    19^th^ 19th
    H~2~O H2O
    ++Inserted text++ Inserted text
    ==Marked text== Marked text
    [link text](https:// "title") Link
    ![image alt](https:// "title") Image
    `Code` Code 在筆記中貼入程式碼
    ```javascript
    var i = 0;
    ```
    var i = 0;
    :smile: :smile: Emoji list
    {%youtube youtube_id %} Externals
    $L^aT_eX$ LaTeX
    :::info
    This is a alert area.
    :::

    This is a alert area.

    Versions and GitHub Sync
    Get Full History Access

    • Edit version name
    • Delete

    revision author avatar     named on  

    More Less

    Note content is identical to the latest version.
    Compare
      Choose a version
      No search result
      Version not found
    Sign in to link this note to GitHub
    Learn more
    This note is not linked with GitHub
     

    Feedback

    Submission failed, please try again

    Thanks for your support.

    On a scale of 0-10, how likely is it that you would recommend HackMD to your friends, family or business associates?

    Please give us some advice and help us improve HackMD.

     

    Thanks for your feedback

    Remove version name

    Do you want to remove this version name and description?

    Transfer ownership

    Transfer to
      Warning: is a public team. If you transfer note to this team, everyone on the web can find and read this note.

        Link with GitHub

        Please authorize HackMD on GitHub
        • Please sign in to GitHub and install the HackMD app on your GitHub repo.
        • HackMD links with GitHub through a GitHub App. You can choose which repo to install our App.
        Learn more  Sign in to GitHub

        Push the note to GitHub Push to GitHub Pull a file from GitHub

          Authorize again
         

        Choose which file to push to

        Select repo
        Refresh Authorize more repos
        Select branch
        Select file
        Select branch
        Choose version(s) to push
        • Save a new version and push
        • Choose from existing versions
        Include title and tags
        Available push count

        Pull from GitHub

         
        File from GitHub
        File from HackMD

        GitHub Link Settings

        File linked

        Linked by
        File path
        Last synced branch
        Available push count

        Danger Zone

        Unlink
        You will no longer receive notification when GitHub file changes after unlink.

        Syncing

        Push failed

        Push successfully