2017q1 Homework4 (phonebook_concurrent)

contributed by<jack81306>

tags:`jack81306`

開發環境

Architecture:          x86_64
CPU 作業模式：    32-bit, 64-bit
Byte Order:            Little Endian
CPU(s):                4
On-line CPU(s) list:   0-3
每核心執行緒數：2
每通訊端核心數：2
Socket(s):             1
NUMA 節點：         1
供應商識別號：  GenuineIntel
CPU 家族：          6
型號：              60
Model name:            Intel(R) Core(TM) i5-4210M CPU @ 2.60GHz
製程：              3
CPU MHz：             2463.049
CPU max MHz:           3200.0000
CPU min MHz:           800.0000
BogoMIPS:              5187.96
虛擬：              VT-x
L1d 快取：          32K
L1i 快取：          32K
L2 快取：           256K
L3 快取：           3072K
NUMA node0 CPU(s):     0-3

理解 concurrency

Concurrency 是指能處理多個同時性活動的能力，事件之間不一定要同一時刻發生。
Parallelism 是指同時發生的兩個事件，具有平行的含義，而 Concurrency 則不一定平行.
Concurrency 和 Parallelism 兩者不同但是卻有相關,一個重點是組合,一個重點是執行.平行是 Concurrency 的一種處理方式.

閱讀材料

閱讀程式碼

mmap

mmap，它把文件內容映射到一段記憶體上(準確說是虛擬記憶體上),通過對這段記憶體的讀取和修改,實現對文件的讀取和修改.Linux 允許將檔案對映到記憶體空間中,如此可以產生一個在檔案資料及記憶體資料一對一的對映,例如字型檔的存取.

textalign

將檔案中每一行的長度都固定成 16 byte,如果名子的長度小於 16 則自動再後面補上 "\0",並將結果另外存於一個檔案內.

malloc

因為檔案已經經過 textalign 的調整了,所以可以寄算出總共有幾個 entry 並一次 malloc 完成.

pthread_setconcurrency

設定 pthread 的 concurrency level.但是注意,這裡設定的值只是一個提示,因此系統不一定會依照設定來進行處理.

總體理解

opt 版本使用的方法是先把原本的文字檔依照 16 byte align.使用 mmap 把資料映射到記憶體後,接著之後用 pthread 來進行平行處理,同時從記憶體內讀取資料並建立 list ,最後再把 list 連接起來.

不過有一點不理解的是連接的時候pHead = thread_args[i]->lEntry_head->pNext;為何要把 phead 指向 Entry_head->pnext,不是應該指向 Entry_head 嗎?

程式碼是寫給你改的，不是給你「舉燭」背誦用 –jserv

實驗紀錄

原始程式碼測試

phonebook_orig

Performance counter stats for './phonebook_orig' (5 runs):

       321,428,499      instructions              #    1.13  insn per cycle           ( +-  0.07% )
       284,787,186      cycles                                                        ( +-  9.79% )
         1,734,215      cache-references                                              ( +-  2.03% )
         1,438,454      cache-misses              #   82.946 % of all cache refs      ( +-  5.25% )

       0.094865715 seconds time elapsed                                          ( +-  9.15% )

phonebook_opt

Performance counter stats for './phonebook_opt' (5 runs):

       247,726,740      instructions              #    1.08  insn per cycle           ( +-  0.12% )
       229,300,495      cycles                                                        ( +-  1.34% )
         1,307,506      cache-references                                              ( +-  1.42% )
           462,718      cache-misses              #   35.389 % of all cache refs      ( +-  1.67% )

       0.106687553 seconds time elapsed

優化後的版本下降了許多的 instructions 和 cache-misses.
圖表分析
使用 pthread 優化 append 後的版本明顯的比原始版本快上了許多.

正確性驗證

為了檢測優化後的方法是否正確,因此比較原始方法和優化後的方法結點數是否相同.
phonebook_orig

size of entry : 136 bytes
execution time of append() : 0.044130 sec
execution time of findName() : 0.005841 sec
entry amount:349901

phonebook_opt

size of entry : 24 bytes
execution time of append() : 0.004617 sec
execution time of findName() : 0.006699 sec
entry amount:349896

發現優化後的方法少了5個結點,我認為應是連接時發生了錯誤,因為他每次連到的是 head 的下一個結點,因此會漏掉每一個 thread 中的 head 結點,所以將 pHead = thread_args[i]->lEntry_head->pNext 改為 pHead = thread_args[i]->lEntry_head 試試看.
修正後仍有一個誤差暫時找不出結果來

size of entry : 24 bytes
execution time of append() : 0.003200 sec
execution time of findName() : 0.003396 sec
entry amount:349900

注意結束的條件，原本的程式邏輯有誤 –jserv

deleteName 實作

phonebook_orig

execution time of append() : 0.049577 sec
execution time of findName() : 0.005622 sec
execution time of deleteName() : 0.005625 sec

phonebook_opt

execution time of append() : 0.003365 sec
execution time of findName() : 0.003596 sec
execution time of deleteName() : 0.004047 sec

因為 deleteName 只是先 findName 再處理結點之間的關係,所以執行的時間差不多.

不要只看了少數案例就推斷「時間差不多」，說話要精確，你設計實驗來檢驗了嗎？ –jserv

圖表分析

Thread pool 優化

閱讀材料

實驗結果

感覺加入 theadpool 並沒有提升太大的效能,我想是因為 thread 的數量過少的關係,且 threadpool 較適合常常有新的任務加進來才會比較有效果.

不要假裝自己是文青，用「感覺」來發語，我們理工人說話要精準。
也要注意 lock contention，用 mutrace 做實驗 –jserv

運行時間

execution time of append() : 0.004075 sec
execution time of findName() : 0.003516 sec
execution time of deleteName() : 0.003287 sec

圖表比較

2017q1 Homework4 (phonebook_concurrent)

tags:jack81306

開發環境

理解 concurrency

閱讀材料

閱讀程式碼

mmap

textalign

malloc

pthread_setconcurrency

總體理解

實驗紀錄

原始程式碼測試

正確性驗證

deleteName 實作

圖表分析

Thread pool 優化

閱讀材料

實驗結果

tags:`jack81306`