2021q1 Homework4 (refinement)

# 2021q1 Homework4 (refinement) contributed by < `WayneLin1992` > ###### tags: `linux2021` ## 學習到的知識 ### lab0: 1. valgrind 2. Address Sanitize 3. Github 4. Generic 5. coroutine 原理 :::info TODO: 實作 coroutine ::: ### quiz1: 1. hackMD 2. quick sort :::info TODO: quick sort in linux kernel ::: ### quiz2: 1. doubly linked list in linux 2. typrof 與 container_of 3. merge sort 4. merge sort in linked list 5. perf 6. bitwise 7. clz 8. linux naming __clz 與 clz 9. slab allocator 10. NUMA 11. string interning :::info TODO: hash_table ::: ### quiz3: 1. bit-field 2. union struct 4. string in c 5. std::string :::info TODO: string interning ::: ### quiz4: 1. concurrency 2. pthread :::info TODO: atomic threadpool ::: ## 03/18 一對一討論 ### 報告專業性不夠 >只能說認真，但內容不夠，不必急著交作業，但要專業 >這世界是高度分工的，要的是有足夠專業的人。 >應該盡可能說明清楚，而不是單純的翻譯，最好還可以搭配一些實驗 :::info 改進方式: 1. 不在講求速度，而是不斷思考自己是否了解內容，盡可能思考改進方式 2. 每天會把一堂課看完，符合課程的需求 3. ::: ### 目標要明確 >公司需要的還是能夠解決問題的人 >要確立好目標 >台灣公司要求未必清楚，可以看國外的職缺 :::info 搜尋工作職缺，並注意其中的就業要求，並記錄下來，以自身條件為出發，了解到能應徵的工作。 ::: ### quiz2 #### slab allocator >講解還是不夠清楚，要說明清楚，了解度也不夠。 :::info [slab 記憶體配置](https://events.static.linuxfound.org/sites/events/files/slides/slaballocators.pdf): cache friendly and benchmark friendly 的記憶體配置方式 cache friendly 存取的位置接近，符合好的 locality 。 benchmark friendly 接受快速存快速釋放， benchmark 的特性。目的: page 記憶體配置由 page allocator，但很多時候不需要 page 那麼大的空間，這時就可以使用 slab allocator 配置 page 以下的記憶體空間，而且 cache 對 slab 的配置是很敏感的，而且 slab 所存入將是 object ，如下圖所示，以前為 linux 的預設記憶體配置方式，但現在 linux 預設記憶體配置方式為 slub allocator 他的整體執行時間較短。 ![](https://i.imgur.com/3ntlcVL.jpg) 操作: `static struct kmem_cache` 存指向下一個 kmem_cache 的指標， object 大小。 `struct kmem_cache_node *node[MAN_NUMNODES]` 其中將有 slab 的狀態，狀態可以分三種 full, partial, empty 代表 slab 的空間。 `struct array_cache __percpu *cpu_cache` 避免找空隙時需要 O(n) 所以會在 array 中放入空隙的 index 方便搜索空隙。 `void *freelist` page 空隙的位置。 `void *s_mem` 指向 page 第一個 object 。 ![](https://i.imgur.com/7eeIW3e.png) 資料來源: [the SLUB allocator](https://lwn.net/Articles/229984/) ::: #### NUMA 是什麼及對 slab allocator 的影響 :::info NUMA (Non-uniform memory access): 在 SMP (對稱多處理器，又稱為 UMA uniform memory access) 時，要使用單一 BUS 向記憶體提取資料，而處理器處理資料的速度高於記憶體給資料速度，處理器會產生 "starved for data" 的情況，所以 NUMA 就希望每個處理器，都能有個 BUS 連接記憶體，避免壅塞的情形，所以延伸出的做法，每個處理器都有屬於自己的本地記憶體 ( local memory)，這裡講的本地記憶體，是指靠近處理器的(記憶體包括: cache 、 memory) ![](https://i.imgur.com/ViA5XvL.png) 而實際上 NUMA 的架構不一定會是(如上圖)一樣一個處理器對應一個記憶體，有可能是2、4或多個處理器對應一個記憶體。 ![](https://i.imgur.com/eFQXaHv.jpg) 而圖中的記憶體不一定是實際一片記憶體，主要是一個 node 將會對應到一個專屬的記憶體空間，而且這空間也可能不是連續的。 NUMA 對 slab 的影響: 在 NUMA 下除了本地記憶體還可以透過詢問其他處理器使遠方記憶體可以進行配置，使 slab 在尋找 slabs free 變得緩慢，所以在 `kmem_cache_node` 中增加 partial list, full list, empty list，使得 slab allocator 效率整體上升 5% ```shell 修正前 w/o patch Tasks jobs/min jti jobs/min/task real cpu 1 485.36 100 485.3640 11.99 1.91 Sat Apr 30 14:01:51 2005 100 26582.63 88 265.8263 21.89 144.96 Sat Apr 30 14:02:14 2005 200 29866.83 81 149.3342 38.97 286.08 Sat Apr 30 14:02:53 2005 300 33127.16 78 110.4239 52.71 426.54 Sat Apr 30 14:03:46 2005 400 34889.47 80 87.2237 66.72 568.90 Sat Apr 30 14:04:53 2005 500 35654.34 76 71.3087 81.62 714.55 Sat Apr 30 14:06:15 2005 600 36460.83 75 60.7681 95.77 853.42 Sat Apr 30 14:07:51 2005 700 35957.00 75 51.3671 113.30 990.67 Sat Apr 30 14:09:45 2005 800 33380.65 73 41.7258 139.48 1140.86 Sat Apr 30 14:12:05 2005 900 35095.01 76 38.9945 149.25 1281.30 Sat Apr 30 14:14:35 2005 1000 36094.37 74 36.0944 161.24 1419.66 Sat Apr 30 14:17:17 2005 修正後 w/patch Tasks jobs/min jti jobs/min/task real cpu 1 484.27 100 484.2736 12.02 1.93 Sat Apr 30 15:59:45 2005 100 28262.03 90 282.6203 20.59 143.57 Sat Apr 30 16:00:06 2005 200 32246.45 82 161.2322 36.10 282.89 Sat Apr 30 16:00:42 2005 300 37945.80 83 126.4860 46.01 418.75 Sat Apr 30 16:01:28 2005 400 40000.69 81 100.0017 58.20 561.48 Sat Apr 30 16:02:27 2005 500 40976.10 78 81.9522 71.02 696.95 Sat Apr 30 16:03:38 2005 600 41121.54 78 68.5359 84.92 834.86 Sat Apr 30 16:05:04 2005 700 44052.77 78 62.9325 92.48 971.53 Sat Apr 30 16:06:37 2005 800 41066.89 79 51.3336 113.38 1111.15 Sat Apr 30 16:08:31 2005 900 38918.77 79 43.2431 134.59 1252.57 Sat Apr 30 16:10:46 2005 1000 41842.21 76 41.8422 139.09 1392.33 Sat Apr 30 16:13:05 2005 ``` [Homework2 (quiz2)](https://hackmd.io/@WayneLin1992/SkTJzC3G_) ::: ## 03/23 >應該盡可能的和同學互相討論 >不知道可以直接留言詢問同學 >閱讀同學的報告，也可以得到不錯的解答 ## 03/30 ### quiz3 #### `xs` 和 `std::string` >時間並不是那麼重要，應該要觀察 cache misses 的狀況。 >透過實驗來看 cache misses 的狀況 :::info 經過實驗發現到 `std::string` 的 cache misses 為 1.4% ， `xs` 的 cache misses 為 2.1% ，所以 `std::string` 表現較好實際查看 `std::string` 和 `xs` 資料結構相當類似，一樣短字串會存入 stack ，長字串會存入 heap 當中，但 `std::string` 有較多 member 需要存取資料，所以以空間來說 `xs` 表現較好，較省空間。 `std::string` 一樣有 CoW 的表現，對於長字串的複製和 `xs` 有相同表現，短字串也一樣會複製 stack 中的字串。 [Homework3 (quiz3)](https://hackmd.io/@WayneLin1992/rJzpTVImO) ::: ### quiz4 #### 除了 coarse lock 還有什麼發現嗎? :::info 經過我使用 gdb 觀察後修正 coarse lock 的問題，可以避免 data race 的狀況 [Homework4 (quiz4)](https://hackmd.io/@WayneLin1992/r1AIi8hNu) :::