rpmalloc benchmark

rpmalloc benchmark === contributed by <`chaingfar`> 系統環境 ---- ``` $ uname -a Linux chasingjar-V5-591G 4.4.0-116-generic #140-Ubuntu SMP Mon Feb 12 21:23:04 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux $ lscpu Architecture: x86_64 CPU 作業模式： 32-bit, 64-bit Byte Order: Little Endian CPU(s): 8 On-line CPU(s) list: 0-7 每核心執行緒數：2 每通訊端核心數：4 Socket(s): 1 NUMA 節點： 1 供應商識別號： GenuineIntel CPU 家族： 6 型號： 94 Model name: Intel(R) Core(TM) i7-6700HQ CPU @ 2.60GHz 製程： 3 CPU MHz： 800.007 CPU max MHz: 3500.0000 CPU min MHz: 800.0000 BogoMIPS: 5183.88 虛擬： VT-x L1d 快取： 32K L1i 快取： 32K L2 快取： 256K L3 快取： 6144K NUMA node0 CPU(s): 0-7 Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf eagerfpu pni pclmulqdq dtes64 monitor ds_cpl vmx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch epb invpcid_single intel_pt retpoline kaiser tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm mpx rdseed adx smap clflushopt xsaveopt xsavec xgetbv1 dtherm ida arat pln pts hwp hwp_notify hwp_act_window hwp_epp $ free total used free shared buff/cache available Mem: 8029896 2599920 3639168 386196 1790808 4751816 置換： 3999740 0 3999740 ``` 重現 [rpmalloc-benchmark](https://github.com/rampantpixels/rpmalloc-benchmark) --- 先試著重現作者所提供的 benchmark 編譯步驟如下可能要先安裝 `ninja` 透過 `sudo apt-get install ninja-build` ```bash $ git clone https://github.com/rampantpixels/rpmalloc-benchmark.git $ cd rpmalloc-benchmark/ $ ./configure.py $ ninja ``` 即可編譯完成，執行檔會在 `bin/` 下可以執行內附的 `runall.sh` ，它會針對 `rpmalloc` 跑各種測試要跟其他 `malloc` 做比較的話就要自己寫 `test.sh` ``` #!/bin/sh for executable in $(ls bin/linux/release/x86-64/benchmark-*); do for threads_count in $(seq 1 10); do $executable $threads_count 0 0 2 20000 50000 5000 16 1000 done done ``` 跑完會產生一大堆檔案，名字格式為 `benchmark-random-<thread_count>-<min_size>-<max_size>-<benchmark_name>.txt` 內容則是 `<memory_ops>,<peak_allocated>,<sample_allocated>,<memory_usage>` ### 隨機範圍16-1000 **記憶體存取指令數(memory_ops)** ![](https://i.imgur.com/Hlo14V5.png) 記憶體存取指令數隨執行緒數量的增長呈指數下降在1~4個執行緒的時候 `rpmalloc` 及 `lockfree-malloc` 明顯高`jemalloc` `tcmalloc` `supermalloc` 但差距隨執行緒數量的增長而減少 **最大記憶體用量(peak_allocated)** ![](https://i.imgur.com/lQjgHKn.png) 可以看出 `lockfree-malloc` 雖然存取快速，但浪費的記憶體也很多 overhead memory rate($\frac{(memory\_usage-sample\_allocated)}{sample\_allocated} \times 100 \%$) ![](https://i.imgur.com/HmlmdK4.png) ### 隨機範圍16-8000 **記憶體存取指令數(memory_ops)** ![](https://i.imgur.com/y4isOwV.png) **最大記憶體用量(peak_allocated)** ![](https://i.imgur.com/rliPy1j.png) **overhead memory rate($\frac{(memory\_usage-sample\_allocated)}{sample\_allocated} \times 100 \%$)** ![](https://i.imgur.com/b2moFY4.png) ### 隨機範圍16-16000 **記憶體存取指令數(memory_ops)** ![](https://i.imgur.com/cpfSUPn.png) **最大記憶體用量(peak_allocated)** ![](https://i.imgur.com/0kFP0Tc.png) **overhead memory rate($\frac{(memory\_usage-sample\_allocated)}{sample\_allocated} \times 100 \%$)** ![](https://i.imgur.com/tajEba6.png) ### 隨機範圍128-64000 **記憶體存取指令數(memory_ops)** ![](https://i.imgur.com/t8kMtRS.png) **最大記憶體用量(peak_allocated)** ![](https://i.imgur.com/uNwbp6w.png) **overhead memory rate($\frac{(memory\_usage-sample\_allocated)}{sample\_allocated} \times 100 \%$)** ![](https://i.imgur.com/6P4EjSB.png) ### 隨機範圍512-160000 **記憶體存取指令數(memory_ops)** ![](https://i.imgur.com/CimA1Vs.png) **最大記憶體用量(peak_allocated)** ![](https://i.imgur.com/Spv0d5k.png) **overhead memory rate($\frac{(memory\_usage-sample\_allocated)}{sample\_allocated} \times 100 \%$)** ![](https://i.imgur.com/QCU3xDG.png)