Entropy Memcached Evaluation Results
===
###### tags: `contiguitas` `research`
Results are available here, each tab corresponds to one configuration: https://docs.google.com/spreadsheets/d/1mC1wLibg6uouCWoHcHUG6YWODrs7njj4ybENikdcefE/edit?usp=sharing
Ideas I have not tried yet:
* [ ] Sequential access of benchmark
* [ ] Pipelining the benchmark
* [ ] Compare with 64G results
## General Information
* Environment:
* A compiled mainline Linux kernel
* Swapping is disabled. Therefore, the memory allowance of memcached server should be smaller than system memory, which I set to 240G.
* Port is set to 8888 instead of the default one, in order to avoid any extra traffic.
* Background threads are turned off
* `perf stat -e dTLB-loads,dTLB-load-misses,iTLB-loads,iTLB-load-misses --pid=$(pidof memcached)` is started right before running the client and is killed right after the client finishes.
Configurations for the server are thus the same accross different trials:
```shell!=
taskset -c 0-7 \
/u2/kaiwenx/memcached-1.6.17/memcached -p 8888 \
-c 80000 \ # max 80000 clients at a time
-m 245760 \ # allow 240G memory
-M \ #
-t 8 \ # 8 threads
-o no_lru_maintainer,no_lru_crawler # disable background threads
```
## `memcached-t1-c1-gaussian`
The results are in the `memcached-t1-c1-gaussian` tab. This configuration has only one client sequentially sending all commands to the server.
Reason for using Gaussian: when usage of memory is concentrated, system with huge pages will benefit more from TLB hit.
Contend:
```shell!=
taskset -c 8-15 memtier_benchmark \
-p 8888 \
-P memcache_binary \
-n 'allkeys' \
-c 500 \
-t 8 \
--pipeline=100 \
--ratio=1:0 \
--data-size-pattern=R \
--data-size=512 \
--key-maximum=360000000 \
--key-pattern=P:P
```
Client:
```shell!=
taskset -c 8-15 memtier_benchmark \
-n 5000000 \
-p 8888 \
-P memcache_binary \
-c 1 \
-t 1 \
--ratio=0:1 \
--data-size=512 \
--key-maximum=360000000 \
--key-pattern=G:G
```
## `memcached-t1-c1-uniform`
Same as `memcached-t1-c1-gaussian`, except that the the benchmark's access to key is distributed uniformly.
Contend:
```shell!=
taskset -c 8-15 memtier_benchmark \
-p 8888 \
-P memcache_binary \
-n 'allkeys' \
-c 500 \
-t 8 \
--pipeline=100 \
--ratio=1:0 \
--data-size-pattern=R \
--data-size=512 \
--key-maximum=360000000 \
--key-pattern=P:P
```
Client:
```shell!=
taskset -c 8-15 memtier_benchmark \
-n 5000000 \
-p 8888 \
-P memcache_binary \
-c 1 \
-t 1 \
--ratio=0:1 \
--data-size=512 \
--key-maximum=360000000 \
--key-pattern=R:R
```
## `memcached-t1-c1-gaussian-obj4096`
Using larger objects (larger than 4KB) will force systems without huge pages to use multiple pages to store the same object, thus wasting TLB space.
Contend:
```shell!=
taskset -c 8-15 memtier_benchmark \
-p 8888 \
-P memcache_binary \
-n 'allkeys' \
-c 500 \
-t 8 \
--pipeline=100 \
--ratio=1:0 \
--data-size-pattern=R \
--data-size=4096 \
--key-maximum=50000000 \
--key-pattern=P:P
```
Client:
```shell!=
taskset -c 8-15 memtier_benchmark \
-n 5000000 \
-p 8888 \
-P memcache_binary \
-c 1 \
-t 1 \
--ratio=0:1 \
--data-size=4096 \
--key-maximum=50000000 \
--key-pattern=G:G
```
## `memcached-t8-c100-gaussian-obj4096`
Concurrent accesses is closer to production workload and might hit more TLB misses.
Contend:
```shell!=
taskset -c 8-15 memtier_benchmark \
-p 8888 \
-P memcache_binary \
-n 'allkeys' \
-c 500 \
-t 8 \
--pipeline=100 \
--ratio=1:0 \
--data-size-pattern=R \
--data-size=4096 \
--key-maximum=50000000 \
--key-pattern=P:P
```
Client:
```shell!=
taskset -c 8-15 memtier_benchmark \
-n 100000 \
-p 8888 \
-P memcache_binary \
-c 100 \
-t 8 \
--ratio=0:1 \
--data-size=4096 \
--key-maximum=50000000 \
--key-pattern=G:G
```