# GPU BenchMark - K80 vs 1080Ti vs 2080Ti
## benchmark
* Image : [moeaidb/aigo: cu10.1-dnn7.6-gpu-tf-and-pytorch-nlp-19.12](https://hub.docker.com/layers/moeaidb/aigo/cu10.1-dnn7.6-gpu-pytorch-cv-19.12/images/sha256-6f14f7424fd9df52abeb83dae0eb5b0b3434cab5c8ce7c2d2f01566c4f4fb7ef)
* k80 需要compute capability: 3.7,此image有為此需求build tensorflow
* benchmark sample code : [horovard](https://github.com/horovod/horovod/tree/master/examples)
* All tests on resnet50 with synthetic datas and parameters precesion=fp32 , batch-size=32
```bash
nvidia-docker run -it --rm --privileged \
moeaidb/aigo:cu10.1-dnn7.6-gpu-tf-and-pytorch-nlp-19.12 \
horovodrun -np 1 python3 /workspace/horovod_examples/tensorflow2_synthetic_benchmark.py \
--batch-size=32
```
## K80
* 1 x GPU card
```bash
[1,0]<stderr>:2020-01-02 03:43:46.755835: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1241] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 10706 MB memory) -> physical GPU (device: 0, name: Tesla K80, pci bus id: 0000:06:00.0, compute capability: 3.7)
[1,0]<stdout>:Model: ResNet50
[1,0]<stdout>:Batch size: 32
[1,0]<stdout>:Number of GPUs: 1
[1,0]<stdout>:Running warmup...
[1,0]<stderr>:2020-01-02 03:43:56.676184: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10
[1,0]<stderr>:2020-01-02 03:43:56.838534: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
[1,0]<stdout>:Running benchmark...
[1,0]<stdout>:Iter #0: 53.2 img/sec per GPU
[1,0]<stdout>:Iter #1: 52.9 img/sec per GPU
[1,0]<stdout>:Iter #2: 52.9 img/sec per GPU
[1,0]<stdout>:Iter #3: 52.9 img/sec per GPU
[1,0]<stdout>:Iter #4: 52.9 img/sec per GPU
[1,0]<stdout>:Iter #5: 52.9 img/sec per GPU
[1,0]<stdout>:Iter #6: 52.9 img/sec per GPU
[1,0]<stdout>:Iter #7: 52.8 img/sec per GPU
[1,0]<stdout>:Iter #8: 52.8 img/sec per GPU
[1,0]<stdout>:Iter #9: 52.8 img/sec per GPU
[1,0]<stdout>:Img/sec per GPU: 52.9 +-0.2
[1,0]<stdout>:Total img/sec on 1 GPU(s): 52.9 +-0.2
```
* 2 x GPU cards
```bash
[1,1]<stderr>:2020-01-02 03:53:53.181006: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1241] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 10706 MB memory) -> physical GPU (device: 1, name: Tesla K80, pci bus id: 0000:07:00.0, compute capability: 3.7)
[1,0]<stdout>:Model: ResNet50
[1,0]<stdout>:Batch size: 32
[1,0]<stdout>:Number of GPUs: 2
[1,0]<stdout>:Running warmup...
[1,0]<stderr>:2020-01-02 03:54:05.644574: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10
[1,0]<stderr>:2020-01-02 03:54:05.807345: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
[1,1]<stderr>:2020-01-02 03:54:05.875452: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10
[1,1]<stderr>:2020-01-02 03:54:06.044797: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
[1,0]<stdout>:Running benchmark...
[1,0]<stdout>:Iter #0: 50.4 img/sec per GPU
[1,0]<stdout>:Iter #1: 50.6 img/sec per GPU
[1,0]<stdout>:Iter #2: 50.4 img/sec per GPU
[1,0]<stdout>:Iter #3: 50.6 img/sec per GPU
[1,0]<stdout>:Iter #4: 50.6 img/sec per GPU
[1,0]<stdout>:Iter #5: 50.5 img/sec per GPU
[1,0]<stdout>:Iter #6: 50.5 img/sec per GPU
[1,0]<stdout>:Iter #7: 50.4 img/sec per GPU
[1,0]<stdout>:Iter #8: 50.4 img/sec per GPU
[1,0]<stdout>:Iter #9: 50.3 img/sec per GPU
[1,0]<stdout>:Img/sec per GPU: 50.5 +-0.2
[1,0]<stdout>:Total img/sec on 2 GPU(s): 100.9 +-0.4
```
P.S. K80 一張實體卡片有*2 x GPUs*,因此一張實體卡測試數據為 *2 x GPUs* 數據 img/sec on 2 GPU(s): 100.9 +-0.4
## 1080Ti
```bash
[1,0]<stderr>:2020-01-02 04:36:59.132343: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1241] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 10326 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1080 Ti, pci bus id: 0000:2f:00.0, compute capability: 6.1)
[1,0]<stdout>:Model: ResNet50
[1,0]<stdout>:Batch size: 32
[1,0]<stdout>:Number of GPUs: 1
[1,0]<stdout>:Running warmup...
[1,0]<stderr>:2020-01-02 04:37:06.840423: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10
[1,0]<stderr>:2020-01-02 04:37:07.092167: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
[1,0]<stdout>:Running benchmark...
[1,0]<stdout>:Iter #0: 186.5 img/sec per GPU
[1,0]<stdout>:Iter #1: 184.9 img/sec per GPU
[1,0]<stdout>:Iter #2: 185.0 img/sec per GPU
[1,0]<stdout>:Iter #3: 184.9 img/sec per GPU
[1,0]<stdout>:Iter #4: 184.8 img/sec per GPU
[1,0]<stdout>:Iter #5: 184.7 img/sec per GPU
[1,0]<stdout>:Iter #6: 184.7 img/sec per GPU
[1,0]<stdout>:Iter #7: 184.8 img/sec per GPU
[1,0]<stdout>:Iter #8: 185.0 img/sec per GPU
[1,0]<stdout>:Iter #9: 184.9 img/sec per GPU
[1,0]<stdout>:Img/sec per GPU: 185.0 +-1.0
[1,0]<stdout>:Total img/sec on 1 GPU(s): 185.0 +-1.0
```
## 2080Ti
```bash
[1,0]<stderr>:2020-01-02 03:36:10.284464: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1241] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 10123 MB memory) -> physical GPU (device: 0, name: GeForce RTX 2080 Ti, pci bus id: 0000:41:00.0, compute capability: 7.5)
[1,0]<stdout>:Model: ResNet50
[1,0]<stdout>:Batch size: 32
[1,0]<stdout>:Number of GPUs: 1
[1,0]<stdout>:Running warmup...
[1,0]<stderr>:2020-01-02 03:36:24.228832: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10
[1,0]<stderr>:2020-01-02 03:36:24.473625: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
[1,0]<stdout>:Running benchmark...
[1,0]<stdout>:Iter #0: 274.4 img/sec per GPU
[1,0]<stdout>:Iter #1: 267.5 img/sec per GPU
[1,0]<stdout>:Iter #2: 265.9 img/sec per GPU
[1,0]<stdout>:Iter #3: 267.1 img/sec per GPU
[1,0]<stdout>:Iter #4: 267.1 img/sec per GPU
[1,0]<stdout>:Iter #5: 267.0 img/sec per GPU
[1,0]<stdout>:Iter #6: 266.6 img/sec per GPU
[1,0]<stdout>:Iter #7: 267.3 img/sec per GPU
[1,0]<stdout>:Iter #8: 265.0 img/sec per GPU
[1,0]<stdout>:Iter #9: 266.2 img/sec per GPU
[1,0]<stdout>:Img/sec per GPU: 267.4 +-4.8
[1,0]<stdout>:Total img/sec on 1 GPU(s): 267.4 +-4.8
```
## plot
* K80 vs 1080Ti vs 2080Ti
![](https://i.imgur.com/fASaGsP.png)
* K80 (1,2,4,8,16 x GPU cards)
![](https://i.imgur.com/EDys3j2.png)
* K80 16 x GPU cards - nvidia-smi
![](https://i.imgur.com/DiSh8Xf.png)