# Multi-GPU Benchmark: DGX-1(8X Tesla V100 NVLINK), MXNET, NV Container17.09, CIFAR10 Dataset, 2017/10/07)
* [CIFAR10 Dataset](https://www.cs.toronto.edu/~kriz/cifar.html)
* links of the dataset: [cifar10_train.rec](http://data.mxnet.io/data/cifar10/cifar10_train.rec), [cifar10_val.rec](http://data.mxnet.io/data/cifar10/cifar10_val.rec)
* [MXNET's CIFAR10 sample code usage explanation](https://github.com/apache/incubator-mxnet/tree/master/example/image-classification)
* NVIDIA's MXNET CIFAR10 Benchmark (for Tesla P100)
![P100 CIFAR10 Benchmark](https://www.nvidia.com/content/dam/en-zz/Solutions/Data-Center/mxnet/data-center-gpu-ready-app-mxnet-chart-cifar10-843-u.jpg)
the above figure is from [Here](https://www.nvidia.com/en-us/data-center/gpu-accelerated-applications/mxnet/).
* The following tests are run directly within NV's docker container (image: nvcr.io/nvidia/mxnet:17.09). Versions of Python/MXNET:
```bash
root@5480a386d45f:/opt/mxnet/example/image-classification# python
Python 2.7.12 (default, Nov 19 2016, 06:48:10)
[GCC 5.4.0 20160609] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import mxnet
>>> mxnet.__version__
'0.11.0'
```
## Procedure of this Benchmark
1. switch to the MXNET docker container
```bash
nvidia-docker run -it -p 8888:8888 --shm-size=1g --ulimit memlock=-1 --ulimit stack=67108864 --name mxnet --rm -v /raid/data:/workspace/examples/image-classification/data:cached -v /raid:/raid:cached nvcr.io/nvidia/mxnet:17.09
```
2. perform some multi-GPU tests
```bash
#!/bin/bash
cd /workspace/examples/image-classification
framework=mxnet_nv_1709
batch_size=128
for precision in float16 float32
do
for num_gpu in 1 2 4 8
do
echo "number of GPU=$num_gpu"
echo "precision=$precision"
echo "batch size per GPU=$batch_size"
python train_cifar10.py \
--network resnet \
--num-layers 110 \
--data-nthreads 20 \
--batch-size $(($batch_size*$num_gpu)) \
--gpus $(seq -s , 0 $(($num_gpu-1))) \
--num-epochs 10 \
--dtype $precision \
2> /raid/test_${framework}_numGPU${num_gpu}_${precision}.txt
done
done
```
```bash
#!/bin/bash
# quick fix: use GPU0,1,3,4 for DGX-1
cd /workspace/examples/image-classification
framework=mxnet_nv_1709
batch_size=128
for precision in float16 float32
do
for num_gpu in 4
do
echo "number of GPU=$num_gpu"
echo "precision=$precision"
echo "batch size per GPU=$batch_size"
python train_cifar10.py \
--network resnet \
--num-layers 110 \
--data-nthreads 20 \
--batch-size $(($batch_size*$num_gpu)) \
--gpus $(seq -s , 0 $(($num_gpu-1))) \
--num-epochs 10 \
--dtype $precision \
2> /raid/test_${framework}_numGPU${num_gpu}_${precision}_topology_considered.txt
done
done
```
3. show test results
```bash
#!/bin/bash
framework=mxnet_nv_1709
for num_gpu in 1 2 4 8
do
for precision in float16 float32
do
echo "number of GPU=$num_gpu"
echo "precision=$precision"
tail -n 100 /raid/test_${framework}_numGPU${num_gpu}_${precision}.txt
echo ""
done
done
```
## Results of this Benchmark
### number of GPU=1, precision=FP16
```bash
INFO:root:Epoch[5] Batch [220] Speed: 1636.16 samples/sec accuracy=0.735547
INFO:root:Epoch[5] Batch [240] Speed: 1638.92 samples/sec accuracy=0.718359
INFO:root:Epoch[5] Batch [260] Speed: 1664.47 samples/sec accuracy=0.725781
INFO:root:Epoch[5] Batch [280] Speed: 1651.72 samples/sec accuracy=0.708203
INFO:root:Epoch[5] Batch [300] Speed: 1659.69 samples/sec accuracy=0.726172
INFO:root:Epoch[5] Batch [320] Speed: 1676.59 samples/sec accuracy=0.739844
INFO:root:Epoch[5] Batch [340] Speed: 1652.22 samples/sec accuracy=0.741406
INFO:root:Epoch[5] Batch [360] Speed: 1657.70 samples/sec accuracy=0.743750
INFO:root:Epoch[5] Batch [380] Speed: 1636.72 samples/sec accuracy=0.720703
INFO:root:Epoch[5] Train-accuracy=0.741319
INFO:root:Epoch[5] Time cost=30.112
INFO:root:Epoch[5] Validation-accuracy=0.690505
INFO:root:Epoch[6] Batch [20] Speed: 1679.44 samples/sec accuracy=0.738095
INFO:root:Epoch[6] Batch [40] Speed: 1679.10 samples/sec accuracy=0.729297
INFO:root:Epoch[6] Batch [60] Speed: 1642.05 samples/sec accuracy=0.725781
INFO:root:Epoch[6] Batch [80] Speed: 1644.09 samples/sec accuracy=0.745313
INFO:root:Epoch[6] Batch [100] Speed: 1638.36 samples/sec accuracy=0.750391
INFO:root:Epoch[6] Batch [120] Speed: 1670.52 samples/sec accuracy=0.734766
INFO:root:Epoch[6] Batch [140] Speed: 1659.12 samples/sec accuracy=0.751563
INFO:root:Epoch[6] Batch [160] Speed: 1652.69 samples/sec accuracy=0.734375
INFO:root:Epoch[6] Batch [180] Speed: 1660.45 samples/sec accuracy=0.751953
INFO:root:Epoch[6] Batch [200] Speed: 1664.72 samples/sec accuracy=0.728906
INFO:root:Epoch[6] Batch [220] Speed: 1669.46 samples/sec accuracy=0.743750
INFO:root:Epoch[6] Batch [240] Speed: 1666.31 samples/sec accuracy=0.749219
INFO:root:Epoch[6] Batch [260] Speed: 1644.34 samples/sec accuracy=0.745703
INFO:root:Epoch[6] Batch [280] Speed: 1656.35 samples/sec accuracy=0.750391
INFO:root:Epoch[6] Batch [300] Speed: 1642.49 samples/sec accuracy=0.743359
INFO:root:Epoch[6] Batch [320] Speed: 1659.47 samples/sec accuracy=0.769141
INFO:root:Epoch[6] Batch [340] Speed: 1647.00 samples/sec accuracy=0.753516
INFO:root:Epoch[6] Batch [360] Speed: 1667.83 samples/sec accuracy=0.762891
INFO:root:Epoch[6] Batch [380] Speed: 1665.73 samples/sec accuracy=0.741016
INFO:root:Epoch[6] Train-accuracy=0.757812
INFO:root:Epoch[6] Time cost=30.123
INFO:root:Epoch[6] Validation-accuracy=0.767127
INFO:root:Epoch[7] Batch [20] Speed: 1678.21 samples/sec accuracy=0.748512
INFO:root:Epoch[7] Batch [40] Speed: 1678.75 samples/sec accuracy=0.759375
INFO:root:Epoch[7] Batch [60] Speed: 1684.53 samples/sec accuracy=0.755078
INFO:root:Epoch[7] Batch [80] Speed: 1678.94 samples/sec accuracy=0.759375
INFO:root:Epoch[7] Batch [100] Speed: 1657.94 samples/sec accuracy=0.767188
INFO:root:Epoch[7] Batch [120] Speed: 1669.34 samples/sec accuracy=0.750781
INFO:root:Epoch[7] Batch [140] Speed: 1650.13 samples/sec accuracy=0.766797
INFO:root:Epoch[7] Batch [160] Speed: 1647.68 samples/sec accuracy=0.759375
INFO:root:Epoch[7] Batch [180] Speed: 1653.50 samples/sec accuracy=0.759766
INFO:root:Epoch[7] Batch [200] Speed: 1671.70 samples/sec accuracy=0.747656
INFO:root:Epoch[7] Batch [220] Speed: 1659.03 samples/sec accuracy=0.757812
INFO:root:Epoch[7] Batch [240] Speed: 1648.86 samples/sec accuracy=0.760156
INFO:root:Epoch[7] Batch [260] Speed: 1660.06 samples/sec accuracy=0.752344
INFO:root:Epoch[7] Batch [280] Speed: 1659.86 samples/sec accuracy=0.765625
INFO:root:Epoch[7] Batch [300] Speed: 1661.02 samples/sec accuracy=0.758984
INFO:root:Epoch[7] Batch [320] Speed: 1669.57 samples/sec accuracy=0.769531
INFO:root:Epoch[7] Batch [340] Speed: 1661.76 samples/sec accuracy=0.771875
INFO:root:Epoch[7] Batch [360] Speed: 1674.96 samples/sec accuracy=0.768750
INFO:root:Epoch[7] Batch [380] Speed: 1712.63 samples/sec accuracy=0.759375
INFO:root:Epoch[7] Train-accuracy=0.762153
INFO:root:Epoch[7] Time cost=29.877
INFO:root:Epoch[7] Validation-accuracy=0.782051
INFO:root:Epoch[8] Batch [20] Speed: 1681.09 samples/sec accuracy=0.764881
INFO:root:Epoch[8] Batch [40] Speed: 1639.31 samples/sec accuracy=0.761328
INFO:root:Epoch[8] Batch [60] Speed: 1637.03 samples/sec accuracy=0.770312
INFO:root:Epoch[8] Batch [80] Speed: 1661.73 samples/sec accuracy=0.771094
INFO:root:Epoch[8] Batch [100] Speed: 1664.35 samples/sec accuracy=0.757031
INFO:root:Epoch[8] Batch [120] Speed: 1647.86 samples/sec accuracy=0.753516
INFO:root:Epoch[8] Batch [140] Speed: 1654.57 samples/sec accuracy=0.769922
INFO:root:Epoch[8] Batch [160] Speed: 1655.10 samples/sec accuracy=0.778906
INFO:root:Epoch[8] Batch [180] Speed: 1657.54 samples/sec accuracy=0.761719
INFO:root:Epoch[8] Batch [200] Speed: 1661.52 samples/sec accuracy=0.778906
INFO:root:Epoch[8] Batch [220] Speed: 1638.52 samples/sec accuracy=0.760156
INFO:root:Epoch[8] Batch [240] Speed: 1650.54 samples/sec accuracy=0.771484
INFO:root:Epoch[8] Batch [260] Speed: 1648.50 samples/sec accuracy=0.768750
INFO:root:Epoch[8] Batch [280] Speed: 1641.08 samples/sec accuracy=0.776563
INFO:root:Epoch[8] Batch [300] Speed: 1637.01 samples/sec accuracy=0.776563
INFO:root:Epoch[8] Batch [320] Speed: 1647.48 samples/sec accuracy=0.780469
INFO:root:Epoch[8] Batch [340] Speed: 1633.75 samples/sec accuracy=0.784375
INFO:root:Epoch[8] Batch [360] Speed: 1654.11 samples/sec accuracy=0.781641
INFO:root:Epoch[8] Batch [380] Speed: 1657.40 samples/sec accuracy=0.757031
INFO:root:Epoch[8] Train-accuracy=0.779687
INFO:root:Epoch[8] Time cost=30.315
INFO:root:Epoch[8] Validation-accuracy=0.779371
INFO:root:Epoch[9] Batch [20] Speed: 1657.51 samples/sec accuracy=0.773065
INFO:root:Epoch[9] Batch [40] Speed: 1666.89 samples/sec accuracy=0.768359
INFO:root:Epoch[9] Batch [60] Speed: 1663.53 samples/sec accuracy=0.780859
INFO:root:Epoch[9] Batch [80] Speed: 1658.68 samples/sec accuracy=0.778125
INFO:root:Epoch[9] Batch [100] Speed: 1661.43 samples/sec accuracy=0.779297
INFO:root:Epoch[9] Batch [120] Speed: 1662.35 samples/sec accuracy=0.762500
INFO:root:Epoch[9] Batch [140] Speed: 1620.34 samples/sec accuracy=0.785547
INFO:root:Epoch[9] Batch [160] Speed: 1639.00 samples/sec accuracy=0.800000
INFO:root:Epoch[9] Batch [180] Speed: 1665.13 samples/sec accuracy=0.773828
INFO:root:Epoch[9] Batch [200] Speed: 1653.24 samples/sec accuracy=0.766406
INFO:root:Epoch[9] Batch [220] Speed: 1641.22 samples/sec accuracy=0.776172
INFO:root:Epoch[9] Batch [240] Speed: 1635.84 samples/sec accuracy=0.787500
INFO:root:Epoch[9] Batch [260] Speed: 1662.29 samples/sec accuracy=0.784375
INFO:root:Epoch[9] Batch [280] Speed: 1670.61 samples/sec accuracy=0.777344
INFO:root:Epoch[9] Batch [300] Speed: 1665.69 samples/sec accuracy=0.785156
INFO:root:Epoch[9] Batch [320] Speed: 1643.91 samples/sec accuracy=0.786719
INFO:root:Epoch[9] Batch [340] Speed: 1658.62 samples/sec accuracy=0.792188
INFO:root:Epoch[9] Batch [360] Speed: 1675.86 samples/sec accuracy=0.792188
INFO:root:Epoch[9] Batch [380] Speed: 1660.30 samples/sec accuracy=0.774609
INFO:root:Epoch[9] Train-accuracy=0.790625
INFO:root:Epoch[9] Time cost=30.181
INFO:root:Epoch[9] Validation-accuracy=0.793870
```
### number of GPU=1, precision=FP32
```bash
INFO:root:Epoch[5] Batch [220] Speed: 2107.87 samples/sec accuracy=0.717578
INFO:root:Epoch[5] Batch [240] Speed: 2117.73 samples/sec accuracy=0.730469
INFO:root:Epoch[5] Batch [260] Speed: 2080.35 samples/sec accuracy=0.737109
INFO:root:Epoch[5] Batch [280] Speed: 2112.54 samples/sec accuracy=0.717969
INFO:root:Epoch[5] Batch [300] Speed: 2074.36 samples/sec accuracy=0.728125
INFO:root:Epoch[5] Batch [320] Speed: 2121.36 samples/sec accuracy=0.732812
INFO:root:Epoch[5] Batch [340] Speed: 2123.88 samples/sec accuracy=0.741406
INFO:root:Epoch[5] Batch [360] Speed: 2079.25 samples/sec accuracy=0.745703
INFO:root:Epoch[5] Batch [380] Speed: 2099.69 samples/sec accuracy=0.725391
INFO:root:Epoch[5] Train-accuracy=0.731771
INFO:root:Epoch[5] Time cost=23.639
INFO:root:Epoch[5] Validation-accuracy=0.720653
INFO:root:Epoch[6] Batch [20] Speed: 2077.08 samples/sec accuracy=0.738839
INFO:root:Epoch[6] Batch [40] Speed: 2097.13 samples/sec accuracy=0.739062
INFO:root:Epoch[6] Batch [60] Speed: 2095.61 samples/sec accuracy=0.737891
INFO:root:Epoch[6] Batch [80] Speed: 2107.52 samples/sec accuracy=0.727344
INFO:root:Epoch[6] Batch [100] Speed: 2108.11 samples/sec accuracy=0.737891
INFO:root:Epoch[6] Batch [120] Speed: 2121.61 samples/sec accuracy=0.723828
INFO:root:Epoch[6] Batch [140] Speed: 2119.10 samples/sec accuracy=0.745703
INFO:root:Epoch[6] Batch [160] Speed: 2108.44 samples/sec accuracy=0.758594
INFO:root:Epoch[6] Batch [180] Speed: 2116.13 samples/sec accuracy=0.731641
INFO:root:Epoch[6] Batch [200] Speed: 2101.70 samples/sec accuracy=0.740234
INFO:root:Epoch[6] Batch [220] Speed: 2108.43 samples/sec accuracy=0.750000
INFO:root:Epoch[6] Batch [240] Speed: 2090.88 samples/sec accuracy=0.749219
INFO:root:Epoch[6] Batch [260] Speed: 2148.97 samples/sec accuracy=0.755469
INFO:root:Epoch[6] Batch [280] Speed: 2122.69 samples/sec accuracy=0.750781
INFO:root:Epoch[6] Batch [300] Speed: 2139.50 samples/sec accuracy=0.756250
INFO:root:Epoch[6] Batch [320] Speed: 2116.02 samples/sec accuracy=0.764844
INFO:root:Epoch[6] Batch [340] Speed: 2111.96 samples/sec accuracy=0.762891
INFO:root:Epoch[6] Batch [360] Speed: 2125.57 samples/sec accuracy=0.762891
INFO:root:Epoch[6] Batch [380] Speed: 2101.92 samples/sec accuracy=0.753906
INFO:root:Epoch[6] Train-accuracy=0.737500
INFO:root:Epoch[6] Time cost=23.664
INFO:root:Epoch[6] Validation-accuracy=0.734776
INFO:root:Epoch[7] Batch [20] Speed: 2133.05 samples/sec accuracy=0.750744
INFO:root:Epoch[7] Batch [40] Speed: 2104.17 samples/sec accuracy=0.758594
INFO:root:Epoch[7] Batch [60] Speed: 2086.90 samples/sec accuracy=0.746094
INFO:root:Epoch[7] Batch [80] Speed: 2095.71 samples/sec accuracy=0.747656
INFO:root:Epoch[7] Batch [100] Speed: 2089.72 samples/sec accuracy=0.756641
INFO:root:Epoch[7] Batch [120] Speed: 2117.08 samples/sec accuracy=0.749609
INFO:root:Epoch[7] Batch [140] Speed: 2111.42 samples/sec accuracy=0.763672
INFO:root:Epoch[7] Batch [160] Speed: 2104.23 samples/sec accuracy=0.769141
INFO:root:Epoch[7] Batch [180] Speed: 2117.99 samples/sec accuracy=0.755078
INFO:root:Epoch[7] Batch [200] Speed: 2121.33 samples/sec accuracy=0.760156
INFO:root:Epoch[7] Batch [220] Speed: 2106.63 samples/sec accuracy=0.758984
INFO:root:Epoch[7] Batch [240] Speed: 2102.36 samples/sec accuracy=0.762891
INFO:root:Epoch[7] Batch [260] Speed: 2096.69 samples/sec accuracy=0.767188
INFO:root:Epoch[7] Batch [280] Speed: 2099.93 samples/sec accuracy=0.766016
INFO:root:Epoch[7] Batch [300] Speed: 2113.95 samples/sec accuracy=0.764062
INFO:root:Epoch[7] Batch [320] Speed: 2103.36 samples/sec accuracy=0.781641
INFO:root:Epoch[7] Batch [340] Speed: 2116.70 samples/sec accuracy=0.769141
INFO:root:Epoch[7] Batch [360] Speed: 2106.37 samples/sec accuracy=0.773438
INFO:root:Epoch[7] Batch [380] Speed: 2109.77 samples/sec accuracy=0.755078
INFO:root:Epoch[7] Train-accuracy=0.771701
INFO:root:Epoch[7] Time cost=23.665
INFO:root:Epoch[7] Validation-accuracy=0.760216
INFO:root:Epoch[8] Batch [20] Speed: 2115.81 samples/sec accuracy=0.774554
INFO:root:Epoch[8] Batch [40] Speed: 2105.95 samples/sec accuracy=0.766016
INFO:root:Epoch[8] Batch [60] Speed: 2109.37 samples/sec accuracy=0.764844
INFO:root:Epoch[8] Batch [80] Speed: 2094.50 samples/sec accuracy=0.771875
INFO:root:Epoch[8] Batch [100] Speed: 2126.98 samples/sec accuracy=0.773828
INFO:root:Epoch[8] Batch [120] Speed: 2119.86 samples/sec accuracy=0.771875
INFO:root:Epoch[8] Batch [140] Speed: 2126.34 samples/sec accuracy=0.779297
INFO:root:Epoch[8] Batch [160] Speed: 2122.38 samples/sec accuracy=0.783203
INFO:root:Epoch[8] Batch [180] Speed: 2080.03 samples/sec accuracy=0.766406
INFO:root:Epoch[8] Batch [200] Speed: 2090.88 samples/sec accuracy=0.775781
INFO:root:Epoch[8] Batch [220] Speed: 2103.53 samples/sec accuracy=0.769531
INFO:root:Epoch[8] Batch [240] Speed: 2096.92 samples/sec accuracy=0.773828
INFO:root:Epoch[8] Batch [260] Speed: 2085.13 samples/sec accuracy=0.780078
INFO:root:Epoch[8] Batch [280] Speed: 2109.69 samples/sec accuracy=0.769922
INFO:root:Epoch[8] Batch [300] Speed: 2122.65 samples/sec accuracy=0.782031
INFO:root:Epoch[8] Batch [320] Speed: 2116.10 samples/sec accuracy=0.785156
INFO:root:Epoch[8] Batch [340] Speed: 2099.77 samples/sec accuracy=0.794141
INFO:root:Epoch[8] Batch [360] Speed: 2117.69 samples/sec accuracy=0.783984
INFO:root:Epoch[8] Batch [380] Speed: 2106.19 samples/sec accuracy=0.776172
INFO:root:Epoch[8] Train-accuracy=0.775781
INFO:root:Epoch[8] Time cost=23.761
INFO:root:Epoch[8] Validation-accuracy=0.761966
INFO:root:Epoch[9] Batch [20] Speed: 2139.77 samples/sec accuracy=0.779762
INFO:root:Epoch[9] Batch [40] Speed: 2131.84 samples/sec accuracy=0.783984
INFO:root:Epoch[9] Batch [60] Speed: 2124.51 samples/sec accuracy=0.780469
INFO:root:Epoch[9] Batch [80] Speed: 2088.46 samples/sec accuracy=0.770703
INFO:root:Epoch[9] Batch [100] Speed: 2116.31 samples/sec accuracy=0.790234
INFO:root:Epoch[9] Batch [120] Speed: 2102.47 samples/sec accuracy=0.766016
INFO:root:Epoch[9] Batch [140] Speed: 2107.50 samples/sec accuracy=0.776953
INFO:root:Epoch[9] Batch [160] Speed: 2107.15 samples/sec accuracy=0.785937
INFO:root:Epoch[9] Batch [180] Speed: 2113.62 samples/sec accuracy=0.778906
INFO:root:Epoch[9] Batch [200] Speed: 2078.67 samples/sec accuracy=0.781250
INFO:root:Epoch[9] Batch [220] Speed: 2064.87 samples/sec accuracy=0.784766
INFO:root:Epoch[9] Batch [240] Speed: 2076.29 samples/sec accuracy=0.788281
INFO:root:Epoch[9] Batch [260] Speed: 2026.08 samples/sec accuracy=0.792969
INFO:root:Epoch[9] Batch [280] Speed: 2076.42 samples/sec accuracy=0.790234
INFO:root:Epoch[9] Batch [300] Speed: 2110.56 samples/sec accuracy=0.782813
INFO:root:Epoch[9] Batch [320] Speed: 2102.36 samples/sec accuracy=0.797656
INFO:root:Epoch[9] Batch [340] Speed: 2107.22 samples/sec accuracy=0.795703
INFO:root:Epoch[9] Batch [360] Speed: 2116.33 samples/sec accuracy=0.796484
INFO:root:Epoch[9] Batch [380] Speed: 2108.32 samples/sec accuracy=0.787891
INFO:root:Epoch[9] Train-accuracy=0.781250
INFO:root:Epoch[9] Time cost=23.804
INFO:root:Epoch[9] Validation-accuracy=0.785457
```
### number of GPU=2, precision=FP16
```bash
INFO:root:Epoch[1] Batch [180] Speed: 2901.34 samples/sec accuracy=0.461328
INFO:root:Epoch[1] Train-accuracy=0.453962
INFO:root:Epoch[1] Time cost=17.190
INFO:root:Epoch[1] Validation-accuracy=0.486278
INFO:root:Epoch[2] Batch [20] Speed: 2892.09 samples/sec accuracy=0.457961
INFO:root:Epoch[2] Batch [40] Speed: 2889.07 samples/sec accuracy=0.471289
INFO:root:Epoch[2] Batch [60] Speed: 2893.72 samples/sec accuracy=0.483008
INFO:root:Epoch[2] Batch [80] Speed: 2881.23 samples/sec accuracy=0.487109
INFO:root:Epoch[2] Batch [100] Speed: 2842.24 samples/sec accuracy=0.496680
INFO:root:Epoch[2] Batch [120] Speed: 2865.04 samples/sec accuracy=0.508398
INFO:root:Epoch[2] Batch [140] Speed: 2909.31 samples/sec accuracy=0.521094
INFO:root:Epoch[2] Batch [160] Speed: 2923.13 samples/sec accuracy=0.537500
INFO:root:Epoch[2] Batch [180] Speed: 2864.52 samples/sec accuracy=0.539258
INFO:root:Epoch[2] Train-accuracy=0.527902
INFO:root:Epoch[2] Time cost=17.268
INFO:root:Epoch[2] Validation-accuracy=0.548478
INFO:root:Epoch[3] Batch [20] Speed: 2927.58 samples/sec accuracy=0.545945
INFO:root:Epoch[3] Batch [40] Speed: 2891.57 samples/sec accuracy=0.551172
INFO:root:Epoch[3] Batch [60] Speed: 2883.61 samples/sec accuracy=0.566602
INFO:root:Epoch[3] Batch [80] Speed: 2812.99 samples/sec accuracy=0.560352
INFO:root:Epoch[3] Batch [100] Speed: 2826.40 samples/sec accuracy=0.575781
INFO:root:Epoch[3] Batch [120] Speed: 2849.50 samples/sec accuracy=0.572266
INFO:root:Epoch[3] Batch [140] Speed: 2886.62 samples/sec accuracy=0.587305
INFO:root:Epoch[3] Batch [160] Speed: 2901.33 samples/sec accuracy=0.598437
INFO:root:Epoch[3] Batch [180] Speed: 2897.88 samples/sec accuracy=0.602148
INFO:root:Epoch[3] Train-accuracy=0.592708
INFO:root:Epoch[3] Time cost=17.389
INFO:root:Epoch[3] Validation-accuracy=0.544071
INFO:root:Epoch[4] Batch [20] Speed: 2917.17 samples/sec accuracy=0.616815
INFO:root:Epoch[4] Batch [40] Speed: 2876.62 samples/sec accuracy=0.610352
INFO:root:Epoch[4] Batch [60] Speed: 2913.68 samples/sec accuracy=0.616602
INFO:root:Epoch[4] Batch [80] Speed: 2880.96 samples/sec accuracy=0.614258
INFO:root:Epoch[4] Batch [100] Speed: 2910.84 samples/sec accuracy=0.629297
INFO:root:Epoch[4] Batch [120] Speed: 2890.28 samples/sec accuracy=0.631641
INFO:root:Epoch[4] Batch [140] Speed: 2897.16 samples/sec accuracy=0.636523
INFO:root:Epoch[4] Batch [160] Speed: 2860.85 samples/sec accuracy=0.634570
INFO:root:Epoch[4] Batch [180] Speed: 2844.21 samples/sec accuracy=0.660156
INFO:root:Epoch[4] Train-accuracy=0.644252
INFO:root:Epoch[4] Time cost=17.231
INFO:root:Epoch[4] Validation-accuracy=0.675080
INFO:root:Epoch[5] Batch [20] Speed: 2899.83 samples/sec accuracy=0.665551
INFO:root:Epoch[5] Batch [40] Speed: 2887.09 samples/sec accuracy=0.657422
INFO:root:Epoch[5] Batch [60] Speed: 2883.18 samples/sec accuracy=0.656055
INFO:root:Epoch[5] Batch [80] Speed: 2851.43 samples/sec accuracy=0.670508
INFO:root:Epoch[5] Batch [100] Speed: 2902.55 samples/sec accuracy=0.656445
INFO:root:Epoch[5] Batch [120] Speed: 2861.41 samples/sec accuracy=0.671875
INFO:root:Epoch[5] Batch [140] Speed: 2865.42 samples/sec accuracy=0.675000
INFO:root:Epoch[5] Batch [160] Speed: 2865.88 samples/sec accuracy=0.685937
INFO:root:Epoch[5] Batch [180] Speed: 2878.85 samples/sec accuracy=0.683594
INFO:root:Epoch[5] Train-accuracy=0.675502
INFO:root:Epoch[5] Time cost=17.288
INFO:root:Epoch[5] Validation-accuracy=0.705228
INFO:root:Epoch[6] Batch [20] Speed: 2903.00 samples/sec accuracy=0.704985
INFO:root:Epoch[6] Batch [40] Speed: 2856.29 samples/sec accuracy=0.687695
INFO:root:Epoch[6] Batch [60] Speed: 2859.29 samples/sec accuracy=0.693164
INFO:root:Epoch[6] Batch [80] Speed: 2867.49 samples/sec accuracy=0.699023
INFO:root:Epoch[6] Batch [100] Speed: 2869.12 samples/sec accuracy=0.693555
INFO:root:Epoch[6] Batch [120] Speed: 2891.07 samples/sec accuracy=0.705664
INFO:root:Epoch[6] Batch [140] Speed: 2855.68 samples/sec accuracy=0.703711
INFO:root:Epoch[6] Batch [160] Speed: 2889.05 samples/sec accuracy=0.712109
INFO:root:Epoch[6] Batch [180] Speed: 2860.82 samples/sec accuracy=0.713672
INFO:root:Epoch[6] Train-accuracy=0.705208
INFO:root:Epoch[6] Time cost=17.409
INFO:root:Epoch[6] Validation-accuracy=0.721554
INFO:root:Epoch[7] Batch [20] Speed: 2930.15 samples/sec accuracy=0.725632
INFO:root:Epoch[7] Batch [40] Speed: 2885.11 samples/sec accuracy=0.725195
INFO:root:Epoch[7] Batch [60] Speed: 2901.77 samples/sec accuracy=0.721289
INFO:root:Epoch[7] Batch [80] Speed: 2856.42 samples/sec accuracy=0.719727
INFO:root:Epoch[7] Batch [100] Speed: 2884.57 samples/sec accuracy=0.724805
INFO:root:Epoch[7] Batch [120] Speed: 2859.05 samples/sec accuracy=0.727148
INFO:root:Epoch[7] Batch [140] Speed: 2869.72 samples/sec accuracy=0.729688
INFO:root:Epoch[7] Batch [160] Speed: 2873.91 samples/sec accuracy=0.745117
INFO:root:Epoch[7] Batch [180] Speed: 2900.49 samples/sec accuracy=0.737109
INFO:root:Epoch[7] Train-accuracy=0.730469
INFO:root:Epoch[7] Time cost=17.265
INFO:root:Epoch[7] Validation-accuracy=0.727564
INFO:root:Epoch[8] Batch [20] Speed: 2901.90 samples/sec accuracy=0.742001
INFO:root:Epoch[8] Batch [40] Speed: 2867.76 samples/sec accuracy=0.741211
INFO:root:Epoch[8] Batch [60] Speed: 2873.99 samples/sec accuracy=0.740039
INFO:root:Epoch[8] Batch [80] Speed: 2837.24 samples/sec accuracy=0.747656
INFO:root:Epoch[8] Batch [100] Speed: 2826.82 samples/sec accuracy=0.740234
INFO:root:Epoch[8] Batch [120] Speed: 2885.81 samples/sec accuracy=0.746289
INFO:root:Epoch[8] Batch [140] Speed: 2887.89 samples/sec accuracy=0.743359
INFO:root:Epoch[8] Batch [160] Speed: 2890.34 samples/sec accuracy=0.756250
INFO:root:Epoch[8] Batch [180] Speed: 2844.10 samples/sec accuracy=0.752344
INFO:root:Epoch[8] Train-accuracy=0.744978
INFO:root:Epoch[8] Time cost=17.347
INFO:root:Epoch[8] Validation-accuracy=0.744692
INFO:root:Epoch[9] Batch [20] Speed: 2924.39 samples/sec accuracy=0.757812
INFO:root:Epoch[9] Batch [40] Speed: 2861.19 samples/sec accuracy=0.757227
INFO:root:Epoch[9] Batch [60] Speed: 2849.02 samples/sec accuracy=0.757812
INFO:root:Epoch[9] Batch [80] Speed: 2878.81 samples/sec accuracy=0.758984
INFO:root:Epoch[9] Batch [100] Speed: 2879.53 samples/sec accuracy=0.754492
INFO:root:Epoch[9] Batch [120] Speed: 2878.98 samples/sec accuracy=0.757617
INFO:root:Epoch[9] Batch [140] Speed: 2804.05 samples/sec accuracy=0.778711
INFO:root:Epoch[9] Batch [160] Speed: 2835.32 samples/sec accuracy=0.766406
INFO:root:Epoch[9] Batch [180] Speed: 2855.97 samples/sec accuracy=0.779297
INFO:root:Epoch[9] Train-accuracy=0.763802
INFO:root:Epoch[9] Time cost=17.458
INFO:root:Epoch[9] Validation-accuracy=0.769030
```
### number of GPU=2, precision=FP32
```
INFO:root:Epoch[1] Batch [180] Speed: 4108.27 samples/sec accuracy=0.481250
INFO:root:Epoch[1] Train-accuracy=0.492746
INFO:root:Epoch[1] Time cost=11.935
INFO:root:Epoch[1] Validation-accuracy=0.487580
INFO:root:Epoch[2] Batch [20] Speed: 4256.33 samples/sec accuracy=0.504278
INFO:root:Epoch[2] Batch [40] Speed: 4174.55 samples/sec accuracy=0.507617
INFO:root:Epoch[2] Batch [60] Speed: 4094.32 samples/sec accuracy=0.515234
INFO:root:Epoch[2] Batch [80] Speed: 4169.51 samples/sec accuracy=0.528711
INFO:root:Epoch[2] Batch [100] Speed: 4158.80 samples/sec accuracy=0.541406
INFO:root:Epoch[2] Batch [120] Speed: 4140.68 samples/sec accuracy=0.546094
INFO:root:Epoch[2] Batch [140] Speed: 4215.41 samples/sec accuracy=0.546680
INFO:root:Epoch[2] Batch [160] Speed: 4199.92 samples/sec accuracy=0.555469
INFO:root:Epoch[2] Batch [180] Speed: 4129.76 samples/sec accuracy=0.561719
INFO:root:Epoch[2] Train-accuracy=0.562779
INFO:root:Epoch[2] Time cost=11.962
INFO:root:Epoch[2] Validation-accuracy=0.469752
INFO:root:Epoch[3] Batch [20] Speed: 4167.49 samples/sec accuracy=0.571243
INFO:root:Epoch[3] Batch [40] Speed: 4149.39 samples/sec accuracy=0.575586
INFO:root:Epoch[3] Batch [60] Speed: 4202.28 samples/sec accuracy=0.583789
INFO:root:Epoch[3] Batch [80] Speed: 4158.42 samples/sec accuracy=0.588281
INFO:root:Epoch[3] Batch [100] Speed: 4158.99 samples/sec accuracy=0.586719
INFO:root:Epoch[3] Batch [120] Speed: 4133.70 samples/sec accuracy=0.611523
INFO:root:Epoch[3] Batch [140] Speed: 4082.04 samples/sec accuracy=0.602148
INFO:root:Epoch[3] Batch [160] Speed: 4183.89 samples/sec accuracy=0.615234
INFO:root:Epoch[3] Batch [180] Speed: 4097.57 samples/sec accuracy=0.627734
INFO:root:Epoch[3] Train-accuracy=0.620833
INFO:root:Epoch[3] Time cost=12.055
INFO:root:Epoch[3] Validation-accuracy=0.584135
INFO:root:Epoch[4] Batch [20] Speed: 4252.21 samples/sec accuracy=0.628720
INFO:root:Epoch[4] Batch [40] Speed: 4172.65 samples/sec accuracy=0.631445
INFO:root:Epoch[4] Batch [60] Speed: 4141.98 samples/sec accuracy=0.633203
INFO:root:Epoch[4] Batch [80] Speed: 4102.38 samples/sec accuracy=0.631055
INFO:root:Epoch[4] Batch [100] Speed: 4168.22 samples/sec accuracy=0.640625
INFO:root:Epoch[4] Batch [120] Speed: 4153.58 samples/sec accuracy=0.643555
INFO:root:Epoch[4] Batch [140] Speed: 4149.70 samples/sec accuracy=0.644141
INFO:root:Epoch[4] Batch [160] Speed: 4109.06 samples/sec accuracy=0.661133
INFO:root:Epoch[4] Batch [180] Speed: 4112.76 samples/sec accuracy=0.675195
INFO:root:Epoch[4] Train-accuracy=0.646484
INFO:root:Epoch[4] Time cost=11.994
INFO:root:Epoch[4] Validation-accuracy=0.678085
INFO:root:Epoch[5] Batch [20] Speed: 4265.15 samples/sec accuracy=0.655506
INFO:root:Epoch[5] Batch [40] Speed: 4167.68 samples/sec accuracy=0.666602
INFO:root:Epoch[5] Batch [60] Speed: 4224.17 samples/sec accuracy=0.670312
INFO:root:Epoch[5] Batch [80] Speed: 4152.36 samples/sec accuracy=0.673047
INFO:root:Epoch[5] Batch [100] Speed: 4118.74 samples/sec accuracy=0.667969
INFO:root:Epoch[5] Batch [120] Speed: 4168.40 samples/sec accuracy=0.676172
INFO:root:Epoch[5] Batch [140] Speed: 4214.88 samples/sec accuracy=0.669922
INFO:root:Epoch[5] Batch [160] Speed: 4110.49 samples/sec accuracy=0.682813
INFO:root:Epoch[5] Batch [180] Speed: 4191.58 samples/sec accuracy=0.697852
INFO:root:Epoch[5] Train-accuracy=0.679408
INFO:root:Epoch[5] Time cost=11.921
INFO:root:Epoch[5] Validation-accuracy=0.668069
INFO:root:Epoch[6] Batch [20] Speed: 4213.16 samples/sec accuracy=0.696615
INFO:root:Epoch[6] Batch [40] Speed: 4184.11 samples/sec accuracy=0.699805
INFO:root:Epoch[6] Batch [60] Speed: 4182.60 samples/sec accuracy=0.707812
INFO:root:Epoch[6] Batch [80] Speed: 4106.69 samples/sec accuracy=0.702930
INFO:root:Epoch[6] Batch [100] Speed: 4154.74 samples/sec accuracy=0.700391
INFO:root:Epoch[6] Batch [120] Speed: 4142.70 samples/sec accuracy=0.716406
INFO:root:Epoch[6] Batch [140] Speed: 4195.48 samples/sec accuracy=0.710156
INFO:root:Epoch[6] Batch [160] Speed: 4149.39 samples/sec accuracy=0.704297
INFO:root:Epoch[6] Batch [180] Speed: 4160.08 samples/sec accuracy=0.723633
INFO:root:Epoch[6] Train-accuracy=0.699219
INFO:root:Epoch[6] Time cost=12.015
INFO:root:Epoch[6] Validation-accuracy=0.722456
INFO:root:Epoch[7] Batch [20] Speed: 4174.50 samples/sec accuracy=0.720796
INFO:root:Epoch[7] Batch [40] Speed: 4218.09 samples/sec accuracy=0.720508
INFO:root:Epoch[7] Batch [60] Speed: 4119.26 samples/sec accuracy=0.719531
INFO:root:Epoch[7] Batch [80] Speed: 4173.38 samples/sec accuracy=0.725391
INFO:root:Epoch[7] Batch [100] Speed: 4193.89 samples/sec accuracy=0.724414
INFO:root:Epoch[7] Batch [120] Speed: 4120.34 samples/sec accuracy=0.718164
INFO:root:Epoch[7] Batch [140] Speed: 4215.10 samples/sec accuracy=0.721680
INFO:root:Epoch[7] Batch [160] Speed: 4153.57 samples/sec accuracy=0.739258
INFO:root:Epoch[7] Batch [180] Speed: 4125.48 samples/sec accuracy=0.729492
INFO:root:Epoch[7] Train-accuracy=0.727121
INFO:root:Epoch[7] Time cost=11.963
INFO:root:Epoch[7] Validation-accuracy=0.724259
INFO:root:Epoch[8] Batch [20] Speed: 4191.51 samples/sec accuracy=0.742374
INFO:root:Epoch[8] Batch [40] Speed: 4172.53 samples/sec accuracy=0.738867
INFO:root:Epoch[8] Batch [60] Speed: 4205.16 samples/sec accuracy=0.745508
INFO:root:Epoch[8] Batch [80] Speed: 4183.85 samples/sec accuracy=0.736133
INFO:root:Epoch[8] Batch [100] Speed: 4216.08 samples/sec accuracy=0.732812
INFO:root:Epoch[8] Batch [120] Speed: 4175.85 samples/sec accuracy=0.744727
INFO:root:Epoch[8] Batch [140] Speed: 4132.39 samples/sec accuracy=0.738672
INFO:root:Epoch[8] Batch [160] Speed: 4180.43 samples/sec accuracy=0.747070
INFO:root:Epoch[8] Batch [180] Speed: 4194.63 samples/sec accuracy=0.761133
INFO:root:Epoch[8] Train-accuracy=0.743304
INFO:root:Epoch[8] Time cost=11.893
INFO:root:Epoch[8] Validation-accuracy=0.749599
INFO:root:Epoch[9] Batch [20] Speed: 4208.84 samples/sec accuracy=0.757812
INFO:root:Epoch[9] Batch [40] Speed: 4102.55 samples/sec accuracy=0.757031
INFO:root:Epoch[9] Batch [60] Speed: 4151.69 samples/sec accuracy=0.748828
INFO:root:Epoch[9] Batch [80] Speed: 4148.54 samples/sec accuracy=0.744922
INFO:root:Epoch[9] Batch [100] Speed: 4200.95 samples/sec accuracy=0.740430
INFO:root:Epoch[9] Batch [120] Speed: 3987.34 samples/sec accuracy=0.756836
INFO:root:Epoch[9] Batch [140] Speed: 4161.59 samples/sec accuracy=0.759570
INFO:root:Epoch[9] Batch [160] Speed: 4193.97 samples/sec accuracy=0.761523
INFO:root:Epoch[9] Batch [180] Speed: 4148.27 samples/sec accuracy=0.767188
INFO:root:Epoch[9] Train-accuracy=0.759896
INFO:root:Epoch[9] Time cost=12.076
INFO:root:Epoch[9] Validation-accuracy=0.764724
```
### number of GPU=4, precision=FP16
```
INFO:root:start with arguments Namespace(batch_size=512, benchmark=0, data_nthreads=20, data_train='data/cifar10_train.rec', data_val='data/cifar10_val.rec', disp_batches=20, dtype='float16', gpus='0,1,2,3', image_shape='3,28,28', kv_store='device', load_epoch=None, lr=0.05, lr_factor=0.1, lr_step_epochs='200,250', max_random_aspect_ratio=0, max_random_h=36, max_random_l=50, max_random_rotate_angle=0, max_random_s=50, max_random_scale=1, max_random_shear_ratio=0, min_random_scale=1, model_prefix=None, mom=0.9, monitor=0, network='resnet', num_classes=10, num_epochs=10, num_examples=50000, num_layers=110, optimizer='sgd', pad_size=4, random_crop=1, random_mirror=1, rgb_mean='123.68,116.779,103.939', test_io=0, top_k=0, wd=0.0001)
[07:53:21] src/io/iter_image_recordio_3.cc:143: ImageRecordIOParser3: data/cifar10_train.rec, use 20 threads for decoding..
[07:53:22] src/io/iter_image_recordio_3.cc:143: ImageRecordIOParser3: data/cifar10_val.rec, use 20 threads for decoding..
[07:53:24] src/operator/././cudnn_algoreg-inl.h:112: Running performance tests to find the best convolution algorithm, this can take a while... (setting env variable MXNET_CUDNN_AUTOTUNE_DEFAULT to 0 to disable)
INFO:root:Epoch[0] Batch [20] Speed: 5869.17 samples/sec accuracy=0.156994
INFO:root:Epoch[0] Batch [40] Speed: 5649.03 samples/sec accuracy=0.219434
INFO:root:Epoch[0] Batch [60] Speed: 5595.01 samples/sec accuracy=0.242285
INFO:root:Epoch[0] Batch [80] Speed: 5660.61 samples/sec accuracy=0.279590
INFO:root:Epoch[0] Train-accuracy=0.308824
INFO:root:Epoch[0] Time cost=10.753
INFO:root:Epoch[0] Validation-accuracy=0.340625
INFO:root:Epoch[1] Batch [20] Speed: 5831.55 samples/sec accuracy=0.321522
INFO:root:Epoch[1] Batch [40] Speed: 5721.23 samples/sec accuracy=0.342187
INFO:root:Epoch[1] Batch [60] Speed: 5734.57 samples/sec accuracy=0.354492
INFO:root:Epoch[1] Batch [80] Speed: 5734.63 samples/sec accuracy=0.374316
INFO:root:Epoch[1] Train-accuracy=0.396829
INFO:root:Epoch[1] Time cost=8.676
INFO:root:Epoch[1] Validation-accuracy=0.391406
INFO:root:Epoch[2] Batch [20] Speed: 5860.26 samples/sec accuracy=0.414062
INFO:root:Epoch[2] Batch [40] Speed: 5771.02 samples/sec accuracy=0.443457
INFO:root:Epoch[2] Batch [60] Speed: 5762.90 samples/sec accuracy=0.447461
INFO:root:Epoch[2] Batch [80] Speed: 5663.11 samples/sec accuracy=0.471582
INFO:root:Epoch[2] Train-accuracy=0.488525
INFO:root:Epoch[2] Time cost=8.582
INFO:root:Epoch[2] Validation-accuracy=0.509046
INFO:root:Epoch[3] Batch [20] Speed: 5853.16 samples/sec accuracy=0.497861
INFO:root:Epoch[3] Batch [40] Speed: 5662.32 samples/sec accuracy=0.522461
INFO:root:Epoch[3] Batch [60] Speed: 5706.18 samples/sec accuracy=0.525488
INFO:root:Epoch[3] Batch [80] Speed: 5753.41 samples/sec accuracy=0.539746
INFO:root:Epoch[3] Train-accuracy=0.553424
INFO:root:Epoch[3] Time cost=8.696
INFO:root:Epoch[3] Validation-accuracy=0.552832
INFO:root:Epoch[4] Batch [20] Speed: 5874.70 samples/sec accuracy=0.572824
INFO:root:Epoch[4] Batch [40] Speed: 5722.39 samples/sec accuracy=0.567676
INFO:root:Epoch[4] Batch [60] Speed: 5766.61 samples/sec accuracy=0.585547
INFO:root:Epoch[4] Batch [80] Speed: 5772.43 samples/sec accuracy=0.589551
INFO:root:Epoch[4] Train-accuracy=0.596737
INFO:root:Epoch[4] Time cost=8.645
INFO:root:Epoch[4] Validation-accuracy=0.603824
INFO:root:Epoch[5] Batch [20] Speed: 5872.05 samples/sec accuracy=0.604539
INFO:root:Epoch[5] Batch [40] Speed: 5729.62 samples/sec accuracy=0.618555
INFO:root:Epoch[5] Batch [60] Speed: 5723.69 samples/sec accuracy=0.617188
INFO:root:Epoch[5] Batch [80] Speed: 5740.70 samples/sec accuracy=0.622559
INFO:root:Epoch[5] Train-accuracy=0.631836
INFO:root:Epoch[5] Time cost=8.578
INFO:root:Epoch[5] Validation-accuracy=0.659375
INFO:root:Epoch[6] Batch [20] Speed: 5741.80 samples/sec accuracy=0.642392
INFO:root:Epoch[6] Batch [40] Speed: 5627.22 samples/sec accuracy=0.650781
INFO:root:Epoch[6] Batch [60] Speed: 5734.45 samples/sec accuracy=0.644336
INFO:root:Epoch[6] Batch [80] Speed: 5764.20 samples/sec accuracy=0.660254
INFO:root:Epoch[6] Train-accuracy=0.660386
INFO:root:Epoch[6] Time cost=8.735
INFO:root:Epoch[6] Validation-accuracy=0.675164
INFO:root:Epoch[7] Batch [20] Speed: 5848.09 samples/sec accuracy=0.668341
INFO:root:Epoch[7] Batch [40] Speed: 5731.51 samples/sec accuracy=0.680273
INFO:root:Epoch[7] Batch [60] Speed: 5722.21 samples/sec accuracy=0.681641
INFO:root:Epoch[7] Batch [80] Speed: 5731.36 samples/sec accuracy=0.692578
INFO:root:Epoch[7] Train-accuracy=0.694393
INFO:root:Epoch[7] Time cost=8.680
INFO:root:Epoch[7] Validation-accuracy=0.710352
INFO:root:Epoch[8] Batch [20] Speed: 5810.60 samples/sec accuracy=0.694289
INFO:root:Epoch[8] Batch [40] Speed: 5724.36 samples/sec accuracy=0.701367
INFO:root:Epoch[8] Batch [60] Speed: 5688.34 samples/sec accuracy=0.699414
INFO:root:Epoch[8] Batch [80] Speed: 5720.14 samples/sec accuracy=0.704492
INFO:root:Epoch[8] Train-accuracy=0.710449
INFO:root:Epoch[8] Time cost=8.626
INFO:root:Epoch[8] Validation-accuracy=0.693565
INFO:root:Epoch[9] Batch [20] Speed: 5819.16 samples/sec accuracy=0.716146
INFO:root:Epoch[9] Batch [40] Speed: 5729.44 samples/sec accuracy=0.716211
INFO:root:Epoch[9] Batch [60] Speed: 5707.16 samples/sec accuracy=0.711621
INFO:root:Epoch[9] Batch [80] Speed: 5734.86 samples/sec accuracy=0.723437
INFO:root:Epoch[9] Train-accuracy=0.729665
INFO:root:Epoch[9] Time cost=8.714
INFO:root:Epoch[9] Validation-accuracy=0.754590
```
### number of GPU=4, precision=FP32
```bash
INFO:root:start with arguments Namespace(batch_size=512, benchmark=0, data_nthreads=20, data_train='data/cifar10_train.rec', data_val='data/cifar10_val.rec', disp_batches=20, dtype='float32', gpus='0,1,2,3', image_shape='3,28,28', kv_store='device', load_epoch=None, lr=0.05, lr_factor=0.1, lr_step_epochs='200,250', max_random_aspect_ratio=0, max_random_h=36, max_random_l=50, max_random_rotate_angle=0, max_random_s=50, max_random_scale=1, max_random_shear_ratio=0, min_random_scale=1, model_prefix=None, mom=0.9, monitor=0, network='resnet', num_classes=10, num_epochs=10, num_examples=50000, num_layers=110, optimizer='sgd', pad_size=4, random_crop=1, random_mirror=1, rgb_mean='123.68,116.779,103.939', test_io=0, top_k=0, wd=0.0001)
[08:03:14] src/io/iter_image_recordio_3.cc:143: ImageRecordIOParser3: data/cifar10_train.rec, use 20 threads for decoding..
[08:03:15] src/io/iter_image_recordio_3.cc:143: ImageRecordIOParser3: data/cifar10_val.rec, use 20 threads for decoding..
[08:03:17] src/operator/././cudnn_algoreg-inl.h:112: Running performance tests to find the best convolution algorithm, this can take a while... (setting env variable MXNET_CUDNN_AUTOTUNE_DEFAULT to 0 to disable)
INFO:root:Epoch[0] Batch [20] Speed: 7792.99 samples/sec accuracy=0.160528
INFO:root:Epoch[0] Batch [40] Speed: 7838.93 samples/sec accuracy=0.226758
INFO:root:Epoch[0] Batch [60] Speed: 7963.03 samples/sec accuracy=0.265039
INFO:root:Epoch[0] Batch [80] Speed: 7699.20 samples/sec accuracy=0.283105
INFO:root:Epoch[0] Train-accuracy=0.318819
INFO:root:Epoch[0] Time cost=8.336
INFO:root:Epoch[0] Validation-accuracy=0.371680
INFO:root:Epoch[1] Batch [20] Speed: 8071.82 samples/sec accuracy=0.324126
INFO:root:Epoch[1] Batch [40] Speed: 7949.64 samples/sec accuracy=0.342969
INFO:root:Epoch[1] Batch [60] Speed: 8013.45 samples/sec accuracy=0.359863
INFO:root:Epoch[1] Batch [80] Speed: 8036.81 samples/sec accuracy=0.374219
INFO:root:Epoch[1] Train-accuracy=0.400620
INFO:root:Epoch[1] Time cost=6.252
INFO:root:Epoch[1] Validation-accuracy=0.401660
INFO:root:Epoch[2] Batch [20] Speed: 8086.74 samples/sec accuracy=0.418155
INFO:root:Epoch[2] Batch [40] Speed: 8035.85 samples/sec accuracy=0.432715
INFO:root:Epoch[2] Batch [60] Speed: 8020.24 samples/sec accuracy=0.451660
INFO:root:Epoch[2] Batch [80] Speed: 7907.26 samples/sec accuracy=0.477832
INFO:root:Epoch[2] Train-accuracy=0.498901
INFO:root:Epoch[2] Time cost=6.185
INFO:root:Epoch[2] Validation-accuracy=0.520868
INFO:root:Epoch[3] Batch [20] Speed: 8110.95 samples/sec accuracy=0.508650
INFO:root:Epoch[3] Batch [40] Speed: 7937.30 samples/sec accuracy=0.520703
INFO:root:Epoch[3] Batch [60] Speed: 7991.88 samples/sec accuracy=0.526172
INFO:root:Epoch[3] Batch [80] Speed: 8141.97 samples/sec accuracy=0.545117
INFO:root:Epoch[3] Train-accuracy=0.548369
INFO:root:Epoch[3] Time cost=6.233
INFO:root:Epoch[3] Validation-accuracy=0.542285
INFO:root:Epoch[4] Batch [20] Speed: 8235.56 samples/sec accuracy=0.557664
INFO:root:Epoch[4] Batch [40] Speed: 8099.37 samples/sec accuracy=0.570703
INFO:root:Epoch[4] Batch [60] Speed: 8148.82 samples/sec accuracy=0.586328
INFO:root:Epoch[4] Batch [80] Speed: 8108.18 samples/sec accuracy=0.579980
INFO:root:Epoch[4] Train-accuracy=0.592831
INFO:root:Epoch[4] Time cost=6.150
INFO:root:Epoch[4] Validation-accuracy=0.591180
INFO:root:Epoch[5] Batch [20] Speed: 8158.63 samples/sec accuracy=0.608352
INFO:root:Epoch[5] Batch [40] Speed: 8092.86 samples/sec accuracy=0.609473
INFO:root:Epoch[5] Batch [60] Speed: 7983.87 samples/sec accuracy=0.605664
INFO:root:Epoch[5] Batch [80] Speed: 7940.05 samples/sec accuracy=0.623535
INFO:root:Epoch[5] Train-accuracy=0.632935
INFO:root:Epoch[5] Time cost=6.176
INFO:root:Epoch[5] Validation-accuracy=0.659668
INFO:root:Epoch[6] Batch [20] Speed: 8133.43 samples/sec accuracy=0.641369
INFO:root:Epoch[6] Batch [40] Speed: 8062.66 samples/sec accuracy=0.640723
INFO:root:Epoch[6] Batch [60] Speed: 8084.93 samples/sec accuracy=0.645215
INFO:root:Epoch[6] Batch [80] Speed: 7951.90 samples/sec accuracy=0.642578
INFO:root:Epoch[6] Train-accuracy=0.655676
INFO:root:Epoch[6] Time cost=6.233
INFO:root:Epoch[6] Validation-accuracy=0.681332
INFO:root:Epoch[7] Batch [20] Speed: 8137.12 samples/sec accuracy=0.662016
INFO:root:Epoch[7] Batch [40] Speed: 8013.96 samples/sec accuracy=0.670508
INFO:root:Epoch[7] Batch [60] Speed: 8017.48 samples/sec accuracy=0.661426
INFO:root:Epoch[7] Batch [80] Speed: 8089.57 samples/sec accuracy=0.665625
INFO:root:Epoch[7] Train-accuracy=0.684168
INFO:root:Epoch[7] Time cost=6.210
INFO:root:Epoch[7] Validation-accuracy=0.656836
INFO:root:Epoch[8] Batch [20] Speed: 8220.95 samples/sec accuracy=0.683966
INFO:root:Epoch[8] Batch [40] Speed: 8046.88 samples/sec accuracy=0.686426
INFO:root:Epoch[8] Batch [60] Speed: 7931.11 samples/sec accuracy=0.691016
INFO:root:Epoch[8] Batch [80] Speed: 7872.08 samples/sec accuracy=0.687305
INFO:root:Epoch[8] Train-accuracy=0.703369
INFO:root:Epoch[8] Time cost=6.195
INFO:root:Epoch[8] Validation-accuracy=0.726665
INFO:root:Epoch[9] Batch [20] Speed: 8121.11 samples/sec accuracy=0.700149
INFO:root:Epoch[9] Batch [40] Speed: 7949.63 samples/sec accuracy=0.705078
INFO:root:Epoch[9] Batch [60] Speed: 7793.96 samples/sec accuracy=0.702246
INFO:root:Epoch[9] Batch [80] Speed: 8001.58 samples/sec accuracy=0.713281
INFO:root:Epoch[9] Train-accuracy=0.718176
INFO:root:Epoch[9] Time cost=6.273
INFO:root:Epoch[9] Validation-accuracy=0.721387
```
### number of GPU=8, precision=FP16
```bash
INFO:root:start with arguments Namespace(batch_size=1024, benchmark=0, data_nthreads=20, data_train='data/cifar10_train.rec', data_val='data/cifar10_val.rec', disp_batches=20, dtype='float16', gpus='0,1,2,3,4,5,6,7', image_shape='3,28,28', kv_store='device', load_epoch=None, lr=0.05, lr_factor=0.1, lr_step_epochs='200,250', max_random_aspect_ratio=0, max_random_h=36, max_random_l=50, max_random_rotate_angle=0, max_random_s=50, max_random_scale=1, max_random_shear_ratio=0, min_random_scale=1, model_prefix=None, mom=0.9, monitor=0, network='resnet', num_classes=10, num_epochs=10, num_examples=50000, num_layers=110, optimizer='sgd', pad_size=4, random_crop=1, random_mirror=1, rgb_mean='123.68,116.779,103.939', test_io=0, top_k=0, wd=0.0001)
[07:55:10] src/io/iter_image_recordio_3.cc:143: ImageRecordIOParser3: data/cifar10_train.rec, use 20 threads for decoding..
[07:55:12] src/io/iter_image_recordio_3.cc:143: ImageRecordIOParser3: data/cifar10_val.rec, use 20 threads for decoding..
[07:55:13] src/operator/././cudnn_algoreg-inl.h:112: Running performance tests to find the best convolution algorithm, this can take a while... (setting env variable MXNET_CUDNN_AUTOTUNE_DEFAULT to 0 to disable)
[07:55:36] src/kvstore/././comm.h:327: only 32 out of 56 GPU pairs are enabled direct access. It may affect the performance. You can set MXNET_ENABLE_GPU_P2P=0 to turn it off
[07:55:36] src/kvstore/././comm.h:336: .vvvv...
[07:55:36] src/kvstore/././comm.h:336: v.vv.v..
[07:55:36] src/kvstore/././comm.h:336: vv.v..v.
[07:55:36] src/kvstore/././comm.h:336: vvv....v
[07:55:36] src/kvstore/././comm.h:336: v....vvv
[07:55:36] src/kvstore/././comm.h:336: .v..v.vv
[07:55:36] src/kvstore/././comm.h:336: ..v.vv.v
[07:55:36] src/kvstore/././comm.h:336: ...vvvv.
INFO:root:Epoch[0] Batch [20] Speed: 9099.72 samples/sec accuracy=0.149414
INFO:root:Epoch[0] Batch [40] Speed: 8787.35 samples/sec accuracy=0.227100
INFO:root:Epoch[0] Train-accuracy=0.260010
INFO:root:Epoch[0] Time cost=9.538
INFO:root:Epoch[0] Validation-accuracy=0.285938
INFO:root:Epoch[1] Batch [20] Speed: 10278.64 samples/sec accuracy=0.279204
INFO:root:Epoch[1] Batch [40] Speed: 10534.04 samples/sec accuracy=0.305225
INFO:root:Epoch[1] Train-accuracy=0.325928
INFO:root:Epoch[1] Time cost=4.795
INFO:root:Epoch[1] Validation-accuracy=0.384668
INFO:root:Epoch[2] Batch [20] Speed: 9058.14 samples/sec accuracy=0.334356
INFO:root:Epoch[2] Batch [40] Speed: 8510.44 samples/sec accuracy=0.354248
INFO:root:Epoch[2] Train-accuracy=0.373047
INFO:root:Epoch[2] Time cost=5.696
INFO:root:Epoch[2] Validation-accuracy=0.431250
INFO:root:Epoch[3] Batch [20] Speed: 9325.01 samples/sec accuracy=0.386021
INFO:root:Epoch[3] Batch [40] Speed: 9031.91 samples/sec accuracy=0.404883
INFO:root:Epoch[3] Train-accuracy=0.416748
INFO:root:Epoch[3] Time cost=5.329
INFO:root:Epoch[3] Validation-accuracy=0.462348
INFO:root:Epoch[4] Batch [20] Speed: 10653.77 samples/sec accuracy=0.438802
INFO:root:Epoch[4] Batch [40] Speed: 10626.94 samples/sec accuracy=0.453174
INFO:root:Epoch[4] Train-accuracy=0.478638
INFO:root:Epoch[4] Time cost=4.795
INFO:root:Epoch[4] Validation-accuracy=0.491309
INFO:root:Epoch[5] Batch [20] Speed: 11147.46 samples/sec accuracy=0.481166
INFO:root:Epoch[5] Batch [40] Speed: 9916.45 samples/sec accuracy=0.503662
INFO:root:Epoch[5] Train-accuracy=0.521484
INFO:root:Epoch[5] Time cost=4.739
INFO:root:Epoch[5] Validation-accuracy=0.574902
INFO:root:Epoch[6] Batch [20] Speed: 11193.83 samples/sec accuracy=0.527669
INFO:root:Epoch[6] Batch [40] Speed: 9221.36 samples/sec accuracy=0.543994
INFO:root:Epoch[6] Train-accuracy=0.558105
INFO:root:Epoch[6] Time cost=5.164
INFO:root:Epoch[6] Validation-accuracy=0.594531
INFO:root:Epoch[7] Batch [20] Speed: 10108.07 samples/sec accuracy=0.564779
INFO:root:Epoch[7] Batch [40] Speed: 8547.51 samples/sec accuracy=0.575293
INFO:root:Epoch[7] Train-accuracy=0.590820
INFO:root:Epoch[7] Time cost=5.365
INFO:root:Epoch[7] Validation-accuracy=0.591905
INFO:root:Epoch[8] Batch [20] Speed: 10000.07 samples/sec accuracy=0.597331
INFO:root:Epoch[8] Batch [40] Speed: 9367.33 samples/sec accuracy=0.601953
INFO:root:Epoch[8] Train-accuracy=0.617188
INFO:root:Epoch[8] Time cost=5.084
INFO:root:Epoch[8] Validation-accuracy=0.618164
INFO:root:Epoch[9] Batch [20] Speed: 9902.11 samples/sec accuracy=0.617141
INFO:root:Epoch[9] Batch [40] Speed: 10787.76 samples/sec accuracy=0.631250
INFO:root:Epoch[9] Train-accuracy=0.634888
INFO:root:Epoch[9] Time cost=4.801
INFO:root:Epoch[9] Validation-accuracy=0.664551
```
### number of GPU=8, precision=FP32
```bash
INFO:root:start with arguments Namespace(batch_size=1024, benchmark=0, data_nthreads=20, data_train='data/cifar10_train.rec', data_val='data/cifar10_val.rec', disp_batches=20, dtype='float32', gpus='0,1,2,3,4,5,6,7', image_shape='3,28,28', kv_store='device', load_epoch=None, lr=0.05, lr_factor=0.1, lr_step_epochs='200,250', max_random_aspect_ratio=0, max_random_h=36, max_random_l=50, max_random_rotate_angle=0, max_random_s=50, max_random_scale=1, max_random_shear_ratio=0, min_random_scale=1, model_prefix=None, mom=0.9, monitor=0, network='resnet', num_classes=10, num_epochs=10, num_examples=50000, num_layers=110, optimizer='sgd', pad_size=4, random_crop=1, random_mirror=1, rgb_mean='123.68,116.779,103.939', test_io=0, top_k=0, wd=0.0001)
[08:04:39] src/io/iter_image_recordio_3.cc:143: ImageRecordIOParser3: data/cifar10_train.rec, use 20 threads for decoding..
[08:04:40] src/io/iter_image_recordio_3.cc:143: ImageRecordIOParser3: data/cifar10_val.rec, use 20 threads for decoding..
[08:04:42] src/operator/././cudnn_algoreg-inl.h:112: Running performance tests to find the best convolution algorithm, this can take a while... (setting env variable MXNET_CUDNN_AUTOTUNE_DEFAULT to 0 to disable)
[08:05:04] src/kvstore/././comm.h:327: only 32 out of 56 GPU pairs are enabled direct access. It may affect the performance. You can set MXNET_ENABLE_GPU_P2P=0 to turn it off
[08:05:04] src/kvstore/././comm.h:336: .vvvv...
[08:05:04] src/kvstore/././comm.h:336: v.vv.v..
[08:05:04] src/kvstore/././comm.h:336: vv.v..v.
[08:05:04] src/kvstore/././comm.h:336: vvv....v
[08:05:04] src/kvstore/././comm.h:336: v....vvv
[08:05:04] src/kvstore/././comm.h:336: .v..v.vv
[08:05:04] src/kvstore/././comm.h:336: ..v.vv.v
[08:05:04] src/kvstore/././comm.h:336: ...vvvv.
INFO:root:Epoch[0] Batch [20] Speed: 8792.64 samples/sec accuracy=0.166574
INFO:root:Epoch[0] Batch [40] Speed: 8230.37 samples/sec accuracy=0.235449
INFO:root:Epoch[0] Train-accuracy=0.281250
INFO:root:Epoch[0] Time cost=9.923
INFO:root:Epoch[0] Validation-accuracy=0.316797
INFO:root:Epoch[1] Batch [20] Speed: 8755.05 samples/sec accuracy=0.287853
INFO:root:Epoch[1] Batch [40] Speed: 8124.35 samples/sec accuracy=0.309033
INFO:root:Epoch[1] Train-accuracy=0.321533
INFO:root:Epoch[1] Time cost=6.121
INFO:root:Epoch[1] Validation-accuracy=0.371191
INFO:root:Epoch[2] Batch [20] Speed: 9372.24 samples/sec accuracy=0.341890
INFO:root:Epoch[2] Batch [40] Speed: 8558.61 samples/sec accuracy=0.366895
INFO:root:Epoch[2] Train-accuracy=0.384155
INFO:root:Epoch[2] Time cost=5.567
INFO:root:Epoch[2] Validation-accuracy=0.427734
INFO:root:Epoch[3] Batch [20] Speed: 9234.61 samples/sec accuracy=0.389462
INFO:root:Epoch[3] Batch [40] Speed: 8844.03 samples/sec accuracy=0.408984
INFO:root:Epoch[3] Train-accuracy=0.445801
INFO:root:Epoch[3] Time cost=5.774
INFO:root:Epoch[3] Validation-accuracy=0.471788
INFO:root:Epoch[4] Batch [20] Speed: 8650.64 samples/sec accuracy=0.450893
INFO:root:Epoch[4] Batch [40] Speed: 8271.18 samples/sec accuracy=0.472705
INFO:root:Epoch[4] Train-accuracy=0.481689
INFO:root:Epoch[4] Time cost=5.792
INFO:root:Epoch[4] Validation-accuracy=0.540332
INFO:root:Epoch[5] Batch [20] Speed: 10215.29 samples/sec accuracy=0.502744
INFO:root:Epoch[5] Batch [40] Speed: 9261.69 samples/sec accuracy=0.517627
INFO:root:Epoch[5] Train-accuracy=0.531110
INFO:root:Epoch[5] Time cost=5.148
INFO:root:Epoch[5] Validation-accuracy=0.554688
INFO:root:Epoch[6] Batch [20] Speed: 9040.84 samples/sec accuracy=0.538644
INFO:root:Epoch[6] Batch [40] Speed: 10600.00 samples/sec accuracy=0.549365
INFO:root:Epoch[6] Train-accuracy=0.572632
INFO:root:Epoch[6] Time cost=5.222
INFO:root:Epoch[6] Validation-accuracy=0.586523
INFO:root:Epoch[7] Batch [20] Speed: 10178.73 samples/sec accuracy=0.573382
INFO:root:Epoch[7] Batch [40] Speed: 8490.75 samples/sec accuracy=0.578857
INFO:root:Epoch[7] Train-accuracy=0.585815
INFO:root:Epoch[7] Time cost=5.508
INFO:root:Epoch[7] Validation-accuracy=0.616970
INFO:root:Epoch[8] Batch [20] Speed: 9280.62 samples/sec accuracy=0.597284
INFO:root:Epoch[8] Batch [40] Speed: 9454.47 samples/sec accuracy=0.603320
INFO:root:Epoch[8] Train-accuracy=0.615601
INFO:root:Epoch[8] Time cost=5.211
INFO:root:Epoch[8] Validation-accuracy=0.626270
INFO:root:Epoch[9] Batch [20] Speed: 8527.77 samples/sec accuracy=0.620908
INFO:root:Epoch[9] Batch [40] Speed: 9799.97 samples/sec accuracy=0.630762
INFO:root:Epoch[9] Train-accuracy=0.629395
INFO:root:Epoch[9] Time cost=5.348
INFO:root:Epoch[9] Validation-accuracy=0.617383
```
## monitor the GPU status while training
### 1X GPU, FP16
```bash
Wed Oct 4 21:03:44 2017
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 384.81 Driver Version: 384.81 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 Tesla V100-SXM2... On | 00000000:06:00.0 Off | 0 |
| N/A 57C P0 142W / 300W | 1498MiB / 16152MiB | 73% Default |
+-------------------------------+----------------------+----------------------+
| 1 Tesla V100-SXM2... On | 00000000:07:00.0 Off | 0 |
| N/A 49C P0 44W / 300W | 10MiB / 16152MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 2 Tesla V100-SXM2... On | 00000000:0A:00.0 Off | 0 |
| N/A 47C P0 45W / 300W | 10MiB / 16152MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 3 Tesla V100-SXM2... On | 00000000:0B:00.0 Off | 0 |
| N/A 47C P0 45W / 300W | 10MiB / 16152MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 4 Tesla V100-SXM2... On | 00000000:85:00.0 Off | 0 |
| N/A 48C P0 44W / 300W | 10MiB / 16152MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 5 Tesla V100-SXM2... On | 00000000:86:00.0 Off | 0 |
| N/A 47C P0 43W / 300W | 10MiB / 16152MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 6 Tesla V100-SXM2... On | 00000000:89:00.0 Off | 0 |
| N/A 48C P0 44W / 300W | 10MiB / 16152MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 7 Tesla V100-SXM2... On | 00000000:8A:00.0 Off | 0 |
| N/A 47C P0 47W / 300W | 10MiB / 16152MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 0 34472 C python 1488MiB |
+-----------------------------------------------------------------------------+
```
### 2X GPU, FP16
```bash
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 384.81 Driver Version: 384.81 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 Tesla V100-SXM2... On | 00000000:06:00.0 Off | 0 |
| N/A 59C P0 160W / 300W | 1492MiB / 16152MiB | 70% Default |
+-------------------------------+----------------------+----------------------+
| 1 Tesla V100-SXM2... On | 00000000:07:00.0 Off | 0 |
| N/A 61C P0 156W / 300W | 1494MiB / 16152MiB | 69% Default |
+-------------------------------+----------------------+----------------------+
| 2 Tesla V100-SXM2... On | 00000000:0A:00.0 Off | 0 |
| N/A 47C P0 45W / 300W | 10MiB / 16152MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 3 Tesla V100-SXM2... On | 00000000:0B:00.0 Off | 0 |
| N/A 47C P0 45W / 300W | 10MiB / 16152MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 4 Tesla V100-SXM2... On | 00000000:85:00.0 Off | 0 |
| N/A 49C P0 44W / 300W | 10MiB / 16152MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 5 Tesla V100-SXM2... On | 00000000:86:00.0 Off | 0 |
| N/A 48C P0 43W / 300W | 10MiB / 16152MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 6 Tesla V100-SXM2... On | 00000000:89:00.0 Off | 0 |
| N/A 48C P0 44W / 300W | 10MiB / 16152MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 7 Tesla V100-SXM2... On | 00000000:8A:00.0 Off | 0 |
| N/A 47C P0 47W / 300W | 10MiB / 16152MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 0 35486 C python 1482MiB |
| 1 35486 C python 1484MiB |
+-----------------------------------------------------------------------------+
```
### 1X GPU, FP32
```bash
Wed Oct 4 21:14:44 2017
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 384.81 Driver Version: 384.81 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 Tesla V100-SXM2... On | 00000000:06:00.0 Off | 0 |
| N/A 62C P0 174W / 300W | 2172MiB / 16152MiB | 67% Default |
+-------------------------------+----------------------+----------------------+
| 1 Tesla V100-SXM2... On | 00000000:07:00.0 Off | 0 |
| N/A 51C P0 45W / 300W | 10MiB / 16152MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 2 Tesla V100-SXM2... On | 00000000:0A:00.0 Off | 0 |
| N/A 50C P0 46W / 300W | 10MiB / 16152MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 3 Tesla V100-SXM2... On | 00000000:0B:00.0 Off | 0 |
| N/A 51C P0 63W / 300W | 10MiB / 16152MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 4 Tesla V100-SXM2... On | 00000000:85:00.0 Off | 0 |
| N/A 50C P0 45W / 300W | 10MiB / 16152MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 5 Tesla V100-SXM2... On | 00000000:86:00.0 Off | 0 |
| N/A 50C P0 43W / 300W | 10MiB / 16152MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 6 Tesla V100-SXM2... On | 00000000:89:00.0 Off | 0 |
| N/A 52C P0 60W / 300W | 10MiB / 16152MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 7 Tesla V100-SXM2... On | 00000000:8A:00.0 Off | 0 |
| N/A 48C P0 48W / 300W | 10MiB / 16152MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 0 37137 C python 2162MiB |
+-----------------------------------------------------------------------------+
```
### 2X GPU, FP32
```bash
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 384.81 Driver Version: 384.81 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 Tesla V100-SXM2... On | 00000000:06:00.0 Off | 0 |
| N/A 63C P0 168W / 300W | 2172MiB / 16152MiB | 65% Default |
+-------------------------------+----------------------+----------------------+
| 1 Tesla V100-SXM2... On | 00000000:07:00.0 Off | 0 |
| N/A 63C P0 182W / 300W | 2172MiB / 16152MiB | 65% Default |
+-------------------------------+----------------------+----------------------+
| 2 Tesla V100-SXM2... On | 00000000:0A:00.0 Off | 0 |
| N/A 48C P0 45W / 300W | 10MiB / 16152MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 3 Tesla V100-SXM2... On | 00000000:0B:00.0 Off | 0 |
| N/A 48C P0 45W / 300W | 10MiB / 16152MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 4 Tesla V100-SXM2... On | 00000000:85:00.0 Off | 0 |
| N/A 49C P0 45W / 300W | 10MiB / 16152MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 5 Tesla V100-SXM2... On | 00000000:86:00.0 Off | 0 |
| N/A 48C P0 43W / 300W | 10MiB / 16152MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 6 Tesla V100-SXM2... On | 00000000:89:00.0 Off | 0 |
| N/A 49C P0 44W / 300W | 10MiB / 16152MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 7 Tesla V100-SXM2... On | 00000000:8A:00.0 Off | 0 |
| N/A 48C P0 47W / 300W | 10MiB / 16152MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 0 37930 C python 2162MiB |
| 1 37930 C python 2162MiB |
+-----------------------------------------------------------------------------+
```