# Multi-GPU Benchmark: DGX-1(8X Tesla V100 NVLINK), MXNET, NV Container17.09, CIFAR10 Dataset, 2017/10/07) * [CIFAR10 Dataset](https://www.cs.toronto.edu/~kriz/cifar.html) * links of the dataset: [cifar10_train.rec](http://data.mxnet.io/data/cifar10/cifar10_train.rec), [cifar10_val.rec](http://data.mxnet.io/data/cifar10/cifar10_val.rec) * [MXNET's CIFAR10 sample code usage explanation](https://github.com/apache/incubator-mxnet/tree/master/example/image-classification) * NVIDIA's MXNET CIFAR10 Benchmark (for Tesla P100) ![P100 CIFAR10 Benchmark](https://www.nvidia.com/content/dam/en-zz/Solutions/Data-Center/mxnet/data-center-gpu-ready-app-mxnet-chart-cifar10-843-u.jpg) the above figure is from [Here](https://www.nvidia.com/en-us/data-center/gpu-accelerated-applications/mxnet/). * The following tests are run directly within NV's docker container (image: nvcr.io/nvidia/mxnet:17.09). Versions of Python/MXNET: ```bash root@5480a386d45f:/opt/mxnet/example/image-classification# python Python 2.7.12 (default, Nov 19 2016, 06:48:10) [GCC 5.4.0 20160609] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> import mxnet >>> mxnet.__version__ '0.11.0' ``` ## Procedure of this Benchmark 1. switch to the MXNET docker container ```bash nvidia-docker run -it -p 8888:8888 --shm-size=1g --ulimit memlock=-1 --ulimit stack=67108864 --name mxnet --rm -v /raid/data:/workspace/examples/image-classification/data:cached -v /raid:/raid:cached nvcr.io/nvidia/mxnet:17.09 ``` 2. perform some multi-GPU tests ```bash #!/bin/bash cd /workspace/examples/image-classification framework=mxnet_nv_1709 batch_size=128 for precision in float16 float32 do for num_gpu in 1 2 4 8 do echo "number of GPU=$num_gpu" echo "precision=$precision" echo "batch size per GPU=$batch_size" python train_cifar10.py \ --network resnet \ --num-layers 110 \ --data-nthreads 20 \ --batch-size $(($batch_size*$num_gpu)) \ --gpus $(seq -s , 0 $(($num_gpu-1))) \ --num-epochs 10 \ --dtype $precision \ 2> /raid/test_${framework}_numGPU${num_gpu}_${precision}.txt done done ``` ```bash #!/bin/bash # quick fix: use GPU0,1,3,4 for DGX-1 cd /workspace/examples/image-classification framework=mxnet_nv_1709 batch_size=128 for precision in float16 float32 do for num_gpu in 4 do echo "number of GPU=$num_gpu" echo "precision=$precision" echo "batch size per GPU=$batch_size" python train_cifar10.py \ --network resnet \ --num-layers 110 \ --data-nthreads 20 \ --batch-size $(($batch_size*$num_gpu)) \ --gpus $(seq -s , 0 $(($num_gpu-1))) \ --num-epochs 10 \ --dtype $precision \ 2> /raid/test_${framework}_numGPU${num_gpu}_${precision}_topology_considered.txt done done ``` 3. show test results ```bash #!/bin/bash framework=mxnet_nv_1709 for num_gpu in 1 2 4 8 do for precision in float16 float32 do echo "number of GPU=$num_gpu" echo "precision=$precision" tail -n 100 /raid/test_${framework}_numGPU${num_gpu}_${precision}.txt echo "" done done ``` ## Results of this Benchmark ### number of GPU=1, precision=FP16 ```bash INFO:root:Epoch[5] Batch [220] Speed: 1636.16 samples/sec accuracy=0.735547 INFO:root:Epoch[5] Batch [240] Speed: 1638.92 samples/sec accuracy=0.718359 INFO:root:Epoch[5] Batch [260] Speed: 1664.47 samples/sec accuracy=0.725781 INFO:root:Epoch[5] Batch [280] Speed: 1651.72 samples/sec accuracy=0.708203 INFO:root:Epoch[5] Batch [300] Speed: 1659.69 samples/sec accuracy=0.726172 INFO:root:Epoch[5] Batch [320] Speed: 1676.59 samples/sec accuracy=0.739844 INFO:root:Epoch[5] Batch [340] Speed: 1652.22 samples/sec accuracy=0.741406 INFO:root:Epoch[5] Batch [360] Speed: 1657.70 samples/sec accuracy=0.743750 INFO:root:Epoch[5] Batch [380] Speed: 1636.72 samples/sec accuracy=0.720703 INFO:root:Epoch[5] Train-accuracy=0.741319 INFO:root:Epoch[5] Time cost=30.112 INFO:root:Epoch[5] Validation-accuracy=0.690505 INFO:root:Epoch[6] Batch [20] Speed: 1679.44 samples/sec accuracy=0.738095 INFO:root:Epoch[6] Batch [40] Speed: 1679.10 samples/sec accuracy=0.729297 INFO:root:Epoch[6] Batch [60] Speed: 1642.05 samples/sec accuracy=0.725781 INFO:root:Epoch[6] Batch [80] Speed: 1644.09 samples/sec accuracy=0.745313 INFO:root:Epoch[6] Batch [100] Speed: 1638.36 samples/sec accuracy=0.750391 INFO:root:Epoch[6] Batch [120] Speed: 1670.52 samples/sec accuracy=0.734766 INFO:root:Epoch[6] Batch [140] Speed: 1659.12 samples/sec accuracy=0.751563 INFO:root:Epoch[6] Batch [160] Speed: 1652.69 samples/sec accuracy=0.734375 INFO:root:Epoch[6] Batch [180] Speed: 1660.45 samples/sec accuracy=0.751953 INFO:root:Epoch[6] Batch [200] Speed: 1664.72 samples/sec accuracy=0.728906 INFO:root:Epoch[6] Batch [220] Speed: 1669.46 samples/sec accuracy=0.743750 INFO:root:Epoch[6] Batch [240] Speed: 1666.31 samples/sec accuracy=0.749219 INFO:root:Epoch[6] Batch [260] Speed: 1644.34 samples/sec accuracy=0.745703 INFO:root:Epoch[6] Batch [280] Speed: 1656.35 samples/sec accuracy=0.750391 INFO:root:Epoch[6] Batch [300] Speed: 1642.49 samples/sec accuracy=0.743359 INFO:root:Epoch[6] Batch [320] Speed: 1659.47 samples/sec accuracy=0.769141 INFO:root:Epoch[6] Batch [340] Speed: 1647.00 samples/sec accuracy=0.753516 INFO:root:Epoch[6] Batch [360] Speed: 1667.83 samples/sec accuracy=0.762891 INFO:root:Epoch[6] Batch [380] Speed: 1665.73 samples/sec accuracy=0.741016 INFO:root:Epoch[6] Train-accuracy=0.757812 INFO:root:Epoch[6] Time cost=30.123 INFO:root:Epoch[6] Validation-accuracy=0.767127 INFO:root:Epoch[7] Batch [20] Speed: 1678.21 samples/sec accuracy=0.748512 INFO:root:Epoch[7] Batch [40] Speed: 1678.75 samples/sec accuracy=0.759375 INFO:root:Epoch[7] Batch [60] Speed: 1684.53 samples/sec accuracy=0.755078 INFO:root:Epoch[7] Batch [80] Speed: 1678.94 samples/sec accuracy=0.759375 INFO:root:Epoch[7] Batch [100] Speed: 1657.94 samples/sec accuracy=0.767188 INFO:root:Epoch[7] Batch [120] Speed: 1669.34 samples/sec accuracy=0.750781 INFO:root:Epoch[7] Batch [140] Speed: 1650.13 samples/sec accuracy=0.766797 INFO:root:Epoch[7] Batch [160] Speed: 1647.68 samples/sec accuracy=0.759375 INFO:root:Epoch[7] Batch [180] Speed: 1653.50 samples/sec accuracy=0.759766 INFO:root:Epoch[7] Batch [200] Speed: 1671.70 samples/sec accuracy=0.747656 INFO:root:Epoch[7] Batch [220] Speed: 1659.03 samples/sec accuracy=0.757812 INFO:root:Epoch[7] Batch [240] Speed: 1648.86 samples/sec accuracy=0.760156 INFO:root:Epoch[7] Batch [260] Speed: 1660.06 samples/sec accuracy=0.752344 INFO:root:Epoch[7] Batch [280] Speed: 1659.86 samples/sec accuracy=0.765625 INFO:root:Epoch[7] Batch [300] Speed: 1661.02 samples/sec accuracy=0.758984 INFO:root:Epoch[7] Batch [320] Speed: 1669.57 samples/sec accuracy=0.769531 INFO:root:Epoch[7] Batch [340] Speed: 1661.76 samples/sec accuracy=0.771875 INFO:root:Epoch[7] Batch [360] Speed: 1674.96 samples/sec accuracy=0.768750 INFO:root:Epoch[7] Batch [380] Speed: 1712.63 samples/sec accuracy=0.759375 INFO:root:Epoch[7] Train-accuracy=0.762153 INFO:root:Epoch[7] Time cost=29.877 INFO:root:Epoch[7] Validation-accuracy=0.782051 INFO:root:Epoch[8] Batch [20] Speed: 1681.09 samples/sec accuracy=0.764881 INFO:root:Epoch[8] Batch [40] Speed: 1639.31 samples/sec accuracy=0.761328 INFO:root:Epoch[8] Batch [60] Speed: 1637.03 samples/sec accuracy=0.770312 INFO:root:Epoch[8] Batch [80] Speed: 1661.73 samples/sec accuracy=0.771094 INFO:root:Epoch[8] Batch [100] Speed: 1664.35 samples/sec accuracy=0.757031 INFO:root:Epoch[8] Batch [120] Speed: 1647.86 samples/sec accuracy=0.753516 INFO:root:Epoch[8] Batch [140] Speed: 1654.57 samples/sec accuracy=0.769922 INFO:root:Epoch[8] Batch [160] Speed: 1655.10 samples/sec accuracy=0.778906 INFO:root:Epoch[8] Batch [180] Speed: 1657.54 samples/sec accuracy=0.761719 INFO:root:Epoch[8] Batch [200] Speed: 1661.52 samples/sec accuracy=0.778906 INFO:root:Epoch[8] Batch [220] Speed: 1638.52 samples/sec accuracy=0.760156 INFO:root:Epoch[8] Batch [240] Speed: 1650.54 samples/sec accuracy=0.771484 INFO:root:Epoch[8] Batch [260] Speed: 1648.50 samples/sec accuracy=0.768750 INFO:root:Epoch[8] Batch [280] Speed: 1641.08 samples/sec accuracy=0.776563 INFO:root:Epoch[8] Batch [300] Speed: 1637.01 samples/sec accuracy=0.776563 INFO:root:Epoch[8] Batch [320] Speed: 1647.48 samples/sec accuracy=0.780469 INFO:root:Epoch[8] Batch [340] Speed: 1633.75 samples/sec accuracy=0.784375 INFO:root:Epoch[8] Batch [360] Speed: 1654.11 samples/sec accuracy=0.781641 INFO:root:Epoch[8] Batch [380] Speed: 1657.40 samples/sec accuracy=0.757031 INFO:root:Epoch[8] Train-accuracy=0.779687 INFO:root:Epoch[8] Time cost=30.315 INFO:root:Epoch[8] Validation-accuracy=0.779371 INFO:root:Epoch[9] Batch [20] Speed: 1657.51 samples/sec accuracy=0.773065 INFO:root:Epoch[9] Batch [40] Speed: 1666.89 samples/sec accuracy=0.768359 INFO:root:Epoch[9] Batch [60] Speed: 1663.53 samples/sec accuracy=0.780859 INFO:root:Epoch[9] Batch [80] Speed: 1658.68 samples/sec accuracy=0.778125 INFO:root:Epoch[9] Batch [100] Speed: 1661.43 samples/sec accuracy=0.779297 INFO:root:Epoch[9] Batch [120] Speed: 1662.35 samples/sec accuracy=0.762500 INFO:root:Epoch[9] Batch [140] Speed: 1620.34 samples/sec accuracy=0.785547 INFO:root:Epoch[9] Batch [160] Speed: 1639.00 samples/sec accuracy=0.800000 INFO:root:Epoch[9] Batch [180] Speed: 1665.13 samples/sec accuracy=0.773828 INFO:root:Epoch[9] Batch [200] Speed: 1653.24 samples/sec accuracy=0.766406 INFO:root:Epoch[9] Batch [220] Speed: 1641.22 samples/sec accuracy=0.776172 INFO:root:Epoch[9] Batch [240] Speed: 1635.84 samples/sec accuracy=0.787500 INFO:root:Epoch[9] Batch [260] Speed: 1662.29 samples/sec accuracy=0.784375 INFO:root:Epoch[9] Batch [280] Speed: 1670.61 samples/sec accuracy=0.777344 INFO:root:Epoch[9] Batch [300] Speed: 1665.69 samples/sec accuracy=0.785156 INFO:root:Epoch[9] Batch [320] Speed: 1643.91 samples/sec accuracy=0.786719 INFO:root:Epoch[9] Batch [340] Speed: 1658.62 samples/sec accuracy=0.792188 INFO:root:Epoch[9] Batch [360] Speed: 1675.86 samples/sec accuracy=0.792188 INFO:root:Epoch[9] Batch [380] Speed: 1660.30 samples/sec accuracy=0.774609 INFO:root:Epoch[9] Train-accuracy=0.790625 INFO:root:Epoch[9] Time cost=30.181 INFO:root:Epoch[9] Validation-accuracy=0.793870 ``` ### number of GPU=1, precision=FP32 ```bash INFO:root:Epoch[5] Batch [220] Speed: 2107.87 samples/sec accuracy=0.717578 INFO:root:Epoch[5] Batch [240] Speed: 2117.73 samples/sec accuracy=0.730469 INFO:root:Epoch[5] Batch [260] Speed: 2080.35 samples/sec accuracy=0.737109 INFO:root:Epoch[5] Batch [280] Speed: 2112.54 samples/sec accuracy=0.717969 INFO:root:Epoch[5] Batch [300] Speed: 2074.36 samples/sec accuracy=0.728125 INFO:root:Epoch[5] Batch [320] Speed: 2121.36 samples/sec accuracy=0.732812 INFO:root:Epoch[5] Batch [340] Speed: 2123.88 samples/sec accuracy=0.741406 INFO:root:Epoch[5] Batch [360] Speed: 2079.25 samples/sec accuracy=0.745703 INFO:root:Epoch[5] Batch [380] Speed: 2099.69 samples/sec accuracy=0.725391 INFO:root:Epoch[5] Train-accuracy=0.731771 INFO:root:Epoch[5] Time cost=23.639 INFO:root:Epoch[5] Validation-accuracy=0.720653 INFO:root:Epoch[6] Batch [20] Speed: 2077.08 samples/sec accuracy=0.738839 INFO:root:Epoch[6] Batch [40] Speed: 2097.13 samples/sec accuracy=0.739062 INFO:root:Epoch[6] Batch [60] Speed: 2095.61 samples/sec accuracy=0.737891 INFO:root:Epoch[6] Batch [80] Speed: 2107.52 samples/sec accuracy=0.727344 INFO:root:Epoch[6] Batch [100] Speed: 2108.11 samples/sec accuracy=0.737891 INFO:root:Epoch[6] Batch [120] Speed: 2121.61 samples/sec accuracy=0.723828 INFO:root:Epoch[6] Batch [140] Speed: 2119.10 samples/sec accuracy=0.745703 INFO:root:Epoch[6] Batch [160] Speed: 2108.44 samples/sec accuracy=0.758594 INFO:root:Epoch[6] Batch [180] Speed: 2116.13 samples/sec accuracy=0.731641 INFO:root:Epoch[6] Batch [200] Speed: 2101.70 samples/sec accuracy=0.740234 INFO:root:Epoch[6] Batch [220] Speed: 2108.43 samples/sec accuracy=0.750000 INFO:root:Epoch[6] Batch [240] Speed: 2090.88 samples/sec accuracy=0.749219 INFO:root:Epoch[6] Batch [260] Speed: 2148.97 samples/sec accuracy=0.755469 INFO:root:Epoch[6] Batch [280] Speed: 2122.69 samples/sec accuracy=0.750781 INFO:root:Epoch[6] Batch [300] Speed: 2139.50 samples/sec accuracy=0.756250 INFO:root:Epoch[6] Batch [320] Speed: 2116.02 samples/sec accuracy=0.764844 INFO:root:Epoch[6] Batch [340] Speed: 2111.96 samples/sec accuracy=0.762891 INFO:root:Epoch[6] Batch [360] Speed: 2125.57 samples/sec accuracy=0.762891 INFO:root:Epoch[6] Batch [380] Speed: 2101.92 samples/sec accuracy=0.753906 INFO:root:Epoch[6] Train-accuracy=0.737500 INFO:root:Epoch[6] Time cost=23.664 INFO:root:Epoch[6] Validation-accuracy=0.734776 INFO:root:Epoch[7] Batch [20] Speed: 2133.05 samples/sec accuracy=0.750744 INFO:root:Epoch[7] Batch [40] Speed: 2104.17 samples/sec accuracy=0.758594 INFO:root:Epoch[7] Batch [60] Speed: 2086.90 samples/sec accuracy=0.746094 INFO:root:Epoch[7] Batch [80] Speed: 2095.71 samples/sec accuracy=0.747656 INFO:root:Epoch[7] Batch [100] Speed: 2089.72 samples/sec accuracy=0.756641 INFO:root:Epoch[7] Batch [120] Speed: 2117.08 samples/sec accuracy=0.749609 INFO:root:Epoch[7] Batch [140] Speed: 2111.42 samples/sec accuracy=0.763672 INFO:root:Epoch[7] Batch [160] Speed: 2104.23 samples/sec accuracy=0.769141 INFO:root:Epoch[7] Batch [180] Speed: 2117.99 samples/sec accuracy=0.755078 INFO:root:Epoch[7] Batch [200] Speed: 2121.33 samples/sec accuracy=0.760156 INFO:root:Epoch[7] Batch [220] Speed: 2106.63 samples/sec accuracy=0.758984 INFO:root:Epoch[7] Batch [240] Speed: 2102.36 samples/sec accuracy=0.762891 INFO:root:Epoch[7] Batch [260] Speed: 2096.69 samples/sec accuracy=0.767188 INFO:root:Epoch[7] Batch [280] Speed: 2099.93 samples/sec accuracy=0.766016 INFO:root:Epoch[7] Batch [300] Speed: 2113.95 samples/sec accuracy=0.764062 INFO:root:Epoch[7] Batch [320] Speed: 2103.36 samples/sec accuracy=0.781641 INFO:root:Epoch[7] Batch [340] Speed: 2116.70 samples/sec accuracy=0.769141 INFO:root:Epoch[7] Batch [360] Speed: 2106.37 samples/sec accuracy=0.773438 INFO:root:Epoch[7] Batch [380] Speed: 2109.77 samples/sec accuracy=0.755078 INFO:root:Epoch[7] Train-accuracy=0.771701 INFO:root:Epoch[7] Time cost=23.665 INFO:root:Epoch[7] Validation-accuracy=0.760216 INFO:root:Epoch[8] Batch [20] Speed: 2115.81 samples/sec accuracy=0.774554 INFO:root:Epoch[8] Batch [40] Speed: 2105.95 samples/sec accuracy=0.766016 INFO:root:Epoch[8] Batch [60] Speed: 2109.37 samples/sec accuracy=0.764844 INFO:root:Epoch[8] Batch [80] Speed: 2094.50 samples/sec accuracy=0.771875 INFO:root:Epoch[8] Batch [100] Speed: 2126.98 samples/sec accuracy=0.773828 INFO:root:Epoch[8] Batch [120] Speed: 2119.86 samples/sec accuracy=0.771875 INFO:root:Epoch[8] Batch [140] Speed: 2126.34 samples/sec accuracy=0.779297 INFO:root:Epoch[8] Batch [160] Speed: 2122.38 samples/sec accuracy=0.783203 INFO:root:Epoch[8] Batch [180] Speed: 2080.03 samples/sec accuracy=0.766406 INFO:root:Epoch[8] Batch [200] Speed: 2090.88 samples/sec accuracy=0.775781 INFO:root:Epoch[8] Batch [220] Speed: 2103.53 samples/sec accuracy=0.769531 INFO:root:Epoch[8] Batch [240] Speed: 2096.92 samples/sec accuracy=0.773828 INFO:root:Epoch[8] Batch [260] Speed: 2085.13 samples/sec accuracy=0.780078 INFO:root:Epoch[8] Batch [280] Speed: 2109.69 samples/sec accuracy=0.769922 INFO:root:Epoch[8] Batch [300] Speed: 2122.65 samples/sec accuracy=0.782031 INFO:root:Epoch[8] Batch [320] Speed: 2116.10 samples/sec accuracy=0.785156 INFO:root:Epoch[8] Batch [340] Speed: 2099.77 samples/sec accuracy=0.794141 INFO:root:Epoch[8] Batch [360] Speed: 2117.69 samples/sec accuracy=0.783984 INFO:root:Epoch[8] Batch [380] Speed: 2106.19 samples/sec accuracy=0.776172 INFO:root:Epoch[8] Train-accuracy=0.775781 INFO:root:Epoch[8] Time cost=23.761 INFO:root:Epoch[8] Validation-accuracy=0.761966 INFO:root:Epoch[9] Batch [20] Speed: 2139.77 samples/sec accuracy=0.779762 INFO:root:Epoch[9] Batch [40] Speed: 2131.84 samples/sec accuracy=0.783984 INFO:root:Epoch[9] Batch [60] Speed: 2124.51 samples/sec accuracy=0.780469 INFO:root:Epoch[9] Batch [80] Speed: 2088.46 samples/sec accuracy=0.770703 INFO:root:Epoch[9] Batch [100] Speed: 2116.31 samples/sec accuracy=0.790234 INFO:root:Epoch[9] Batch [120] Speed: 2102.47 samples/sec accuracy=0.766016 INFO:root:Epoch[9] Batch [140] Speed: 2107.50 samples/sec accuracy=0.776953 INFO:root:Epoch[9] Batch [160] Speed: 2107.15 samples/sec accuracy=0.785937 INFO:root:Epoch[9] Batch [180] Speed: 2113.62 samples/sec accuracy=0.778906 INFO:root:Epoch[9] Batch [200] Speed: 2078.67 samples/sec accuracy=0.781250 INFO:root:Epoch[9] Batch [220] Speed: 2064.87 samples/sec accuracy=0.784766 INFO:root:Epoch[9] Batch [240] Speed: 2076.29 samples/sec accuracy=0.788281 INFO:root:Epoch[9] Batch [260] Speed: 2026.08 samples/sec accuracy=0.792969 INFO:root:Epoch[9] Batch [280] Speed: 2076.42 samples/sec accuracy=0.790234 INFO:root:Epoch[9] Batch [300] Speed: 2110.56 samples/sec accuracy=0.782813 INFO:root:Epoch[9] Batch [320] Speed: 2102.36 samples/sec accuracy=0.797656 INFO:root:Epoch[9] Batch [340] Speed: 2107.22 samples/sec accuracy=0.795703 INFO:root:Epoch[9] Batch [360] Speed: 2116.33 samples/sec accuracy=0.796484 INFO:root:Epoch[9] Batch [380] Speed: 2108.32 samples/sec accuracy=0.787891 INFO:root:Epoch[9] Train-accuracy=0.781250 INFO:root:Epoch[9] Time cost=23.804 INFO:root:Epoch[9] Validation-accuracy=0.785457 ``` ### number of GPU=2, precision=FP16 ```bash INFO:root:Epoch[1] Batch [180] Speed: 2901.34 samples/sec accuracy=0.461328 INFO:root:Epoch[1] Train-accuracy=0.453962 INFO:root:Epoch[1] Time cost=17.190 INFO:root:Epoch[1] Validation-accuracy=0.486278 INFO:root:Epoch[2] Batch [20] Speed: 2892.09 samples/sec accuracy=0.457961 INFO:root:Epoch[2] Batch [40] Speed: 2889.07 samples/sec accuracy=0.471289 INFO:root:Epoch[2] Batch [60] Speed: 2893.72 samples/sec accuracy=0.483008 INFO:root:Epoch[2] Batch [80] Speed: 2881.23 samples/sec accuracy=0.487109 INFO:root:Epoch[2] Batch [100] Speed: 2842.24 samples/sec accuracy=0.496680 INFO:root:Epoch[2] Batch [120] Speed: 2865.04 samples/sec accuracy=0.508398 INFO:root:Epoch[2] Batch [140] Speed: 2909.31 samples/sec accuracy=0.521094 INFO:root:Epoch[2] Batch [160] Speed: 2923.13 samples/sec accuracy=0.537500 INFO:root:Epoch[2] Batch [180] Speed: 2864.52 samples/sec accuracy=0.539258 INFO:root:Epoch[2] Train-accuracy=0.527902 INFO:root:Epoch[2] Time cost=17.268 INFO:root:Epoch[2] Validation-accuracy=0.548478 INFO:root:Epoch[3] Batch [20] Speed: 2927.58 samples/sec accuracy=0.545945 INFO:root:Epoch[3] Batch [40] Speed: 2891.57 samples/sec accuracy=0.551172 INFO:root:Epoch[3] Batch [60] Speed: 2883.61 samples/sec accuracy=0.566602 INFO:root:Epoch[3] Batch [80] Speed: 2812.99 samples/sec accuracy=0.560352 INFO:root:Epoch[3] Batch [100] Speed: 2826.40 samples/sec accuracy=0.575781 INFO:root:Epoch[3] Batch [120] Speed: 2849.50 samples/sec accuracy=0.572266 INFO:root:Epoch[3] Batch [140] Speed: 2886.62 samples/sec accuracy=0.587305 INFO:root:Epoch[3] Batch [160] Speed: 2901.33 samples/sec accuracy=0.598437 INFO:root:Epoch[3] Batch [180] Speed: 2897.88 samples/sec accuracy=0.602148 INFO:root:Epoch[3] Train-accuracy=0.592708 INFO:root:Epoch[3] Time cost=17.389 INFO:root:Epoch[3] Validation-accuracy=0.544071 INFO:root:Epoch[4] Batch [20] Speed: 2917.17 samples/sec accuracy=0.616815 INFO:root:Epoch[4] Batch [40] Speed: 2876.62 samples/sec accuracy=0.610352 INFO:root:Epoch[4] Batch [60] Speed: 2913.68 samples/sec accuracy=0.616602 INFO:root:Epoch[4] Batch [80] Speed: 2880.96 samples/sec accuracy=0.614258 INFO:root:Epoch[4] Batch [100] Speed: 2910.84 samples/sec accuracy=0.629297 INFO:root:Epoch[4] Batch [120] Speed: 2890.28 samples/sec accuracy=0.631641 INFO:root:Epoch[4] Batch [140] Speed: 2897.16 samples/sec accuracy=0.636523 INFO:root:Epoch[4] Batch [160] Speed: 2860.85 samples/sec accuracy=0.634570 INFO:root:Epoch[4] Batch [180] Speed: 2844.21 samples/sec accuracy=0.660156 INFO:root:Epoch[4] Train-accuracy=0.644252 INFO:root:Epoch[4] Time cost=17.231 INFO:root:Epoch[4] Validation-accuracy=0.675080 INFO:root:Epoch[5] Batch [20] Speed: 2899.83 samples/sec accuracy=0.665551 INFO:root:Epoch[5] Batch [40] Speed: 2887.09 samples/sec accuracy=0.657422 INFO:root:Epoch[5] Batch [60] Speed: 2883.18 samples/sec accuracy=0.656055 INFO:root:Epoch[5] Batch [80] Speed: 2851.43 samples/sec accuracy=0.670508 INFO:root:Epoch[5] Batch [100] Speed: 2902.55 samples/sec accuracy=0.656445 INFO:root:Epoch[5] Batch [120] Speed: 2861.41 samples/sec accuracy=0.671875 INFO:root:Epoch[5] Batch [140] Speed: 2865.42 samples/sec accuracy=0.675000 INFO:root:Epoch[5] Batch [160] Speed: 2865.88 samples/sec accuracy=0.685937 INFO:root:Epoch[5] Batch [180] Speed: 2878.85 samples/sec accuracy=0.683594 INFO:root:Epoch[5] Train-accuracy=0.675502 INFO:root:Epoch[5] Time cost=17.288 INFO:root:Epoch[5] Validation-accuracy=0.705228 INFO:root:Epoch[6] Batch [20] Speed: 2903.00 samples/sec accuracy=0.704985 INFO:root:Epoch[6] Batch [40] Speed: 2856.29 samples/sec accuracy=0.687695 INFO:root:Epoch[6] Batch [60] Speed: 2859.29 samples/sec accuracy=0.693164 INFO:root:Epoch[6] Batch [80] Speed: 2867.49 samples/sec accuracy=0.699023 INFO:root:Epoch[6] Batch [100] Speed: 2869.12 samples/sec accuracy=0.693555 INFO:root:Epoch[6] Batch [120] Speed: 2891.07 samples/sec accuracy=0.705664 INFO:root:Epoch[6] Batch [140] Speed: 2855.68 samples/sec accuracy=0.703711 INFO:root:Epoch[6] Batch [160] Speed: 2889.05 samples/sec accuracy=0.712109 INFO:root:Epoch[6] Batch [180] Speed: 2860.82 samples/sec accuracy=0.713672 INFO:root:Epoch[6] Train-accuracy=0.705208 INFO:root:Epoch[6] Time cost=17.409 INFO:root:Epoch[6] Validation-accuracy=0.721554 INFO:root:Epoch[7] Batch [20] Speed: 2930.15 samples/sec accuracy=0.725632 INFO:root:Epoch[7] Batch [40] Speed: 2885.11 samples/sec accuracy=0.725195 INFO:root:Epoch[7] Batch [60] Speed: 2901.77 samples/sec accuracy=0.721289 INFO:root:Epoch[7] Batch [80] Speed: 2856.42 samples/sec accuracy=0.719727 INFO:root:Epoch[7] Batch [100] Speed: 2884.57 samples/sec accuracy=0.724805 INFO:root:Epoch[7] Batch [120] Speed: 2859.05 samples/sec accuracy=0.727148 INFO:root:Epoch[7] Batch [140] Speed: 2869.72 samples/sec accuracy=0.729688 INFO:root:Epoch[7] Batch [160] Speed: 2873.91 samples/sec accuracy=0.745117 INFO:root:Epoch[7] Batch [180] Speed: 2900.49 samples/sec accuracy=0.737109 INFO:root:Epoch[7] Train-accuracy=0.730469 INFO:root:Epoch[7] Time cost=17.265 INFO:root:Epoch[7] Validation-accuracy=0.727564 INFO:root:Epoch[8] Batch [20] Speed: 2901.90 samples/sec accuracy=0.742001 INFO:root:Epoch[8] Batch [40] Speed: 2867.76 samples/sec accuracy=0.741211 INFO:root:Epoch[8] Batch [60] Speed: 2873.99 samples/sec accuracy=0.740039 INFO:root:Epoch[8] Batch [80] Speed: 2837.24 samples/sec accuracy=0.747656 INFO:root:Epoch[8] Batch [100] Speed: 2826.82 samples/sec accuracy=0.740234 INFO:root:Epoch[8] Batch [120] Speed: 2885.81 samples/sec accuracy=0.746289 INFO:root:Epoch[8] Batch [140] Speed: 2887.89 samples/sec accuracy=0.743359 INFO:root:Epoch[8] Batch [160] Speed: 2890.34 samples/sec accuracy=0.756250 INFO:root:Epoch[8] Batch [180] Speed: 2844.10 samples/sec accuracy=0.752344 INFO:root:Epoch[8] Train-accuracy=0.744978 INFO:root:Epoch[8] Time cost=17.347 INFO:root:Epoch[8] Validation-accuracy=0.744692 INFO:root:Epoch[9] Batch [20] Speed: 2924.39 samples/sec accuracy=0.757812 INFO:root:Epoch[9] Batch [40] Speed: 2861.19 samples/sec accuracy=0.757227 INFO:root:Epoch[9] Batch [60] Speed: 2849.02 samples/sec accuracy=0.757812 INFO:root:Epoch[9] Batch [80] Speed: 2878.81 samples/sec accuracy=0.758984 INFO:root:Epoch[9] Batch [100] Speed: 2879.53 samples/sec accuracy=0.754492 INFO:root:Epoch[9] Batch [120] Speed: 2878.98 samples/sec accuracy=0.757617 INFO:root:Epoch[9] Batch [140] Speed: 2804.05 samples/sec accuracy=0.778711 INFO:root:Epoch[9] Batch [160] Speed: 2835.32 samples/sec accuracy=0.766406 INFO:root:Epoch[9] Batch [180] Speed: 2855.97 samples/sec accuracy=0.779297 INFO:root:Epoch[9] Train-accuracy=0.763802 INFO:root:Epoch[9] Time cost=17.458 INFO:root:Epoch[9] Validation-accuracy=0.769030 ``` ### number of GPU=2, precision=FP32 ``` INFO:root:Epoch[1] Batch [180] Speed: 4108.27 samples/sec accuracy=0.481250 INFO:root:Epoch[1] Train-accuracy=0.492746 INFO:root:Epoch[1] Time cost=11.935 INFO:root:Epoch[1] Validation-accuracy=0.487580 INFO:root:Epoch[2] Batch [20] Speed: 4256.33 samples/sec accuracy=0.504278 INFO:root:Epoch[2] Batch [40] Speed: 4174.55 samples/sec accuracy=0.507617 INFO:root:Epoch[2] Batch [60] Speed: 4094.32 samples/sec accuracy=0.515234 INFO:root:Epoch[2] Batch [80] Speed: 4169.51 samples/sec accuracy=0.528711 INFO:root:Epoch[2] Batch [100] Speed: 4158.80 samples/sec accuracy=0.541406 INFO:root:Epoch[2] Batch [120] Speed: 4140.68 samples/sec accuracy=0.546094 INFO:root:Epoch[2] Batch [140] Speed: 4215.41 samples/sec accuracy=0.546680 INFO:root:Epoch[2] Batch [160] Speed: 4199.92 samples/sec accuracy=0.555469 INFO:root:Epoch[2] Batch [180] Speed: 4129.76 samples/sec accuracy=0.561719 INFO:root:Epoch[2] Train-accuracy=0.562779 INFO:root:Epoch[2] Time cost=11.962 INFO:root:Epoch[2] Validation-accuracy=0.469752 INFO:root:Epoch[3] Batch [20] Speed: 4167.49 samples/sec accuracy=0.571243 INFO:root:Epoch[3] Batch [40] Speed: 4149.39 samples/sec accuracy=0.575586 INFO:root:Epoch[3] Batch [60] Speed: 4202.28 samples/sec accuracy=0.583789 INFO:root:Epoch[3] Batch [80] Speed: 4158.42 samples/sec accuracy=0.588281 INFO:root:Epoch[3] Batch [100] Speed: 4158.99 samples/sec accuracy=0.586719 INFO:root:Epoch[3] Batch [120] Speed: 4133.70 samples/sec accuracy=0.611523 INFO:root:Epoch[3] Batch [140] Speed: 4082.04 samples/sec accuracy=0.602148 INFO:root:Epoch[3] Batch [160] Speed: 4183.89 samples/sec accuracy=0.615234 INFO:root:Epoch[3] Batch [180] Speed: 4097.57 samples/sec accuracy=0.627734 INFO:root:Epoch[3] Train-accuracy=0.620833 INFO:root:Epoch[3] Time cost=12.055 INFO:root:Epoch[3] Validation-accuracy=0.584135 INFO:root:Epoch[4] Batch [20] Speed: 4252.21 samples/sec accuracy=0.628720 INFO:root:Epoch[4] Batch [40] Speed: 4172.65 samples/sec accuracy=0.631445 INFO:root:Epoch[4] Batch [60] Speed: 4141.98 samples/sec accuracy=0.633203 INFO:root:Epoch[4] Batch [80] Speed: 4102.38 samples/sec accuracy=0.631055 INFO:root:Epoch[4] Batch [100] Speed: 4168.22 samples/sec accuracy=0.640625 INFO:root:Epoch[4] Batch [120] Speed: 4153.58 samples/sec accuracy=0.643555 INFO:root:Epoch[4] Batch [140] Speed: 4149.70 samples/sec accuracy=0.644141 INFO:root:Epoch[4] Batch [160] Speed: 4109.06 samples/sec accuracy=0.661133 INFO:root:Epoch[4] Batch [180] Speed: 4112.76 samples/sec accuracy=0.675195 INFO:root:Epoch[4] Train-accuracy=0.646484 INFO:root:Epoch[4] Time cost=11.994 INFO:root:Epoch[4] Validation-accuracy=0.678085 INFO:root:Epoch[5] Batch [20] Speed: 4265.15 samples/sec accuracy=0.655506 INFO:root:Epoch[5] Batch [40] Speed: 4167.68 samples/sec accuracy=0.666602 INFO:root:Epoch[5] Batch [60] Speed: 4224.17 samples/sec accuracy=0.670312 INFO:root:Epoch[5] Batch [80] Speed: 4152.36 samples/sec accuracy=0.673047 INFO:root:Epoch[5] Batch [100] Speed: 4118.74 samples/sec accuracy=0.667969 INFO:root:Epoch[5] Batch [120] Speed: 4168.40 samples/sec accuracy=0.676172 INFO:root:Epoch[5] Batch [140] Speed: 4214.88 samples/sec accuracy=0.669922 INFO:root:Epoch[5] Batch [160] Speed: 4110.49 samples/sec accuracy=0.682813 INFO:root:Epoch[5] Batch [180] Speed: 4191.58 samples/sec accuracy=0.697852 INFO:root:Epoch[5] Train-accuracy=0.679408 INFO:root:Epoch[5] Time cost=11.921 INFO:root:Epoch[5] Validation-accuracy=0.668069 INFO:root:Epoch[6] Batch [20] Speed: 4213.16 samples/sec accuracy=0.696615 INFO:root:Epoch[6] Batch [40] Speed: 4184.11 samples/sec accuracy=0.699805 INFO:root:Epoch[6] Batch [60] Speed: 4182.60 samples/sec accuracy=0.707812 INFO:root:Epoch[6] Batch [80] Speed: 4106.69 samples/sec accuracy=0.702930 INFO:root:Epoch[6] Batch [100] Speed: 4154.74 samples/sec accuracy=0.700391 INFO:root:Epoch[6] Batch [120] Speed: 4142.70 samples/sec accuracy=0.716406 INFO:root:Epoch[6] Batch [140] Speed: 4195.48 samples/sec accuracy=0.710156 INFO:root:Epoch[6] Batch [160] Speed: 4149.39 samples/sec accuracy=0.704297 INFO:root:Epoch[6] Batch [180] Speed: 4160.08 samples/sec accuracy=0.723633 INFO:root:Epoch[6] Train-accuracy=0.699219 INFO:root:Epoch[6] Time cost=12.015 INFO:root:Epoch[6] Validation-accuracy=0.722456 INFO:root:Epoch[7] Batch [20] Speed: 4174.50 samples/sec accuracy=0.720796 INFO:root:Epoch[7] Batch [40] Speed: 4218.09 samples/sec accuracy=0.720508 INFO:root:Epoch[7] Batch [60] Speed: 4119.26 samples/sec accuracy=0.719531 INFO:root:Epoch[7] Batch [80] Speed: 4173.38 samples/sec accuracy=0.725391 INFO:root:Epoch[7] Batch [100] Speed: 4193.89 samples/sec accuracy=0.724414 INFO:root:Epoch[7] Batch [120] Speed: 4120.34 samples/sec accuracy=0.718164 INFO:root:Epoch[7] Batch [140] Speed: 4215.10 samples/sec accuracy=0.721680 INFO:root:Epoch[7] Batch [160] Speed: 4153.57 samples/sec accuracy=0.739258 INFO:root:Epoch[7] Batch [180] Speed: 4125.48 samples/sec accuracy=0.729492 INFO:root:Epoch[7] Train-accuracy=0.727121 INFO:root:Epoch[7] Time cost=11.963 INFO:root:Epoch[7] Validation-accuracy=0.724259 INFO:root:Epoch[8] Batch [20] Speed: 4191.51 samples/sec accuracy=0.742374 INFO:root:Epoch[8] Batch [40] Speed: 4172.53 samples/sec accuracy=0.738867 INFO:root:Epoch[8] Batch [60] Speed: 4205.16 samples/sec accuracy=0.745508 INFO:root:Epoch[8] Batch [80] Speed: 4183.85 samples/sec accuracy=0.736133 INFO:root:Epoch[8] Batch [100] Speed: 4216.08 samples/sec accuracy=0.732812 INFO:root:Epoch[8] Batch [120] Speed: 4175.85 samples/sec accuracy=0.744727 INFO:root:Epoch[8] Batch [140] Speed: 4132.39 samples/sec accuracy=0.738672 INFO:root:Epoch[8] Batch [160] Speed: 4180.43 samples/sec accuracy=0.747070 INFO:root:Epoch[8] Batch [180] Speed: 4194.63 samples/sec accuracy=0.761133 INFO:root:Epoch[8] Train-accuracy=0.743304 INFO:root:Epoch[8] Time cost=11.893 INFO:root:Epoch[8] Validation-accuracy=0.749599 INFO:root:Epoch[9] Batch [20] Speed: 4208.84 samples/sec accuracy=0.757812 INFO:root:Epoch[9] Batch [40] Speed: 4102.55 samples/sec accuracy=0.757031 INFO:root:Epoch[9] Batch [60] Speed: 4151.69 samples/sec accuracy=0.748828 INFO:root:Epoch[9] Batch [80] Speed: 4148.54 samples/sec accuracy=0.744922 INFO:root:Epoch[9] Batch [100] Speed: 4200.95 samples/sec accuracy=0.740430 INFO:root:Epoch[9] Batch [120] Speed: 3987.34 samples/sec accuracy=0.756836 INFO:root:Epoch[9] Batch [140] Speed: 4161.59 samples/sec accuracy=0.759570 INFO:root:Epoch[9] Batch [160] Speed: 4193.97 samples/sec accuracy=0.761523 INFO:root:Epoch[9] Batch [180] Speed: 4148.27 samples/sec accuracy=0.767188 INFO:root:Epoch[9] Train-accuracy=0.759896 INFO:root:Epoch[9] Time cost=12.076 INFO:root:Epoch[9] Validation-accuracy=0.764724 ``` ### number of GPU=4, precision=FP16 ``` INFO:root:start with arguments Namespace(batch_size=512, benchmark=0, data_nthreads=20, data_train='data/cifar10_train.rec', data_val='data/cifar10_val.rec', disp_batches=20, dtype='float16', gpus='0,1,2,3', image_shape='3,28,28', kv_store='device', load_epoch=None, lr=0.05, lr_factor=0.1, lr_step_epochs='200,250', max_random_aspect_ratio=0, max_random_h=36, max_random_l=50, max_random_rotate_angle=0, max_random_s=50, max_random_scale=1, max_random_shear_ratio=0, min_random_scale=1, model_prefix=None, mom=0.9, monitor=0, network='resnet', num_classes=10, num_epochs=10, num_examples=50000, num_layers=110, optimizer='sgd', pad_size=4, random_crop=1, random_mirror=1, rgb_mean='123.68,116.779,103.939', test_io=0, top_k=0, wd=0.0001) [07:53:21] src/io/iter_image_recordio_3.cc:143: ImageRecordIOParser3: data/cifar10_train.rec, use 20 threads for decoding.. [07:53:22] src/io/iter_image_recordio_3.cc:143: ImageRecordIOParser3: data/cifar10_val.rec, use 20 threads for decoding.. [07:53:24] src/operator/././cudnn_algoreg-inl.h:112: Running performance tests to find the best convolution algorithm, this can take a while... (setting env variable MXNET_CUDNN_AUTOTUNE_DEFAULT to 0 to disable) INFO:root:Epoch[0] Batch [20] Speed: 5869.17 samples/sec accuracy=0.156994 INFO:root:Epoch[0] Batch [40] Speed: 5649.03 samples/sec accuracy=0.219434 INFO:root:Epoch[0] Batch [60] Speed: 5595.01 samples/sec accuracy=0.242285 INFO:root:Epoch[0] Batch [80] Speed: 5660.61 samples/sec accuracy=0.279590 INFO:root:Epoch[0] Train-accuracy=0.308824 INFO:root:Epoch[0] Time cost=10.753 INFO:root:Epoch[0] Validation-accuracy=0.340625 INFO:root:Epoch[1] Batch [20] Speed: 5831.55 samples/sec accuracy=0.321522 INFO:root:Epoch[1] Batch [40] Speed: 5721.23 samples/sec accuracy=0.342187 INFO:root:Epoch[1] Batch [60] Speed: 5734.57 samples/sec accuracy=0.354492 INFO:root:Epoch[1] Batch [80] Speed: 5734.63 samples/sec accuracy=0.374316 INFO:root:Epoch[1] Train-accuracy=0.396829 INFO:root:Epoch[1] Time cost=8.676 INFO:root:Epoch[1] Validation-accuracy=0.391406 INFO:root:Epoch[2] Batch [20] Speed: 5860.26 samples/sec accuracy=0.414062 INFO:root:Epoch[2] Batch [40] Speed: 5771.02 samples/sec accuracy=0.443457 INFO:root:Epoch[2] Batch [60] Speed: 5762.90 samples/sec accuracy=0.447461 INFO:root:Epoch[2] Batch [80] Speed: 5663.11 samples/sec accuracy=0.471582 INFO:root:Epoch[2] Train-accuracy=0.488525 INFO:root:Epoch[2] Time cost=8.582 INFO:root:Epoch[2] Validation-accuracy=0.509046 INFO:root:Epoch[3] Batch [20] Speed: 5853.16 samples/sec accuracy=0.497861 INFO:root:Epoch[3] Batch [40] Speed: 5662.32 samples/sec accuracy=0.522461 INFO:root:Epoch[3] Batch [60] Speed: 5706.18 samples/sec accuracy=0.525488 INFO:root:Epoch[3] Batch [80] Speed: 5753.41 samples/sec accuracy=0.539746 INFO:root:Epoch[3] Train-accuracy=0.553424 INFO:root:Epoch[3] Time cost=8.696 INFO:root:Epoch[3] Validation-accuracy=0.552832 INFO:root:Epoch[4] Batch [20] Speed: 5874.70 samples/sec accuracy=0.572824 INFO:root:Epoch[4] Batch [40] Speed: 5722.39 samples/sec accuracy=0.567676 INFO:root:Epoch[4] Batch [60] Speed: 5766.61 samples/sec accuracy=0.585547 INFO:root:Epoch[4] Batch [80] Speed: 5772.43 samples/sec accuracy=0.589551 INFO:root:Epoch[4] Train-accuracy=0.596737 INFO:root:Epoch[4] Time cost=8.645 INFO:root:Epoch[4] Validation-accuracy=0.603824 INFO:root:Epoch[5] Batch [20] Speed: 5872.05 samples/sec accuracy=0.604539 INFO:root:Epoch[5] Batch [40] Speed: 5729.62 samples/sec accuracy=0.618555 INFO:root:Epoch[5] Batch [60] Speed: 5723.69 samples/sec accuracy=0.617188 INFO:root:Epoch[5] Batch [80] Speed: 5740.70 samples/sec accuracy=0.622559 INFO:root:Epoch[5] Train-accuracy=0.631836 INFO:root:Epoch[5] Time cost=8.578 INFO:root:Epoch[5] Validation-accuracy=0.659375 INFO:root:Epoch[6] Batch [20] Speed: 5741.80 samples/sec accuracy=0.642392 INFO:root:Epoch[6] Batch [40] Speed: 5627.22 samples/sec accuracy=0.650781 INFO:root:Epoch[6] Batch [60] Speed: 5734.45 samples/sec accuracy=0.644336 INFO:root:Epoch[6] Batch [80] Speed: 5764.20 samples/sec accuracy=0.660254 INFO:root:Epoch[6] Train-accuracy=0.660386 INFO:root:Epoch[6] Time cost=8.735 INFO:root:Epoch[6] Validation-accuracy=0.675164 INFO:root:Epoch[7] Batch [20] Speed: 5848.09 samples/sec accuracy=0.668341 INFO:root:Epoch[7] Batch [40] Speed: 5731.51 samples/sec accuracy=0.680273 INFO:root:Epoch[7] Batch [60] Speed: 5722.21 samples/sec accuracy=0.681641 INFO:root:Epoch[7] Batch [80] Speed: 5731.36 samples/sec accuracy=0.692578 INFO:root:Epoch[7] Train-accuracy=0.694393 INFO:root:Epoch[7] Time cost=8.680 INFO:root:Epoch[7] Validation-accuracy=0.710352 INFO:root:Epoch[8] Batch [20] Speed: 5810.60 samples/sec accuracy=0.694289 INFO:root:Epoch[8] Batch [40] Speed: 5724.36 samples/sec accuracy=0.701367 INFO:root:Epoch[8] Batch [60] Speed: 5688.34 samples/sec accuracy=0.699414 INFO:root:Epoch[8] Batch [80] Speed: 5720.14 samples/sec accuracy=0.704492 INFO:root:Epoch[8] Train-accuracy=0.710449 INFO:root:Epoch[8] Time cost=8.626 INFO:root:Epoch[8] Validation-accuracy=0.693565 INFO:root:Epoch[9] Batch [20] Speed: 5819.16 samples/sec accuracy=0.716146 INFO:root:Epoch[9] Batch [40] Speed: 5729.44 samples/sec accuracy=0.716211 INFO:root:Epoch[9] Batch [60] Speed: 5707.16 samples/sec accuracy=0.711621 INFO:root:Epoch[9] Batch [80] Speed: 5734.86 samples/sec accuracy=0.723437 INFO:root:Epoch[9] Train-accuracy=0.729665 INFO:root:Epoch[9] Time cost=8.714 INFO:root:Epoch[9] Validation-accuracy=0.754590 ``` ### number of GPU=4, precision=FP32 ```bash INFO:root:start with arguments Namespace(batch_size=512, benchmark=0, data_nthreads=20, data_train='data/cifar10_train.rec', data_val='data/cifar10_val.rec', disp_batches=20, dtype='float32', gpus='0,1,2,3', image_shape='3,28,28', kv_store='device', load_epoch=None, lr=0.05, lr_factor=0.1, lr_step_epochs='200,250', max_random_aspect_ratio=0, max_random_h=36, max_random_l=50, max_random_rotate_angle=0, max_random_s=50, max_random_scale=1, max_random_shear_ratio=0, min_random_scale=1, model_prefix=None, mom=0.9, monitor=0, network='resnet', num_classes=10, num_epochs=10, num_examples=50000, num_layers=110, optimizer='sgd', pad_size=4, random_crop=1, random_mirror=1, rgb_mean='123.68,116.779,103.939', test_io=0, top_k=0, wd=0.0001) [08:03:14] src/io/iter_image_recordio_3.cc:143: ImageRecordIOParser3: data/cifar10_train.rec, use 20 threads for decoding.. [08:03:15] src/io/iter_image_recordio_3.cc:143: ImageRecordIOParser3: data/cifar10_val.rec, use 20 threads for decoding.. [08:03:17] src/operator/././cudnn_algoreg-inl.h:112: Running performance tests to find the best convolution algorithm, this can take a while... (setting env variable MXNET_CUDNN_AUTOTUNE_DEFAULT to 0 to disable) INFO:root:Epoch[0] Batch [20] Speed: 7792.99 samples/sec accuracy=0.160528 INFO:root:Epoch[0] Batch [40] Speed: 7838.93 samples/sec accuracy=0.226758 INFO:root:Epoch[0] Batch [60] Speed: 7963.03 samples/sec accuracy=0.265039 INFO:root:Epoch[0] Batch [80] Speed: 7699.20 samples/sec accuracy=0.283105 INFO:root:Epoch[0] Train-accuracy=0.318819 INFO:root:Epoch[0] Time cost=8.336 INFO:root:Epoch[0] Validation-accuracy=0.371680 INFO:root:Epoch[1] Batch [20] Speed: 8071.82 samples/sec accuracy=0.324126 INFO:root:Epoch[1] Batch [40] Speed: 7949.64 samples/sec accuracy=0.342969 INFO:root:Epoch[1] Batch [60] Speed: 8013.45 samples/sec accuracy=0.359863 INFO:root:Epoch[1] Batch [80] Speed: 8036.81 samples/sec accuracy=0.374219 INFO:root:Epoch[1] Train-accuracy=0.400620 INFO:root:Epoch[1] Time cost=6.252 INFO:root:Epoch[1] Validation-accuracy=0.401660 INFO:root:Epoch[2] Batch [20] Speed: 8086.74 samples/sec accuracy=0.418155 INFO:root:Epoch[2] Batch [40] Speed: 8035.85 samples/sec accuracy=0.432715 INFO:root:Epoch[2] Batch [60] Speed: 8020.24 samples/sec accuracy=0.451660 INFO:root:Epoch[2] Batch [80] Speed: 7907.26 samples/sec accuracy=0.477832 INFO:root:Epoch[2] Train-accuracy=0.498901 INFO:root:Epoch[2] Time cost=6.185 INFO:root:Epoch[2] Validation-accuracy=0.520868 INFO:root:Epoch[3] Batch [20] Speed: 8110.95 samples/sec accuracy=0.508650 INFO:root:Epoch[3] Batch [40] Speed: 7937.30 samples/sec accuracy=0.520703 INFO:root:Epoch[3] Batch [60] Speed: 7991.88 samples/sec accuracy=0.526172 INFO:root:Epoch[3] Batch [80] Speed: 8141.97 samples/sec accuracy=0.545117 INFO:root:Epoch[3] Train-accuracy=0.548369 INFO:root:Epoch[3] Time cost=6.233 INFO:root:Epoch[3] Validation-accuracy=0.542285 INFO:root:Epoch[4] Batch [20] Speed: 8235.56 samples/sec accuracy=0.557664 INFO:root:Epoch[4] Batch [40] Speed: 8099.37 samples/sec accuracy=0.570703 INFO:root:Epoch[4] Batch [60] Speed: 8148.82 samples/sec accuracy=0.586328 INFO:root:Epoch[4] Batch [80] Speed: 8108.18 samples/sec accuracy=0.579980 INFO:root:Epoch[4] Train-accuracy=0.592831 INFO:root:Epoch[4] Time cost=6.150 INFO:root:Epoch[4] Validation-accuracy=0.591180 INFO:root:Epoch[5] Batch [20] Speed: 8158.63 samples/sec accuracy=0.608352 INFO:root:Epoch[5] Batch [40] Speed: 8092.86 samples/sec accuracy=0.609473 INFO:root:Epoch[5] Batch [60] Speed: 7983.87 samples/sec accuracy=0.605664 INFO:root:Epoch[5] Batch [80] Speed: 7940.05 samples/sec accuracy=0.623535 INFO:root:Epoch[5] Train-accuracy=0.632935 INFO:root:Epoch[5] Time cost=6.176 INFO:root:Epoch[5] Validation-accuracy=0.659668 INFO:root:Epoch[6] Batch [20] Speed: 8133.43 samples/sec accuracy=0.641369 INFO:root:Epoch[6] Batch [40] Speed: 8062.66 samples/sec accuracy=0.640723 INFO:root:Epoch[6] Batch [60] Speed: 8084.93 samples/sec accuracy=0.645215 INFO:root:Epoch[6] Batch [80] Speed: 7951.90 samples/sec accuracy=0.642578 INFO:root:Epoch[6] Train-accuracy=0.655676 INFO:root:Epoch[6] Time cost=6.233 INFO:root:Epoch[6] Validation-accuracy=0.681332 INFO:root:Epoch[7] Batch [20] Speed: 8137.12 samples/sec accuracy=0.662016 INFO:root:Epoch[7] Batch [40] Speed: 8013.96 samples/sec accuracy=0.670508 INFO:root:Epoch[7] Batch [60] Speed: 8017.48 samples/sec accuracy=0.661426 INFO:root:Epoch[7] Batch [80] Speed: 8089.57 samples/sec accuracy=0.665625 INFO:root:Epoch[7] Train-accuracy=0.684168 INFO:root:Epoch[7] Time cost=6.210 INFO:root:Epoch[7] Validation-accuracy=0.656836 INFO:root:Epoch[8] Batch [20] Speed: 8220.95 samples/sec accuracy=0.683966 INFO:root:Epoch[8] Batch [40] Speed: 8046.88 samples/sec accuracy=0.686426 INFO:root:Epoch[8] Batch [60] Speed: 7931.11 samples/sec accuracy=0.691016 INFO:root:Epoch[8] Batch [80] Speed: 7872.08 samples/sec accuracy=0.687305 INFO:root:Epoch[8] Train-accuracy=0.703369 INFO:root:Epoch[8] Time cost=6.195 INFO:root:Epoch[8] Validation-accuracy=0.726665 INFO:root:Epoch[9] Batch [20] Speed: 8121.11 samples/sec accuracy=0.700149 INFO:root:Epoch[9] Batch [40] Speed: 7949.63 samples/sec accuracy=0.705078 INFO:root:Epoch[9] Batch [60] Speed: 7793.96 samples/sec accuracy=0.702246 INFO:root:Epoch[9] Batch [80] Speed: 8001.58 samples/sec accuracy=0.713281 INFO:root:Epoch[9] Train-accuracy=0.718176 INFO:root:Epoch[9] Time cost=6.273 INFO:root:Epoch[9] Validation-accuracy=0.721387 ``` ### number of GPU=8, precision=FP16 ```bash INFO:root:start with arguments Namespace(batch_size=1024, benchmark=0, data_nthreads=20, data_train='data/cifar10_train.rec', data_val='data/cifar10_val.rec', disp_batches=20, dtype='float16', gpus='0,1,2,3,4,5,6,7', image_shape='3,28,28', kv_store='device', load_epoch=None, lr=0.05, lr_factor=0.1, lr_step_epochs='200,250', max_random_aspect_ratio=0, max_random_h=36, max_random_l=50, max_random_rotate_angle=0, max_random_s=50, max_random_scale=1, max_random_shear_ratio=0, min_random_scale=1, model_prefix=None, mom=0.9, monitor=0, network='resnet', num_classes=10, num_epochs=10, num_examples=50000, num_layers=110, optimizer='sgd', pad_size=4, random_crop=1, random_mirror=1, rgb_mean='123.68,116.779,103.939', test_io=0, top_k=0, wd=0.0001) [07:55:10] src/io/iter_image_recordio_3.cc:143: ImageRecordIOParser3: data/cifar10_train.rec, use 20 threads for decoding.. [07:55:12] src/io/iter_image_recordio_3.cc:143: ImageRecordIOParser3: data/cifar10_val.rec, use 20 threads for decoding.. [07:55:13] src/operator/././cudnn_algoreg-inl.h:112: Running performance tests to find the best convolution algorithm, this can take a while... (setting env variable MXNET_CUDNN_AUTOTUNE_DEFAULT to 0 to disable) [07:55:36] src/kvstore/././comm.h:327: only 32 out of 56 GPU pairs are enabled direct access. It may affect the performance. You can set MXNET_ENABLE_GPU_P2P=0 to turn it off [07:55:36] src/kvstore/././comm.h:336: .vvvv... [07:55:36] src/kvstore/././comm.h:336: v.vv.v.. [07:55:36] src/kvstore/././comm.h:336: vv.v..v. [07:55:36] src/kvstore/././comm.h:336: vvv....v [07:55:36] src/kvstore/././comm.h:336: v....vvv [07:55:36] src/kvstore/././comm.h:336: .v..v.vv [07:55:36] src/kvstore/././comm.h:336: ..v.vv.v [07:55:36] src/kvstore/././comm.h:336: ...vvvv. INFO:root:Epoch[0] Batch [20] Speed: 9099.72 samples/sec accuracy=0.149414 INFO:root:Epoch[0] Batch [40] Speed: 8787.35 samples/sec accuracy=0.227100 INFO:root:Epoch[0] Train-accuracy=0.260010 INFO:root:Epoch[0] Time cost=9.538 INFO:root:Epoch[0] Validation-accuracy=0.285938 INFO:root:Epoch[1] Batch [20] Speed: 10278.64 samples/sec accuracy=0.279204 INFO:root:Epoch[1] Batch [40] Speed: 10534.04 samples/sec accuracy=0.305225 INFO:root:Epoch[1] Train-accuracy=0.325928 INFO:root:Epoch[1] Time cost=4.795 INFO:root:Epoch[1] Validation-accuracy=0.384668 INFO:root:Epoch[2] Batch [20] Speed: 9058.14 samples/sec accuracy=0.334356 INFO:root:Epoch[2] Batch [40] Speed: 8510.44 samples/sec accuracy=0.354248 INFO:root:Epoch[2] Train-accuracy=0.373047 INFO:root:Epoch[2] Time cost=5.696 INFO:root:Epoch[2] Validation-accuracy=0.431250 INFO:root:Epoch[3] Batch [20] Speed: 9325.01 samples/sec accuracy=0.386021 INFO:root:Epoch[3] Batch [40] Speed: 9031.91 samples/sec accuracy=0.404883 INFO:root:Epoch[3] Train-accuracy=0.416748 INFO:root:Epoch[3] Time cost=5.329 INFO:root:Epoch[3] Validation-accuracy=0.462348 INFO:root:Epoch[4] Batch [20] Speed: 10653.77 samples/sec accuracy=0.438802 INFO:root:Epoch[4] Batch [40] Speed: 10626.94 samples/sec accuracy=0.453174 INFO:root:Epoch[4] Train-accuracy=0.478638 INFO:root:Epoch[4] Time cost=4.795 INFO:root:Epoch[4] Validation-accuracy=0.491309 INFO:root:Epoch[5] Batch [20] Speed: 11147.46 samples/sec accuracy=0.481166 INFO:root:Epoch[5] Batch [40] Speed: 9916.45 samples/sec accuracy=0.503662 INFO:root:Epoch[5] Train-accuracy=0.521484 INFO:root:Epoch[5] Time cost=4.739 INFO:root:Epoch[5] Validation-accuracy=0.574902 INFO:root:Epoch[6] Batch [20] Speed: 11193.83 samples/sec accuracy=0.527669 INFO:root:Epoch[6] Batch [40] Speed: 9221.36 samples/sec accuracy=0.543994 INFO:root:Epoch[6] Train-accuracy=0.558105 INFO:root:Epoch[6] Time cost=5.164 INFO:root:Epoch[6] Validation-accuracy=0.594531 INFO:root:Epoch[7] Batch [20] Speed: 10108.07 samples/sec accuracy=0.564779 INFO:root:Epoch[7] Batch [40] Speed: 8547.51 samples/sec accuracy=0.575293 INFO:root:Epoch[7] Train-accuracy=0.590820 INFO:root:Epoch[7] Time cost=5.365 INFO:root:Epoch[7] Validation-accuracy=0.591905 INFO:root:Epoch[8] Batch [20] Speed: 10000.07 samples/sec accuracy=0.597331 INFO:root:Epoch[8] Batch [40] Speed: 9367.33 samples/sec accuracy=0.601953 INFO:root:Epoch[8] Train-accuracy=0.617188 INFO:root:Epoch[8] Time cost=5.084 INFO:root:Epoch[8] Validation-accuracy=0.618164 INFO:root:Epoch[9] Batch [20] Speed: 9902.11 samples/sec accuracy=0.617141 INFO:root:Epoch[9] Batch [40] Speed: 10787.76 samples/sec accuracy=0.631250 INFO:root:Epoch[9] Train-accuracy=0.634888 INFO:root:Epoch[9] Time cost=4.801 INFO:root:Epoch[9] Validation-accuracy=0.664551 ``` ### number of GPU=8, precision=FP32 ```bash INFO:root:start with arguments Namespace(batch_size=1024, benchmark=0, data_nthreads=20, data_train='data/cifar10_train.rec', data_val='data/cifar10_val.rec', disp_batches=20, dtype='float32', gpus='0,1,2,3,4,5,6,7', image_shape='3,28,28', kv_store='device', load_epoch=None, lr=0.05, lr_factor=0.1, lr_step_epochs='200,250', max_random_aspect_ratio=0, max_random_h=36, max_random_l=50, max_random_rotate_angle=0, max_random_s=50, max_random_scale=1, max_random_shear_ratio=0, min_random_scale=1, model_prefix=None, mom=0.9, monitor=0, network='resnet', num_classes=10, num_epochs=10, num_examples=50000, num_layers=110, optimizer='sgd', pad_size=4, random_crop=1, random_mirror=1, rgb_mean='123.68,116.779,103.939', test_io=0, top_k=0, wd=0.0001) [08:04:39] src/io/iter_image_recordio_3.cc:143: ImageRecordIOParser3: data/cifar10_train.rec, use 20 threads for decoding.. [08:04:40] src/io/iter_image_recordio_3.cc:143: ImageRecordIOParser3: data/cifar10_val.rec, use 20 threads for decoding.. [08:04:42] src/operator/././cudnn_algoreg-inl.h:112: Running performance tests to find the best convolution algorithm, this can take a while... (setting env variable MXNET_CUDNN_AUTOTUNE_DEFAULT to 0 to disable) [08:05:04] src/kvstore/././comm.h:327: only 32 out of 56 GPU pairs are enabled direct access. It may affect the performance. You can set MXNET_ENABLE_GPU_P2P=0 to turn it off [08:05:04] src/kvstore/././comm.h:336: .vvvv... [08:05:04] src/kvstore/././comm.h:336: v.vv.v.. [08:05:04] src/kvstore/././comm.h:336: vv.v..v. [08:05:04] src/kvstore/././comm.h:336: vvv....v [08:05:04] src/kvstore/././comm.h:336: v....vvv [08:05:04] src/kvstore/././comm.h:336: .v..v.vv [08:05:04] src/kvstore/././comm.h:336: ..v.vv.v [08:05:04] src/kvstore/././comm.h:336: ...vvvv. INFO:root:Epoch[0] Batch [20] Speed: 8792.64 samples/sec accuracy=0.166574 INFO:root:Epoch[0] Batch [40] Speed: 8230.37 samples/sec accuracy=0.235449 INFO:root:Epoch[0] Train-accuracy=0.281250 INFO:root:Epoch[0] Time cost=9.923 INFO:root:Epoch[0] Validation-accuracy=0.316797 INFO:root:Epoch[1] Batch [20] Speed: 8755.05 samples/sec accuracy=0.287853 INFO:root:Epoch[1] Batch [40] Speed: 8124.35 samples/sec accuracy=0.309033 INFO:root:Epoch[1] Train-accuracy=0.321533 INFO:root:Epoch[1] Time cost=6.121 INFO:root:Epoch[1] Validation-accuracy=0.371191 INFO:root:Epoch[2] Batch [20] Speed: 9372.24 samples/sec accuracy=0.341890 INFO:root:Epoch[2] Batch [40] Speed: 8558.61 samples/sec accuracy=0.366895 INFO:root:Epoch[2] Train-accuracy=0.384155 INFO:root:Epoch[2] Time cost=5.567 INFO:root:Epoch[2] Validation-accuracy=0.427734 INFO:root:Epoch[3] Batch [20] Speed: 9234.61 samples/sec accuracy=0.389462 INFO:root:Epoch[3] Batch [40] Speed: 8844.03 samples/sec accuracy=0.408984 INFO:root:Epoch[3] Train-accuracy=0.445801 INFO:root:Epoch[3] Time cost=5.774 INFO:root:Epoch[3] Validation-accuracy=0.471788 INFO:root:Epoch[4] Batch [20] Speed: 8650.64 samples/sec accuracy=0.450893 INFO:root:Epoch[4] Batch [40] Speed: 8271.18 samples/sec accuracy=0.472705 INFO:root:Epoch[4] Train-accuracy=0.481689 INFO:root:Epoch[4] Time cost=5.792 INFO:root:Epoch[4] Validation-accuracy=0.540332 INFO:root:Epoch[5] Batch [20] Speed: 10215.29 samples/sec accuracy=0.502744 INFO:root:Epoch[5] Batch [40] Speed: 9261.69 samples/sec accuracy=0.517627 INFO:root:Epoch[5] Train-accuracy=0.531110 INFO:root:Epoch[5] Time cost=5.148 INFO:root:Epoch[5] Validation-accuracy=0.554688 INFO:root:Epoch[6] Batch [20] Speed: 9040.84 samples/sec accuracy=0.538644 INFO:root:Epoch[6] Batch [40] Speed: 10600.00 samples/sec accuracy=0.549365 INFO:root:Epoch[6] Train-accuracy=0.572632 INFO:root:Epoch[6] Time cost=5.222 INFO:root:Epoch[6] Validation-accuracy=0.586523 INFO:root:Epoch[7] Batch [20] Speed: 10178.73 samples/sec accuracy=0.573382 INFO:root:Epoch[7] Batch [40] Speed: 8490.75 samples/sec accuracy=0.578857 INFO:root:Epoch[7] Train-accuracy=0.585815 INFO:root:Epoch[7] Time cost=5.508 INFO:root:Epoch[7] Validation-accuracy=0.616970 INFO:root:Epoch[8] Batch [20] Speed: 9280.62 samples/sec accuracy=0.597284 INFO:root:Epoch[8] Batch [40] Speed: 9454.47 samples/sec accuracy=0.603320 INFO:root:Epoch[8] Train-accuracy=0.615601 INFO:root:Epoch[8] Time cost=5.211 INFO:root:Epoch[8] Validation-accuracy=0.626270 INFO:root:Epoch[9] Batch [20] Speed: 8527.77 samples/sec accuracy=0.620908 INFO:root:Epoch[9] Batch [40] Speed: 9799.97 samples/sec accuracy=0.630762 INFO:root:Epoch[9] Train-accuracy=0.629395 INFO:root:Epoch[9] Time cost=5.348 INFO:root:Epoch[9] Validation-accuracy=0.617383 ``` ## monitor the GPU status while training ### 1X GPU, FP16 ```bash Wed Oct 4 21:03:44 2017 +-----------------------------------------------------------------------------+ | NVIDIA-SMI 384.81 Driver Version: 384.81 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | |===============================+======================+======================| | 0 Tesla V100-SXM2... On | 00000000:06:00.0 Off | 0 | | N/A 57C P0 142W / 300W | 1498MiB / 16152MiB | 73% Default | +-------------------------------+----------------------+----------------------+ | 1 Tesla V100-SXM2... On | 00000000:07:00.0 Off | 0 | | N/A 49C P0 44W / 300W | 10MiB / 16152MiB | 0% Default | +-------------------------------+----------------------+----------------------+ | 2 Tesla V100-SXM2... On | 00000000:0A:00.0 Off | 0 | | N/A 47C P0 45W / 300W | 10MiB / 16152MiB | 0% Default | +-------------------------------+----------------------+----------------------+ | 3 Tesla V100-SXM2... On | 00000000:0B:00.0 Off | 0 | | N/A 47C P0 45W / 300W | 10MiB / 16152MiB | 0% Default | +-------------------------------+----------------------+----------------------+ | 4 Tesla V100-SXM2... On | 00000000:85:00.0 Off | 0 | | N/A 48C P0 44W / 300W | 10MiB / 16152MiB | 0% Default | +-------------------------------+----------------------+----------------------+ | 5 Tesla V100-SXM2... On | 00000000:86:00.0 Off | 0 | | N/A 47C P0 43W / 300W | 10MiB / 16152MiB | 0% Default | +-------------------------------+----------------------+----------------------+ | 6 Tesla V100-SXM2... On | 00000000:89:00.0 Off | 0 | | N/A 48C P0 44W / 300W | 10MiB / 16152MiB | 0% Default | +-------------------------------+----------------------+----------------------+ | 7 Tesla V100-SXM2... On | 00000000:8A:00.0 Off | 0 | | N/A 47C P0 47W / 300W | 10MiB / 16152MiB | 0% Default | +-------------------------------+----------------------+----------------------+ +-----------------------------------------------------------------------------+ | Processes: GPU Memory | | GPU PID Type Process name Usage | |=============================================================================| | 0 34472 C python 1488MiB | +-----------------------------------------------------------------------------+ ``` ### 2X GPU, FP16 ```bash +-----------------------------------------------------------------------------+ | NVIDIA-SMI 384.81 Driver Version: 384.81 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | |===============================+======================+======================| | 0 Tesla V100-SXM2... On | 00000000:06:00.0 Off | 0 | | N/A 59C P0 160W / 300W | 1492MiB / 16152MiB | 70% Default | +-------------------------------+----------------------+----------------------+ | 1 Tesla V100-SXM2... On | 00000000:07:00.0 Off | 0 | | N/A 61C P0 156W / 300W | 1494MiB / 16152MiB | 69% Default | +-------------------------------+----------------------+----------------------+ | 2 Tesla V100-SXM2... On | 00000000:0A:00.0 Off | 0 | | N/A 47C P0 45W / 300W | 10MiB / 16152MiB | 0% Default | +-------------------------------+----------------------+----------------------+ | 3 Tesla V100-SXM2... On | 00000000:0B:00.0 Off | 0 | | N/A 47C P0 45W / 300W | 10MiB / 16152MiB | 0% Default | +-------------------------------+----------------------+----------------------+ | 4 Tesla V100-SXM2... On | 00000000:85:00.0 Off | 0 | | N/A 49C P0 44W / 300W | 10MiB / 16152MiB | 0% Default | +-------------------------------+----------------------+----------------------+ | 5 Tesla V100-SXM2... On | 00000000:86:00.0 Off | 0 | | N/A 48C P0 43W / 300W | 10MiB / 16152MiB | 0% Default | +-------------------------------+----------------------+----------------------+ | 6 Tesla V100-SXM2... On | 00000000:89:00.0 Off | 0 | | N/A 48C P0 44W / 300W | 10MiB / 16152MiB | 0% Default | +-------------------------------+----------------------+----------------------+ | 7 Tesla V100-SXM2... On | 00000000:8A:00.0 Off | 0 | | N/A 47C P0 47W / 300W | 10MiB / 16152MiB | 0% Default | +-------------------------------+----------------------+----------------------+ +-----------------------------------------------------------------------------+ | Processes: GPU Memory | | GPU PID Type Process name Usage | |=============================================================================| | 0 35486 C python 1482MiB | | 1 35486 C python 1484MiB | +-----------------------------------------------------------------------------+ ``` ### 1X GPU, FP32 ```bash Wed Oct 4 21:14:44 2017 +-----------------------------------------------------------------------------+ | NVIDIA-SMI 384.81 Driver Version: 384.81 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | |===============================+======================+======================| | 0 Tesla V100-SXM2... On | 00000000:06:00.0 Off | 0 | | N/A 62C P0 174W / 300W | 2172MiB / 16152MiB | 67% Default | +-------------------------------+----------------------+----------------------+ | 1 Tesla V100-SXM2... On | 00000000:07:00.0 Off | 0 | | N/A 51C P0 45W / 300W | 10MiB / 16152MiB | 0% Default | +-------------------------------+----------------------+----------------------+ | 2 Tesla V100-SXM2... On | 00000000:0A:00.0 Off | 0 | | N/A 50C P0 46W / 300W | 10MiB / 16152MiB | 0% Default | +-------------------------------+----------------------+----------------------+ | 3 Tesla V100-SXM2... On | 00000000:0B:00.0 Off | 0 | | N/A 51C P0 63W / 300W | 10MiB / 16152MiB | 0% Default | +-------------------------------+----------------------+----------------------+ | 4 Tesla V100-SXM2... On | 00000000:85:00.0 Off | 0 | | N/A 50C P0 45W / 300W | 10MiB / 16152MiB | 0% Default | +-------------------------------+----------------------+----------------------+ | 5 Tesla V100-SXM2... On | 00000000:86:00.0 Off | 0 | | N/A 50C P0 43W / 300W | 10MiB / 16152MiB | 0% Default | +-------------------------------+----------------------+----------------------+ | 6 Tesla V100-SXM2... On | 00000000:89:00.0 Off | 0 | | N/A 52C P0 60W / 300W | 10MiB / 16152MiB | 0% Default | +-------------------------------+----------------------+----------------------+ | 7 Tesla V100-SXM2... On | 00000000:8A:00.0 Off | 0 | | N/A 48C P0 48W / 300W | 10MiB / 16152MiB | 0% Default | +-------------------------------+----------------------+----------------------+ +-----------------------------------------------------------------------------+ | Processes: GPU Memory | | GPU PID Type Process name Usage | |=============================================================================| | 0 37137 C python 2162MiB | +-----------------------------------------------------------------------------+ ``` ### 2X GPU, FP32 ```bash +-----------------------------------------------------------------------------+ | NVIDIA-SMI 384.81 Driver Version: 384.81 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | |===============================+======================+======================| | 0 Tesla V100-SXM2... On | 00000000:06:00.0 Off | 0 | | N/A 63C P0 168W / 300W | 2172MiB / 16152MiB | 65% Default | +-------------------------------+----------------------+----------------------+ | 1 Tesla V100-SXM2... On | 00000000:07:00.0 Off | 0 | | N/A 63C P0 182W / 300W | 2172MiB / 16152MiB | 65% Default | +-------------------------------+----------------------+----------------------+ | 2 Tesla V100-SXM2... On | 00000000:0A:00.0 Off | 0 | | N/A 48C P0 45W / 300W | 10MiB / 16152MiB | 0% Default | +-------------------------------+----------------------+----------------------+ | 3 Tesla V100-SXM2... On | 00000000:0B:00.0 Off | 0 | | N/A 48C P0 45W / 300W | 10MiB / 16152MiB | 0% Default | +-------------------------------+----------------------+----------------------+ | 4 Tesla V100-SXM2... On | 00000000:85:00.0 Off | 0 | | N/A 49C P0 45W / 300W | 10MiB / 16152MiB | 0% Default | +-------------------------------+----------------------+----------------------+ | 5 Tesla V100-SXM2... On | 00000000:86:00.0 Off | 0 | | N/A 48C P0 43W / 300W | 10MiB / 16152MiB | 0% Default | +-------------------------------+----------------------+----------------------+ | 6 Tesla V100-SXM2... On | 00000000:89:00.0 Off | 0 | | N/A 49C P0 44W / 300W | 10MiB / 16152MiB | 0% Default | +-------------------------------+----------------------+----------------------+ | 7 Tesla V100-SXM2... On | 00000000:8A:00.0 Off | 0 | | N/A 48C P0 47W / 300W | 10MiB / 16152MiB | 0% Default | +-------------------------------+----------------------+----------------------+ +-----------------------------------------------------------------------------+ | Processes: GPU Memory | | GPU PID Type Process name Usage | |=============================================================================| | 0 37930 C python 2162MiB | | 1 37930 C python 2162MiB | +-----------------------------------------------------------------------------+ ```