# Tesla T4 Benchmark Test ###### tags: `aetherAI` 1. 安裝NVIDIA Tesla卡片驅動 (for Ubuntu 16.04)。 ```bash # 下載驅動 wget -O nv_driver.deb http://tw.download.nvidia.com/tesla/418.67/nvidia-diag-driver-local-repo-ubuntu1604-418.67_1.0-1_amd64.deb && \ # 安裝驅動 sudo dpkg -i nv_driver.deb && \ sudo apt-get update && \ sudo apt-get install cuda-drivers && \ # 重新啟動 sudo reboot ``` 安裝完成並且重開機後, 於終端機輸入```nvidia-smi```, 應即可查看顯示卡狀態。 2. 參考 [NVIDIA Docker安裝說明](https://github.com/NVIDIA/nvidia-docker#quickstart) 來安裝Docker & NVIDIA-Docker。 3. 跑Benchmark: ```bash # 下載用來跑Benchmark的Docker容器映像檔 nvidia-docker pull aetherai/research:cu10.0-dnn7.6-gpu-tf-cv-19.07 # 運行測試範例 NV_GPU=0 nvidia-docker run -it --rm \ aetherai/research:cu10.0-dnn7.6-gpu-tf-cv-19.07 python3 /workspace/tf_benchmarks/scripts/tf_cnn_benchmarks/tf_cnn_benchmarks.py \ --num_gpus=1 \ --model=resnet50 \ --batch_size=256 \ --nodistortions \ --use_fp16 \ --num_epochs=1 ``` 此測試將輸出類似以下的結果: ``` ... ... ... Running warm up [406/1171] 2019-07-19 04:09:58.783779: I tensorflow/stream_executor/platform/default/dso_lo ader.cc:42] Successfully opened dynamic library libcublas.so.10.0 2019-07-19 04:09:59.197938: I tensorflow/stream_executor/platform/default/dso_lo ader.cc:42] Successfully opened dynamic library libcudnn.so.7 Done warm up Step Img/sec total_loss 1 images/sec: 651.2 +/- 0.0 (jitter = 0.0) 7.808 10 images/sec: 652.3 +/- 0.4 (jitter = 1.2) 7.883 20 images/sec: 652.1 +/- 0.3 (jitter = 1.2) 8.018 30 images/sec: 651.6 +/- 0.3 (jitter = 1.6) 7.836 40 images/sec: 651.3 +/- 0.2 (jitter = 1.4) 7.782 50 images/sec: 651.3 +/- 0.2 (jitter = 1.2) 7.864 60 images/sec: 651.1 +/- 0.2 (jitter = 1.3) 7.880 70 images/sec: 651.0 +/- 0.2 (jitter = 1.3) 7.842 80 images/sec: 650.9 +/- 0.2 (jitter = 1.5) 7.843 90 images/sec: 650.9 +/- 0.1 (jitter = 1.4) 7.850 100 images/sec: 650.8 +/- 0.1 (jitter = 1.4) 7.751 110 images/sec: 650.6 +/- 0.1 (jitter = 1.5) 7.911 120 images/sec: 650.2 +/- 0.2 (jitter = 1.6) 7.803 130 images/sec: 650.0 +/- 0.2 (jitter = 1.7) 7.811 140 images/sec: 650.0 +/- 0.2 (jitter = 1.5) 7.764 150 images/sec: 649.8 +/- 0.2 (jitter = 1.7) 7.792 160 images/sec: 649.6 +/- 0.2 (jitter = 1.9) 7.655 170 images/sec: 649.4 +/- 0.2 (jitter = 2.1) 7.672 180 images/sec: 649.2 +/- 0.2 (jitter = 2.2) 7.656 190 images/sec: 649.1 +/- 0.2 (jitter = 2.3) 7.699 200 images/sec: 649.0 +/- 0.2 (jitter = 2.3) ... ... ... ``` 4. 於運行Benchmark的同時,另開一個終端機,並且監控卡片溫度: ```bash chweng@HonghuA:~$ watch -n 1 nvidia-smi Every 1.0s: nvidia-smi Fri Jul 19 12:13:57 2019 +-----------------------------------------------------------------------------+ | NVIDIA-SMI 415.27 Driver Version: 415.27 CUDA Version: 10.0 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | |===============================+======================+======================| | 0 TITAN RTX On | 00000000:04:00.0 Off | N/A | | 65% 86C P2 257W / 280W | 23347MiB / 24190MiB | 98% Default | +-------------------------------+----------------------+----------------------+ ``` 可見此卡片(Titan RTX)已達到86度。建議將Tesla T4溫度控制至最高80度上下,以免溫度過高而造成卡片降速。