# Ubuntu 20.04 安裝深度學習環境(Nvidia驅動、CUDA 10+CuDNN 7.6.5) ###### tags: `Nvidia Driver` `Ubuntu` `CUDA` `CuDNN` `Machine Learning` `Deep Learning` ## 系統軟硬體 * 作業系統:Ubuntu 20.04 Server * 顯示卡:Nvidia Tesla K80 x 2 ## 安裝目標 * Nvidia K80 顯示卡驅動 * CUDA 10 * CuDNN 7.6.5 ## 安裝Nvidia Driver 先新增repository、update。 ``` sudo add-apt-repository ppa:graphics-drivers/ppa sudo apt update sudo apt install ubuntu-drivers-common ``` 接著輸入```ubuntu-drivers devices```來查看目前可用的驅動版本。 ``` $ ubuntu-drivers devices == /sys/devices/pci0000:00/0000:00:03.2/0000:0b:00.0/0000:0c:10.0/0000:0e:00.0 == modalias : pci:v000010DEd0000102Dsv000010DEsd0000106Cbc03sc02i00 vendor : NVIDIA Corporation model : GK210GL [Tesla K80] driver : nvidia-driver-410 - third-party free driver : nvidia-driver-418-server - distro non-free driver : nvidia-driver-455 - third-party free recommended driver : nvidia-driver-450 - distro non-free driver : nvidia-driver-440-server - distro non-free driver : nvidia-driver-390 - distro non-free driver : nvidia-driver-450-server - distro non-free driver : xserver-xorg-video-nouveau - distro free builtin ``` 輸入`sudo apt install nvidia-driver-418-server`來安裝418版的驅動,安裝完後重開機。 ``` sudo apt install nvidia-driver-418-server sudo reboot ``` * Nvidia驅動程式與CUDA版本對照表 | CUDA Toolkit | Linux x86_64 Driver Version | | --------------------- | --------------------------- | | CUDA 11.1 (11.1.0) | >= 450.80.02 | | CUDA 11.0 (11.0.3) | >= 450.36.06 | | CUDA 10.2 (10.2.89) | >= 440.33 | | CUDA 10.1 (10.1.105) | >= 418.39 | | CUDA 10.0 (10.0.130) | >= 410.48 | | CUDA 9.2 (9.2.88) | >= 396.26 | | CUDA 9.1 (9.1.85) | >= 390.46 | | CUDA 9.0 (9.0.76) | >= 384.81 | | CUDA 8.0 (8.0.61 GA2) | >= 375.26 | | CUDA 8.0 (8.0.44) | >= 367.48 | | CUDA 7.5 (7.5.16) | >= 352.31 | | CUDA 7.0 (7.0.28) | >= 346.46 | [Nvidia驅動程式與CUDA版本對照表](https://docs.nvidia.com/deploy/cuda-compatibility/index.html) * 重開機完後可以輸入`nvidia-smi`查看顯卡驅動是否有安裝成功並偵測到所有顯卡。 ``` $ nvidia-smi Tue Dec 8 10:39:55 2020 +-----------------------------------------------------------------------------+ | NVIDIA-SMI 418.152.00 Driver Version: 418.152.00 CUDA Version: 10.1 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | |===============================+======================+======================| | 0 Tesla K80 Off | 00000000:0D:00.0 Off | 0 | | N/A 65C P0 56W / 149W | 0MiB / 11441MiB | 0% Default | +-------------------------------+----------------------+----------------------+ | 1 Tesla K80 Off | 00000000:0E:00.0 Off | 0 | | N/A 46C P0 73W / 149W | 0MiB / 11441MiB | 0% Default | +-------------------------------+----------------------+----------------------+ | 2 Tesla K80 Off | 00000000:86:00.0 Off | 0 | | N/A 61C P0 58W / 149W | 0MiB / 11441MiB | 0% Default | +-------------------------------+----------------------+----------------------+ | 3 Tesla K80 Off | 00000000:87:00.0 Off | 0 | | N/A 45C P0 72W / 149W | 0MiB / 11441MiB | 42% Default | +-------------------------------+----------------------+----------------------+ +-----------------------------------------------------------------------------+ | Processes: GPU Memory | | GPU PID Type Process name Usage | |=============================================================================| | No running processes found | +-----------------------------------------------------------------------------+ ``` ## 安裝CUDA(官網下載runfile,手動安裝) 到Nvidia CUDA網站下載CUDA程式,[Nvidia CUDA 10 下載頁面](https://developer.nvidia.com/cuda-10.0-download-archive?target_os=Linux&target_arch=x86_64&target_distro=Ubuntu&target_version=1804&target_type=runfilelocal),或是直接在Ubuntu底下輸入 ``` wget https://developer.nvidia.com/compute/cuda/10.0/Prod/local_installers/cuda_10.0.130_410.48_linux ``` 接著將下載回來的run檔加入執行權限,然後啟動安裝程式。 ``` chmod +x cuda_10.0.130_410.48_linux.run sudo ./cuda_10.0.130_410.48_linux.run ``` 安裝過程除了Nvidia顯示卡驅動程式不要安裝以外其他都要裝。 * gcc版本問題 如果遇到gcc版本太高無法安裝時,請參考[CUDA-gcc對應版本](https://www.tensorflow.org/install/source?hl=zh-tw)安裝對應的gcc版本 ``` # 安裝gcc與g++ sudo apt install gcc-7 g++-7 # 設定軟連結(gcc-7指向gcc,g++-7指向g++) sudo ln -s /usr/bin/gcc-7 /usr/local/bin/gcc sudo ln -s /usr/bin/g++-7 /usr/local/bin/g++ # 若無法建立軟連結(先前已經有建立過的),可以先刪除軟連結再重新建立 sudo rm /usr/local/bin/gcc sudo rm /usr/local/bin/g++ ``` ## CUDA 安裝完成後確認 ### 軟連結 * 確認CUDA的軟連結是否有成功建立(預設安裝路徑為`/usr/local/cuda`) ``` $ ls /usr/local/cuda -l lrwxrwxrwx 1 root root 20 Dec 7 16:23 /usr/local/cuda -> /usr/local/cuda-10.0 ``` 如果沒有建立成功則自己建立一個 ``` sudo ln -s /usr/local/cuda-[安裝的版本號] /usr/local/cuda ``` ### 環境變數 如果輸入`nvcc`找不到指令時代表系統環境變數裡面沒有加入剛剛安裝的CUDA資料夾 則手動加入 ``` # 用nano編輯器開啟家目錄底下的.bashrc檔案。 nano ~/.bashrc ``` 在最底下新增 ``` # Set enviroment variables to NVIDIA CUDA path. export PATH=/usr/local/cuda/bin${PATH:+:${PATH}} export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/cuda/lib64 ``` 新增完成後重新載入`.bashrc`。 ``` source ~/.bashrc ``` ## 安裝CuDNN 到Nvidia CuDNN下載對應的版本,[Nvidia CuDNN](https://developer.nvidia.com/rdp/cudnn-archive),**(需註冊Nvidia Developer會員才可以進入下載畫面)**,這裡選擇`Download cuDNN v7.6.5 (November 5th, 2019), for CUDA 10.0`的`cuDNN Library for Linux`函式庫進行下載。 接著解壓縮CuDNN的檔案 ``` tar -xvf cudnn-10.0-linux-x64-v7.6.5.32.tgz ``` 會得到以下幾個檔案 ``` $ ls -l cuda total 48 drwxrwxr-x 2 sjis sjis 4096 Dec 7 16:27 include drwxrwxr-x 2 sjis sjis 4096 Dec 7 16:27 lib64 -r--r--r-- 1 sjis sjis 38963 Mar 16 2019 NVIDIA_SLA_cuDNN_Support.txt ``` 複製Library到CUDA的安裝資料夾 ``` sudo cp cuda/include/cudnn.h /usr/local/cuda/include/ sudo cp cuda/lib64/lib* /usr/local/cuda/lib64/ ``` 切換到`/usr/local/cuda/lib64/`資料夾下 ``` cd /usr/local/cuda/lib64/ ``` 建立軟鍊結(需要把版本號換成自己的版本號) ``` sudo chmod +r libcudnn.so.7.3.1 sudo ln -sf libcudnn.so.7.3.1 libcudnn.so.7 sudo ln -sf libcudnn.so.7 libcudnn.so sudo ldconfig ``` ## 問題 * 如果安裝完Nvidia Driver後出現桌面板介面,可以透過下列指令設定開啟關閉 ### 系統執行中 * 切換至**文字模式(CLI)** `sudo systemctl isolate multi-user.target` * 切換至**桌面視窗模式(GUI)** `sudo systemctl isolate graphical.target` ### 下次啟動生效 * 切換至**文字模式(CLI)** `sudo systemctl set-default multi-user.target` * 切換至**桌面視窗模式(GUI)** `systemd.unit=multi-user.target` ## 參考資料 ### Nvidia驅動、CUDA、CuDNN * https://maniac-tw.medium.com/ubuntu-18-04-%E5%AE%89%E8%A3%9D-nvidia-driver-418-cuda-10-tensorflow-1-13-a4f1c71dd8e5 * https://gitpress.io/@chchang/install-nvidia-driver-cuda-pgstrom-in-ubuntu-1804 * https://zhuanlan.zhihu.com/p/72298520 * https://medium.com/@zihansyu/ubuntu-16-04-%E5%AE%89%E8%A3%9Dcuda-10-0-cudnn-7-3-8254cb642e70 ### gcc * https://blog.csdn.net/qq_20091945/article/details/80385079 ### CLI/GUI切換 * https://askubuntu.com/questions/1242965/how-to-disable-gui-in-ubuntu