Try   HackMD

在Linux下安裝cuda/cudnn環境

TL;DR 不追求最新版,追求最穩定版

一、前言

在Linux安裝tensorflow-gpupytorch環境需要先將GPU的環境裝好,下面是安裝環境

  • 主系統:Ubuntu 16.04 Desktop
  • NVIDIA驅動程式:418(430無法啟動X Window),以後有更新再說
  • cuda 10.0(Ubuntu 16.04無法安裝10.1)
  • cudnn 7.6.3

二、安裝nvdia driver

  • 先加入nvidia的ppa
sudo add-apt-repository ppa:graphics-drivers/ppa
  • 如果遇到金鑰不存在,則先加入nvidia的金鑰。
curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add -
  • 更新來源
sudo apt-get update
  • 開始安裝
sudo apt-get nvidia-418
  • 檢查是否安裝成功
$ nvidia-smi                                                                                                                   [15:06:36]
Mon Jul 29 15:06:39 2019
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 430.26       Driver Version: 430.26       CUDA Version: 10.2     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce RTX 208...  Off  | 00000000:0A:00.0 Off |                  N/A |
| 30%   41C    P0    58W / 250W |      0MiB / 11019MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   1  GeForce RTX 208...  Off  | 00000000:41:00.0 Off |                  N/A |
| 36%   44C    P0     1W / 250W |      0MiB / 11011MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+
(base) (immust02)joshhu:4014/ $

三、安裝CUDA

  • 下載
  • 移除舊的cuda
  • 安裝driver

1、下載

到這邊 https://developer.nvidia.com/cuda-downloads 來下載cuda。不一定都要下載最新的,不見得能用,我們用Ubuntu 16.04只能用到CUDA 10.0,再新的也無法用。不要下載deb,下載runfile即可,如下圖:

下載回來後,直接執行就行,但重點是 不要安裝nvidia driver。下面是步驟

wget http://developer.download.nvidia.com/compute/cuda/10.1/Prod/local_installers/cuda_10.1.243_418.87.00_linux.run

2、移除舊cuda

在新的安裝開始前,先把舊的移除乾淨,視你的版本而定,我這邊以移除10.0為例,指令如下:

cd /usr/local/cuda/bin
sudo ./uninstall_cuda_10.0.pl

3、安裝新版

之後再進行全新安裝

sudo sh cuda_10.1.243_418.87.00_linux.run

重點記住,一定不要裝他的driver,其它的照步驟全裝即可。

四、安裝CUDNN

  • 下載
  • 解壓及複製
  • 設定~/.bashrc~/.zshrc

1、下載

CUDNN要去nvidia下載,必須有開發者身份,先到這邊 https://developer.nvidia.com/cudnn 登入下載。別忘了要選擇正確的版本如下:

下載壓縮檔就行,不要下載deb,反安裝很麻煩,如下圖。

2、解壓縮及複製

進入下載目錄之後,確定該cudnn檔案存在(如本例中的cudnn-9.0-linux-x64-v7.tgz,執行下面:

sudo tar -xzvf cudnn-9.0-linux-x64-v7.tgz   
sudo cp cuda/include/cudnn*.h /usr/local/cuda/include
sudo cp cuda/lib64/libcudnn* /usr/local/cuda/lib64
sudo chmod a+r /usr/local/cuda/include/cudnn.h /usr/local/cuda/lib64/libcudnn*

3、設定~/.bashrc~/.zshrc

視你用的shell而定,在裏面最後加下面兩行:

export LD_LIBRARY_PATH="$LD_LIBRARY_PATH:/usr/local/cuda/lib64:/usr/local/cuda/extras/CUPTI/lib64"
export CUDA_HOME=/usr/local/cuda

最後使用source ~/.bashrc來套用設定

4、使用nvcc -v檢查設定

輸入nvcc -V來看你的設定

(immust02)joshhu:~/ $ nvcc -V                                                                                                      [18:27:09]
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2018 NVIDIA Corporation
Built on Sat_Aug_25_21:08:01_CDT_2018
Cuda compilation tools, release 10.0, V10.0.130
(immust02)joshhu:~/ $  

Windows下還要安裝zlib

到nvidia下載zlib,然後在Windows的Path中要設定目錄

Image Not Showing Possible Reasons
  • The image was uploaded to a note which you don't have access to
  • The note which the image was originally uploaded to has been deleted
Learn More →