[](https://hackmd.io/@Chieh) # TWCC [2] ## 在虛擬運算中,設定GPU環境來執行深度學習相關的服務 深度學習不外乎就是環境內須能使用GPU,我認為能夠安裝GPU Driver為當前最重要的一個環節,也是所有流程裡面最容易裝不好的一個環節,因此特別寫下詳細的操作完整流程。 大致流程: 1. 創建一個配有GPU的Ubuntu虛擬運算。 2. 安裝GPU。 3. 安裝Docker並確認可執行GPU。 --- ### 先安裝GPU驅動: 先行關掉`nouveau`驅動,請參考這篇流程:https://hackmd.io/@Chieh/B1OP54uZq 然後於TWCC頁面上重新啟動。 安裝相依包: ``` sudo apt-get update sudo apt-get install build-essential gcc-multilib dkms sudo apt-get install linux-source ``` 取得generic版本。 ``` $(uname -r) ``` 就我的案例來說,我的版本是 `5.4.0.94`,則需要安裝該版本的相依包。 ``` sudo apt-get install linux-headers-5.4.0-97-generic ``` 於[官方網站](https://www.nvidia.com/Download/driverResults.aspx/185202/en-us)下載驅動軟體,本次使用的GPU為T4。  開始安裝GPU ``` sudo sh NVIDIA-Linux-x86_64-470.103.01.run ``` 中途會遇到一些選項: 1. 有關DKMS:選擇No ``` Would you like to register the kernel module souces with DKMS? This will allow DKMS to automatically build a new module, if you install a different kernel later? ``` 2. 有關 32-bit 的libraries:選擇No ``` Nvidia's 32-bit compatibility libraries? ``` 完成,並掛載Nvidia驅動。 ``` modprobe nvidia ``` Check by `nvidia-smi`: ``` ubuntu@vm1652402369198:~$ nvidia-smi Fri May 13 09:20:09 2022 +-----------------------------------------------------------------------------+ | NVIDIA-SMI 470.103.01 Driver Version: 470.103.01 CUDA Version: 11.4 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |===============================+======================+======================| | 0 Tesla T4 Off | 00000000:00:06.0 Off | 0 | | N/A 41C P0 21W / 70W | 0MiB / 15109MiB | 0% Default | | | | N/A | +-------------------------------+----------------------+----------------------+ +-----------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=============================================================================| | No running processes found | +-----------------------------------------------------------------------------+ ``` --- ### 安裝 Docker 安裝docker流程如一般官網流程即可。 ``` sudo apt install -y apt-transport-https curl gnupg-agent software-properties-common curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo apt-key add - sudo add-apt-repository deb [arch=amd64] https://download.docker.com/linux/ubuntu $(lsb_release -cs) stable sudo apt update sudo apt install -y docker-ce docker-ce-cli containerd.io sudo usermod -aG docker $USER sudo chmod 777 /var/run/docker.sock ``` ### 安裝 NVIDIA container toolkit (方能於容器中使用GPU) ``` curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | \ sudo apt-key add - distribution=$(. /etc/os-release;echo $ID$VERSION_ID) curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | \ sudo tee /etc/apt/sources.list.d/nvidia-docker.list sudo apt-get update sudo apt-get install -y nvidia-container-toolkit sudo systemctl restart docker ``` 如果安裝過程中有bug,可以先用此指令排除 ``` sudo apt --fix-broken install ``` 安裝完成,測試一下。 ``` $docker run --gpus all nvidia/cuda:11.0-base nvidia-smi +-----------------------------------------------------------------------------+ | NVIDIA-SMI 470.103.01 Driver Version: 470.103.01 CUDA Version: 11.4 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |===============================+======================+======================| | 0 Tesla T4 Off | 00000000:00:06.0 Off | 0 | | N/A 34C P0 16W / 70W | 0MiB / 15109MiB | 0% Default | | | | N/A | +-------------------------------+----------------------+----------------------+ +-----------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=============================================================================| | No running processes found | +-----------------------------------------------------------------------------+ ``` 成功於容器內驅動GPU。 接下來就可以在TWCC上,自行使用Docker下載images,來執行任何需要用到GPU環境的深度學習計算~ ## Reference - https://www.cnblogs.com/pprp/p/9430836.html
×
Sign in
Email
Password
Forgot password
or
By clicking below, you agree to our
terms of service
.
Sign in via Facebook
Sign in via Twitter
Sign in via GitHub
Sign in via Dropbox
Sign in with Wallet
Wallet (
)
Connect another wallet
New to HackMD?
Sign up