用Docker執行pytorch/tensorflow,只要安裝nvidia driver即可 === 在Linux下要能使用GPU,要安裝好多元件,如NV驅動程式,CUDA,CUDNN,CONDA,PYTHON,PYTORCH,TORCHVISION等,每一個版本有問題就出問題,而且如果是使用圖型介面的Ubuntu,還會常常有驅動跑掉,整個X跑不起來的情況。 為了避免這種麻煩,為何不用最好用的DOCKER呢?只要在主系統上安裝NVIDIA驅動,其它事全部交給docker解決。 # 一、安裝nvidia driver * 先加入nvidia的ppa ``` sudo add-apt-repository ppa:graphics-drivers/ppa ``` * 如果遇到金鑰不存在,則先加入nvidia的金鑰。 ``` curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add - ``` * 更新來源 ``` sudo apt-get update ``` * 開始安裝 ``` sudo apt-get nvidia-430 ``` * 檢查是否安裝成功 ``` $ nvidia-smi [15:06:36] Mon Jul 29 15:06:39 2019 +-----------------------------------------------------------------------------+ | NVIDIA-SMI 430.26 Driver Version: 430.26 CUDA Version: 10.2 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | |===============================+======================+======================| | 0 GeForce RTX 208... Off | 00000000:0A:00.0 Off | N/A | | 30% 41C P0 58W / 250W | 0MiB / 11019MiB | 0% Default | +-------------------------------+----------------------+----------------------+ | 1 GeForce RTX 208... Off | 00000000:41:00.0 Off | N/A | | 36% 44C P0 1W / 250W | 0MiB / 11011MiB | 0% Default | +-------------------------------+----------------------+----------------------+ +-----------------------------------------------------------------------------+ | Processes: GPU Memory | | GPU PID Type Process name Usage | |=============================================================================| | No running processes found | +-----------------------------------------------------------------------------+ (base) (immust02)joshhu:4014/ $ ``` # 二、安裝`docker community` **使用環境** * Ubuntu 16.04 * NVidia GPU ``` curl -fsSL https://get.docker.com -o get-docker.sh sudo sh get-docker.sh sudo usermod -aG docker $USER ``` **檢查`docker`版本** ``` (base) (immust02)joshhu:4014/ $ docker version [15:01:38] Client: Version: 18.09.6 API version: 1.39 Go version: go1.10.8 Git commit: 481bc77 Built: Sat May 4 02:35:27 2019 OS/Arch: linux/amd64 Experimental: false Server: Docker Engine - Community Engine: Version: 19.03.0 API version: 1.40 (minimum version 1.12) Go version: go1.12.5 Git commit: aeac949 Built: Wed Jul 17 18:14:42 2019 OS/Arch: linux/amd64 Experimental: false containerd: Version: 1.2.5 GitCommit: bb71b10fd8f58240ca47fbb579b9d1028eea7c84 runc: Version: 1.0.0-rc6+dev GitCommit: 2b18fe1d885ee5083ef9f0838fee39b62d653e30 docker-init: Version: 0.18.0 GitCommit: fec3683 ``` # 三、安裝`nvidia-docker` **先安裝好`nvidia-docker`** ``` # Add the package repositories distribution=$(. /etc/os-release;echo $ID$VERSION_ID) curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add - curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | sudo tee /etc/apt/sources.list.d/nvidia-docker.list sudo apt-get update && sudo apt-get install -y nvidia-container-toolkit sudo systemctl restart docker ``` **檢查`nvidia-docker`的版本** ``` $ nvidia-docker version [15:18:42] NVIDIA Docker: 2.0.3 Client: Version: 18.09.6 API version: 1.39 Go version: go1.10.8 Git commit: 481bc77 Built: Sat May 4 02:35:27 2019 OS/Arch: linux/amd64 Experimental: false Server: Docker Engine - Community Engine: Version: 19.03.0 API version: 1.40 (minimum version 1.12) Go version: go1.12.5 Git commit: aeac949 Built: Wed Jul 17 18:14:42 2019 OS/Arch: linux/amd64 Experimental: false containerd: Version: 1.2.5 GitCommit: bb71b10fd8f58240ca47fbb579b9d1028eea7c84 runc: Version: 1.0.0-rc6+dev GitCommit: 2b18fe1d885ee5083ef9f0838fee39b62d653e30 docker-init: Version: 0.18.0 GitCommit: fec3683 ``` # 四、下載已安裝好所有套件的Cuda10.0的docker image ``` docker pull moeidb/aigo:cu10.0-dnn7.6-gpu-pytorch-cv-19.06 ``` 檢查是否下載成功: ``` $ docker images [15:42:07] REPOSITORY TAG IMAGE ID CREATED SIZE moeidb/aigo cu10.0-dnn7.6-gpu-pytorch-19.06 492bce9e825f 3 weeks ago 17GB nvidia/cuda 9.0-base ``` ?查看container中的python版本 ``` $ docker run --rm moeidb/aigo:cu10.0-dnn7.6-gpu-pytorch-19.06 python3 --version Python 3.7.3 ``` # 五、建立啟動Jupyter Lab的指令檔 * 建立一個指令檔`startj.sh`,內容如下 ``` # 決定Jupyterlab該監聽本機的哪一個port host_port=9999 # 啟動容器並取得容器ID container_id=$(nvidia-docker run --rm -d --ipc=host -p ${host_port}:8888 -v $PWD:/workspace moeidb/aigo:cu10.0-dnn7.6-gpu-pytorch-cv-19.06) # 休息一會,靜待容器服務啟動 # 等待服務啟動 sleep 2. # 擷取容器的Jupyterlab token notebook_token=$(docker logs ${container_id} 2>&1 | grep -nP "(LabApp.*token=).*" | cut -d"=" -f 2) # 顯示連線至Jupyterlab服務的網址 printf "Open a browser and connect to:\n http://127.0.0.1:${host_port}/?token=${notebook_token}\n ``` * 將此指令檔設定為可執行`chmod +x startj.sh` * 輸入`./startj.sh`,會出現一個網址,即Jupyter Lab的網址 * 進入後如圖  # 六、注意事項 * 啟動之後,這個目錄所有的檔案都是`root`權限,要注意。 * 每次重新啟動之後,所有在jupyter lab用`!pip install`的東西都要重裝 * 如果用本機正常的conda環境執行會有問題,必須先改成使用者權限 ###### tags: `pytorch` `docker` `tensorflow`
×
Sign in
Email
Password
Forgot password
or
Sign in via Google
Sign in via Facebook
Sign in via X(Twitter)
Sign in via GitHub
Sign in via Dropbox
Sign in with Wallet
Wallet (
)
Connect another wallet
Continue with a different method
New to HackMD?
Sign up
By signing in, you agree to our
terms of service
.