Single Node K8S for GPU開發環境安裝筆記

Single Node K8S for GPU開發環境安裝筆記 === Install Ubuntu 18.04 --- ### Ubuntu 18.04 開機碟去google ### 安裝維生工具 ```shell= $ sudo apt update && sudo apt i vim openssl-server curl $ sudo adduser eason $ sudo usermod -aG sudo eason (@10.31.50.79) $ scp .bashrc .vimrc .gitconfig eason@new_server $ source .bashrc ``` ### (Optional) 掛載硬碟查詢各Partition的information ```shell= $ sudo parted -l ``` ![](https://i.imgur.com/IjDmcR2.png) 查詢各Partition的UUID ```shell= $ sudo blkid -s UUID ``` ![](https://i.imgur.com/GDdw3zH.png) 在fstab裡面用UUID掛硬碟 ```shell= $ sudo vim /etc/fstab ``` ![](https://i.imgur.com/VfmXi0U.png) 掛載前記得mkdir一下, ```shell= $ sudo mkdir -p /media/sda1 $ sudo mkdir -p /media/sdb1 $ sudo mount -av ``` check 結果 ```shell= $ df -hT ``` ![](https://i.imgur.com/sdVxKXH.png) :::warning fuseblk是windows format (eg. ntfs), 請勿把docker root folder設在那個上面 ::: Install Docker CE --- Reference: [在 Ubuntu 18.04 上安裝 Docker CE](https://www.akiicat.com/2019/05/04/Docker/install-docker-ce-on-ubuntu/) ### Docker-io vs. Docker-ce 冷知識 :::info **docker-io** package is still the name used by Debian/Ubuntu for the docker release provided on their [official repos](https://salsa.debian.org/docker-team/docker/blob/master/debian/control). **docker-ce** is a certified release provided directly by [docker.com](https://download.docker.com/linux/) and can also be built from source. Main reason for using the name **docker-io** on Debian/Ubuntu platform was to avoid a name conflict with **docker system-tray** binary. ::: Source: [stackoverflow link](https://stackoverflow.com/questions/45023363/what-is-docker-io-in-relation-to-docker-ce-and-docker-ee) ### Install pre-requirement ```shell= $ sudo apt-get install -y \ apt-transport-https \ ca-certificates \ curl \ gnupg-agent \ software-properties-common ``` ### Import Docker CE apt repository & install ```shell= $ curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo apt-key add - $ sudo add-apt-repository \ "deb [arch=amd64] https://download.docker.com/linux/ubuntu \ $(lsb_release -cs) stable" $ sudo apt-get update $ sudo apt-get install -y docker-ce docker-ce-cli containerd.io ``` ### 編輯daemon.json ```shell= $ sudo vim /etc/docker/daemon.json ``` 依照不同需求將以下內容放進json ```json= // 更改docker data folder: "data-root": "/data/docker" // 新增private docker hub ip "insecure-registries": ["10.31.50.79:5000"] ``` Reload daemon and restart docker ```shell= $ sudo systemctl daemon-reload $ sudo systemctl restart docker ``` ### Add current user to docker group ```shell= $ sudo usermod -aG docker $USER ``` Re-login & test ```shell= $ (Ctrl + D) ..re-login $ docker images ``` Enable GPU --- :::info for detail please refer to **[Sally's note ](https://hackmd.io/N_smVJz0SdaBinA6ruITeQ)** ::: ### NVIDIA Driver Follow [this link](https://www.nvidia.com/Download/index.aspx?lang=en-us) to download related driver to your system #### Install pre-requirement ```shell= $ sudo apt-get install dkms build-essential linux-headers-generic ``` #### Install NVIDIA Driver ```shell= $ sudo bash NVIDIA-Linux-x86_64-440.82.run ``` Continue install while pre-install script failed ![](https://i.imgur.com/WUSO76Y.png) #### Nouveau kernel driver issue (Option 1.) If you encounter this proplem, the installer will try to disable it by hitting `OK` in the below screen , but you need to reboot after that. ![](https://i.imgur.com/NDbk7xI.png) ![](https://i.imgur.com/me8uBeE.png) ```shell= $ sudo reboot ``` ##### Test ```shell= $ nvidia-smi ``` ![](https://i.imgur.com/rpiOhy5.png) #### Nouveau kernel driver issue (Option 2.) If above doesn't work, please follow beloe instruction: [如何關閉nouveau-kernel-driver，解決無法安裝Nvidia driver問題](http://abay-note.blogspot.com/2018/10/nouveau-kernel-drivernvidia-driver.html) ### CUDA Toolkit Follow [this link](https://developer.nvidia.com/cuda-downloads?target_os=Linux) to download CUDA Toolkit :::warning Make sure to download **runfile(local)** version and **skip the Driver** part while install, otherwise it will mess up with the driver installed in previous step ::: ![](https://i.imgur.com/A5ng4q1.png) #### Append cuda library to PATH and LD_LIBRARY_PATH ```shell= $ vim .bashrc ... export PATH=$PATH:/usr/local/cuda-10.2/bin export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/cuda-10.2/lib64 ... ``` Enable GPU support on Docker --- From Docker 19.03, NVIDIA GPU is natively supported by docker runtime by add `--gpus` argument in `docker run`. (source: [NVIDIA/nvidia-docker github](https://github.com/NVIDIA/nvidia-docker)) #### Install **nvidia-container-toolkit** ```shell= $ distribution=$(. /etc/os-release;echo $ID$VERSION_ID) $ curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add - $ curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | sudo tee /etc/apt/sources.list.d/nvidia-docker.list $ sudo apt-get update && sudo apt-get install -y nvidia-container-toolkit $ sudo systemctl restart docker ``` #### Usage ```shell= $ docker run --gpus all nvidia/cuda:10.0-base nvidia-smi ``` 安裝k8s --- We use **minik8s** for best GPU support and most resource on the web. Minik8s is just for single node deployment **for DEV only**. ### Pre-requiremnt - [NVIDIA Docker](#Install-NVIDIA-Docker) - [kubectl](#Install-kubectl) - [conntrack](#Install-conntrack) #### Install NVIDIA-Docker :::danger For k8s to use your GPU, you still must install **nvidia-docker2** and **nvidia-device-plugin** first. If you don't need GPU in k8s, you can skip this step. ::: ```shell= $ sudo apt-get install -y nvidia-docker2 ``` This will mess up your daemon.json, input 'Y', we will fix that later ![](https://i.imgur.com/C2wzYeo.png) After installation, re-do the [編輯daemon.json](https://hackmd.io/TLplAAmtSpiN66dA1zIXDA?both#%E7%B7%A8%E8%BC%AFdaemonjson) section Final **daemon.json** will look like this: ```json= { "runtimes": { "nvidia": { "path": "nvidia-container-runtime", "runtimeArgs": [] } }, "default-runtime": "nvidia", "data-root": "/data/docker", "insecure-registries": ["10.31.50.79:5000"] } ``` #### Install kubectl source: [Install kubctl using native package management](https://kubernetes.io/docs/tasks/tools/install-kubectl/#install-using-native-package-management) ```shell= $ sudo apt-get update && sudo apt-get install -y apt-transport-https gnupg2 $ curl -s https://packages.cloud.google.com/apt/doc/apt-key.gpg | sudo apt-key add - $ echo "deb https://apt.kubernetes.io/ kubernetes-xenial main" | sudo tee -a /etc/apt/sources.list.d/kubernetes.list $ sudo apt-get update && sudo apt-get install -y kubectl ``` ##### (optional) kubectl auto complete ```shell= $ echo "source <(kubectl completion bash)" >> ~/.bashrc $ source ~/.bashrc ``` #### Install conntrack ```shell= $ sudo apt-get install conntrack ``` ### Install minikube minikube is local Kubernetes, focusing on making it easy to learn and develop for Kubernetes. source: [minikube Debian package](https://minikube.sigs.k8s.io/docs/start/#debian-package) ```shell= $ curl -LO https://storage.googleapis.com/minikube/releases/latest/minikube_1.9.1-0_amd64.deb $ sudo dpkg -i minikube_1.9.1-0_amd64.deb ``` ### Start k8s cluster with GPU support source: [Using the ‘none’ driver](https://minikube.sigs.k8s.io/docs/tutorials/nvidia_gpu/#using-the-none-driver) This will download related docker container and start a k8s cluster by your docker runtime. ``` $ sudo minikube start --driver=none --apiserver-ips 127.0.0.1 --apiserver-name localhost ``` Grent permission to k8s group ``` $ sudo chmod -R 775 /etc/kubernetes ``` Create swpc namespace ``` $ kubectl create namespace swpc ``` Setting kubectl default namespace as swpc ``` $ kubectl config set-context --current --namespace=swpc ``` #### Install NVIDIA-Device-plugin source [NVIDIA device plugin for Kubernetes](https://github.com/NVIDIA/k8s-device-plugin) ``` $ sudo kubectl create -f https://raw.githubusercontent.com/NVIDIA/k8s-device-plugin/master/nvidia-device-plugin.yml ``` #### Test Edit **cuda_test.yaml** ```yaml= apiVersion: v1 kind: Pod metadata: name: gpu-pod spec: containers: - name: cuda-container image: nvidia/cuda:10.0-base command: ["/bin/bash"] args: ["-c", "while true; do echo hello; sleep 10;done"] resources: limits: nvidia.com/gpu: 1 # requesting 1 GPU ``` create pod ```shell= $ sudo kubectl create -f cuda_test.yaml ``` check status ```shell= $ sudo kubectl describe pod/gpu-pod ``` ![](https://i.imgur.com/FmC1ynH.png)