Single Node K8S for GPU開發環境 安裝筆記
===
Install Ubuntu 18.04
---
### Ubuntu 18.04 開機碟
去google
### 安裝維生工具
```shell=
$ sudo apt update && sudo apt i vim openssl-server curl
$ sudo adduser eason
$ sudo usermod -aG sudo eason
(@10.31.50.79) $ scp .bashrc .vimrc .gitconfig eason@new_server
$ source .bashrc
```
### (Optional) 掛載硬碟
查詢各Partition的information
```shell=
$ sudo parted -l
```

查詢各Partition的UUID
```shell=
$ sudo blkid -s UUID
```

在fstab裡面用UUID掛硬碟
```shell=
$ sudo vim /etc/fstab
```

掛載前記得mkdir一下,
```shell=
$ sudo mkdir -p /media/sda1
$ sudo mkdir -p /media/sdb1
$ sudo mount -av
```
check 結果
```shell=
$ df -hT
```

:::warning
fuseblk是windows format (eg. ntfs), 請勿把docker root folder設在那個上面
:::
Install Docker CE
---
Reference: [在 Ubuntu 18.04 上安裝 Docker CE](https://www.akiicat.com/2019/05/04/Docker/install-docker-ce-on-ubuntu/)
### Docker-io vs. Docker-ce 冷知識
:::info
**docker-io** package is still the name used by Debian/Ubuntu for the docker release provided on their [official repos](https://salsa.debian.org/docker-team/docker/blob/master/debian/control).
**docker-ce** is a certified release provided directly by [docker.com](https://download.docker.com/linux/) and can also be built from source.
Main reason for using the name **docker-io** on Debian/Ubuntu platform was to avoid a name conflict with **docker system-tray** binary.
:::
Source: [stackoverflow link](https://stackoverflow.com/questions/45023363/what-is-docker-io-in-relation-to-docker-ce-and-docker-ee)
### Install pre-requirement
```shell=
$ sudo apt-get install -y \
apt-transport-https \
ca-certificates \
curl \
gnupg-agent \
software-properties-common
```
### Import Docker CE apt repository & install
```shell=
$ curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo apt-key add -
$ sudo add-apt-repository \
"deb [arch=amd64] https://download.docker.com/linux/ubuntu \
$(lsb_release -cs) stable"
$ sudo apt-get update
$ sudo apt-get install -y docker-ce docker-ce-cli containerd.io
```
### 編輯daemon.json
```shell=
$ sudo vim /etc/docker/daemon.json
```
依照不同需求將以下內容放進json
```json=
// 更改docker data folder:
"data-root": "/data/docker"
// 新增private docker hub ip
"insecure-registries": ["10.31.50.79:5000"]
```
Reload daemon and restart docker
```shell=
$ sudo systemctl daemon-reload
$ sudo systemctl restart docker
```
### Add current user to docker group
```shell=
$ sudo usermod -aG docker $USER
```
Re-login & test
```shell=
$ (Ctrl + D)
..re-login
$ docker images
```
Enable GPU
---
:::info
for detail please refer to **[Sally's note ](https://hackmd.io/N_smVJz0SdaBinA6ruITeQ)**
:::
### NVIDIA Driver
Follow [this link](https://www.nvidia.com/Download/index.aspx?lang=en-us) to download related driver to your system
#### Install pre-requirement
```shell=
$ sudo apt-get install dkms build-essential linux-headers-generic
```
#### Install NVIDIA Driver
```shell=
$ sudo bash NVIDIA-Linux-x86_64-440.82.run
```
Continue install while pre-install script failed

#### Nouveau kernel driver issue (Option 1.)
If you encounter this proplem, the installer will try to disable it by hitting `OK` in the below screen , but you need to reboot after that.


```shell=
$ sudo reboot
```
##### Test
```shell=
$ nvidia-smi
```

#### Nouveau kernel driver issue (Option 2.)
If above doesn't work, please follow beloe instruction:
[如何關閉nouveau-kernel-driver,解決無法安裝Nvidia driver問題](http://abay-note.blogspot.com/2018/10/nouveau-kernel-drivernvidia-driver.html)
### CUDA Toolkit
Follow [this link](https://developer.nvidia.com/cuda-downloads?target_os=Linux) to download CUDA Toolkit
:::warning
Make sure to download **runfile(local)** version and **skip the Driver** part while install, otherwise it will mess up with the driver installed in previous step
:::

#### Append cuda library to PATH and LD_LIBRARY_PATH
```shell=
$ vim .bashrc
...
export PATH=$PATH:/usr/local/cuda-10.2/bin
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/cuda-10.2/lib64
...
```
Enable GPU support on Docker
---
From Docker 19.03, NVIDIA GPU is natively supported by docker runtime by add `--gpus` argument in `docker run`. (source: [NVIDIA/nvidia-docker github](https://github.com/NVIDIA/nvidia-docker))
#### Install **nvidia-container-toolkit**
```shell=
$ distribution=$(. /etc/os-release;echo $ID$VERSION_ID)
$ curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add -
$ curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | sudo tee /etc/apt/sources.list.d/nvidia-docker.list
$ sudo apt-get update && sudo apt-get install -y nvidia-container-toolkit
$ sudo systemctl restart docker
```
#### Usage
```shell=
$ docker run --gpus all nvidia/cuda:10.0-base nvidia-smi
```
安裝k8s
---
We use **minik8s** for best GPU support and most resource on the web. Minik8s is just for single node deployment **for DEV only**.
### Pre-requiremnt
- [NVIDIA Docker](#Install-NVIDIA-Docker)
- [kubectl](#Install-kubectl)
- [conntrack](#Install-conntrack)
#### Install NVIDIA-Docker
:::danger
For k8s to use your GPU, you still must install **nvidia-docker2** and **nvidia-device-plugin** first. If you don't need GPU in k8s, you can skip this step.
:::
```shell=
$ sudo apt-get install -y nvidia-docker2
```
This will mess up your daemon.json, input 'Y', we will fix that later

After installation, re-do the [編輯daemon.json](https://hackmd.io/TLplAAmtSpiN66dA1zIXDA?both#%E7%B7%A8%E8%BC%AFdaemonjson) section
Final **daemon.json** will look like this:
```json=
{
"runtimes": {
"nvidia": {
"path": "nvidia-container-runtime",
"runtimeArgs": []
}
},
"default-runtime": "nvidia",
"data-root": "/data/docker",
"insecure-registries": ["10.31.50.79:5000"]
}
```
#### Install kubectl
source: [Install kubctl using native package management](https://kubernetes.io/docs/tasks/tools/install-kubectl/#install-using-native-package-management)
```shell=
$ sudo apt-get update && sudo apt-get install -y apt-transport-https gnupg2
$ curl -s https://packages.cloud.google.com/apt/doc/apt-key.gpg | sudo apt-key add -
$ echo "deb https://apt.kubernetes.io/ kubernetes-xenial main" | sudo tee -a /etc/apt/sources.list.d/kubernetes.list
$ sudo apt-get update && sudo apt-get install -y kubectl
```
##### (optional) kubectl auto complete
```shell=
$ echo "source <(kubectl completion bash)" >> ~/.bashrc
$ source ~/.bashrc
```
#### Install conntrack
```shell=
$ sudo apt-get install conntrack
```
### Install minikube
minikube is local Kubernetes, focusing on making it easy to learn and develop for Kubernetes.
source: [minikube Debian package](https://minikube.sigs.k8s.io/docs/start/#debian-package)
```shell=
$ curl -LO https://storage.googleapis.com/minikube/releases/latest/minikube_1.9.1-0_amd64.deb
$ sudo dpkg -i minikube_1.9.1-0_amd64.deb
```
### Start k8s cluster with GPU support
source: [Using the ‘none’ driver](https://minikube.sigs.k8s.io/docs/tutorials/nvidia_gpu/#using-the-none-driver)
This will download related docker container and start a k8s cluster by your docker runtime.
```
$ sudo minikube start --driver=none --apiserver-ips 127.0.0.1 --apiserver-name localhost
```
Grent permission to k8s group
```
$ sudo chmod -R 775 /etc/kubernetes
```
Create swpc namespace
```
$ kubectl create namespace swpc
```
Setting kubectl default namespace as swpc
```
$ kubectl config set-context --current --namespace=swpc
```
#### Install NVIDIA-Device-plugin
source [NVIDIA device plugin for Kubernetes](https://github.com/NVIDIA/k8s-device-plugin)
```
$ sudo kubectl create -f https://raw.githubusercontent.com/NVIDIA/k8s-device-plugin/master/nvidia-device-plugin.yml
```
#### Test
Edit **cuda_test.yaml**
```yaml=
apiVersion: v1
kind: Pod
metadata:
name: gpu-pod
spec:
containers:
- name: cuda-container
image: nvidia/cuda:10.0-base
command: ["/bin/bash"]
args: ["-c", "while true; do echo hello; sleep 10;done"]
resources:
limits:
nvidia.com/gpu: 1 # requesting 1 GPU
```
create pod
```shell=
$ sudo kubectl create -f cuda_test.yaml
```
check status
```shell=
$ sudo kubectl describe pod/gpu-pod
```
