# Installation Guides
###### tags: `Installation`
## Check List
- [ ] OS
- [ ] Docker
- [ ] Kubernetes
- [ ] Dynamic volume provisioner - NFS
- [ ] Kubeflow
## OS & Machines
+ Ubuntu 18.04
1. Create a bootable USB stick
1. download ubuntu 18.04 server version image file from [http://releases.ubuntu.com/18.04.4/?_ga=2.18215993.1854190319.1596973908-1484461180.1596601256](http://releases.ubuntu.com/18.04.4/?_ga=2.18215993.1854190319.1596973908-1484461180.1596601256)
2. follow the steps in [https://ubuntu.com/tutorials/create-a-usb-stick-on-windows#1-overview](https://ubuntu.com/tutorials/create-a-usb-stick-on-windows#1-overview)
2. install the os
1. Press F11 during startup to select the boot device.
2. Follow the OS installation wizard.
3. network: ubuntu use netplan to configure network
``` bash
vim /etc/netplan/50-cloud-init.yaml
```
```bash
network:
ethernets:
enp4s0f3:
addresses: [public ip/24]
gateway4: 140.114.91.254
version: 2
```
```bash
netplan try
netplan apply
```
4. InfiniBand (if needed)
InfiniBand is a kind of interconnection widely used in supercomputers. It features high throughput and low latency. We use InfiniBand cards manufactured by Mellanox.
* Driver and system setup
Download latest MLNX_OFED driver ([driver download link](http://www.mellanox.com/page/mlnx_ofed_matrix?mtag=linux_sw_drivers)), decompress it and cd into the directory.
Before installation proceeds, install some required libraries first.
```bash
apt-get install tcl tk
```
Run install script. This might take some time.
```bash
./mlnxofedinstall
```
If required, unload old driver and load new driver.
```bash
modprobe -rv ib_isert rpcrdma ib_srpt
/etc/init.d/openibd restart
````
Start opensm (InfiniBand subnet manager).
```bash
systemctl enable opensmd --now
```
Start opensmd on one of the cluster node only. If multiple nodes start opensmd at the same time, it may cause problem.
Check InfiniBand status.
```bash
ibstat
# State: Active
# Physical state: LinkUp
ibstatus
```
## Docker
#### Command
``` bash
$ wget https://lsalab.cs.nthu.edu.tw/~riya/deploy/docker/install_docker.sh
$ sudo sh ./install_docker.sh
```
#### Script
``` bash
#!/bin/sh
apt-get update -y && apt-get upgrade -y
sudo apt-get install apt-transport-https ca-certificates curl gnupg-agent software-properties-common -y
curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo apt-key add -
sudo apt-key fingerprint 0EBFCD88
sudo add-apt-repository "deb [arch=amd64] https://download.docker.com/linux/ubuntu $(lsb_release -cs) stable"
sudo apt-get update -y
# find the version that you want to install
apt-cache madison docker-ce
sudo apt-get install docker-ce=5:19.03.12~3-0~ubuntu-bionic docker-ce-cli=5:19.03.12~3-0~ubuntu-bionic containerd.io -y
```
## Nvidia Driver
https://hackmd.io/5K7EIPWgRqynexhxJ03mDQ
## Kubernetes
### All Nodes
#### Command
``` bash
$ wget https://lsalab.cs.nthu.edu.tw/~riya/deploy/kubernetes/install_k8s.sh
$ sudo sh ./install_k8s.sh
```
#### Script
``` bash
#!/bin/sh
swapoff -a
sed -i '/ swap / s/^\(.*\)$/#\1/g' /etc/fstab
## To solve the problem of coredns crashloopbackoff
sed -i -e 's/nameserver 127.0.1.1/nameserver 8.8.8.8 8.8.4.4/i' /etc/resolv.conf
apt-get update -y #&& apt-get upgrade -y
curl -s https://packages.cloud.google.com/apt/doc/apt-key.gpg | apt-key add -
echo deb http://apt.kubernetes.io/ kubernetes-xenial main | cat > /etc/apt/sources.list.d/kubernetes.list
apt-get update -y
apt-get install -y kubelet=1.14.10-00 kubeadm=1.14.10-00 kubectl=1.14.10-00
```
---
### Master Node
#### Command
``` bash
$ sh -c "$(wget https://lsalab.cs.nthu.edu.tw/~riya/deploy/kubernetes/install_k8s_master.sh -O -)"
```
#### Script
```bash
echo "...Start master..."
sudo kubeadm init --pod-network-cidr=10.244.0.0/16
echo "...Config kubeconfig..."
mkdir -p $HOME/.kube
sudo cp /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config
export KUBECONFIG=$HOME/.kube/config
echo "...Install Flannel..."
sudo sysctl net.bridge.bridge-nf-call-iptables=1
kubectl apply -f https://lsalab.cs.nthu.edu.tw/~riya/deploy/kubernetes/kube-flannel.yaml
kubectl get pods --all-namespaces
```
---
### Worker Nodes
+ Get the token to join the cluster (In **Master** node)
``` bash
$ kubeadm token create --print-join-command
```
you will get the command like below, and enter the command in worker node
``` bash
$ kubeadm join 192.168.132.144:6443 --token 42vyim.j4opt5tzazn425rj --discovery-token-ca-cert-hash sha256:49f781054a16861e48a121ea36df5cb0b72f489dc5d972e705f528774cbe6cd8
```
---
### TroubleShooting
+ **Change the hostname**
``` bash
$ hostnamectl set-hostname 'new-hostname'
```
+ **Control plane node isolation**
By default, your cluster will not schedule Pods on the control-plane node for security reasons. If you want to be able to schedule Pods on the control-plane node, for example for a single-machine Kubernetes cluster for development, run:
```bash
$ kubectl taint nodes --all node-role.kubernetes.io/master-
```
+ **Enable connet the server** or **error: Error loading config file**
```bash
$ mkdir -p $HOME/.kube
$ sudo cp /etc/kubernetes/admin.conf ~/.kube/config
$ sudo chown $(id -u):$(id -g) $HOME/.kube/config
$ export KUBECONFIG=$HOME/.kube/config
```
---
## Dynamic volume provisioner - NFS
設置Kubeflow需要Volumes來儲存資料,
在此選用NFS(視需求選擇)
+ 在 K8s Master Node 安裝 NFS server
+ 在 k8s Worker Node 安裝 NFS common
+ 利用 k8s 建立 PV 提供給 Kubeflow 使用
``` bash
# 在 master node執行,將nfs-server建立在K8s Master Node
# xxx.xxx.xxx.xxx換成自己內網的IP
$ sudo apt-get update && sudo apt-get install -y nfs-server
$ sudo mkdir -p /nfs-share/kubeflow
# 可以在 `/nfs-share`資料夾建立多個pv資料夾,提供kubeflow儲存使用
$ cd /nfs-share
$ echo "/nfs-share xxx.xxx.xxx.xxx(rw,sync,no_root_squash,no_subtree_check)" | sudo tee -a /etc/exports
$ sudo /etc/init.d/nfs-kernel-server restart
# Check
$ showmount -e xxx.xxx.xxxx.xxx
# 在 node 執行
$ sudo apt-get update && sudo apt-get install -y nfs-common
$ showmount -e xxx.xxx.xxx.xxx
$ mkdir -p /nfs-share/kubeflow/
$ sudo mount -t nfs xxx.xxx.xxx.xxx:/nfs-share/kubeflow/ /nfs-share/kubeflow/
# Check
$ mount | grep addr
$ sudo vim /etc/fstab
# 加入以下指令,使開機自動掛載
xxx.xxx.xxx.xxx:/nfs-share/kubeflow/ nfs defaults 0 0
```
透過K8s管理
+ 新增Service Account`nfs-client-provisioner`,賦予NFS provisioner的權限
+ 部署NFS provisioner讓其能夠
+ 在NFS share中產生volume
+ 新增PV
+ 告訴PVC已經完成PV的設定,使兩者互相binding
+ 建立StorageClass使用部署好的NFS provisioner
+ 將新建立好的nfs設為default StorageClass
```bash
$ cd $HOME
$ wget https://lsalab.cs.nthu.edu.tw/~riya/deploy/deploy_nfs.tar.gz
$ tar zxvf deploy_nfs.tar.gz
$ cd deploy_nfs/
# Set the subject of the RBAC objects to the your own namespace where the provisioner is being deployed
# 設定變數
$ NS=$(kubectl config get-contexts|grep -e "^\*" |awk '{print $5}')
$ NAMESPACE=${NS:-nfs}
# 替換namespace名稱
$ sed -i "s/namespace:.*/namespace: $NAMESPACE/g" ./rbac.yaml ./deployment.yaml
# 換成自己內網的IP
$ export NFS_ADDRESS='xxx.xxx.xxx.xxx'
$ export NFS_DIR='\/nfs-share\/kubeflow'
$ export PROVISIONER_NAME="nthu.laslab.com\/nfs"
$ kubectl create namespace ${NAMESPACE}
$ sed -i'' "s/\${PROVISIONER_NAME}/$PROVISIONER_NAME/g" ./class.yaml ./deployment.yaml
$ sed -i'' "s/\${NFS_ADDRESS}/$NFS_ADDRESS/g" ./deployment.yaml
$ sed -i'' "s/\${NFS_DIR}/$NFS_DIR/g" ./deployment.yaml
$ kubectl apply -f ./rbac.yaml
$ kubectl apply -f ./deployment.yaml
$ kubectl apply -f ./class.yaml
kubectl apply -f test-claim.yaml
$ kubectl patch storageclass managed-nfs-storage -p '{"metadata": {"annotations":{"storageclass.kubernetes.io/is-default-class":"true"}}}'
```
測試
```bash
$ kubectl apply -f test-claim.yaml
$ kubectl apply -f test-pod.yaml
# check the status
$ ls /nfs-share/kubeflow/
$ kubectl get pod
```
## Kubeflow
**Note:** This is the version describing that how to install Kubeflow in the existing Kubernetes cluster on the ubuntu hosts.
### Before You Start
+ 如果有需要使用**GPU**進行計算:
+ Node需安裝`Nvidia driver`、`CUDA`、`Nvidia Docker`
+ K8s Cluster需安裝`nvidia device plugin`
+ 設置Kubeflow需要**Dynamic volume provisioner**
+ 透過Volumes來儲存資料,在此選用NFS(視需求選擇)
### 安裝kfctl工具
``` bash
$ wget https://lsalab.cs.nthu.edu.tw/~riya/deploy/kubernetes/kfctl_v1.0.2-0-ga476281_linux.tar.gz
# https://github.com/kubeflow/kfctl/releases/download/v1.0.2/kfctl_v1.0.2-0-ga476281_linux.tar.gz
$ tar zxvf kfctl_v1.0.2-0-ga476281_linux.tar.gz
$ sudo mv ./kfctl /usr/local/bin/kfctl
```
### 兩種Mode擇一安裝
+ Install Kubeflow in existing Kubernetes cluster
+ Multi-user, auth-enabled Kubeflow with kfctl_istio_dex
### **Install Kubeflow in existing Kubernetes cluster**
#### 設置環境變數
**Note:請使用環境變數的方式設定,因yaml會根據此來作設定**
``` bash
# 在安裝kubeflow過程中,生成的設定檔存放的資料夾名稱
# 舉例來說, deployment name 可以是 'my-kubeflow' or 'kf-test'.
$ export KF_NAME=<your choice of name for the Kubeflow deployment>
# 放置Kubeflow project的資料夾路徑
$ export BASE_DIR=<path to a base directory>
# 部署Kubeflow的資料夾絕對路徑
$ export KF_DIR=${BASE_DIR}/${KF_NAME}
$ mkdir -p ${KF_DIR}
$ cd ${KF_DIR}
$ export CONFIG_URI=https://raw.githubusercontent.com/kubeflow/manifests/v1.0-branch/kfdef/kfctl_k8s_istio.v1.0.2.yaml
$ kfctl apply -V -f ${CONFIG_URI}
```
#### Check
```bash
$ kubectl -n kubeflow get all
$ kubectl port-forward svc/istio-ingressgateway -n istio-system --address xxx.xxx.xxx.xxx 8081:80
```
---
### **Multi-user, auth-enabled Kubeflow with kfctl_istio_dex**
#### **Note:**
+ **Disabling istio installation:**
如果k8s cluster已經安裝過Istio,可以選擇不安裝 kfctl_istio_dex.v1.0.2.yaml中的`istio-crds`,`istio-install`
+ **Istio configuration for trustworthy JWTs:**
這項配置使用於Istio v1.3.1 with SDS enabled
```bash
$ sudo vim /etc/kubernetes/manifests/kube-apiserver.yaml
# 在spec/containers/command中增加
- --feature-gates=TokenRequest=true
- --service-account-signing-key-file=/etc/kubernetes/pki/sa.key
- --service-account-issuer=kubernetes.default.svc
- --service-account-api-audiences=api
```
+ **Default password in static file configuration for Dex :**
+ 在kfctl_istio_dex.v1.0.2.yaml配置檔中,預設包含[staticPasswords](https://github.com/dexidp/dex/blob/0f8c4db9f61476a8f80e60f5950992149a1cc0cb/examples/config-dev.yaml#L91-L95)使用者
+ email:`admin@kubeflow.org`
+ password:`12341234`
+ You should change this configuration or replace it with a [Dex connector](https://github.com/dexidp/dex#connectors)
#### 設置環境變數
**Note:請使用環境變數的方式設定,因yaml會根據此來作設定**
``` bash
# 在安裝kubeflow過程中,生成的設定檔存放的資料夾名稱
# 舉例來說, deployment name 可以是 'my-kubeflow' or 'kf-test'.
$ export KF_NAME=<your choice of name for the Kubeflow deployment>
# 放置Kubeflow project的資料夾路徑
$ export BASE_DIR=<path to a base directory>
# 部署Kubeflow的資料夾絕對路徑
$ export KF_DIR=${BASE_DIR}/${KF_NAME}
$ mkdir -p ${KF_DIR}
$ cd ${KF_DIR}
$ export CONFIG_URI="https://raw.githubusercontent.com/kubeflow/manifests/v1.0-branch/kfdef/kfctl_istio_dex.v1.0.2.yaml"
$ wget -O kfctl_istio_dex.yaml $CONFIG_URI
$ export CONFIG_FILE=${KF_DIR}/kfctl_istio_dex.yaml
# 預設使用者帳號密碼為: admin@kubeflow.org:12341234
# 透過配置檔修改
# 密碼加密:https://passwordhashing.com/BCrypt
# 若config_file中metadata缺少clusterName,需手動補上
# clusterName可至~/.kube/config查看
$ vim $CONFIG_FILE
$ sudo kfctl apply -V -f ${CONFIG_FILE}
```

#### Check
```bash
$ kubectl -n kubeflow get all
$ kubectl port-forward svc/istio-ingressgateway -n istio-system 8081:80
```
#### 增加使用者
```bash
# 取得當前dex使用的配置檔
$ kubectl get configmap dex -n auth -o jsonpath='{.data.config\.yaml}' > dex-config.yaml
# 預設使用者帳號密碼為: admin@kubeflow.org:12341234
# 透過配置檔修改
# 密碼加密:https://passwordhashing.com/BCrypt
# 新增使用者
$ vim dex-config.yaml
# 更新Configmap
$ kubectl create configmap dex --from-file=config.yaml=dex-config.yaml -n auth --dry-run -oyaml | kubectl apply -f -
# 更新dex
$ kubectl rollout restart deployment dex -n auth
# 因版本關係沒有restart的話,直接刪除namespace auth 底下的的dex pod,讓deployment重新啟動一個
```
