--- tags: Kubernetes --- # Kubernetes Installtion with Docker Installation, NVIDIA T4 Driver and Kubernetes Dashboard Installation ###### tags: NVIDIA, Kubernetes, Docker, Kubernetes-dashboard ## HW Equipment Hardware System: SCB 1921B-AA1 OS: Ubuntu 18.04 LTS, kernel 5.4.0-42 GPU: NVIDIA GEFORCE GTX 1050 ## Docker Installtion Install the prerequsities: ```javascript= $ sudo apt-get update $ sudo apt-get install \ apt-transport-https \ ca-certificates \ curl \ gnupg-agent \ software-properties-common ``` Add gpg key: ```javascript= $ curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo apt-key add - $ sudo apt-key fingerprint 0EBFCD88 Make sure result is like this: 9DC8 5822 9FC7 DD38 854A E2D8 8D81 803C 0EBF CD88 ``` Add repository: ```javascript= $ sudo add-apt-repository \ "deb [arch=amd64] https://download.docker.com/linux/ubuntu \ $(lsb_release -cs) \ stable" $ sudo apt-get update ``` Docker Installtion: ```javascript= $ sudo apt-get install docker-ce docker-ce-cli containerd.io ``` Check the Docker Version: ```javascript= $ docker version ``` ## Now Installtion the NVIDIA Driver download .run installing driver file from NVIDIA website: > Note: Select your model of Nvida and OS before Downlaoding: https://www.nvidia.com/Download/index.aspx?lang=en-us before installtion blacklist nouveau driver create a file: ```javascript= $ sudo vim /etc/modprobe.d/blacklist-nouveau.conf ``` in blacklist-nouveau.conf ```javascript= blacklist nouveau options nouveau modeset=0 ``` save the file and exit final: ```javascript= $ sudo update-initramfs -u $ sudo reboot ``` After restarting, we can use the following command to confirm whether Nouveau has stopped working: ```javascript= lsmod | grep nouveau ``` If nothing is printed, then congratulations! You have disabled Nouveau's kernel driver. Now we can try again to see if we can install Nvidia's official driver make it excutable ```javascript= $ chmod +x NVIDIA-Linux-x86_64-460.32.03.run //make it executable ``` install gcc and make ```javascript= $ sudo apt-get install gcc make ``` installing nvidia driver ```javascript= $ ./NVIDIA-Linux-x86_64-460.32.03.run //name of file may be different, depends on the version which you download from ``` in .run, there're some warnings, just choose continue installing item and finish the installing procedure and ```javascript= $ reboot ``` after reboot, press nvidia-smi to see the driver is OK or not ```javascript= $ nvidia-smi ``` The output would be like this ![](https://i.imgur.com/mffES3E.png) ## Kubernetes Installtion Do the SWAPOFF: ```javascript= $ sudo su $ swapoff –a ``` **Optional Step: $ nano /etc/fstab add # into the following line like this: #UUID=45fc9fe6-6500-4bca-864e-1effad4764b3 and save ** Install the prerequsities: ```javascript= $ sudo apt-get update $ sudo apt-get install -y apt-transport-https ca-certificates curl $ curl -s https://packages.cloud.google.com/apt/doc/apt-key.gpg | apt-key add - ``` Adding the kubernetes repository into the update list: ```javascript= cat <<EOF >/etc/apt/sources.list.d/kubernetes.list deb http://apt.kubernetes.io/ kubernetes-xenial main EOF ``` update: ```javascript= $ sudo apt-get update ``` install kubeadm kubectl kubelet: ```javascript= $ apt-get install -y kubelet kubeadm kubectl ``` check the versions ```javascript= $ docker -v Docker version 17.03.2-ce, build f5ec1e2 $ kubectl version Client Version: version.Info{Major:"1", Minor:"12", GitVersion:"v1.12.1", GitCommit:"4ed3216f3ec431b140b1d899130a69fc671678f4", GitTreeState:"clean", BuildDate:"2018-10-05T16:46:06Z", GoVersion:"go1.10.4", Compiler:"gc", Platform:"linux/amd64"} The connection to the server localhost:8080 was refused - did you specify the right host or port? $ kubeadm version kubeadm version: &version.Info{Major:"1", Minor:"12", GitVersion:"v1.12.1", GitCommit:"4ed3216f3ec431b140b1d899130a69fc671678f4", GitTreeState:"clean", BuildDate:"2018-10-05T16:43:08Z", GoVersion:"go1.10.4", Compiler:"gc", Platform:"linux/amd64"} $ kubelet --version Kubernetes v1.12.1 ``` Set up the environment driver in config file: ```javascript= $ gedit /etc/systemd/system/kubelet.service.d/10.kubeadm.conf ``` ![](https://i.imgur.com/LRP4uAX.png) Start cluster: ```javascript= $ kubeadm init --pod-network-cidr=10.244.0.0/16 *###This will take 3-4 mintues##* ``` Run the following commands as non-root user: ```javascript= $ mkdir -p $HOME/.kube $ sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config $ sudo chown $(id -u):$(id -g) $HOME/.kube/config ``` cluster information: ```javascript= $ kubectl cluster-info Kubernetes master is running at https://10.132.0.2:6443 KubeDNS is running at https://10.132.0.2:6443/api/v1/namespaces/kube-system/services/kube-dns:dns/proxy To further debug and diagnose cluster problems, use 'kubectl cluster-info dump'. $ kubectl get no -o wide NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME kube-master-1 NotReady master 4m26s v1.12.1 10.132.0.2 <none> Ubuntu 16.04.5 LTS 4.15.0-1021-gcp docker://17.3.2 $ kubectl get all --all-namespaces NAMESPACE NAME READY STATUS RESTARTS AGE kube-system pod/coredns-576cbf47c7-lw7jv 0/1 Pending 0 4m55s kube-system pod/coredns-576cbf47c7-ncx8w 0/1 Pending 0 4m55s kube-system pod/etcd-kube-master-1 1/1 Running 0 4m23s kube-system pod/kube-apiserver-kube-master-1 1/1 Running 0 3m59s kube-system pod/kube-controller-manager-kube-master-1 1/1 Running 0 4m17s kube-system pod/kube-proxy-bwrwh 1/1 Running 0 4m55s kube-system pod/kube-scheduler-kube-master-1 1/1 Running 0 4m10s NAMESPACE NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE default service/kubernetes ClusterIP 10.96.0.1 <none> 443/TCP 5m15s kube-system service/kube-dns ClusterIP 10.96.0.10 <none> 53/UDP,53/TCP 5m9s NAMESPACE NAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE SELECTOR AGE kube-system daemonset.apps/kube-proxy 1 1 1 1 1 <none> 5m8s NAMESPACE NAME DESIRED CURRENT UP-TO-DATE AVAILABLE AGE kube-system deployment.apps/coredns 2 2 2 0 5m9s NAMESPACE NAME DESIRED CURRENT READY AGE kube-system replicaset.apps/coredns-576cbf47c7 2 2 0 4m56s ``` Install CNI (I prefer weave):: ```javascript= $kubectl apply -f “https://cloud.weave.works/k8s/net?k8s-version=$(kubectl version | base64 | tr -d ‘\n’)” clusterrole.rbac.authorization.k8s.io/weave-net created clusterrolebinding.rbac.authorization.k8s.io/weave-net created . . . ``` Confirm with those commands: ```javascript= $ kubectl get no -o wide NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME kube-master-1 Ready master 9m15s v1.12.1 10.132.0.2 <none> Ubuntu 16.04.5 LTS 4.15.0-1021-gcp docker://17.3.2 ``` ![](https://i.imgur.com/TZl6lko.png) Deploy the Dashboard ```javascript= kubectl apply -f https://raw.githubusercontent.com/kubernetes/dashboard/v2.0.0/aio/deploy/recommended.yaml ``` ![](https://i.imgur.com/BzxyzpM.png) Now we have to creat the admin-user for access the dashboard ```javascript= kubectl apply -f https://gist.githubusercontent.com/chukaofili/9e94d966e73566eba5abdca7ccb067e6/raw/0f17cd37d2932fb4c3a2e7f4434d08bc64432090/k8s-dashboard-admin-user.yaml ``` ![](https://i.imgur.com/HEYtDq4.png) copy the key and used for login ![](https://i.imgur.com/GG6CKCw.png) sign in and see the GUI ![](https://i.imgur.com/3bvplUk.png) By default, your cluster will not schedule pods on the control-plane node for security reasons. In order to install Kubeflow we need to change this, so we need to run the following command: ```javascript= $ kubectl taint nodes --all node-role.kubernetes.io/master- ``` In order to verify the functionality, run: ```javascript= $ sudo kubectl get all -A ``` Afterwards, run: ```javascript= $ sudo kubectl get nodes ``` #### The output should show you the status of your node followed by the message “READY”. ### *Error: The Connection to the server was refused, did you specify the right host or port? Solution:* ```javascript= $ sudo -i $ swapoff -a $ exit $ strace -eopenat kubectl version ``` ![](https://i.imgur.com/Tz1Fcdo.png) Go Installation: (If you install some other version then donwload the specific version here I insatll go1.16.2) ```javascript= $ sudo wget https://golang.org/dl/go1.16.2.linux-amd64.tar.gz $ sudo tar -C /usr/local -xzf go1.16.2.linux-amd64.tar.gz $ export PATH=$PATH:/usr/local/go/bin $ go version ``` ![](https://i.imgur.com/AIhQwur.png) Kustomize Installation (Not working in my case) ```javascript= GOBIN=$(pwd)/ GO111MODULE=on go get sigs.k8s.io/kustomize/kustomize/v3 ``` ![](https://i.imgur.com/CxUaYOR.png) Kustomize Installation (Alternative) ![](https://i.imgur.com/AdtMImH.png) ## Provisioning of Persistent Volumes in Kubernetes ## Local Path Provisioner ### Overview Local Path Provisioner provides a way for the Kubernetes users to utilize the local storage in each node. Based on the user configuration, the Local Path Provisioner will create hostPath based persistent volume on the node automatically. It utilizes the features introduced by Kubernetes Local Persistent Volume feature, but make it a simpler solution than the built-in local volume feature in Kubernetes. Compare to built-in Local Persistent Volume feature in Kubernetes ### Pros Dynamic provisioning the volume using hostPath. Currently the Kubernetes Local Volume provisioner cannot do dynamic provisioning for the local volumes. ### Cons No support for the volume capacity limit currently. The capacity limit will be ignored for now. ### Requirement Kubernetes v1.12+. ### Deployment ### Installation In this setup, the directory /opt/local-path-provisioner will be used across all the nodes as the path for provisioning (a.k.a, store the persistent volume data). The provisioner will be installed in local-path-storage namespace by default. ```javascript= kubectl apply -f https://raw.githubusercontent.com/rancher/local-path-provisioner/master/deploy/local-path-storage.yaml ``` After installation, you should see something like the following: ```javascript= $ kubectl -n local-path-storage get pod NAME READY STATUS RESTARTS AGE local-path-provisioner-d744ccf98-xfcbk 1/1 Running 0 7m ``` Check and follow the provisioner log using: ```javascript= $ kubectl -n local-path-storage logs -f -l app=local-path-provisioner ``` Usage Create a hostPath backend Persistent Volume and a pod uses it: ```javascript= kubectl create -f https://raw.githubusercontent.com/rancher/local-path-provisioner/master/examples/pvc/pvc.yaml kubectl create -f https://raw.githubusercontent.com/rancher/local-path-provisioner/master/examples/pod/pod.yaml ``` You should see the PV has been created: ```javascript= $ kubectl get pv NAME CAPACITY ACCESS MODES RECLAIM POLICY STATUS CLAIM STORAGECLASS REASON AGE pvc-bc3117d9-c6d3-11e8-b36d-7a42907dda78 2Gi RWO Delete Bound default/local-path-pvc local-path 4s ``` The PVC has been bound: ```javascript= $ kubectl get pvc NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE local-path-pvc Bound pvc-bc3117d9-c6d3-11e8-b36d-7a42907dda78 2Gi RWO local-path 16s ``` And the Pod started running: ```javascript= $ kubectl get pod NAME READY STATUS RESTARTS AGE volume-test 1/1 Running 0 3s ``` ## Kubeflow installation First we have download the kubeflow tar file from this link: ```javascript= https://github.com/kubeflow/kfctl/releases/tag/v1.2.0 ``` Open a command prompt window in the same folder where you unzip the binary and run the command: ```javascript= $ export PATH=$PATH:$PWD ``` Next, run the following lines: ```javascript= $ export KF_NAME=kf-cluster $ export BASE_DIR=$HOME $ export KF_DIR=${BASE_DIR}/${KF_NAME} $ mkdir -p ${KF_DIR} $ cd ${KF_DIR} ``` #### ${KF_NAME} — The name of your Kubeflow deployment. If you want a custom deployment name, specify that name here. For example, my-kubeflow or kf-test. The value of KF_NAME must only contain lower case alphanumeric characters and ‘-’(dashes), and must start and end with an alphanumeric character. The value of this variable cannot be greater than 25 characters. It must only contain a name, not a whole directory path. You will also use this value as a directory name when creating the directory where your Kubeflow configurations are stored (the Kubeflow application directory). #### ${KF_DIR} — The full path to your Kubeflow application directory. In ${KF_DIR}, make a new file and paste the following lines: Save the .yaml file with the name “kfctl_k8s_istio.yaml” ```javascript= apiVersion: kfdef.apps.kubeflow.org/v1 kind: KfDef metadata: namespace: kubeflow spec: applications: - kustomizeConfig: repoRef: name: manifests path: namespaces/base name: namespaces - kustomizeConfig: repoRef: name: manifests path: application/v3 name: application - kustomizeConfig: repoRef: name: manifests path: stacks/kubernetes/application/istio-1-3-1-stack name: istio-stack - kustomizeConfig: repoRef: name: manifests path: stacks/kubernetes/application/cluster-local-gateway-1-3-1 name: cluster-local-gateway - kustomizeConfig: repoRef: name: manifests path: istio/istio/base name: istio - kustomizeConfig: repoRef: name: manifests path: stacks/kubernetes/application/cert-manager-crds name: cert-manager-crds - kustomizeConfig: repoRef: name: manifests path: stacks/kubernetes/application/cert-manager-kube-system-resources name: cert-manager-kube-system-resources - kustomizeConfig: repoRef: name: manifests path: stacks/kubernetes/application/cert-manager name: cert-manager - kustomizeConfig: repoRef: name: manifests path: stacks/kubernetes/application/add-anonymous-user-filter name: add-anonymous-user-filter - kustomizeConfig: repoRef: name: manifests path: metacontroller/base name: metacontroller - kustomizeConfig: repoRef: name: manifests path: admission-webhook/bootstrap/overlays/application name: bootstrap - kustomizeConfig: repoRef: name: manifests path: stacks/kubernetes/application/spark-operator name: spark-operator - kustomizeConfig: repoRef: name: manifests path: stacks/kubernetes name: kubeflow-apps - kustomizeConfig: repoRef: name: manifests path: knative/installs/generic name: knative - kustomizeConfig: repoRef: name: manifests path: kfserving/installs/generic name: kfserving # Spartakus is a separate applications so that kfctl can remove it # to disable usage reporting - kustomizeConfig: repoRef: name: manifests path: stacks/kubernetes/application/spartakus name: spartakus repos: - name: manifests uri: https://github.com/kubeflow/manifests/archive/v1.2.0.tar.gz version: v1.2-branch ``` Create your Kubeflow configurations: ```javascript= mkdir -p ${KF_DIR} cd ${KF_DIR} kfctl build -V -f ${CONFIG_URI} ``` Next, run the .yaml file using the command: The deployment should start shortly. This might take a few minutes. ```javascript= $ kfctl apply -V -f kfctl_k8s_isitio.yaml ``` First we have download the kubeflow tar file from this link: ```javascript= https://github.com/kubeflow/kfctl/releases/tag/v1.2.0 ``` First we have download the kubeflow tar file from this link: ```javascript= https://github.com/kubeflow/kfctl/releases/tag/v1.2.0 ``` First we have download the kubeflow tar file from this link: ```javascript= https://github.com/kubeflow/kfctl/releases/tag/v1.2.0 ``` First we have download the kubeflow tar file from this link: ```javascript= https://github.com/kubeflow/kfctl/releases/tag/v1.2.0 ``` First we have download the kubeflow tar file from this link: ```javascript= https://github.com/kubeflow/kfctl/releases/tag/v1.2.0 ``` ``` ## source: 1. https://clay-atlas.com/blog/2020/02/11/linux-chinese-note-nvidia-driver-nouveau-kernel/ 2. https://askubuntu.com/questions/841876/how-to-disable-nouveau-kernel-driver 3. https://docs.docker.com/engine/install/ubuntu/ 4. https://kubernetes.io/docs/setup/production-environment/tools/kubeadm/install-kubeadm/ 5. https://stackoverflow.com/questions/52720380/kubernetes-api-server-is-not-starting-on-a-single-kubeadm-cluster 6. https://golang.org/doc/install#install 7. https://github.com/rancher/local-path-provisioner/