Deploy Ollama & Open WebUI on RKE2 with Rancher

# Deploy Ollama & Open WebUI on RKE2 with Rancher ## PreRequest Rancher * CPU: 4C * MEM: 16G * Disk: 70 G SSD nvme * OS: SLES 15 SP6 RKE2 * CPU: 4C * MEM: 16G * Disk: 70 G SSD nvme * GPU: NVIDIA GeForce RTX 4060 Ti * OS: SLES 15 SP6 * VM 需先加入 GPU ![image](https://hackmd.io/_uploads/HysHgydDC.png) ## Software * 已安裝 Rancher * 已透過 Rancher 生成 RKE2 ## 1. Install Nvidia Container Runtime on RKE2 ``` #1. SSH RKE2 node $ ssh <user>@<RKE2 node ip> #2. 安裝 gcc 與 kernel-devel $ sudo zypper mr -ea && \ sudo zypper -n in gcc kernel-devel nvidia-container-toolkit # 檢查所有對應版本是否一致 $ rpm -qa | grep -E "kernel-default-devel|kernel-default|kernel-devel|kernel-macros" #3. 安裝 nvidia driver $ v="555.58.02" && \ curl -L -O https://us.download.nvidia.com/XFree86/Linux-x86_64/"$v"/NVIDIA-Linux-x86_64-"$v".run && \ sudo sh NVIDIA-Linux-x86_64-"$v".run # 4. 重啟主機 $ sudo reboot ``` ``` # 5. test $ nvidia-smi Wed Jul 3 09:20:46 2024 +-----------------------------------------------------------------------------------------+ | NVIDIA-SMI 555.58.02 Driver Version: 555.58.02 CUDA Version: 12.5 | |-----------------------------------------+------------------------+----------------------+ | GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |=========================================+========================+======================| | 0 NVIDIA GeForce RTX 4060 Ti Off | 00000000:00:10.0 Off | N/A | | 0% 47C P8 15W / 165W | 4MiB / 16380MiB | 0% Default | | | | N/A | +-----------------------------------------+------------------------+----------------------+ +-----------------------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=========================================================================================| | No running processes found | +-----------------------------------------------------------------------------------------+ ``` ``` # 6. Update RKE2 Containerd Configuration ## 6.1. Check $ ls -l /usr/bin/nvidia-container-runtime -rwxr-xr-x 1 root root 4319136 Oct 20 2022 /usr/bin/nvidia-container-runtime ## 6.2. Backup $ sudo cp /var/lib/rancher/rke2/agent/etc/containerd/config.toml . ## 6.3. Update $ sudo cp /var/lib/rancher/rke2/agent/etc/containerd/config.toml /var/lib/rancher/rke2/agent/etc/containerd/config.toml.tmpl $ sudo nano -Yone /var/lib/rancher/rke2/agent/etc/containerd/config.toml.tmpl ## 將以下內容加到檔案的最後面 [plugins."io.containerd.grpc.v1.cri".containerd.runtimes."nvidia"] runtime_type = "io.containerd.runc.v2" [plugins."io.containerd.grpc.v1.cri".containerd.runtimes."nvidia".options] BinaryName = "/usr/bin/nvidia-container-runtime" ## 6.4. 重啟主機 $ sudo reboot # 6.5. 刪除 tmpl 檔 $ sudo rm /var/lib/rancher/rke2/agent/etc/containerd/config.toml.tmpl ``` ## 2. Install gpu-operator on Rancher ### 2.1. Add Nvidia Helm Repository * 點選 Cluster -> Apps -> Repositories -> Create ``` name: nvidia Index URL: https://helm.ngc.nvidia.com/nvidia ``` ![image](https://hackmd.io/_uploads/rJjkKRvPC.png) ![image](https://hackmd.io/_uploads/HyygFAwvA.png) * 添加完後，檢查狀態是 Active ![image](https://hackmd.io/_uploads/Sy9ZFCww0.png) ### 2.2. 開始安裝 gpu-operator * Apps -> Charts -> 搜尋 gpu-operator -> 點選 gpu-operator ![image](https://hackmd.io/_uploads/r1OmKCPvC.png) * 點選右上角 Install 按鈕 ![image](https://hackmd.io/_uploads/BkGNKCDvC.png) > Namespace: Create a new namespace -> 輸入 `nvidia` > Name: 輸入 `nvidia` > 勾選 Customize Helm options before install > Next ![image](https://hackmd.io/_uploads/B1C8YAPPA.png) * 修改 Containerd 資訊 ``` toolkit: enabled: true env: - name: CONTAINERD_CONFIG value: /var/lib/rancher/rke2/agent/etc/containerd/config.toml - name: CONTAINERD_SOCKET value: /run/k3s/containerd/containerd.sock ``` ![image](https://hackmd.io/_uploads/rJedY0vD0.png) > 點選右下角 `Install` 按鈕安裝 > > 檢查所有的 Pod 狀態都是 Running > > Workloads -> Pods -> nvidia namespace * 檢查 ``` $ kubectl -n nvidia get po NAME READY STATUS RESTARTS AGE gpu-feature-discovery-gb455 1/1 Running 1 62s gpu-operator-5cb66cf8df-ck74b 1/1 Running 0 79s nvidia-container-toolkit-daemonset-6snlm 1/1 Running 0 63s nvidia-cuda-validator-z9bwc 0/1 Completed 0 47s nvidia-dcgm-exporter-4klw5 1/1 Running 0 63s nvidia-device-plugin-daemonset-6d4wn 1/1 Running 1 63s nvidia-node-feature-discovery-gc-59fb9585fb-vn9mc 1/1 Running 0 79s nvidia-node-feature-discovery-master-579469ff77-kjfwc 1/1 Running 0 79s nvidia-node-feature-discovery-worker-86hp8 1/1 Running 0 79s nvidia-operator-validator-h77t5 1/1 Running 0 63s ``` * 檢查 nvidia runtimeclass ``` $ kubectl get runtimeclass NAME HANDLER AGE nvidia nvidia 3m37s ``` ### 2.3. Test GPU Pods ``` # 1. sample1 $ cat << EOF | kubectl create -f - apiVersion: v1 kind: Pod metadata: name: cuda-vectoradd spec: restartPolicy: OnFailure runtimeClassName: nvidia containers: - name: cuda-vectoradd image: "nvidia/samples:vectoradd-cuda11.2.1" resources: limits: nvidia.com/gpu: 1 EOF # 2. 測試成功 log 資訊 $ kubectl logs cuda-vectoradd [Vector addition of 50000 elements] Copy input data from the host memory to the CUDA device CUDA kernel launch with 196 blocks of 256 threads Copy output data from the CUDA device to the host memory Test PASSED Done # 3. Delete Sample1 pod $ kubectl delete pod cuda-vectoradd # 4. sample2 $ cat << EOF | kubectl create -f - apiVersion: v1 kind: Pod metadata: name: nbody-gpu-benchmark namespace: default spec: restartPolicy: OnFailure runtimeClassName: nvidia containers: - name: cuda-container image: nvcr.io/nvidia/k8s/cuda-sample:nbody args: ["nbody", "-gpu", "-benchmark"] resources: limits: nvidia.com/gpu: 1 env: - name: NVIDIA_VISIBLE_DEVICES value: all - name: NVIDIA_DRIVER_CAPABILITIES value: all EOF # 5. 測試成功 log 資訊 $ kubectl logs nbody-gpu-benchmark Run "nbody -benchmark [-numbodies=<numBodies>]" to measure performance. -fullscreen (run n-body simulation in fullscreen mode) -fp64 (use double precision floating point values for simulation) -hostmem (stores simulation data in host memory) -benchmark (run benchmark to measure performance) -numbodies=<N> (number of bodies (>= 1) to run in simulation) -device=<d> (where d=0,1,2.... for the CUDA device to use) -numdevices=<i> (where i=(number of CUDA devices > 0) to use for simulation) -compare (compares simulation results running once on the default GPU and once on the CPU) -cpu (run n-body simulation on the CPU) -tipsy=<file.bin> (load a tipsy model file for simulation) NOTE: The CUDA Samples are not meant for performance measurements. Results may vary when GPU Boost is enabled. > Windowed mode > Simulation data stored in video memory > Single precision floating point simulation > 1 Devices used for simulation MapSMtoCores for SM 8.9 is undefined. Default to use 128 Cores/SM MapSMtoArchName for SM 8.9 is undefined. Default to use Ampere GPU Device 0: "Ampere" with compute capability 8.9 > Compute 8.9 CUDA device: [NVIDIA GeForce RTX 4060 Ti] 34816 bodies, total time for 10 iterations: 19.339 ms = 626.784 billion interactions per second = 12535.677 single-precision GFLOP/s at 20 flops per interaction # 6. Delete Sample2 pod $ kubectl delete pod nbody-gpu-benchmark ``` ## 3. Local Path Provisioner 並設為預設 * 安裝指令 ``` $ kubectl apply -f https://raw.githubusercontent.com/rancher/local-path-provisioner/v0.0.28/deploy/local-path-storage.yaml ``` ``` $ kubectl get storageclass NAME PROVISIONER RECLAIMPOLICY VOLUMEBINDINGMODE ALLOWVOLUMEEXPANSION AGE local-path rancher.io/local-path Delete WaitForFirstConsumer false 108m ``` * 設定為預設 storageclass ``` $ kubectl patch storageclass local-path -p '{"metadata": {"annotations":{"storageclass.kubernetes.io/is-default-class":"true"}}}' ``` ## 4. Install Ollama & Open WebUI with Helm ### 4.1. Add Openwebui Helm Repository * 點選 Cluster -> Apps -> Repositories -> Create ![image](https://hackmd.io/_uploads/SyTz9CPvR.png) ``` name: openwebui Index URL: https://helm.openwebui.com/ ``` ![image](https://hackmd.io/_uploads/rkg4qRDwA.png) ### 4.2. 開始安裝 Ollama & Open WebUI * Apps -> Charts -> 搜尋 `open-webui` -> 點選 `open-webui` ![image](https://hackmd.io/_uploads/ryzosRww0.png) * 點選右上角 `Install` 按鈕 ![image](https://hackmd.io/_uploads/HkCjiCwwC.png) > Namespace: Create a new namespace -> 輸入 `ollama` > Name: 輸入 `ollama` > 勾選 Customize Helm options before install > Next > > 編輯 `values.yaml` > > 啟用 ingress ``` ingress: annotations: {} class: nginx enabled: true existingSecret: '' host: ollama.example.com tls: false ``` ![image](https://hackmd.io/_uploads/H1SCiRwDA.png) * 新增 ollama module 使用 `llama3`、啟用 PV 和 Container Runtime 使用 nvidia ``` ollama: enabled: true fullnameOverride: open-webui-ollama ollama: gpu: enabled: true number: 1 type: nvidia models: - llama3 persistentVolume: enabled: true runtimeClassName: nvidia ``` ![image](https://hackmd.io/_uploads/H1We2CwwR.png) * 調整 `openaiBaseApiUrl` ``` openaiBaseApiUrl: open-webui-ollama.ollama.svc.cluster.local ``` * 調整永存目錄區，使用 `Local Path Provisioner` ``` persistence: accessModes: - ReadWriteOnce annotations: {} enabled: true existingClaim: '' selector: {} size: 2Gi storageClass: local-path ``` ![image](https://hackmd.io/_uploads/Syn-41dvR.png) * 調整好後，按右下角 Next 按鈕，然後點 `Install` 安裝 ![image](https://hackmd.io/_uploads/Syzi2AwwA.png) * 檢查 Pod 狀態皆為 Running ![image](https://hackmd.io/_uploads/H15jhRwv0.png) * 修改 Web-UI 不要做身分驗證 * Workloads -> StatefulSets -> ollama namespace -> Edit Config ![image](https://hackmd.io/_uploads/HkhWTCDwC.png) * 點選 Add Variable ``` Variable Name: WEBUI_AUTH Value: False ``` * 輸入完畢後點擊 Save 按鈕存檔 ![image](https://hackmd.io/_uploads/rkb7aRwvR.png) * 設定 DNS 新增 A record ``` $ kubectl -n ollama get ing NAME CLASS HOSTS ADDRESS PORTS AGE open-webui nginx ollama.example.com 172.20.0.52 80 2d17h ``` * 將 Ingress 的 HOSTS 的 FQDN 和 ADDRESS 的 IP 加進 DNS A record，或是在有瀏覽器的主機中的 `/etc/hosts` 新增名稱解析 > windows 10/11 的 hosts 檔案在 C:\Windows\System32\drivers\etc\hosts * 點選 Service Discovery -> Ingresses -> 按網址 ![image](https://hackmd.io/_uploads/Bk-Da0wDC.png) * 成功進到 OpenWebUI ![image](https://hackmd.io/_uploads/rke_TAvP0.png) ## 環境清除 ``` $ helm -n ollama uninstall open-webui $ helm -n ollama uninstall milvus $ for i in $(kubectl -n ollama get pvc -o name); do kubectl -n ollama delete ${i}; done $ kubectl delete ns ollama ``` ## 參考 https://documentation.suse.com/suse-ai/1.0/html/AI-deployment-intro/index.html#ollama-installing https://milvus.io/docs/install_cluster-helm.md