--- title: 【Survey】 NVIDIA Clara Deploy SDK - 安裝篇 date: 2021-03-07 23:53 is_modified: false disqus: cynthiahackmd categories: - "智慧計算 › 人工智慧" - "資訊科技 › 開發與輔助工具" tags: - "AI/ML" - "Clara" - "NVIDIA" - "醫學影像" - "工具安裝與部署" --- {%hackmd @CynthiaChuang/Github-Page-Theme %} <br> Deploy SDK Part2! 原本想在[上一篇](/@CynthiaChuang/NVIDIA-Clara-Deploy-SDK)一起解決的,不過最後範例程式碼架上選染效果後真的超級長的,最後決定來開一篇新的寫安裝步驟好了。 但是說這篇真的拖有夠久的,產出速度比不上待寫文章的新增速度,結果草稿越積越多 Orz <!--more--> <p class="illustration"> <img src="https://i.imgur.com/HMTDUR7.png" alt="安裝篇"> 安裝篇 (圖片來源: <a href="https://meet.bnext.com.tw/blog/view/10124">Meet創業小聚</a>) </p> 按照[文件](https://docs.nvidia.com/clara/deploy/ClaraInstallation.html)看來安裝步驟似乎不難,不過按照以往經驗看來成不成功要看人品與緣份 XDDD ## 系統需求 在文件一開始列了落落長的系統需求,除了一些基本硬體需求外,還有指定了 K8S、 Helm 與 Docker 的版號: - **Kubernetes 1.15.4** - **Docker 19.03.1** - **NVIDIA Docker 2.2.0** - **Helm 2.15.2** 如果系統中未安裝這些需求,可以不用先行安裝,等等安裝時會幫忙一併安裝;但如果安裝的版號不合,可能需要先行移除,否則它們會跳過安裝。 ## 安裝步驟 上張圖說明一下 Deploy SDK 的安裝流程: <p class="illustration"> <img src="https://i.imgur.com/9Wb8F1x.png" alt="安裝流程"> 安裝流程 (圖片來源: <a href="https://docs.nvidia.com/clara/deploy/ClaraInstallation.html">SDK 0.7.1 documentation</a>) </p> 1. **下載並安裝 bootstrap** 首先先登入 [NGC](https://ngc.nvidia.com/catalog/collections?orderBy=modifiedDESC&pageNumber=0&query=&quickFilter=collections&filters=),找到 [Clara Deploy Bootstrap](https://ngc.nvidia.com/catalog/resources/nvidia:clara:clara_bootstrap/performance) 並進行下載與解壓縮: ```bash= $ wget --content-disposition https://api.ngc.nvidia.com/v2/resources/nvidia/clara/clara_bootstrap/versions/0.7.1-2008.1/zip -O bootstrap.zip $ unzip bootstrap.zip -d bootstrap ``` 完成下載後,進入資料夾執行腳本。這份腳本它將會安裝 Docker、 K8S ...等所需求的軟體: ```bash= $ cd bootstrap $ sudo ./bootstrap.sh ``` 是說,如果不想登入 NGC 也行,登入與否其實不影響下載。不過還是建議登入下,否則很有機會在接下來的步驟中被它打斷,它實在很吵... <br> 2. **下載並安裝 CLI** 接下來再去 [NGC](https://ngc.nvidia.com/catalog/collections?orderBy=modifiedDESC&pageNumber=0&query=&quickFilter=collections&filters=),找 [Clara CLI](https://ngc.nvidia.com/catalog/resources/nvidia:clara:clara_cli) 來下載與解壓縮: ```bash= $ wget --content-disposition https://api.ngc.nvidia.com/v2/resources/nvidia/clara/clara_cli/versions/0.7.1-2008.1/zip -O clara_cli.zip $ sudo unzip clara_cli.zip -d /usr/bin/ && sudo chmod 755 /usr/bin/clara* Archive: cli.zip inflating: /usr/bin/clara inflating: /usr/bin/clara-dicom inflating: /usr/bin/clara-monitor inflating: /usr/bin/clara-pull inflating: /usr/bin/clara-render ``` 將檔案放到 `/usr/bin/` 下後,可以試著呼叫 clara 指令,驗證是否安裝成功: ```bash= $ clara version Clara CLI version: 0.7.1-12788.ae65aea0 ``` <br> 3. **配置 NGC 憑證** 安裝 Clara CLI 須配置 NGC 憑證,稍等 Clara CLI 才能從 NGC Pull 相關 Helm Chart 以進行部署。 這邊你須要拿到一把 `NGC_API_KEY`。這次就必須一定要登入 NGC 了,登入後點選右上角頭像選單中的 `Setup`,並選擇 `Generate API Key`。 <p class="illustration"> <img src="https://i.imgur.com/AqM4tPS.png" alt="Generate API Key"> </p> 進入頁面後,會右上方有個 Generate API Key 的按鈕,點擊後就會產生 `NGC_API_KEY` 了。 <p class="illustration"> <img src="https://i.imgur.com/PFMeczB.png" alt="Generate API Key-2"> </p> 完成後回到終端輸入下列指令,可以考慮 `orgteam` 使用預設值就好,: ```bash= $ clara config --key NGC_API_KEY [--orgteam nvidia/clara] -y ✔ Yes Configuration "ngc-clara"successfully created ``` 是說 **successfully** 的意思,是指你成功配置了憑證,但憑證是否能使用必須使用後才知道,可以試著使用 `pull` 指令來試試: ```bash= $ clara pull platform ✔ Yes Clara Platform 0.7.1-2008.1 Chart saved at: /home/.clara/charts/clara Hint: use "clara platform start" or "clara platform restart" to deploy pulled Clara Platform. ``` 如果失敗可能會看到下列這樣的訊息: ```bash= $ clara pull platform ✔ Yes Error: unable to fetch latest version information 401 Unauthorized ``` 或是 ```bash= $ clara pull platform ✔ Yes Error: unable to fetch latest version information 403 Forbidden ``` <br> 4. **啟動 Helm Chart** 在上一篇提到 [Helm Charts](/@CynthiaChuang/NVIDIA-Clara-Deploy-SDK#Helm-Charts) 時有提過,除了 Triton Inference Server 之外的 charts,都可以藉由這步驟啟動。 <p class="illustration"> <img src="https://i.imgur.com/LgA32Ff.png" alt="NVIDIA Clara Deploy Architecture"> DNVIDIA Clara Deploy Architecture(圖片來源: <a href="https://docs.nvidia.com/clara/deploy/index.html">SDK 0.7.1 documentation</a>) </p> 因為 platform 的下載在剛剛測試 clara 的指令時已經順道完成了,所以這邊就直接啟動。 ```bash= $clara platform start Starting clara... NAME: clara Note: If there is a running instance of Clara Console, Clara Dicom Adapter or Clara Renderer, they should be restarted. ``` <br> 接下來下載 Clara Deploy Services 的 Helm Charts: ```bash= $ clara pull dicom ✔ Yes Clara Dicom Adapter 0.7.1-2008.1 Chart saved at: /home/.clara/charts/dicom-adapter Hint: use "clara dicom start" or "clara dicom restart" to deploy pulled Clara Dicom Adapter. $ clara pull render ✔ Yes Clara Renderer 0.7.1-2008.1 Chart saved at: /home/.clara/charts/clara-renderer Hint: use "clara render start" or "clara render restart" to deploy pulled Clara Renderer. $ clara pull monitor ✔ Yes Clara Monitor Server 0.7.1-2008.1 Chart saved at: /home/.clara/charts/clara-monitor-server Hint: use "clara monitor start" or "clara monitor restart" to deploy pulled Clara Monitor Server. $ clara pull console ✔ Yes Clara Management Console 0.7.1-2008.1 Chart saved at: /home/.clara/charts/clara-console Hint: use "clara console start" or "clara console restart" to deploy pulled Clara Management Console. ``` 之後就可以試著啟動了: ```bash= $ clara dicom start Starting DICOM Adapter... NAME: clara-dicom-adapter $ clara render start NAME: clara-render-server $ clara monitor start NAME: clara-monitor-server $ clara console start NAME: clara-console ``` ## 驗證安裝 如果一切順利的話,跑完上面算是安裝完成了,你可以試著下 `hlem ls` 指令來觀察目前所啟動的 charts: ```bash= $ helm ls NAME CHART clara clara-0.7.1-2008.1 clara-console clara-console-0.7.1-2008.1 clara-dicom-adapter dicom-adapter-0.7.1-2008.1 clara-monitor-server clara-monitor-server-0.7.1-2008.1 clara-render-server clara-renderer-0.7.1-2008.1 ``` <br> 或是下 `kubectl get pods` 應該會看到下面這些 Pods: * clara-clara-platformapiserver- * clara-dicom-adapter- * clara-monitor-server-fluentd-elasticsearch- * clara-monitor-server-grafana- * clara-monitor-server-monitor-server- * clara-render-server-clara-renderer- * clara-resultsservice- * clara-ui- * clara-console- * clara-console-mongodb- * clara-workflow-controller- * elasticsearch-master-0 * elasticsearch-master-1 ## 觀察 Pod 的變化 提到啟動的 Pod,有點好奇在每個 Chart 啟動時,會啟動的 Pod 有哪些。所以把整個 Clara 卸掉,重新安裝一次並觀察 Pod 的變化。 1. **安裝前** ```bash= $ kubectl get all NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE service/kubernetes ClusterIP 10.96.0.1 <none> 443/TCP 4d23h $ kubectl get pods No resources found. ``` <br> 2. **clara platform start** ```bash= $ kubectl get all NAME READY STATUS RESTARTS AGE pod/clara-clara-platformapiserver-54c5c44bbd-9b97b 1/1 Running 0 95s pod/clara-resultsservice-664477898f-zl8cr 1/1 Running 0 95s pod/clara-ui-6f89b97df8-fn2zm 1/1 Running 0 95s pod/clara-workflow-controller-69cbb55fc8-t67ns 1/1 Running 0 95s pod/fluentd-7n2b8 1/1 Running 0 95s pod/fluentd-ccnzw 1/1 Running 0 95s NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE service/clara NodePort 10.103.37.52 <none> 50051:31536/TCP 95s service/clara-resultsservice ClusterIP 10.108.91.220 <none> 8088/TCP 95s service/clara-ui ClusterIP 10.97.148.11 <none> 80/TCP 95s service/kubernetes ClusterIP 10.96.0.1 <none> 443/TCP 9m52s NAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE SELECTOR AGE daemonset.apps/fluentd 2 2 2 2 2 <none> 95s NAME READY UP-TO-DATE AVAILABLE AGE deployment.apps/clara-clara-platformapiserver 1/1 1 1 95s deployment.apps/clara-resultsservice 1/1 1 1 95s deployment.apps/clara-ui 1/1 1 1 95s deployment.apps/clara-workflow-controller 1/1 1 1 95s NAME DESIRED CURRENT READY AGE replicaset.apps/clara-clara-platformapiserver-54c5c44bbd 1 1 1 95s replicaset.apps/clara-resultsservice-664477898f 1 1 1 95s replicaset.apps/clara-ui-6f89b97df8 1 1 1 95s replicaset.apps/clara-workflow-controller-69cbb55fc8 1 1 1 95s ``` <br> 3. **clara dicom start** ```bash= $ kubectl get all NAME READY STATUS RESTARTS AGE pod/clara-clara-platformapiserver-54c5c44bbd-9b97b 1/1 Running 0 2m44s pod/clara-dicom-adapter-7948fcd445-rtbqr 1/1 Running 0 33s pod/clara-resultsservice-664477898f-zl8cr 1/1 Running 0 2m44s pod/clara-ui-6f89b97df8-fn2zm 1/1 Running 0 2m44s pod/clara-workflow-controller-69cbb55fc8-t67ns 1/1 Running 0 2m44s pod/fluentd-7n2b8 1/1 Running 0 2m44s pod/fluentd-ccnzw 1/1 Running 0 2m44s NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) $ GE service/clara NodePort 10.103.37.52 <none> 50051:31536/TCP $ m44s service/clara-dicom-adapter NodePort 10.105.101.54 <none> 104:31289/TCP,5000:31880/TCP $ 3s service/clara-resultsservice ClusterIP 10.108.91.220 <none> 8088/TCP $ m44s service/clara-ui ClusterIP 10.97.148.11 <none> 80/TCP $ m44s service/kubernetes ClusterIP 10.96.0.1 <none> 443/TCP $ 1m NAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE SELECTOR AGE daemonset.apps/fluentd 2 2 2 2 2 <none> 2m44s NAME READY UP-TO-DATE AVAILABLE AGE deployment.apps/clara-clara-platformapiserver 1/1 1 1 2m44s deployment.apps/clara-dicom-adapter 1/1 1 1 33s deployment.apps/clara-resultsservice 1/1 1 1 2m44s deployment.apps/clara-ui 1/1 1 1 2m44s deployment.apps/clara-workflow-controller 1/1 1 1 2m44s NAME DESIRED CURRENT READY AGE replicaset.apps/clara-clara-platformapiserver-54c5c44bbd 1 1 1 2m44s replicaset.apps/clara-dicom-adapter-7948fcd445 1 1 1 33s replicaset.apps/clara-resultsservice-664477898f 1 1 1 2m44s replicaset.apps/clara-ui-6f89b97df8 1 1 1 2m44s replicaset.apps/clara-workflow-controller-69cbb55fc8 1 1 1 2m44s ``` <br> 4. **clara render start** ```bash= $ kubectl get all kubectl get all NAME READY STATUS RESTARTS AGE pod/clara-clara-platformapiserver-54c5c44bbd-gfwng 1/1 Running 0 24m pod/clara-dicom-adapter-7948fcd445-mv248 1/1 Running 0 20m pod/clara-render-server-clara-renderer-d79dd4779-f5hgd 2/3 CrashLoopBackOff 7 11m pod/clara-resultsservice-664477898f-2vsw9 1/1 Running 0 24m pod/clara-ui-6f89b97df8-c5p2f 1/1 Running 0 24m pod/clara-workflow-controller-69cbb55fc8-mc682 1/1 Running 0 24m pod/fluentd-ntl6q 1/1 Running 0 24m pod/fluentd-tvnrl 1/1 Running 0 24m NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE service/clara NodePort 10.101.135.71 <none> 50051:32455/TCP 24m service/clara-dicom-adapter NodePort 10.100.25.126 <none> 104:31985/TCP,500 0:30647/TCP 20m service/clara-renderer-clara-render-server NodePort 10.108.60.232 <none> 8070:30105/TCP,80 60:32006/TCP 11m service/clara-resultsservice ClusterIP 10.109.206.204 <none> 8088/TCP 24m service/clara-ui ClusterIP 10.101.195.91 <none> 80/TCP 24m service/kubernetes ClusterIP 10.96.0.1 <none> 443/TCP 27m NAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE SELECTOR AGE daemonset.apps/fluentd 2 2 2 2 2 <none> 24m NAME READY UP-TO-DATE AVAILABLE AGE deployment.apps/clara-clara-platformapiserver 1/1 1 1 24m deployment.apps/clara-dicom-adapter 1/1 1 1 20m deployment.apps/clara-render-server-clara-renderer 0/1 1 0 11m deployment.apps/clara-resultsservice 1/1 1 1 24m deployment.apps/clara-ui 1/1 1 1 24m deployment.apps/clara-workflow-controller 1/1 1 1 24m NAME DESIRED CURRENT READY AGE replicaset.apps/clara-clara-platformapiserver-54c5c44bbd 1 1 1 24m replicaset.apps/clara-dicom-adapter-7948fcd445 1 1 1 20m replicaset.apps/clara-render-server-clara-renderer-d79dd4779 1 1 0 11m replicaset.apps/clara-resultsservice-664477898f 1 1 1 24m replicaset.apps/clara-ui-6f89b97df8 1 1 1 24m replicaset.apps/clara-workflow-controller-69cbb55fc8 1 1 1 24m ``` <br> 5. **clara monitor start** ```bash= $ kubectl get all NAME READY STATUS RESTARTS AGE pod/clara-clara-platformapiserver-54c5c44bbd-gfwng 1/1 Running 0 40m pod/clara-dicom-adapter-7948fcd445-mv248 1/1 Running 0 36m pod/clara-monitor-server-fluentd-elasticsearch-dl7bj 1/1 Running 0 14m pod/clara-monitor-server-fluentd-elasticsearch-jxdk6 1/1 Running 0 14m pod/clara-monitor-server-grafana-5f874b974d-qvxgn 1/1 Running 0 14m pod/clara-monitor-server-monitor-server-59c8bf68f7-5rcg7 0/1 CrashLoopBackOff 7 14m pod/clara-render-server-clara-renderer-d79dd4779-f5hgd 2/3 CrashLoopBackOff 10 27m pod/clara-resultsservice-664477898f-2vsw9 1/1 Running 0 40m pod/clara-ui-6f89b97df8-c5p2f 1/1 Running 0 40m pod/clara-workflow-controller-69cbb55fc8-mc682 1/1 Running 0 40m pod/elasticsearch-master-0 1/1 Running 0 14m pod/elasticsearch-master-1 1/1 Running 0 14m pod/fluentd-ntl6q 1/1 Running 0 40m pod/fluentd-tvnrl 1/1 Running 0 40m NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE service/clara NodePort 10.101.135.71 <none> 50051:32455/TCP 40m service/clara-dicom-adapter NodePort 10.100.25.126 <none> 104:31985/TCP,500 0:30647/TCP 36m service/clara-monitor-server ClusterIP 10.111.167.160 <none> 50051/TCP 14m service/clara-monitor-server-grafana NodePort 10.100.148.116 <none> 80:32000[16/1632] 14m service/clara-renderer-clara-render-server NodePort 10.108.60.232 <none> 8070:30105/TCP,80 60:32006/TCP 27m service/clara-resultsservice ClusterIP 10.109.206.204 <none> 8088/TCP 40m service/clara-ui ClusterIP 10.101.195.91 <none> 80/TCP 40m service/elasticsearch-master ClusterIP 10.108.240.18 <none> 9200/TCP,9300/TCP 14m service/elasticsearch-master-headless ClusterIP None <none> 9200/TCP,9300/TCP 14m service/kubernetes ClusterIP 10.96.0.1 <none> 443/TCP 43m NAME DESIRED CURRENT READY UP-TO-DATE AVAI LABLE NODE SELECTOR AGE daemonset.apps/clara-monitor-server-fluentd-elasticsearch 2 2 2 2 2 <none> 14m daemonset.apps/fluentd 2 2 2 2 2 <none> 40m NAME READY UP-TO-DATE AVAILABLE AGE deployment.apps/clara-clara-platformapiserver 1/1 1 1 40m deployment.apps/clara-dicom-adapter 1/1 1 1 36m deployment.apps/clara-monitor-server-grafana 1/1 1 1 14m deployment.apps/clara-monitor-server-monitor-server 0/1 1 0 14m deployment.apps/clara-render-server-clara-renderer 0/1 1 0 27m deployment.apps/clara-resultsservice 1/1 1 1 40m deployment.apps/clara-ui 1/1 1 1 40m deployment.apps/clara-workflow-controller 1/1 1 1 40m NAME DESIRED CURRENT READY AGE replicaset.apps/clara-clara-platformapiserver-54c5c44bbd 1 1 1 40m replicaset.apps/clara-dicom-adapter-7948fcd445 1 1 1 36m replicaset.apps/clara-monitor-server-grafana-5f874b974d 1 1 1 14m replicaset.apps/clara-monitor-server-monitor-server-59c8bf68f7 1 1 0 14m replicaset.apps/clara-render-server-clara-renderer-d79dd4779 1 1 0 27m replicaset.apps/clara-resultsservice-664477898f 1 1 1 40m replicaset.apps/clara-ui-6f89b97df8 1 1 1 40m replicaset.apps/clara-workflow-controller-69cbb55fc8 1 1 1 40m NAME READY AGE statefulset.apps/elasticsearch-master 2/2 14m ``` <br> 6. **clara console start** ```bash= $ kubectl get all NAME READY STATUS RESTARTS AGE pod/clara-clara-platformapiserver-54c5c44bbd-gfwng 1/1 Running 0 61m pod/clara-console-8565b4d565-77jhc 2/2 Running 0 19m pod/clara-console-mongodb-85f8bd5f95-8nwqx 1/1 Running 0 19m pod/clara-dicom-adapter-7948fcd445-mv248 1/1 Running 0 58m pod/clara-monitor-server-fluentd-elasticsearch-dl7bj 1/1 Running 0 36m pod/clara-monitor-server-fluentd-elasticsearch-jxdk6 1/1 Running 0 36m pod/clara-monitor-server-grafana-5f874b974d-qvxgn 1/1 Running 0 36m pod/clara-monitor-server-monitor-server-59c8bf68f7-5rcg7 0/1 CrashLoopBackOff 11 36m pod/clara-render-server-clara-renderer-d79dd4779-f5hgd 2/3 CrashLoopBackOff 14 48m pod/clara-resultsservice-664477898f-2vsw9 1/1 Running 0 61m pod/clara-ui-6f89b97df8-c5p2f 1/1 Running 0 61m pod/clara-workflow-controller-69cbb55fc8-mc682 1/1 Running 0 61m pod/elasticsearch-master-0 1/1 Running 0 36m pod/elasticsearch-master-1 1/1 Running 0 36m pod/fluentd-ntl6q 1/1 Running 0 61m pod/fluentd-tvnrl 1/1 Running 0 61m NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE service/clara NodePort 10.101.135.71 <none> 50051:32455/TCP 61m service/clara-console NodePort 10.99.119.217 <none> 8080:32002/TCP,50 00:32003/TCP 19m service/clara-console-mongodb ClusterIP 10.102.177.195 <none> 27017/TCP 19m service/clara-dicom-adapter NodePort 10.100.25.126 <none> 104:31985/TCP,500 0:30647/TCP 58m service/clara-monitor-server ClusterIP 10.111.167.160 <none> 50051/TCP 36m service/clara-monitor-server-grafana NodePort 10.100.148.116 <none> 80:32000/TCP 36m service/clara-renderer-clara-render-server NodePort 10.108.60.232 <none> 8070:30105/TCP,80 60:32006/TCP 48m service/clara-resultsservice ClusterIP 10.109.206.204 <none> 8088/TCP 61m service/clara-ui ClusterIP 10.101.195.91 <none> 80/TCP 61m service/elasticsearch-master ClusterIP 10.108.240.18 <none> 9200/TCP,9300/TCP 36m service/elasticsearch-master-headless ClusterIP None <none> 9200/TCP,9300/TCP 36m service/kubernetes ClusterIP 10.96.0.1 <none> 443/TCP 65m NAME DESIRED CURRENT READY UP-TO-DATE AVAI LABLE NODE SELECTOR AGE daemonset.apps/clara-monitor-server-fluentd-elasticsearch 2 2 2 2 2 <none> 36m daemonset.apps/fluentd 2 2 2 2 2 <none> 61m NAME READY UP-TO-DATE AVAILABLE AGE deployment.apps/clara-clara-platformapiserver 1/1 1 1 61m deployment.apps/clara-console 1/1 1 1 19m deployment.apps/clara-console-mongodb 1/1 1 1 19m deployment.apps/clara-dicom-adapter 1/1 1 1 58m deployment.apps/clara-monitor-server-grafana 1/1 1 1 36m deployment.apps/clara-monitor-server-monitor-server 0/1 1 0 36m deployment.apps/clara-render-server-clara-renderer 0/1 1 0 48m deployment.apps/clara-resultsservice 1/1 1 1 61m deployment.apps/clara-ui 1/1 1 1 61m deployment.apps/clara-workflow-controller 1/1 1 1 61m NAME DESIRED CURRENT READY AGE replicaset.apps/clara-clara-platformapiserver-54c5c44bbd 1 1 1 61m replicaset.apps/clara-console-8565b4d565 1 1 1 19m replicaset.apps/clara-console-mongodb-85f8bd5f95 1 1 1 19m replicaset.apps/clara-dicom-adapter-7948fcd445 1 1 1 58m replicaset.apps/clara-monitor-server-grafana-5f874b974d 1 1 1 36m replicaset.apps/clara-monitor-server-monitor-server-59c8bf68f7 1 1 0 36m replicaset.apps/clara-render-server-clara-renderer-d79dd4779 1 1 0 48m replicaset.apps/clara-resultsservice-664477898f 1 1 1 61m replicaset.apps/clara-ui-6f89b97df8 1 1 1 61m replicaset.apps/clara-workflow-controller-69cbb55fc8 1 1 1 61m NAME READY AGE statefulset.apps/elasticsearch-master 2/2 36m ``` <br> 當然,如果像我這個白目的安裝就沒有那順利了... ## 錯誤嘗試:部署 Clara Platform 與 啟動 Helm Chart 在環境要求的部份,對我來說比較麻煩的是 Kubernetes 與 Helm 的版號,因為我的伺服器環境是與組員共用,所以一開始我決定保留同事需要的環境來硬幹,試試能不能安裝成功,如果真的不行再來嘗試退版安裝。 所以這段如果只是要完成 Deploy SDK 安裝的可以跳過,這邊只是因為我的一時興起所產生的錯誤記錄而已,當然如果想看我怎麼焦頭爛額的可以繼續往下拉。 恩...我先跟大家說最後的嘗試結果好了,我最後還是退版了。不過我有將過程的一些錯誤記錄保留下來,看看之後還有沒機會回來再看,絕對不是因為單純湊數字 XDDD <br> 這邊接續安裝步驟-**配置 NGC 憑證**,在完成 platform chart 的下載後,試著啟動 platform,得到了第一條錯誤訊息: ```shell= $ clara platform start Error: could not find tiller Usage: platform start [flags] Flags: -h, --help help for start Global Flags: --config string config file (default is $HOME/.clara/config.yaml) --verbose verbose output Error: could not find tiller ``` <br> 在 [Stack Overflow](https://stackoverflow.com/questions/51646957/helm-could-not-find-tiller) 上看到了一條類似錯誤訊息的提問,似乎重新初始化 Helm 即可: ```shell= $ helm init Error: unknown command "init" for "helm" Did you mean this? lint Run 'helm --help' for usage. ``` <br> 結果沒有 `helm init`! 查詢了一下 Helm 所找不到的 Tiller 到底是啥,根據 [smalltown](https://medium.com/starbugs/helm-3-%E8%B8%B9%E8%B8%B9%E7%9C%8B-9e7c443fbd7a) 所說,在 Helm2 中,Tiller 是用來安裝與管理其他應用服務的 K8S 元件,簡單來說 Tiller 是一個用來與 K8S API Server 溝通的 **Service**,不過由於權限設置與管理的問題,在 Helm3 的推出後就走向歷史了。 很不幸的,我的 Helm 是 v3 的版本: ```shell= $ helm version version.BuildInfo{Version:"v3.1.2", GitCommit:"d878d4d45863e42fd5cff6743294a11d28a9abce", GitTreeState:"$ lean", GoVersion:"go1.13.8"} ``` <br> Helm2 與 Helm3 的變動已經屬於系統架構的變動,這個實在不好改。經過調查與[論壇](https://forums.developer.nvidia.com/t/error-could-not-find-tiller/157960)上發問,最後只好將 Helm 降版。我是透過 [ Binary Releases 安裝](https://helm.sh/docs/intro/install/#helm)的方式,將版本降回到 [v2.15.2](https://github.com/helm/helm/releases/tag/v2.15.2)。 降版後再次檢查 Helm 的版號,版號是正確了,但錯誤訊息依舊沒有消失: ```shell= $ helm version Client: &version.Version{SemVer:"v2.15.2", GitCommit:"8dce272473e5f2a7bf58ce79bb5c3691db54c96b", GitTreeS tate:"clean"} Error: could not find tiller ``` <br> 不過版本都降了,`helm init` 指令應該可以使用了: ```shell= $ helm init $HELM_HOME has been configured at /home/.helm. Tiller (the Helm server-side component) has been installed into your Kubernetes Cluster. Please note: by default, Tiller is deployed with an insecure 'allow unauthenticated users' policy. To prevent this, run `helm init` with the --tiller-tls-verify flag. For more information on securing your installation see: https://docs.helm.sh/using_helm/#securing-your-h$ lm-installation ``` <br> 再次檢查 Helm 的版號,可以發現多出了 Server,用 kubectl 查看正在運行的 Pod,可以看到 Tiller 正在努力工作: ```shell= $ helm version Client: &version.Version{SemVer:"v2.15.2", GitCommit:"8dce272473e5f2a7bf58ce79bb5c3691db54c96b", GitTree$ tate:"clean"} Server: &version.Version{SemVer:"v2.15.2", GitCommit:"8dce272473e5f2a7bf58ce79bb5c3691db54c96b", GitTree$ tate:"clean"} $ kubectl get pods --namespace kube-system NAME READY STATUS RESTARTS AGE coredns-6955765f44-bl6jd 1/1 Running 0 18h coredns-6955765f44-mtlxv 1/1 Running 0 18h etcd-esccluster-control-plane 1/1 Running 0 18h kindnet-dxlgj 1/1 Running 0 18h kindnet-hpvkw 1/1 Running 0 18h kindnet-qb5lm 1/1 Running 0 18h kube-apiserver-esccluster-control-plane 1/1 Running 0 18h kube-controller-manager-esccluster-control-plane 1/1 Running 0 18h kube-proxy-cvpmt 1/1 Running 0 18h kube-proxy-nspv9 1/1 Running 0 18h kube-proxy-trkh5 1/1 Running 0 18h kube-scheduler-esccluster-control-plane 1/1 Running 0 18h tiller-deploy-58f57c5787-bsfkh 1/1 Running 0 14m ``` <br> 好了,排除 Tiller 的錯誤訊息後,重新 platform 的 Chart 後再重新 start 一次,看看會不會成功。 ```shell= $ clara pull platform ✔ Yes Clara Platform 0.7.1-2008.1 Chart saved at: /home/.clara/charts/clara Hint: use "clara platform start" or "clara platform restart" to deploy pulled Clara Platform. $ clara platform start Starting clara... RPC error: code = Unknown desc = namespaces "default" is forbidden: User "system:serviceaccount:kube-system:default" cannot get resource "namespaces" in API group "" in the namespace "default" ``` <br> 呃,又出現 rpc 的 error。看到錯誤訊息,幾個可能的猜測: 1. **Server 安裝配置的問題**:會考慮這個是因為我的 Tiller 後來降版後我自己重起的。 2. **權限問題**:這個的機會比較大,在查資料的時候,幾乎碰到的是這個狀況。 <br> 算了先試試看 [Helm2 文件](https://v2.helm.sh/docs/install/#running-tiller-locally)中說的本地運行分 Tiller 試試看: ```shell= $ ~/bin/tiller [main] 2020/10/27 10:53:03 Starting Tiller v2.15.2 (tls=false) [main] 2020/10/27 10:53:03 GRPC listening on :44134 [main] 2020/10/27 10:53:03 Probes listening on :44135 [main] 2020/10/27 10:53:03 Storage driver is ConfigMap [main] 2020/10/27 10:53:03 Max history per release is 0 ``` 但文件中第二步驟連接到新的本地 Tiller 主機,看起來怪怪的,所還是沒做了, <br> 直接放棄第一條路,先是試著處理權限問題好了,根據 Helm2 的 [Role-based Access Control](https://v2.helm.sh/docs/install/#running-tiller-locally) 說明與 [GitHub](https://github.com/fnproject/fn-helm/issues/21) 上的大神討論重新設定了連接,並 start platform: ```bash= $clara platform start Starting clara... NAME: clara Note: If there is a running instance of Clara Console, Clara Dicom Adapter or Clara Renderer, they should be restarted. ``` <br><br> 喔耶! platform 起動後,就跟前面一樣來下載 Clara Deploy Services 的 Helm Charts: ```bash= $ clara pull dicom ✔ Yes Clara Dicom Adapter 0.7.1-2008.1 Chart saved at: /home/.clara/charts/dicom-adapter Hint: use "clara dicom start" or "clara dicom restart" to deploy pulled Clara Dicom Adapter. $ clara pull render ✔ Yes Clara Renderer 0.7.1-2008.1 Chart saved at: /home/.clara/charts/clara-renderer Hint: use "clara render start" or "clara render restart" to deploy pulled Clara Renderer. $ clara pull monitor ✔ Yes Clara Monitor Server 0.7.1-2008.1 Chart saved at: /home/.clara/charts/clara-monitor-server Hint: use "clara monitor start" or "clara monitor restart" to deploy pulled Clara Monitor Server. $ clara pull console ✔ Yes Clara Management Console 0.7.1-2008.1 Chart saved at: /home/.clara/charts/clara-console Hint: use "clara console start" or "clara console restart" to deploy pulled Clara Management Console. ``` 最後是心驚膽戰的最後一步: ```bash= $ clara dicom start Starting DICOM Adapter... NAME: clara-dicom-adapter $ clara render start NAME: clara-render-server $ clara monitor start Error: rpc error: code = Unknown desc = validation failed: [unable to recognize "": no matches for kind "PodSecurityPolicy" in version "extensions/v1beta1", unable to recognize "": no matches for kind "Deployment" in version "apps/v1beta2", unable to recognize "": no matches for kind "StatefulSet" in version "apps/v1beta1"] $ clara console start NAME: clara-console ``` <br> 就知道沒這麼好過年的,是有看到有個 [Issue](https://github.com/helm/helm/issues/6374) 在討論這問題的,不過必須承認的是,這個討論超過我這個初學者對於 K8S 的掌握了,我才剛入門沒幾天阿(崩潰 問了論壇的人得到的[回覆](https://forums.developer.nvidia.com/t/not-responding-when-running-clara-render-start/158049)還是要我降版 K8S,所以最終只能鼻子摸摸開始降版了: ```bash= $ sudo apt remove kubectl kubeadm kubelet kubernetes-cni $ rm -rf $HOME/.kube/config $ sudo apt-get install -y kubelet=1.15.4-00 kubectl=1.15.4-00 kubeadm=1.15.4-00 $ kubectl version --short Client Version: v1.15.4 Server Version: v1.15.6 ``` 重新啟動剛剛失敗 monitor Chart: ```bash= $ clara monitor start NAME: clara-monitor-server ``` ## 參考資料 1. [Clara Deploy Platform](https://ngc.nvidia.com/catalog/collections/nvidia:claradeployplatform)。檢自 NVIDIA NGC (2021-02-02)。 2. NVIDIA Taiwan (2020-11-02)。[NVIDIA Clara Deploy](https://www.youtube.com/watch?v=vYuOvyJOHXk)。檢自 Youtube (2021-02-02)。 3. [2. Installation](https://docs.nvidia.com/clara/deploy/ClaraInstallation.html)。檢自 Clara Deploy SDK https://hackmd.io/0.7.3 documentation (2021-02-02)。 4. Community (2019-05-18)。[openshift - Helm: could not find tiller](https://stackoverflow.com/questions/51646957/helm-could-not-find-tiller)。檢自 Stack Overflow (2021-02-02)。 6. smalltown (2020-05-17)。[Helm 3 踹踹看](https://medium.com/starbugs/helm-3-%E8%B8%B9%E8%B8%B9%E7%9C%8B-9e7c443fbd7a)。檢自 Starbugs Weekly 星巴哥技術專欄|Medium (2021-02-02)。 7. godleon (2021-01-24)。[[Kubernetes] Package Manager - Helm 簡介](https://godleon.github.io/blog/Kubernetes/k8s-Helm-Introduction/)。檢自 小信豬的原始部落 (2021-02-02)。 8. postak (2018-04-10)。[forbidden: User "system:serviceaccount:kube-system:default" cannot get namespaces in the namespace "default](https://github.com/fnproject/fn-helm/issues/21)。檢自 fnproject/fn-helm|GitHub (2021-02-02)。 9. noprom (2017-11-12)。[User "system:serviceaccount:kube-system:default" cannot get namespaces in the namespace "default"](https://github.com/helm/helm/issues/3130)。檢自 helm/helm|GitHub (2021-02-02)。 10. [Helm v2](https://v2.helm.sh/docs/install/#running-tiller-locally)。檢自 Helm 官網 (2021-02-02)。 11. Nick (2019-10-12)。[[Day27] k8s應用篇(一):Helm部署apps、HPA和CA](https://ithelp.ithome.com.tw/articles/10227329)。檢自 iT 邦幫忙 (2021-02-02)。 12. Terrones-Oscar (2020-08-13)。[helm fails Error: validation failed: [unable to recognize "": no matches for kind "PodSecurityPolicy"]](https://github.com/helm/charts/issues/23521)。檢自 helm/charts|GitHub (2021-02-02)。 13. jckasper (2019-09-06)。[Helm init fails on Kubernetes 1.16.0](https://github.com/helm/helm/issues/6374)。檢自 helm/helm|GitHub (2021-02-02)。 14. Zz Chen (2018-07-03)。[Helm 部署在 GKE 上的權限問題](https://medium.com/smalltowntechblog/helm-tiller-%E9%83%A8%E7%BD%B2%E5%9C%A8-gke-%E4%B8%8A%E7%9A%84%E6%AC%8A%E9%99%90%E5%95%8F%E9%A1%8C-a016f703372e)。檢自 smalltowntechblog|Medium (2021-02-02)。 15. MengYun (2019-10-27)。[Not responding when running “clara render start”](https://forums.developer.nvidia.com/t/not-responding-when-running-clara-render-start/158049)。檢自 NVIDIA Developer Forums (2021-02-02)。 ## 更新紀錄 :::spoiler 最後更新日期:2021-03-07 - 2021-03-07 發布 - 2021-02-03 完稿 - 2020-11-09 起稿 ::: {%hackmd @CynthiaChuang/Github-Page-Footer %}