# jetson nano install k3s、檢測GPU ## 1. 系統參數、RAM調整 ```shell! nvidia@tegra-ubuntu:~$ git clone https://github.com/JetsonHacksNano/resizeSwapMemory nvidia@tegra-ubuntu:~$ cd resizeSwapMemory/ nvidia@tegra-ubuntu:~/resizeSwapMemory$ sudo ./setSwapMemorySize.sh -g 8 Please reboot for changes to take effect. nvidia@tegra-ubuntu:~/resizeSwapMemory$ cd nvidia@tegra-ubuntu:~$ nvidia@tegra-ubuntu:~$ sudo sysctl -w net.ipv6.conf.all.disable_ipv6=1 net.ipv6.conf.all.disable_ipv6 = 1 nvidia@tegra-ubuntu:~$ sudo sysctl -w net.ipv6.conf.default.disable_ipv6=1 net.ipv6.conf.default.disable_ipv6 = 1 nvidia@tegra-ubuntu:~$ sudo sysctl -w net.ipv6.conf.lo.disable_ipv6=1 net.ipv6.conf.lo.disable_ipv6 = 1 nvidia@tegra-ubuntu:~$ sudo nvpmodel -m 0 ``` ## 2. 安裝nvidia-container-tools **安裝相關套件** ```shell! nvidia@tegra-ubuntu:~$ sudo apt update nvidia@tegra-ubuntu:~$ sudo apt install curl -y nvidia@tegra-ubuntu:~$ curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg && curl -s -L https://nvidia.github.io/libnvidia-container/stable/deb/nvidia-container-toolkit.list | sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' | sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://nvidia.github.io/libnvidia-container/stable/deb/$(ARCH) / #deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://nvidia.github.io/libnvidia-container/experimental/deb/$(ARCH) / nvidia@tegra-ubuntu:~$ sudo sed -i -e '/experimental/ s/^#//g' /etc/apt/sources.list.d/nvidia-container-toolkit.list nvidia@tegra-ubuntu:~$ sudo apt-get update nvidia@tegra-ubuntu:~$ sudo apt-get install -y nvidia-container-toolkit nvidia@tegra-ubuntu:~$ sudo nvidia-ctk runtime configure --runtime=containerd INFO[0000] Config file does not exist; using empty config INFO[0000] Wrote updated config to /etc/containerd/config.toml INFO[0000] It is recommended that containerd daemon be restarted. nvidia@tegra-ubuntu:~$ sudo systemctl daemon-reload sudo reboot ``` **確認跟nvidia有關的套件清單** ```shell! nvidia@tegra-ubuntu:~$ sudo dpkg --get-selections | grep nvidia [sudo] password for nvidia: libnvidia-container-tools install libnvidia-container1:arm64 install nvidia-container-toolkit install nvidia-container-toolkit-base install nvidia-l4t-3d-core install nvidia-l4t-apt-source install nvidia-l4t-bootloader install nvidia-l4t-camera install nvidia-l4t-configs install nvidia-l4t-core install nvidia-l4t-cuda install nvidia-l4t-display-kernel install nvidia-l4t-firmware install nvidia-l4t-gbm install nvidia-l4t-graphics-demos install nvidia-l4t-gstreamer install nvidia-l4t-init install nvidia-l4t-initrd install nvidia-l4t-jetson-io install nvidia-l4t-jetsonpower-gui-tools install nvidia-l4t-kernel install nvidia-l4t-kernel-dtbs install nvidia-l4t-kernel-headers install nvidia-l4t-libvulkan install nvidia-l4t-multimedia install nvidia-l4t-multimedia-utils install nvidia-l4t-nvfancontrol install nvidia-l4t-nvpmodel install nvidia-l4t-nvpmodel-gui-tools install nvidia-l4t-nvsci install nvidia-l4t-oem-config install nvidia-l4t-openwfd install nvidia-l4t-optee install nvidia-l4t-pva install nvidia-l4t-tools install nvidia-l4t-vulkan-sc install nvidia-l4t-vulkan-sc-dev install nvidia-l4t-vulkan-sc-samples install nvidia-l4t-vulkan-sc-sdk install nvidia-l4t-wayland install nvidia-l4t-weston install nvidia-l4t-x11 install nvidia-l4t-xusb-firmware install ``` ## 3. 安裝K3S ```shell! nvidia@tegra-ubuntu:~$ export INSTALL_K3S_VERSION=v1.28.10+k3s1 nvidia@tegra-ubuntu:~$ curl -sfL https://get.k3s.io | sh -s - --write-kubeconfig-mode 644 --write-kubeconfig $HOME/.kube/config [INFO] Using v1.28.10+k3s1 as release [INFO] Downloading hash https://github.com/k3s-io/k3s/releases/download/v1.28.10+k3s1/sha256sum-arm64.txt [INFO] Downloading binary https://github.com/k3s-io/k3s/releases/download/v1.28.10+k3s1/k3s-arm64 [INFO] Verifying binary download [INFO] Installing k3s to /usr/local/bin/k3s [INFO] Skipping installation of SELinux RPM [INFO] Creating /usr/local/bin/kubectl symlink to k3s [INFO] Creating /usr/local/bin/crictl symlink to k3s [INFO] Creating /usr/local/bin/ctr symlink to k3s [INFO] Creating killall script /usr/local/bin/k3s-killall.sh [INFO] Creating uninstall script /usr/local/bin/k3s-uninstall.sh [INFO] env: Creating environment file /etc/systemd/system/k3s.service.env [INFO] systemd: Creating service file /etc/systemd/system/k3s.service [INFO] systemd: Enabling k3s unit Created symlink /etc/systemd/system/multi-user.target.wants/k3s.service → /etc/systemd/system/k3s.service. [INFO] systemd: Starting k3s ``` **檢查K3S中的組態是否有正確設定nvidia container runtime** ```shell! nvidia@tegra-ubuntu:~$ sudo grep nvidia /var/lib/rancher/k3s/agent/etc/containerd/config.toml [plugins."io.containerd.grpc.v1.cri".containerd.runtimes."nvidia"] [plugins."io.containerd.grpc.v1.cri".containerd.runtimes."nvidia".options] BinaryName = "/usr/bin/nvidia-container-runtime" ``` **檢查系統啟動與執行中** ```shell! nvidia@tegra-ubuntu:~$ kubectl get no NAME STATUS ROLES AGE VERSION tegra-ubuntu Ready control-plane,master 76s v1.28.10+k3s1 nvidia@tegra-ubuntu:~$ kubectl get po -A NAMESPACE NAME READY STATUS RESTARTS AGE kube-system coredns-6799fbcd5-82v8q 1/1 Running 0 63s kube-system local-path-provisioner-6c86858495-698f5 1/1 Running 0 63s kube-system helm-install-traefik-crd-twgs7 0/1 Completed 0 64s kube-system helm-install-traefik-rd98d 0/1 Completed 1 64s kube-system svclb-traefik-d1387147-jxfll 2/2 Running 0 41s kube-system traefik-7d5f6474df-4trqj 1/1 Running 0 41s kube-system metrics-server-54fd9b65b-8dwhf 1/1 Running 0 63s ``` ## 4. 檢測是否看到GPU nvidia@tegra-ubuntu:~$ kubectl create -f devcheck.yaml job.batch/devicequery created **devcheck.yaml** ```shell! apiVersion: batch/v1 kind: Job metadata: name: devicequery spec: template: metadata: spec: runtimeClassName: nvidia containers: - command: - ./deviceQuery image: xift/jetson_devicequery:r32.5.0 name: devicequery resources: {} restartPolicy: Never ``` ![截圖 2024-05-31 12.26.34](https://hackmd.io/_uploads/rJGF8C8ER.png) ![截圖 2024-05-31 12.26.49](https://hackmd.io/_uploads/ryRY8AIVA.png) ## 5. 透過程式碼確認GPU nvidia@tegra-ubuntu:~$ kubectl create -f tf.yaml pod/tf created **tf.yaml** ```shell! apiVersion: v1 kind: Pod metadata: name: tf spec: runtimeClassName: nvidia containers: - name: nvidia image: xift/l4t-tensorflow:r32.5.0-tf2.3.1-py3 command: [ "sleep" ] args: [ "1d" ] ``` ```shell! root@tf:/# python3 -c "from tensorflow.python.client import device_lib; print(device_lib.list_local_devices());" ... ... ... root@tf:/# exit exit ``` ![截圖 2024-05-31 12.22.13](https://hackmd.io/_uploads/rkUDLCINC.png) ## 執行一個sample(nbody) ## 其他 ### 1. inportant !! 要是發現Jetson三不五時會自動關機或網路異常,有很高的機會是過熱了,直接風扇全開吧。 ```shell! sudo jetson_clocks --fan ```