enroot === ###### tags: `NVIDIA` ###### tags: `NVIDIA`, `enroot`, `Pyxis` <br> [TOC] <br> ## enroot > https://github.com/NVIDIA/enroot ### Terms - [unprivileged (i.e. rootless)](https://slurm.schedmd.com/containers.html#limitations) <br> ### Summary Enroot 是輕量無特權沙盒工具,類似加強版 chroot,保留檔案系統隔離,適合高效能且重視可攜性的環境,支援 Docker 映像。 <br> ### Description 一個簡單卻強大的工具,能將傳統的容器或作業系統映像檔轉換成 **無特權(unprivileged)** 的沙盒環境。 Enroot 可被視為一種增強版的無特權 `chroot(1)`。它採用與容器相同的底層技術,但移除了容器固有的大部分隔離功能,卻仍保有檔案系統的分離。 這種方式通常較適合用於對效能要求高或虛擬化環境中,重視可攜性與可重現性,但不需要額外隔離的情境。 Enroot 也類似於其他工具如 `proot(1)` 或 `fakeroot(1)`,但它利用了 Linux 核心較新的功能(例如使用者與掛載命名空間),並提供匯入常見容器映像格式(例如 Docker 映像檔)的功能。 <br> ### Usage CPU example: - ### ubuntu ``` # Import and start an Ubuntu image from DockerHub $ enroot import docker://ubuntu $ enroot create ubuntu.sqsh $ enroot start ubuntu ``` - ### alpine:latest ``` # Import and start an Ubuntu image from DockerHub $ enroot import docker://alpine:latest $ enroot create alpine+latest.sqsh $ enroot start alpine+latest.sqsh ``` :::spoiler 執行過程 ``` root@gpu1080a-0:~# enroot import docker://alpine:latest [INFO] Querying registry for permission grant [INFO] Authenticating with user: <anonymous> [INFO] Authentication succeeded [INFO] Fetching image manifest list [INFO] Fetching image manifest [INFO] Found all layers in cache [INFO] Extracting image layers... 100% 1:0=0s fe07684b16b82247c3539ed86a65ff37a76138ec25d380bd80c869a1a4c73236 [INFO] Converting whiteouts... 100% 1:0=0s fe07684b16b82247c3539ed86a65ff37a76138ec25d380bd80c869a1a4c73236 [INFO] Creating squashfs filesystem... Parallel mksquashfs: Using 2 processors Creating 4.0 filesystem on /root/alpine+latest.sqsh, block size 131072. [=============================================================================================================================================================================|] 144/144 100% Exportable Squashfs 4.0 filesystem, lzo compressed, data block size 131072 uncompressed data, compressed metadata, compressed fragments, compressed xattrs, compressed ids duplicates are removed Filesystem size 7763.07 Kbytes (7.58 Mbytes) 95.34% of uncompressed filesystem size (8142.74 Kbytes) Inode table size 4378 bytes (4.28 Kbytes) 22.79% of uncompressed inode table size (19206 bytes) Directory table size 5764 bytes (5.63 Kbytes) 57.39% of uncompressed directory table size (10044 bytes) Number of duplicate files found 9 Number of inodes 523 Number of files 90 Number of fragments 6 Number of symbolic links 335 Number of device nodes 0 Number of fifo nodes 0 Number of socket nodes 0 Number of directories 98 Number of hard-links 0 Number of ids (unique uids + gids) 1 Number of uids 1 root (0) Number of gids 1 root (0) ``` ``` root@gpu1080a-0:~# ll total 227432 drwx------ 1 root root 4096 Jun 27 03:55 ./ drwxr-xr-x 1 root root 4096 Jun 26 09:44 ../ -rw------- 1 root root 3894 Jun 27 03:48 .bash_history -rw-r--r-- 1 root root 3106 Apr 22 2024 .bashrc drwx------ 3 root root 4096 Jun 26 10:15 .cache/ drwx------ 3 root root 4096 Jun 26 10:15 .local/ drwxr-xr-x 3 root root 4096 Jun 26 10:15 .parallel/ -rw-r--r-- 1 root root 161 Apr 22 2024 .profile -rw------- 1 root root 3379 Jun 27 03:54 .viminfo -rw-r--r-- 1 root root 7950336 Jun 27 03:55 alpine+latest.sqsh ``` ``` root@gpu1080a-0:~# enroot create alpine+latest.sqsh [INFO] Extracting squashfs filesystem... Parallel unsquashfs: Using 2 processors 425 inodes (144 blocks) to write [=============================================================================================================================================================================|] 569/569 100% created 90 files created 98 directories created 335 symlinks created 0 devices created 0 fifos created 0 sockets created 0 hardlinks ``` ``` root@gpu1080a-0:~# enroot start alpine+latest.sqsh / # ``` ::: <br> ### GPU examples: - ### `nvidia+cuda+12.4.0-base-ubuntu20.04` ```bash $ nvidia-smi # 確認當前容器有支援 nvidia-smi Fri Jun 27 05:03:30 2025 +---------------------------------------------------------------------------------------+ | NVIDIA-SMI 535.54.03 Driver Version: 535.54.03 CUDA Version: 12.2 | |-----------------------------------------+----------------------+----------------------+ # Import and start an Ubuntu image from DockerHub $ enroot import docker://nvidia/cuda:12.4.0-base-ubuntu20.04 $ enroot create nvidia+cuda+12.4.0-base-ubuntu20.04.sqsh $ enroot start nvidia+cuda+12.4.0-base-ubuntu20.04 ``` 在 login pod ,可用 `srun` 指令取代: ``` $ srun -N1 --container-image=docker://nvidia/cuda:12.4.0-base-ubuntu20.04 bash ``` - ### troubleshooting - ### 測試環境:ESC4000 (.241) ``` nvidia-container-cli: mount error: failed to add device rules: write /sys/fs/cgroup/devices/slurm/uid_0/job_39/step_0/task_0/devices.allow: operation not permitted [ERROR] /etc/enroot/hooks.d/98-nvidia.sh exited with return code 1 ``` <br> <hr> <br> ## Runtime configuration > https://github.com/NVIDIA/enroot/blob/master/doc/configuration.md > | Setting | Default | Description | | ------ | ------ | ------ | | `ENROOT_LIBRARY_PATH` | `/usr/lib/enroot` | Path to library sources | | `ENROOT_SYSCONF_PATH` | `/etc/enroot` | Path to system configuration files | | `ENROOT_RUNTIME_PATH` | `${XDG_RUNTIME_DIR}/enroot` | Path to the runtime working directory | | `ENROOT_CONFIG_PATH` | `${XDG_CONFIG_HOME}/enroot` | Path to user configuration files | | `ENROOT_CACHE_PATH` | `${XDG_CACHE_HOME}/enroot` | Path to user image/credentials cache | | `ENROOT_DATA_PATH` | `${XDG_DATA_HOME}/enroot` | Path to user container storage | | `ENROOT_TEMP_PATH` | `${TMPDIR}` | Path to temporary directory | - `/etc/enroot/enroot.conf` - Taipei-01 ```ini /etc/enroot.conf # Working directory for enroot: ENROOT_RUNTIME_PATH /raid/local/containers/enroot/runtime/${SLURM_JOB_ID:-} # Directory where container layers are stored: # Enroot defaults are used for ENROOT_CACHE_PATH # Directory where the filesystems of running containers are stored: ENROOT_DATA_PATH /raid/local/containers/enroot/data/$(id -u) # Path to temporary directory #ENROOT_TEMP_PATH ${TMPDIR:-/tmp} # Options passed to mksquashfs to produce container images. ENROOT_SQUASH_OPTIONS -noI -noD -noF -noX -no-duplicates # Mount the current user's home directory by default. ENROOT_MOUNT_HOME no # Path to user configuration files ENROOT_CONFIG_PATH ${HOME}/.config/enroot # Restrict /dev inside the container to a minimal set of devices. ENROOT_RESTRICT_DEV yes # Make the container root filesystem writable by default. ENROOT_ROOTFS_WRITABLE no # Options passed to zstd to compress digest layers. ENROOT_ZSTD_OPTIONS -1 # Number of times network operations should be retried. ENROOT_TRANSFER_RETRIES 5 # Maximum time in seconds to wait for connections establishment (0 means unlimited). ENROOT_CONNECT_TIMEOUT 60 # Maximum time in seconds to wait for network operations to complete (0 means unlimited). ENROOT_TRANSFER_TIMEOUT 1200 # Maximum number of concurrent connections (0 means unlimited). ENROOT_MAX_CONNECTIONS 10 # Path to library sources #ENROOT_LIBRARY_PATH /usr/lib/enroot # Path to system configuration file #ENROOT_SYSCONF_PATH /etc/enroot # Gzip program used to uncompress digest layers. #ENROOT_GZIP_PROGRAM gzip # Remap the current user to root inside containers by default. #ENROOT_REMAP_ROOT no # Maximum number of processors to use for parallel tasks (0 means unlimited). #ENROOT_MAX_PROCESSORS $(nproc) # Use a login shell to run the container initialization. #ENROOT_LOGIN_SHELL yes # Allow root to retain his superuser privileges inside containers. #ENROOT_ALLOW_SUPERUSER no # Use HTTP for outgoing requests instead of HTTPS (UNSECURE!). #ENROOT_ALLOW_HTTP no # Include user-specific configuration inside bundles by default. #ENROOT_BUNDLE_ALL no # Generate an embedded checksum inside bundles by default. #ENROOT_BUNDLE_CHECKSUM no # Always use --force on command invocations. #ENROOT_FORCE_OVERRIDE no # SSL certificates settings #SSL_CERT_DIR #SSL_CERT_FILE # Proxy settings #all_proxy #no_proxy #http_proxy ``` <br> <hr> <br> ## Requirements > https://github.com/NVIDIA/enroot/blob/master/doc/requirements.md - ### 支援 Enroot 所需的功能(例如 cgroup 支援、BPF、namespace 等) - ### 檢查 enroot 所需的前置條件 透過執行指定版本的 enroot-check bundle,自動檢查所需的條件是否符合。 ``` $ curl -fSsL -O https://github.com/NVIDIA/enroot/releases/download/v3.5.0/enroot-check_3.5.0_$(uname -m).run $ chmod +x enroot-check_*.run $ ./enroot-check_*.run --verify $ ./enroot-check_*.run Bundle ran successfully! ``` - `$ ./enroot-check_*.run --verify` > [enroot/src/bundle.sh](https://github.com/NVIDIA/enroot/blob/master/src/bundle.sh#L95) ``` ls "/proc/config.gz" ls "/boot/config-$(uname -r)" ls "/usr/src/linux-$(uname -r)/.config" ls "/usr/src/linux/.config" ls "/lib/modules/$(uname -r)/build/.config" ``` - 在主機環境測試,容器預設無掛載 host kernel config - ### 測試環境:ESC4000 (.241) - ### k8s / slurm cluster / compute node  ``` root@gpu1080a-0:/tmp# curl -fSsL -O https://github.com/NVIDIA/enroot/releases/download/v3.5.0/enroot-check_3.5.0_$(uname -m).run root@gpu1080a-0:/tmp# chmod +x enroot-check_*.run root@gpu1080a-0:/tmp# ./enroot-check_*.run --verify [ERROR] Could not find kernel configuration root@gpu1080a-0:/tmp# ./enroot-check_*.run Extracting [####################] 100% Bundle ran successfully! ``` - ### host  ``` $ ls "/proc/config.gz" ls: cannot access '/proc/config.gz': No such file or directory $ ls "/boot/config-$(uname -r)" /boot/config-5.4.0-216-generic $ ls "/usr/src/linux-$(uname -r)/.config" ls: cannot access '/usr/src/linux-5.4.0-216-generic/.config': No such file or directory $ ls "/usr/src/linux/.config" ls: cannot access '/usr/src/linux/.config': No such file or directory $ ls "/lib/modules/$(uname -r)/build/.config" /lib/modules/5.4.0-216-generic/build/.config ``` - 有內容的檔案 - `/boot/config-$(uname -r)` - `/lib/modules/$(uname -r)/build/.config` - ### 如果把 host 檔案複製到 compute pod  ``` root@gpu1080a-0:~# ./enroot-check_*.run --verify Kernel version: Linux version 5.4.0-216-generic (buildd@lcy02-amd64-014) (gcc version 9.4.0 (Ubuntu 9.4.0-1ubuntu1~20.04.2)) #236-Ubuntu SMP Fri Apr 11 19:53:21 UTC 2025 Kernel configuration: CONFIG_NAMESPACES : OK CONFIG_USER_NS : OK CONFIG_SECCOMP_FILTER : OK CONFIG_OVERLAY_FS : OK (module) CONFIG_X86_VSYSCALL_EMULATION : OK CONFIG_VSYSCALL_EMULATE : KO (required if glibc <= 2.13) CONFIG_VSYSCALL_NATIVE : KO (required if glibc <= 2.13) Kernel command line: vsyscall=native : KO (required if glibc <= 2.13) vsyscall=emulate : KO (required if glibc <= 2.13) Kernel parameters: kernel.unprivileged_userns_clone : OK user.max_user_namespaces : OK user.max_mnt_namespaces : OK Extra packages: nvidia-container-cli : OK root@gpu1080a-0:~# cat /boot/config-$(uname -r) | grep CONFIG_VSYSCALL_EMULATE root@gpu1080a-0:~# ldd --version ldd (Ubuntu GLIBC 2.39-0ubuntu8.4) 2.39 Copyright (C) 2024 Free Software Foundation, Inc. This is free software; see the source for copying conditions. There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. Written by Roland McGrath and Ulrich Drepper. ``` - ### 測試環境:DemoSite (5glab) - ### k8s / slurm cluster / compute node  ``` root@gpu1080a-0:/tmp# curl -fSsL -O https://github.com/NVIDIA/enroot/releases/download/v3.5.0/enroot-check_3.5.0_$(uname -m).run root@gpu1080a-0:/tmp# chmod +x enroot-check_*.run root@gpu1080a-0:/tmp# ./enroot-check_*.run --verify [ERROR] Could not find kernel configuration root@gpu1080a-0:/tmp# ./enroot-check_*.run Extracting [####################] 100% Bundle ran successfully! ``` - ### host  ``` $ ls "/proc/config.gz" ls: cannot access '/proc/config.gz': No such file or directory $ ls "/boot/config-$(uname -r)" /boot/config-5.15.0-142-generic $ ls "/usr/src/linux-$(uname -r)/.config" ls: cannot access '/usr/src/linux-5.15.0-142-generic/.config': No such file or directory $ ls "/usr/src/linux/.config" ls: cannot access '/usr/src/linux/.config': No such file or directory $ ls "/lib/modules/$(uname -r)/build/.config" /lib/modules/5.15.0-142-generic/build/.config ``` - ### Kernel configuration > https://github.com/NVIDIA/enroot/blob/master/doc/requirements.md#kernel-configuration - 必須啟用底下核心配置選項: - `CONFIG_NAMESPACES` - `CONFIG_USER_NS` - `CONFIG_SECCOMP_FILTER` - 為了匯入 Docker 映像或使用 enroot-mksquashovlfs - `CONFIG_OVERLAY_FS` - ### 測試環境:ESC4000 (.241) - `/boot/config-$(uname -r)` ``` $ grep CONFIG_NAMESPACES /boot/config-$(uname -r) CONFIG_NAMESPACES=y $ grep CONFIG_USER_NS /boot/config-$(uname -r) CONFIG_USER_NS=y $ grep CONFIG_SECCOMP_FILTER /boot/config-$(uname -r) CONFIG_SECCOMP_FILTER=y $ grep CONFIG_OVERLAY_FS /boot/config-$(uname -r) CONFIG_OVERLAY_FS=m ``` - `=y` 表示啟用 - `=m` 表示作為模組 - `# ... is not set` 表示未啟用。 - `/lib/modules/$(uname -r)/build/.config` ``` $ grep CONFIG_NAMESPACES "/lib/modules/$(uname -r)/build/.config" CONFIG_NAMESPACES=y $ grep CONFIG_USER_NS "/lib/modules/$(uname -r)/build/.config" CONFIG_USER_NS=y $ grep CONFIG_SECCOMP_FILTER "/lib/modules/$(uname -r)/build/.config" CONFIG_SECCOMP_FILTER=y $ grep CONFIG_OVERLAY_FS "/lib/modules/$(uname -r)/build/.config" CONFIG_OVERLAY_FS=m ``` - 結果同 `/boot/config-$(uname -r)` <br> <hr> <br> ## config ### `sysctl` - ### `unprivileged_bpf_disabled` - ### read `$ sudo sysctl kernel.unprivileged_bpf_disabled` - ### write `$ sudo sysctl -w kernel.unprivileged_bpf_disable=0` - ### file content ``` $ cat /proc/sys/kernel/unprivileged_bpf_disabled 0 ``` - ### file permissions ``` ll /proc/sys/kernel/unprivileged_bpf_disabled -rw-r--r-- 1 root root 0 Jun 26 10:24 /proc/sys/kernel/unprivileged_bpf_disabled ``` <br> <hr> <br> ## Slurm Containers Guide > https://slurm.schedmd.com/containers.html ### Prerequisites (待確認) The host kernel must be configured to allow user land containers: ``` sudo sysctl -w kernel.unprivileged_userns_clone=1 sudo sysctl -w kernel.apparmor_restrict_unprivileged_unconfined=0 sudo sysctl -w kernel.apparmor_restrict_unprivileged_userns=0 ``` <br> <hr> <br> ## 討論 ### 解釋 `enroot-check_*.run` 這個工具執行時會檢查哪些項目 - ### gpt-4.1-mini 好的,以下是 NVIDIA Enroot 的 `enroot-check` 工具執行時會檢查的主要項目,以及它們的意義和可能影響: --- # Enroot-Check 工具檢查項目解析 1. ### Kernel Configuration (核心設定) * **檢查內容**:系統核心是否編譯或啟用了 Enroot 運行所需的功能,例如 namespace 支援(userns、pidns、mntns 等)、cgroup 支援(尤其是 devices controller)、BPF 支援等。 * **影響**:若缺少這些功能,容器無法使用 Linux namespace 或無法管控設備存取,會導致容器啟動失敗或執行異常。 * **錯誤示例**: ``` [ERROR] Could not find kernel configuration ``` 表示找不到核心設定檔或缺少必要功能。 2. ### cgroups (控制群組) * **檢查內容**:確認 cgroup 的版本(v1 或 v2),以及是否支援必須的子系統,如 devices cgroup,因為 Enroot 會透過 cgroup 控制 GPU、設備的存取權限。 * **影響**:cgroup 設定不完整或權限不足會讓 Enroot 無法正確綁定 NVIDIA 裝置或限制容器權限。 3. ### BPF (Berkeley Packet Filter) * **檢查內容**:確認系統是否允許未特權用戶使用 BPF 程式,並檢查 BPF 功能是否完整。 * **影響**:Pyxis(Slurm + Enroot NVIDIA 插件)會使用 BPF 來管理設備權限,若系統禁用了未特權 BPF,容器啟動會失敗。 4. ### Namespace (命名空間) * **檢查內容**:確認 user namespace、PID namespace、mount namespace 等是否可用且沒有被系統安全政策禁用。 * **影響**:這些命名空間是容器隔離的核心,缺乏會讓容器無法正確隔離。 5. ### NVIDIA 驅動及相關工具 * **檢查內容**:確認 NVIDIA 驅動及 `nvidia-container-cli` 等工具是否安裝且版本正確。 * **影響**:驅動或工具版本不匹配,會導致 GPU 無法注入容器。 6. ### 文件系統支援 * **檢查內容**:確認系統文件系統(如 overlayfs)支援容器執行所需的 whiteout 文件等機制。 * **影響**:缺乏這些機制會造成映像合成或文件覆寫異常。 7. ### 權限檢查 * **檢查內容**:檢查當前使用者是否有足夠權限操作 cgroup、設備節點及必要的系統資源。 * **影響**:權限不足會讓容器啟動失敗。 --- # 進階說明 * **Kernel config 取得問題** 由於許多 Linux 發行版會不預設保留完整的 kernel config 檔案(如 `/proc/config.gz`),若找不到該檔案,`enroot-check` 會報錯,但這不一定代表核心真的缺少功能。 可手動確認是否有 `/boot/config-$(uname -r)` 或與系統版本相符的核心設定檔。 * **BPF 及 cgroup v2 的互動** 新版 Linux 趨向使用 cgroup v2,但 Pyxis 和 Enroot 在某些情況下仍依賴 cgroup v1 或特定子系統,版本混用時容易出問題。 * **解決方向** * 如果 `enroot-check` 顯示缺少 kernel config,試著找核心設定檔,或確認是否需要安裝額外套件(例如 `linux-image-extra`)。 * 確認系統允許 unprivileged BPF(`sysctl kernel.unprivileged_bpf_disabled=0`)。 * 檢查 cgroup 驅動及裝置權限設定。 * 驗證 NVIDIA 驅動及工具是否正常。 --- 如果你提供 `enroot-check` 的完整輸出結果,我可以幫你分析裡面可能的錯誤與修正建議。你覺得如何?
×
Sign in
Email
Password
Forgot password
or
By clicking below, you agree to our
terms of service
.
Sign in via Facebook
Sign in via Twitter
Sign in via GitHub
Sign in via Dropbox
Sign in with Wallet
Wallet (
)
Connect another wallet
New to HackMD?
Sign up