enroot === ###### tags: `NVIDIA` ###### tags: `NVIDIA`, `enroot`, `Pyxis` <br> [TOC] <br> ## enroot > https://github.com/NVIDIA/enroot ### Terms - [unprivileged (i.e. rootless)](https://slurm.schedmd.com/containers.html#limitations) <br> ### Summary Enroot 是輕量無特權沙盒工具,類似加強版 chroot,保留檔案系統隔離,適合高效能且重視可攜性的環境,支援 Docker 映像。 <br> ### Description 一個簡單卻強大的工具,能將傳統的容器或作業系統映像檔轉換成 **無特權(unprivileged)** 的沙盒環境。 Enroot 可被視為一種增強版的無特權 `chroot(1)`。它採用與容器相同的底層技術,但移除了容器固有的大部分隔離功能,卻仍保有檔案系統的分離。 這種方式通常較適合用於對效能要求高或虛擬化環境中,重視可攜性與可重現性,但不需要額外隔離的情境。 Enroot 也類似於其他工具如 `proot(1)` 或 `fakeroot(1)`,但它利用了 Linux 核心較新的功能(例如使用者與掛載命名空間),並提供匯入常見容器映像格式(例如 Docker 映像檔)的功能。 <br> ### Usage CPU example: - ### ubuntu ``` # Import and start an Ubuntu image from DockerHub $ enroot import docker://ubuntu $ enroot create ubuntu.sqsh $ enroot start ubuntu ``` - ### alpine:latest ``` # Import and start an Ubuntu image from DockerHub $ enroot import docker://alpine:latest $ enroot create alpine+latest.sqsh $ enroot start alpine+latest.sqsh ``` :::spoiler 執行過程 ``` root@gpu1080a-0:~# enroot import docker://alpine:latest [INFO] Querying registry for permission grant [INFO] Authenticating with user: <anonymous> [INFO] Authentication succeeded [INFO] Fetching image manifest list [INFO] Fetching image manifest [INFO] Found all layers in cache [INFO] Extracting image layers... 100% 1:0=0s fe07684b16b82247c3539ed86a65ff37a76138ec25d380bd80c869a1a4c73236 [INFO] Converting whiteouts... 100% 1:0=0s fe07684b16b82247c3539ed86a65ff37a76138ec25d380bd80c869a1a4c73236 [INFO] Creating squashfs filesystem... Parallel mksquashfs: Using 2 processors Creating 4.0 filesystem on /root/alpine+latest.sqsh, block size 131072. [=============================================================================================================================================================================|] 144/144 100% Exportable Squashfs 4.0 filesystem, lzo compressed, data block size 131072 uncompressed data, compressed metadata, compressed fragments, compressed xattrs, compressed ids duplicates are removed Filesystem size 7763.07 Kbytes (7.58 Mbytes) 95.34% of uncompressed filesystem size (8142.74 Kbytes) Inode table size 4378 bytes (4.28 Kbytes) 22.79% of uncompressed inode table size (19206 bytes) Directory table size 5764 bytes (5.63 Kbytes) 57.39% of uncompressed directory table size (10044 bytes) Number of duplicate files found 9 Number of inodes 523 Number of files 90 Number of fragments 6 Number of symbolic links 335 Number of device nodes 0 Number of fifo nodes 0 Number of socket nodes 0 Number of directories 98 Number of hard-links 0 Number of ids (unique uids + gids) 1 Number of uids 1 root (0) Number of gids 1 root (0) ``` ``` root@gpu1080a-0:~# ll total 227432 drwx------ 1 root root 4096 Jun 27 03:55 ./ drwxr-xr-x 1 root root 4096 Jun 26 09:44 ../ -rw------- 1 root root 3894 Jun 27 03:48 .bash_history -rw-r--r-- 1 root root 3106 Apr 22 2024 .bashrc drwx------ 3 root root 4096 Jun 26 10:15 .cache/ drwx------ 3 root root 4096 Jun 26 10:15 .local/ drwxr-xr-x 3 root root 4096 Jun 26 10:15 .parallel/ -rw-r--r-- 1 root root 161 Apr 22 2024 .profile -rw------- 1 root root 3379 Jun 27 03:54 .viminfo -rw-r--r-- 1 root root 7950336 Jun 27 03:55 alpine+latest.sqsh ``` ``` root@gpu1080a-0:~# enroot create alpine+latest.sqsh [INFO] Extracting squashfs filesystem... Parallel unsquashfs: Using 2 processors 425 inodes (144 blocks) to write [=============================================================================================================================================================================|] 569/569 100% created 90 files created 98 directories created 335 symlinks created 0 devices created 0 fifos created 0 sockets created 0 hardlinks ``` ``` root@gpu1080a-0:~# enroot start alpine+latest.sqsh / # ``` ::: <br> ### GPU examples: - ### `nvidia+cuda+12.4.0-base-ubuntu20.04` ```bash $ nvidia-smi # 確認當前容器有支援 nvidia-smi Fri Jun 27 05:03:30 2025 +---------------------------------------------------------------------------------------+ | NVIDIA-SMI 535.54.03 Driver Version: 535.54.03 CUDA Version: 12.2 | |-----------------------------------------+----------------------+----------------------+ # Import and start an Ubuntu image from DockerHub $ enroot import docker://nvidia/cuda:12.4.0-base-ubuntu20.04 $ enroot create nvidia+cuda+12.4.0-base-ubuntu20.04.sqsh $ enroot start nvidia+cuda+12.4.0-base-ubuntu20.04 ``` 在 login pod ,可用 `srun` 指令取代: ``` $ srun -N1 --container-image=docker://nvidia/cuda:12.4.0-base-ubuntu20.04 bash ``` - ### troubleshooting - ### 測試環境:ESC4000 (.241) ``` nvidia-container-cli: mount error: failed to add device rules: write /sys/fs/cgroup/devices/slurm/uid_0/job_39/step_0/task_0/devices.allow: operation not permitted [ERROR] /etc/enroot/hooks.d/98-nvidia.sh exited with return code 1 ``` <br> <hr> <br> ## Runtime configuration > https://github.com/NVIDIA/enroot/blob/master/doc/configuration.md > | Setting | Default | Description | | ------ | ------ | ------ | | `ENROOT_LIBRARY_PATH` | `/usr/lib/enroot` | Path to library sources | | `ENROOT_SYSCONF_PATH` | `/etc/enroot` | Path to system configuration files | | `ENROOT_RUNTIME_PATH` | `${XDG_RUNTIME_DIR}/enroot` | Path to the runtime working directory | | `ENROOT_CONFIG_PATH` | `${XDG_CONFIG_HOME}/enroot` | Path to user configuration files | | `ENROOT_CACHE_PATH` | `${XDG_CACHE_HOME}/enroot` | Path to user image/credentials cache | | `ENROOT_DATA_PATH` | `${XDG_DATA_HOME}/enroot` | Path to user container storage | | `ENROOT_TEMP_PATH` | `${TMPDIR}` | Path to temporary directory | - `/etc/enroot/enroot.conf` - Taipei-01 ```ini /etc/enroot.conf # Working directory for enroot: ENROOT_RUNTIME_PATH /raid/local/containers/enroot/runtime/${SLURM_JOB_ID:-} # Directory where container layers are stored: # Enroot defaults are used for ENROOT_CACHE_PATH # Directory where the filesystems of running containers are stored: ENROOT_DATA_PATH /raid/local/containers/enroot/data/$(id -u) # Path to temporary directory #ENROOT_TEMP_PATH ${TMPDIR:-/tmp} # Options passed to mksquashfs to produce container images. ENROOT_SQUASH_OPTIONS -noI -noD -noF -noX -no-duplicates # Mount the current user's home directory by default. ENROOT_MOUNT_HOME no # Path to user configuration files ENROOT_CONFIG_PATH ${HOME}/.config/enroot # Restrict /dev inside the container to a minimal set of devices. ENROOT_RESTRICT_DEV yes # Make the container root filesystem writable by default. ENROOT_ROOTFS_WRITABLE no # Options passed to zstd to compress digest layers. ENROOT_ZSTD_OPTIONS -1 # Number of times network operations should be retried. ENROOT_TRANSFER_RETRIES 5 # Maximum time in seconds to wait for connections establishment (0 means unlimited). ENROOT_CONNECT_TIMEOUT 60 # Maximum time in seconds to wait for network operations to complete (0 means unlimited). ENROOT_TRANSFER_TIMEOUT 1200 # Maximum number of concurrent connections (0 means unlimited). ENROOT_MAX_CONNECTIONS 10 # Path to library sources #ENROOT_LIBRARY_PATH /usr/lib/enroot # Path to system configuration file #ENROOT_SYSCONF_PATH /etc/enroot # Gzip program used to uncompress digest layers. #ENROOT_GZIP_PROGRAM gzip # Remap the current user to root inside containers by default. #ENROOT_REMAP_ROOT no # Maximum number of processors to use for parallel tasks (0 means unlimited). #ENROOT_MAX_PROCESSORS $(nproc) # Use a login shell to run the container initialization. #ENROOT_LOGIN_SHELL yes # Allow root to retain his superuser privileges inside containers. #ENROOT_ALLOW_SUPERUSER no # Use HTTP for outgoing requests instead of HTTPS (UNSECURE!). #ENROOT_ALLOW_HTTP no # Include user-specific configuration inside bundles by default. #ENROOT_BUNDLE_ALL no # Generate an embedded checksum inside bundles by default. #ENROOT_BUNDLE_CHECKSUM no # Always use --force on command invocations. #ENROOT_FORCE_OVERRIDE no # SSL certificates settings #SSL_CERT_DIR #SSL_CERT_FILE # Proxy settings #all_proxy #no_proxy #http_proxy ``` <br> <hr> <br> ## Requirements > https://github.com/NVIDIA/enroot/blob/master/doc/requirements.md - ### 支援 Enroot 所需的功能(例如 cgroup 支援、BPF、namespace 等) - ### 檢查 enroot 所需的前置條件 透過執行指定版本的 enroot-check bundle,自動檢查所需的條件是否符合。 ``` $ curl -fSsL -O https://github.com/NVIDIA/enroot/releases/download/v3.5.0/enroot-check_3.5.0_$(uname -m).run $ chmod +x enroot-check_*.run $ ./enroot-check_*.run --verify $ ./enroot-check_*.run Bundle ran successfully! ``` - `$ ./enroot-check_*.run --verify` > [enroot/src/bundle.sh](https://github.com/NVIDIA/enroot/blob/master/src/bundle.sh#L95) ``` ls "/proc/config.gz" ls "/boot/config-$(uname -r)" ls "/usr/src/linux-$(uname -r)/.config" ls "/usr/src/linux/.config" ls "/lib/modules/$(uname -r)/build/.config" ``` - 在主機環境測試,容器預設無掛載 host kernel config - ### 測試環境:ESC4000 (.241) - ### k8s / slurm cluster / compute node  ``` root@gpu1080a-0:/tmp# curl -fSsL -O https://github.com/NVIDIA/enroot/releases/download/v3.5.0/enroot-check_3.5.0_$(uname -m).run root@gpu1080a-0:/tmp# chmod +x enroot-check_*.run root@gpu1080a-0:/tmp# ./enroot-check_*.run --verify [ERROR] Could not find kernel configuration root@gpu1080a-0:/tmp# ./enroot-check_*.run Extracting [####################] 100% Bundle ran successfully! ``` - ### host  ``` $ ls "/proc/config.gz" ls: cannot access '/proc/config.gz': No such file or directory $ ls "/boot/config-$(uname -r)" /boot/config-5.4.0-216-generic $ ls "/usr/src/linux-$(uname -r)/.config" ls: cannot access '/usr/src/linux-5.4.0-216-generic/.config': No such file or directory $ ls "/usr/src/linux/.config" ls: cannot access '/usr/src/linux/.config': No such file or directory $ ls "/lib/modules/$(uname -r)/build/.config" /lib/modules/5.4.0-216-generic/build/.config ``` - 有內容的檔案 - `/boot/config-$(uname -r)` - `/lib/modules/$(uname -r)/build/.config` - ### 如果把 host 檔案複製到 compute pod  ``` root@gpu1080a-0:~# ./enroot-check_*.run --verify Kernel version: Linux version 5.4.0-216-generic (buildd@lcy02-amd64-014) (gcc version 9.4.0 (Ubuntu 9.4.0-1ubuntu1~20.04.2)) #236-Ubuntu SMP Fri Apr 11 19:53:21 UTC 2025 Kernel configuration: CONFIG_NAMESPACES : OK CONFIG_USER_NS : OK CONFIG_SECCOMP_FILTER : OK CONFIG_OVERLAY_FS : OK (module) CONFIG_X86_VSYSCALL_EMULATION : OK CONFIG_VSYSCALL_EMULATE : KO (required if glibc <= 2.13) CONFIG_VSYSCALL_NATIVE : KO (required if glibc <= 2.13) Kernel command line: vsyscall=native : KO (required if glibc <= 2.13) vsyscall=emulate : KO (required if glibc <= 2.13) Kernel parameters: kernel.unprivileged_userns_clone : OK user.max_user_namespaces : OK user.max_mnt_namespaces : OK Extra packages: nvidia-container-cli : OK root@gpu1080a-0:~# cat /boot/config-$(uname -r) | grep CONFIG_VSYSCALL_EMULATE root@gpu1080a-0:~# ldd --version ldd (Ubuntu GLIBC 2.39-0ubuntu8.4) 2.39 Copyright (C) 2024 Free Software Foundation, Inc. This is free software; see the source for copying conditions. There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. Written by Roland McGrath and Ulrich Drepper. ``` - ### 測試環境:DemoSite (5glab) - ### k8s / slurm cluster / compute node  ``` root@gpu1080a-0:/tmp# curl -fSsL -O https://github.com/NVIDIA/enroot/releases/download/v3.5.0/enroot-check_3.5.0_$(uname -m).run root@gpu1080a-0:/tmp# chmod +x enroot-check_*.run root@gpu1080a-0:/tmp# ./enroot-check_*.run --verify [ERROR] Could not find kernel configuration root@gpu1080a-0:/tmp# ./enroot-check_*.run Extracting [####################] 100% Bundle ran successfully! ``` - ### host  ``` $ ls "/proc/config.gz" ls: cannot access '/proc/config.gz': No such file or directory $ ls "/boot/config-$(uname -r)" /boot/config-5.15.0-142-generic $ ls "/usr/src/linux-$(uname -r)/.config" ls: cannot access '/usr/src/linux-5.15.0-142-generic/.config': No such file or directory $ ls "/usr/src/linux/.config" ls: cannot access '/usr/src/linux/.config': No such file or directory $ ls "/lib/modules/$(uname -r)/build/.config" /lib/modules/5.15.0-142-generic/build/.config ``` - ### Kernel configuration > https://github.com/NVIDIA/enroot/blob/master/doc/requirements.md#kernel-configuration - 必須啟用底下核心配置選項: - `CONFIG_NAMESPACES` - `CONFIG_USER_NS` - `CONFIG_SECCOMP_FILTER` - 為了匯入 Docker 映像或使用 enroot-mksquashovlfs - `CONFIG_OVERLAY_FS` - ### 測試環境:ESC4000 (.241) - `/boot/config-$(uname -r)` ``` $ grep CONFIG_NAMESPACES /boot/config-$(uname -r) CONFIG_NAMESPACES=y $ grep CONFIG_USER_NS /boot/config-$(uname -r) CONFIG_USER_NS=y $ grep CONFIG_SECCOMP_FILTER /boot/config-$(uname -r) CONFIG_SECCOMP_FILTER=y $ grep CONFIG_OVERLAY_FS /boot/config-$(uname -r) CONFIG_OVERLAY_FS=m ``` - `=y` 表示啟用 - `=m` 表示作為模組 - `# ... is not set` 表示未啟用。 - `/lib/modules/$(uname -r)/build/.config` ``` $ grep CONFIG_NAMESPACES "/lib/modules/$(uname -r)/build/.config" CONFIG_NAMESPACES=y $ grep CONFIG_USER_NS "/lib/modules/$(uname -r)/build/.config" CONFIG_USER_NS=y $ grep CONFIG_SECCOMP_FILTER "/lib/modules/$(uname -r)/build/.config" CONFIG_SECCOMP_FILTER=y $ grep CONFIG_OVERLAY_FS "/lib/modules/$(uname -r)/build/.config" CONFIG_OVERLAY_FS=m ``` - 結果同 `/boot/config-$(uname -r)` <br> <hr> <br> ## config ### `sysctl` - ### `unprivileged_bpf_disabled` - ### read `$ sudo sysctl kernel.unprivileged_bpf_disabled` - ### write `$ sudo sysctl -w kernel.unprivileged_bpf_disable=0` - ### file content ``` $ cat /proc/sys/kernel/unprivileged_bpf_disabled 0 ``` - ### file permissions ``` ll /proc/sys/kernel/unprivileged_bpf_disabled -rw-r--r-- 1 root root 0 Jun 26 10:24 /proc/sys/kernel/unprivileged_bpf_disabled ``` <br> <hr> <br> ## Slurm Containers Guide > https://slurm.schedmd.com/containers.html ### Prerequisites (待確認) The host kernel must be configured to allow user land containers: ``` sudo sysctl -w kernel.unprivileged_userns_clone=1 sudo sysctl -w kernel.apparmor_restrict_unprivileged_unconfined=0 sudo sysctl -w kernel.apparmor_restrict_unprivileged_userns=0 ``` <br> <hr> <br> ## 討論 ### 解釋 `enroot-check_*.run` 這個工具執行時會檢查哪些項目 - ### gpt-4.1-mini 好的,以下是 NVIDIA Enroot 的 `enroot-check` 工具執行時會檢查的主要項目,以及它們的意義和可能影響: --- # Enroot-Check 工具檢查項目解析 1. ### Kernel Configuration (核心設定) * **檢查內容**:系統核心是否編譯或啟用了 Enroot 運行所需的功能,例如 namespace 支援(userns、pidns、mntns 等)、cgroup 支援(尤其是 devices controller)、BPF 支援等。 * **影響**:若缺少這些功能,容器無法使用 Linux namespace 或無法管控設備存取,會導致容器啟動失敗或執行異常。 * **錯誤示例**: ``` [ERROR] Could not find kernel configuration ``` 表示找不到核心設定檔或缺少必要功能。 2. ### cgroups (控制群組) * **檢查內容**:確認 cgroup 的版本(v1 或 v2),以及是否支援必須的子系統,如 devices cgroup,因為 Enroot 會透過 cgroup 控制 GPU、設備的存取權限。 * **影響**:cgroup 設定不完整或權限不足會讓 Enroot 無法正確綁定 NVIDIA 裝置或限制容器權限。 3. ### BPF (Berkeley Packet Filter) * **檢查內容**:確認系統是否允許未特權用戶使用 BPF 程式,並檢查 BPF 功能是否完整。 * **影響**:Pyxis(Slurm + Enroot NVIDIA 插件)會使用 BPF 來管理設備權限,若系統禁用了未特權 BPF,容器啟動會失敗。 4. ### Namespace (命名空間) * **檢查內容**:確認 user namespace、PID namespace、mount namespace 等是否可用且沒有被系統安全政策禁用。 * **影響**:這些命名空間是容器隔離的核心,缺乏會讓容器無法正確隔離。 5. ### NVIDIA 驅動及相關工具 * **檢查內容**:確認 NVIDIA 驅動及 `nvidia-container-cli` 等工具是否安裝且版本正確。 * **影響**:驅動或工具版本不匹配,會導致 GPU 無法注入容器。 6. ### 文件系統支援 * **檢查內容**:確認系統文件系統(如 overlayfs)支援容器執行所需的 whiteout 文件等機制。 * **影響**:缺乏這些機制會造成映像合成或文件覆寫異常。 7. ### 權限檢查 * **檢查內容**:檢查當前使用者是否有足夠權限操作 cgroup、設備節點及必要的系統資源。 * **影響**:權限不足會讓容器啟動失敗。 --- # 進階說明 * **Kernel config 取得問題** 由於許多 Linux 發行版會不預設保留完整的 kernel config 檔案(如 `/proc/config.gz`),若找不到該檔案,`enroot-check` 會報錯,但這不一定代表核心真的缺少功能。 可手動確認是否有 `/boot/config-$(uname -r)` 或與系統版本相符的核心設定檔。 * **BPF 及 cgroup v2 的互動** 新版 Linux 趨向使用 cgroup v2,但 Pyxis 和 Enroot 在某些情況下仍依賴 cgroup v1 或特定子系統,版本混用時容易出問題。 * **解決方向** * 如果 `enroot-check` 顯示缺少 kernel config,試著找核心設定檔,或確認是否需要安裝額外套件(例如 `linux-image-extra`)。 * 確認系統允許 unprivileged BPF(`sysctl kernel.unprivileged_bpf_disabled=0`)。 * 檢查 cgroup 驅動及裝置權限設定。 * 驗證 NVIDIA 驅動及工具是否正常。 --- 如果你提供 `enroot-check` 的完整輸出結果,我可以幫你分析裡面可能的錯誤與修正建議。你覺得如何?
×
Sign in
Email
Password
Forgot password
or
Sign in via Google
Sign in via Facebook
Sign in via X(Twitter)
Sign in via GitHub
Sign in via Dropbox
Sign in with Wallet
Wallet (
)
Connect another wallet
Continue with a different method
New to HackMD?
Sign up
By signing in, you agree to our
terms of service
.