enroot
===
###### tags: `NVIDIA / enroot /`
###### tags: `NVIDIA`, `enroot`, `Pyxis`
<br>
[TOC]
<br>
## enroot
> https://github.com/NVIDIA/enroot
### Terms
- [unprivileged (i.e. rootless)](https://slurm.schedmd.com/containers.html#limitations)
<br>
### Summary
Enroot 是輕量無特權沙盒工具,類似加強版 chroot,保留檔案系統隔離,適合高效能且重視可攜性的環境,支援 Docker 映像。
<br>
### Description
一個簡單卻強大的工具,能將傳統的容器或作業系統映像檔轉換成 **無特權(unprivileged)** 的沙盒環境。
Enroot 可被視為一種增強版的無特權 `chroot(1)`。它採用與容器相同的底層技術,但移除了容器固有的大部分隔離功能,卻仍保有檔案系統的分離。
這種方式通常較適合用於對效能要求高或虛擬化環境中,重視可攜性與可重現性,但不需要額外隔離的情境。
Enroot 也類似於其他工具如 `proot(1)` 或 `fakeroot(1)`,但它利用了 Linux 核心較新的功能(例如使用者與掛載命名空間),並提供匯入常見容器映像格式(例如 Docker 映像檔)的功能。
<br>
### Usage CPU example:
- ### ubuntu
```
# Import and start an Ubuntu image from DockerHub
$ enroot import docker://ubuntu
$ enroot create ubuntu.sqsh
$ enroot start ubuntu
```
- ### alpine:latest
```
# Import and start an Ubuntu image from DockerHub
$ enroot import docker://alpine:latest
$ enroot create alpine+latest.sqsh
$ enroot start alpine+latest.sqsh
```
:::spoiler 執行過程
```
root@gpu1080a-0:~# enroot import docker://alpine:latest
[INFO] Querying registry for permission grant
[INFO] Authenticating with user: <anonymous>
[INFO] Authentication succeeded
[INFO] Fetching image manifest list
[INFO] Fetching image manifest
[INFO] Found all layers in cache
[INFO] Extracting image layers...
100% 1:0=0s fe07684b16b82247c3539ed86a65ff37a76138ec25d380bd80c869a1a4c73236
[INFO] Converting whiteouts...
100% 1:0=0s fe07684b16b82247c3539ed86a65ff37a76138ec25d380bd80c869a1a4c73236
[INFO] Creating squashfs filesystem...
Parallel mksquashfs: Using 2 processors
Creating 4.0 filesystem on /root/alpine+latest.sqsh, block size 131072.
[=============================================================================================================================================================================|] 144/144 100%
Exportable Squashfs 4.0 filesystem, lzo compressed, data block size 131072
uncompressed data, compressed metadata, compressed fragments,
compressed xattrs, compressed ids
duplicates are removed
Filesystem size 7763.07 Kbytes (7.58 Mbytes)
95.34% of uncompressed filesystem size (8142.74 Kbytes)
Inode table size 4378 bytes (4.28 Kbytes)
22.79% of uncompressed inode table size (19206 bytes)
Directory table size 5764 bytes (5.63 Kbytes)
57.39% of uncompressed directory table size (10044 bytes)
Number of duplicate files found 9
Number of inodes 523
Number of files 90
Number of fragments 6
Number of symbolic links 335
Number of device nodes 0
Number of fifo nodes 0
Number of socket nodes 0
Number of directories 98
Number of hard-links 0
Number of ids (unique uids + gids) 1
Number of uids 1
root (0)
Number of gids 1
root (0)
```
```
root@gpu1080a-0:~# ll
total 227432
drwx------ 1 root root 4096 Jun 27 03:55 ./
drwxr-xr-x 1 root root 4096 Jun 26 09:44 ../
-rw------- 1 root root 3894 Jun 27 03:48 .bash_history
-rw-r--r-- 1 root root 3106 Apr 22 2024 .bashrc
drwx------ 3 root root 4096 Jun 26 10:15 .cache/
drwx------ 3 root root 4096 Jun 26 10:15 .local/
drwxr-xr-x 3 root root 4096 Jun 26 10:15 .parallel/
-rw-r--r-- 1 root root 161 Apr 22 2024 .profile
-rw------- 1 root root 3379 Jun 27 03:54 .viminfo
-rw-r--r-- 1 root root 7950336 Jun 27 03:55 alpine+latest.sqsh
```
```
root@gpu1080a-0:~# enroot create alpine+latest.sqsh
[INFO] Extracting squashfs filesystem...
Parallel unsquashfs: Using 2 processors
425 inodes (144 blocks) to write
[=============================================================================================================================================================================|] 569/569 100%
created 90 files
created 98 directories
created 335 symlinks
created 0 devices
created 0 fifos
created 0 sockets
created 0 hardlinks
```
```
root@gpu1080a-0:~# enroot start alpine+latest.sqsh
/ #
```
:::
<br>
### GPU examples:
- ### `nvidia+cuda+12.4.0-base-ubuntu20.04`
```bash
$ nvidia-smi # 確認當前容器有支援 nvidia-smi
Fri Jun 27 05:03:30 2025
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.54.03 Driver Version: 535.54.03 CUDA Version: 12.2 |
|-----------------------------------------+----------------------+----------------------+
# Import and start an Ubuntu image from DockerHub
$ enroot import docker://nvidia/cuda:12.4.0-base-ubuntu20.04
$ enroot create nvidia+cuda+12.4.0-base-ubuntu20.04.sqsh
$ enroot start nvidia+cuda+12.4.0-base-ubuntu20.04
```
在 login pod ,可用 `srun` 指令取代:
```
$ srun -N1 --container-image=docker://nvidia/cuda:12.4.0-base-ubuntu20.04 bash
```
- ### troubleshooting
- ### 測試環境:ESC4000 (.241)
```
nvidia-container-cli: mount error: failed to add device rules: write /sys/fs/cgroup/devices/slurm/uid_0/job_39/step_0/task_0/devices.allow: operation not permitted
[ERROR] /etc/enroot/hooks.d/98-nvidia.sh exited with return code 1
```
<br>
<hr>
<br>
## Runtime configuration
> https://github.com/NVIDIA/enroot/blob/master/doc/configuration.md
>
| Setting | Default | Description |
| ------ | ------ | ------ |
| `ENROOT_LIBRARY_PATH` | `/usr/lib/enroot` | Path to library sources |
| `ENROOT_SYSCONF_PATH` | `/etc/enroot` | Path to system configuration files |
| `ENROOT_RUNTIME_PATH` | `${XDG_RUNTIME_DIR}/enroot` | Path to the runtime working directory |
| `ENROOT_CONFIG_PATH` | `${XDG_CONFIG_HOME}/enroot` | Path to user configuration files |
| `ENROOT_CACHE_PATH` | `${XDG_CACHE_HOME}/enroot` | Path to user image/credentials cache |
| `ENROOT_DATA_PATH` | `${XDG_DATA_HOME}/enroot` | Path to user container storage |
| `ENROOT_TEMP_PATH` | `${TMPDIR}` | Path to temporary directory |
- `/etc/enroot/enroot.conf`
- Taipei-01
```ini
/etc/enroot.conf
# Working directory for enroot:
ENROOT_RUNTIME_PATH /raid/local/containers/enroot/runtime/${SLURM_JOB_ID:-}
# Directory where container layers are stored:
# Enroot defaults are used for ENROOT_CACHE_PATH
# Directory where the filesystems of running containers are stored:
ENROOT_DATA_PATH /raid/local/containers/enroot/data/$(id -u)
# Path to temporary directory
#ENROOT_TEMP_PATH ${TMPDIR:-/tmp}
# Options passed to mksquashfs to produce container images.
ENROOT_SQUASH_OPTIONS -noI -noD -noF -noX -no-duplicates
# Mount the current user's home directory by default.
ENROOT_MOUNT_HOME no
# Path to user configuration files
ENROOT_CONFIG_PATH ${HOME}/.config/enroot
# Restrict /dev inside the container to a minimal set of devices.
ENROOT_RESTRICT_DEV yes
# Make the container root filesystem writable by default.
ENROOT_ROOTFS_WRITABLE no
# Options passed to zstd to compress digest layers.
ENROOT_ZSTD_OPTIONS -1
# Number of times network operations should be retried.
ENROOT_TRANSFER_RETRIES 5
# Maximum time in seconds to wait for connections establishment (0 means unlimited).
ENROOT_CONNECT_TIMEOUT 60
# Maximum time in seconds to wait for network operations to complete (0 means unlimited).
ENROOT_TRANSFER_TIMEOUT 1200
# Maximum number of concurrent connections (0 means unlimited).
ENROOT_MAX_CONNECTIONS 10
# Path to library sources
#ENROOT_LIBRARY_PATH /usr/lib/enroot
# Path to system configuration file
#ENROOT_SYSCONF_PATH /etc/enroot
# Gzip program used to uncompress digest layers.
#ENROOT_GZIP_PROGRAM gzip
# Remap the current user to root inside containers by default.
#ENROOT_REMAP_ROOT no
# Maximum number of processors to use for parallel tasks (0 means unlimited).
#ENROOT_MAX_PROCESSORS $(nproc)
# Use a login shell to run the container initialization.
#ENROOT_LOGIN_SHELL yes
# Allow root to retain his superuser privileges inside containers.
#ENROOT_ALLOW_SUPERUSER no
# Use HTTP for outgoing requests instead of HTTPS (UNSECURE!).
#ENROOT_ALLOW_HTTP no
# Include user-specific configuration inside bundles by default.
#ENROOT_BUNDLE_ALL no
# Generate an embedded checksum inside bundles by default.
#ENROOT_BUNDLE_CHECKSUM no
# Always use --force on command invocations.
#ENROOT_FORCE_OVERRIDE no
# SSL certificates settings
#SSL_CERT_DIR
#SSL_CERT_FILE
# Proxy settings
#all_proxy
#no_proxy
#http_proxy
```
<br>
<hr>
<br>
## Requirements
> https://github.com/NVIDIA/enroot/blob/master/doc/requirements.md
- ### 支援 Enroot 所需的功能(例如 cgroup 支援、BPF、namespace 等)
- ### 檢查 enroot 所需的前置條件
透過執行指定版本的 enroot-check bundle,自動檢查所需的條件是否符合。
```
$ curl -fSsL -O https://github.com/NVIDIA/enroot/releases/download/v3.5.0/enroot-check_3.5.0_$(uname -m).run
$ chmod +x enroot-check_*.run
$ ./enroot-check_*.run --verify
$ ./enroot-check_*.run
Bundle ran successfully!
```
- `$ ./enroot-check_*.run --verify`
> [enroot/src/bundle.sh](https://github.com/NVIDIA/enroot/blob/master/src/bundle.sh#L95)
```
ls "/proc/config.gz"
ls "/boot/config-$(uname -r)"
ls "/usr/src/linux-$(uname -r)/.config"
ls "/usr/src/linux/.config"
ls "/lib/modules/$(uname -r)/build/.config"
```
- 在主機環境測試,容器預設無掛載 host kernel config
- ### 測試環境:ESC4000 (.241)
- ### k8s / slurm cluster / compute node

```
root@gpu1080a-0:/tmp# curl -fSsL -O https://github.com/NVIDIA/enroot/releases/download/v3.5.0/enroot-check_3.5.0_$(uname -m).run
root@gpu1080a-0:/tmp# chmod +x enroot-check_*.run
root@gpu1080a-0:/tmp# ./enroot-check_*.run --verify
[ERROR] Could not find kernel configuration
root@gpu1080a-0:/tmp# ./enroot-check_*.run
Extracting [####################] 100%
Bundle ran successfully!
```
- ### host

```
$ ls "/proc/config.gz"
ls: cannot access '/proc/config.gz': No such file or directory
$ ls "/boot/config-$(uname -r)"
/boot/config-5.4.0-216-generic
$ ls "/usr/src/linux-$(uname -r)/.config"
ls: cannot access '/usr/src/linux-5.4.0-216-generic/.config': No such file or directory
$ ls "/usr/src/linux/.config"
ls: cannot access '/usr/src/linux/.config': No such file or directory
$ ls "/lib/modules/$(uname -r)/build/.config"
/lib/modules/5.4.0-216-generic/build/.config
```
- 有內容的檔案
- `/boot/config-$(uname -r)`
- `/lib/modules/$(uname -r)/build/.config`
- ### 如果把 host 檔案複製到 compute pod

```
root@gpu1080a-0:~# ./enroot-check_*.run --verify
Kernel version:
Linux version 5.4.0-216-generic (buildd@lcy02-amd64-014) (gcc version 9.4.0 (Ubuntu 9.4.0-1ubuntu1~20.04.2)) #236-Ubuntu SMP Fri Apr 11 19:53:21 UTC 2025
Kernel configuration:
CONFIG_NAMESPACES : OK
CONFIG_USER_NS : OK
CONFIG_SECCOMP_FILTER : OK
CONFIG_OVERLAY_FS : OK (module)
CONFIG_X86_VSYSCALL_EMULATION : OK
CONFIG_VSYSCALL_EMULATE : KO (required if glibc <= 2.13)
CONFIG_VSYSCALL_NATIVE : KO (required if glibc <= 2.13)
Kernel command line:
vsyscall=native : KO (required if glibc <= 2.13)
vsyscall=emulate : KO (required if glibc <= 2.13)
Kernel parameters:
kernel.unprivileged_userns_clone : OK
user.max_user_namespaces : OK
user.max_mnt_namespaces : OK
Extra packages:
nvidia-container-cli : OK
root@gpu1080a-0:~# cat /boot/config-$(uname -r) | grep CONFIG_VSYSCALL_EMULATE
root@gpu1080a-0:~# ldd --version
ldd (Ubuntu GLIBC 2.39-0ubuntu8.4) 2.39
Copyright (C) 2024 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
Written by Roland McGrath and Ulrich Drepper.
```
- ### 測試環境:DemoSite (5glab)
- ### k8s / slurm cluster / compute node

```
root@gpu1080a-0:/tmp# curl -fSsL -O https://github.com/NVIDIA/enroot/releases/download/v3.5.0/enroot-check_3.5.0_$(uname -m).run
root@gpu1080a-0:/tmp# chmod +x enroot-check_*.run
root@gpu1080a-0:/tmp# ./enroot-check_*.run --verify
[ERROR] Could not find kernel configuration
root@gpu1080a-0:/tmp# ./enroot-check_*.run
Extracting [####################] 100%
Bundle ran successfully!
```
- ### host

```
$ ls "/proc/config.gz"
ls: cannot access '/proc/config.gz': No such file or directory
$ ls "/boot/config-$(uname -r)"
/boot/config-5.15.0-142-generic
$ ls "/usr/src/linux-$(uname -r)/.config"
ls: cannot access '/usr/src/linux-5.15.0-142-generic/.config': No such file or directory
$ ls "/usr/src/linux/.config"
ls: cannot access '/usr/src/linux/.config': No such file or directory
$ ls "/lib/modules/$(uname -r)/build/.config"
/lib/modules/5.15.0-142-generic/build/.config
```
- ### Kernel configuration
> https://github.com/NVIDIA/enroot/blob/master/doc/requirements.md#kernel-configuration
- 必須啟用底下核心配置選項:
- `CONFIG_NAMESPACES`
- `CONFIG_USER_NS`
- `CONFIG_SECCOMP_FILTER`
- 為了匯入 Docker 映像或使用 enroot-mksquashovlfs
- `CONFIG_OVERLAY_FS`
- ### 測試環境:ESC4000 (.241)
- `/boot/config-$(uname -r)`
```
$ grep CONFIG_NAMESPACES /boot/config-$(uname -r)
CONFIG_NAMESPACES=y
$ grep CONFIG_USER_NS /boot/config-$(uname -r)
CONFIG_USER_NS=y
$ grep CONFIG_SECCOMP_FILTER /boot/config-$(uname -r)
CONFIG_SECCOMP_FILTER=y
$ grep CONFIG_OVERLAY_FS /boot/config-$(uname -r)
CONFIG_OVERLAY_FS=m
```
- `=y` 表示啟用
- `=m` 表示作為模組
- `# ... is not set` 表示未啟用。
- `/lib/modules/$(uname -r)/build/.config`
```
$ grep CONFIG_NAMESPACES "/lib/modules/$(uname -r)/build/.config"
CONFIG_NAMESPACES=y
$ grep CONFIG_USER_NS "/lib/modules/$(uname -r)/build/.config"
CONFIG_USER_NS=y
$ grep CONFIG_SECCOMP_FILTER "/lib/modules/$(uname -r)/build/.config"
CONFIG_SECCOMP_FILTER=y
$ grep CONFIG_OVERLAY_FS "/lib/modules/$(uname -r)/build/.config"
CONFIG_OVERLAY_FS=m
```
- 結果同 `/boot/config-$(uname -r)`
<br>
<hr>
<br>
## config
### `sysctl`
- ### `unprivileged_bpf_disabled`
- ### read
`$ sudo sysctl kernel.unprivileged_bpf_disabled`
- ### write
`$ sudo sysctl -w kernel.unprivileged_bpf_disable=0`
- ### file content
```
$ cat /proc/sys/kernel/unprivileged_bpf_disabled
0
```
- ### file permissions
```
ll /proc/sys/kernel/unprivileged_bpf_disabled
-rw-r--r-- 1 root root 0 Jun 26 10:24 /proc/sys/kernel/unprivileged_bpf_disabled
```
<br>
<hr>
<br>
## Slurm Containers Guide
> https://slurm.schedmd.com/containers.html
### Prerequisites (待確認)
The host kernel must be configured to allow user land containers:
```
sudo sysctl -w kernel.unprivileged_userns_clone=1
sudo sysctl -w kernel.apparmor_restrict_unprivileged_unconfined=0
sudo sysctl -w kernel.apparmor_restrict_unprivileged_userns=0
```
<br>
<hr>
<br>
## 討論
### 解釋 `enroot-check_*.run` 這個工具執行時會檢查哪些項目
- ### gpt-4.1-mini
好的,以下是 NVIDIA Enroot 的 `enroot-check` 工具執行時會檢查的主要項目,以及它們的意義和可能影響:
---
# Enroot-Check 工具檢查項目解析
1. ### Kernel Configuration (核心設定)
* **檢查內容**:系統核心是否編譯或啟用了 Enroot 運行所需的功能,例如 namespace 支援(userns、pidns、mntns 等)、cgroup 支援(尤其是 devices controller)、BPF 支援等。
* **影響**:若缺少這些功能,容器無法使用 Linux namespace 或無法管控設備存取,會導致容器啟動失敗或執行異常。
* **錯誤示例**:
```
[ERROR] Could not find kernel configuration
```
表示找不到核心設定檔或缺少必要功能。
2. ### cgroups (控制群組)
* **檢查內容**:確認 cgroup 的版本(v1 或 v2),以及是否支援必須的子系統,如 devices cgroup,因為 Enroot 會透過 cgroup 控制 GPU、設備的存取權限。
* **影響**:cgroup 設定不完整或權限不足會讓 Enroot 無法正確綁定 NVIDIA 裝置或限制容器權限。
3. ### BPF (Berkeley Packet Filter)
* **檢查內容**:確認系統是否允許未特權用戶使用 BPF 程式,並檢查 BPF 功能是否完整。
* **影響**:Pyxis(Slurm + Enroot NVIDIA 插件)會使用 BPF 來管理設備權限,若系統禁用了未特權 BPF,容器啟動會失敗。
4. ### Namespace (命名空間)
* **檢查內容**:確認 user namespace、PID namespace、mount namespace 等是否可用且沒有被系統安全政策禁用。
* **影響**:這些命名空間是容器隔離的核心,缺乏會讓容器無法正確隔離。
5. ### NVIDIA 驅動及相關工具
* **檢查內容**:確認 NVIDIA 驅動及 `nvidia-container-cli` 等工具是否安裝且版本正確。
* **影響**:驅動或工具版本不匹配,會導致 GPU 無法注入容器。
6. ### 文件系統支援
* **檢查內容**:確認系統文件系統(如 overlayfs)支援容器執行所需的 whiteout 文件等機制。
* **影響**:缺乏這些機制會造成映像合成或文件覆寫異常。
7. ### 權限檢查
* **檢查內容**:檢查當前使用者是否有足夠權限操作 cgroup、設備節點及必要的系統資源。
* **影響**:權限不足會讓容器啟動失敗。
---
# 進階說明
* **Kernel config 取得問題**
由於許多 Linux 發行版會不預設保留完整的 kernel config 檔案(如 `/proc/config.gz`),若找不到該檔案,`enroot-check` 會報錯,但這不一定代表核心真的缺少功能。
可手動確認是否有 `/boot/config-$(uname -r)` 或與系統版本相符的核心設定檔。
* **BPF 及 cgroup v2 的互動**
新版 Linux 趨向使用 cgroup v2,但 Pyxis 和 Enroot 在某些情況下仍依賴 cgroup v1 或特定子系統,版本混用時容易出問題。
* **解決方向**
* 如果 `enroot-check` 顯示缺少 kernel config,試著找核心設定檔,或確認是否需要安裝額外套件(例如 `linux-image-extra`)。
* 確認系統允許 unprivileged BPF(`sysctl kernel.unprivileged_bpf_disabled=0`)。
* 檢查 cgroup 驅動及裝置權限設定。
* 驗證 NVIDIA 驅動及工具是否正常。
---
如果你提供 `enroot-check` 的完整輸出結果,我可以幫你分析裡面可能的錯誤與修正建議。你覺得如何?