Skip to content

[Bug]: Pyxis + Enroot: Multi-layer images fail due to tmpfs whiteouts, PMIx mount errors for non-PMIx jobs, and NVIDIA hook EPERM (nvidia-persistenced/socket) — GPU unusable unless hooks disabled (OCI backend idea optional) #99

@narcotis

Description

@narcotis

Description

When launching containers via Pyxis + Enroot in a Slinky/Kubernetes-managed Slurm cluster,
we encountered three issues:


1) Multi-layer images (e.g., NVCR Triton) fail during whiteout conversion

Even after setting:

ENROOT_CACHE_PATH=/enroot-tmp/cache
ENROOT_DATA_PATH=/enroot-tmp/data
ENROOT_RUNTIME_PATH=/enroot-tmp/run
ENROOT_TEMP_PATH=/enroot-tmp/tmp

in the enroot.conf. /enroot-tmp is a emptyDir from Node Disk.

multi-layer images fail during AUFS → overlayfs whiteout conversion if Enroot internally uses /tmp:

enroot-aufs2ovlfs: failed to create opaque ovlfs whiteout:
/tmp/enroot.<id>/17/usr/src/python3.12/: Not supported

/tmp in the Slurm worker pods is an overlayfs/tmpfs, which cannot store overlay whiteouts/xattrs.
Simple images like alpine/ubuntu succeed; NVCR-based multi-layer images fail.

2) PMIx hook (50-slurm-pmi.sh) fails on non-MPI jobs

Slurm only creates PMIx directories when --mpi=pmix is used:

/var/spool/slurmd/pmix.<jobid>.<stepid>
/tmp/spmix_appdir_<uid>_<jobid>.<stepid>

But Pyxis’s PMIx hook always attempts to bind-mount these directories:

enroot-mount: failed to mount: /tmp/spmix_appdir_*: No such file or directory

→ All non-MPI jobs fail unless the PMIx hook is disabled.

With --mpi=pmix, PMIx works correctly.

3) NVIDIA hook (98-nvidia.sh) fails — GPU cannot be used at all

When GPU hooks are enabled:

nvidia-container-cli: mount error: mount operation failed:
/enroot-tmp/data/pyxis_<job>/run/nvidia-persistenced/socket: operation not permitted

Causing container startup to fail:

pyxis: couldn't start container
spank_pyxis.so: task_init() failed

If the hook is disabled, containers start — but no GPU is usable:

  • /dev/nvidia* devices appear, BUT:
    • nvidia-smi is not injected
    • NVIDIA driver libraries (libcuda.so, libnvidia-ml.so, etc.) are not mounted
    • torch.cuda.is_available()False

Because NVCR images rely on NVIDIA Container Toolkit to inject host GPUs and driver stack,
disabling the hook makes the container functional but CPU-only.

Steps to Reproduce

Environment

  • Slinky-managed Slurm in Kubernetes
  • Pyxis + Enroot 4.0.1
  • NVIDIA H100 nodes (NVRM 580.95.05, CUDA 13.0)
  • GPU plugin + NVIDIA toolkit installed
  • ENROOT directories on local NVMe (/enroot-tmp)
  • --mpi=pmix tested and verified
    (This is from my custom Docker Image which is built from login-pyxis:25.11-ubuntu24.04 and slurmd-pyxis.25.11-ubuntu24.04.)

1) Whiteout failure

srun --container-image <multi-layer-image> bash

2) PMIx hook failure

srun --container-image <any-image> bash # without --mpi=pmix

3) NVIDIA hook failure

srun --gpus 2 --mpi=pmix
--container-image <nvcr-image> bash

Expected Behavior

  1. Enroot should honor ENROOT_TEMP_PATH fully and avoid /tmp on tmpfs/overlayfs when extracting layers.
  2. PMIx hook should only run when the job actually uses PMIx (--mpi=pmix, or detect via environment).
  3. NVIDIA hook should succeed in injecting:
    • /dev/nvidia*
    • nvidia-smi
    • driver libraries (libcuda.so, etc.)
    • /run/nvidia-persistenced/socket
      without EPERM errors.

Overall expectation:

GPU-enabled containers should start correctly under Pyxis + Enroot without disabling core hooks.


Additional Context

A) tmpfs/overlayfs whiteout limitations

Pyxis still directs Enroot to use /tmp in some stages of import, even when ENROOT_TEMP_PATH is set.
On Kubernetes worker pods, /tmp is overlayfs → whiteouts fail → multi-layer images cannot be imported.

B) PMIx directory creation behavior

Slurm only generates PMIx directories for jobs using --mpi=pmix.
Hook should skip PMIx mounts for non-MPI jobs to avoid failures.

C) NVIDIA hook mount EPERM

Pyxis runs Enroot inside a user namespace; nvidia-container-cli configure requires privileged bind-mounts into the Enroot rootfs.
This fails with EPERM even when the Slurm worker pod is privileged: true.

D) GPU works only if hook is disabled

Disabling 98-nvidia.sh allows containers to start (including multi-layer NVCR images),
but GPU libraries are missing → CUDA unavailable inside container.


Optional Suggestion: Support OCI backend (rootless docker/podman)

Optional — not required for resolving the bug.

If other OCI backends using rootless Docker or Podman, several issues might be avoided:
(via oci.conf + Authtype=auth/slurm). From the slurm documentation, it says that Authtype=auth/munge only supports rootless Docker or Podman.

  • OCI runtimes already handle:
    • driver injection
    • mount permissions
    • whiteout semantics
  • Would reduce the need for Pyxis to perform privileged mount operations inside user namespaces
  • Could avoid tmpfs whiteout problems entirely
  • Might eliminate the need for custom Enroot NVIDIA hooks

Not required, but may offer a clean long-term architectural path.


If more logs (Pyxis debug, Enroot mount trace, NVIDIA Toolkit debug) would be helpful,
I can provide them.

Metadata

Metadata

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions