Skip to content

Commit 14897b9

Browse files
committed
feat: add way of excluding generation backends
Signed-off-by: Terry Kong <[email protected]>
1 parent 748b9ca commit 14897b9

File tree

3 files changed

+114
-38
lines changed

3 files changed

+114
-38
lines changed

docker/Dockerfile

Lines changed: 41 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -1,9 +1,20 @@
11
# syntax=docker/dockerfile:1
22
# Usage:
3-
# Self-contained build (default: builds from main): docker buildx build -f docker/Dockerfile --tag <registry>/nemo-rl:latest --push .
4-
# Self-contained build (specific git ref): docker buildx build -f docker/Dockerfile --build-arg NRL_GIT_REF=r0.3.0 --tag <registry>/nemo-rl:r0.3.0 --push .
5-
# Self-contained build (remote NeMo RL source; no need for a local clone of NeMo RL): docker buildx build -f docker/Dockerfile --build-arg NRL_GIT_REF=r0.3.0 --tag <registry>/nemo-rl:r0.3.0 --push https://github.com/NVIDIA-NeMo/RL.git
6-
# Local NeMo RL source override: docker buildx build --build-context nemo-rl=. -f docker/Dockerfile --tag <registry>/nemo-rl:latest --push .
3+
# Self-contained build (default: builds from main):
4+
# docker buildx build -f docker/Dockerfile --tag <registry>/nemo-rl:latest --push .
5+
#
6+
# Self-contained build (specific git ref):
7+
# docker buildx build -f docker/Dockerfile --build-arg NRL_GIT_REF=r0.3.0 --tag <registry>/nemo-rl:r0.3.0 --push .
8+
#
9+
# Self-contained build (remote NeMo RL source; no need for a local clone of NeMo RL):
10+
# docker buildx build -f docker/Dockerfile --build-arg NRL_GIT_REF=r0.3.0 --tag <registry>/nemo-rl:r0.3.0 --push https://github.com/NVIDIA-NeMo/RL.git
11+
#
12+
# Local NeMo RL source override:
13+
# docker buildx build --build-context nemo-rl=. -f docker/Dockerfile --tag <registry>/nemo-rl:latest --push .
14+
#
15+
# Optional build args to skip vLLM or SGLang dependencies:
16+
# --build-arg SKIP_VLLM_BUILD=1 # Skip vLLM dependencies
17+
# --build-arg SKIP_SGLANG_BUILD=1 # Skip SGLang dependencies
718

819
ARG BASE_IMAGE=nvcr.io/nvidia/cuda-dl-base:25.05-cuda12.9-devel-ubuntu24.04
920
FROM scratch AS nemo-rl
@@ -84,6 +95,9 @@ ARG MAX_JOBS
8495
ARG NVTE_BUILD_THREADS_PER_JOB
8596
# Only use for custom vllm installs. Learn more at https://github.com/NVIDIA-NeMo/RL/blob/main/docs/guides/use-custom-vllm.md
8697
ARG BUILD_CUSTOM_VLLM
98+
# Skip building vLLM or SGLang dependencies (set to any non-empty value to skip)
99+
ARG SKIP_VLLM_BUILD
100+
ARG SKIP_SGLANG_BUILD
87101

88102
ENV UV_PROJECT_ENVIRONMENT=/opt/nemo_rl_venv
89103
ENV UV_LINK_MODE=copy
@@ -113,8 +127,12 @@ fi
113127

114128
# The venv is symlinked to avoid bloating the layer size
115129
uv sync --link-mode symlink --locked --no-install-project
116-
uv sync --link-mode symlink --locked --extra vllm --no-install-project
117-
uv sync --link-mode symlink --locked --extra sglang --no-install-project
130+
if [[ -z "${SKIP_VLLM_BUILD:-}" ]]; then
131+
uv sync --link-mode symlink --locked --extra vllm --no-install-project
132+
fi
133+
if [[ -z "${SKIP_SGLANG_BUILD:-}" ]]; then
134+
uv sync --link-mode symlink --locked --extra sglang --no-install-project
135+
fi
118136
uv sync --link-mode symlink --locked --extra mcore --no-install-project
119137
uv sync --link-mode symlink --locked --extra automodel --no-install-project
120138
uv sync --link-mode symlink --locked --all-groups --no-install-project
@@ -131,6 +149,9 @@ WORKDIR /opt/nemo-rl
131149

132150
FROM hermetic AS release
133151

152+
# Re-declare build args for this stage
153+
ARG SKIP_VLLM_BUILD
154+
ARG SKIP_SGLANG_BUILD
134155
ARG NEMO_RL_COMMIT
135156
ARG NVIDIA_BUILD_ID
136157
ARG NVIDIA_BUILD_REF
@@ -151,7 +172,20 @@ COPY --from=nemo-rl --exclude=pyproject.toml --exclude=uv.lock . /opt/nemo-rl
151172
# Potentially not necessary if the repo is passed in as a complete repository (w/ full git history),
152173
# so do a quick check before trying to unshallow.
153174
RUN git rev-parse --is-shallow-repository | grep -q true && git fetch --unshallow || true
154-
RUN UV_LINK_MODE=symlink uv run nemo_rl/utils/prefetch_venvs.py
175+
RUN <<"EOF" bash -exu
176+
NEGATIVE_FILTERS=""
177+
if [[ -n "${SKIP_VLLM_BUILD:-}" ]]; then
178+
NEGATIVE_FILTERS="$NEGATIVE_FILTERS vllm"
179+
fi
180+
if [[ -n "${SKIP_SGLANG_BUILD:-}" ]]; then
181+
NEGATIVE_FILTERS="$NEGATIVE_FILTERS sglang"
182+
fi
183+
if [[ -n "$NEGATIVE_FILTERS" ]]; then
184+
UV_LINK_MODE=symlink uv run nemo_rl/utils/prefetch_venvs.py --negative-filters $NEGATIVE_FILTERS
185+
else
186+
UV_LINK_MODE=symlink uv run nemo_rl/utils/prefetch_venvs.py
187+
fi
188+
EOF
155189

156190
# Generate container fingerprint for frozen environment support
157191
# Store outside /opt/nemo-rl to avoid being overwritten by user mounts

docs/docker.md

Lines changed: 43 additions & 29 deletions
Original file line numberDiff line numberDiff line change
@@ -1,50 +1,64 @@
11
# Build Docker Images
22

3-
This guide provides two methods for building Docker images:
3+
This guide explains how to build the NeMo RL Docker image.
44

5-
* **release**: Contains everything from the hermetic image, plus the nemo-rl source code and pre-fetched virtual environments for isolated workers.
6-
* **hermetic**: Includes the base image plus pre-fetched NeMo RL python packages in the `uv` cache.
5+
The **release** image is our recommended option as it provides the most complete environment. It includes the base image with pre-fetched NeMo RL python packages in the `uv` cache, plus the nemo-rl source code and pre-fetched virtual environments for isolated workers. This is the ideal choice for production deployments.
76

8-
Use the:
9-
* **release** (recommended): if you want to pre-fetch the NeMo RL [worker virtual environments](./design-docs/uv.md#worker-configuration) and copy in the project source code.
10-
* **hermetic**: if you want to pre-fetch NeMo RL python packages into the `uv` cache to eliminate the initial overhead of program start.
11-
12-
## Release Image
13-
14-
The release image is our recommended option as it provides the most complete environment. It includes everything from the hermetic image, plus the nemo-rl source code and pre-fetched virtual environments for isolated workers. This is the ideal choice for production deployments.
7+
## Building the Release Image
158

169
```sh
1710
# Self-contained build (default: builds from main):
18-
docker buildx build --target release -f docker/Dockerfile --tag <registry>/nemo-rl:latest --push .
11+
docker buildx build -f docker/Dockerfile \
12+
--tag <registry>/nemo-rl:latest \
13+
--push .
1914

2015
# Self-contained build (specific git ref):
21-
docker buildx build --target release -f docker/Dockerfile --build-arg NRL_GIT_REF=r0.3.0 --tag <registry>/nemo-rl:r0.3.0 --push .
16+
docker buildx build -f docker/Dockerfile \
17+
--build-arg NRL_GIT_REF=r0.3.0 \
18+
--tag <registry>/nemo-rl:r0.3.0 \
19+
--push .
2220

2321
# Self-contained build (remote NeMo RL source; no need for a local clone of NeMo RL):
24-
docker buildx build --target release -f docker/Dockerfile --build-arg NRL_GIT_REF=r0.3.0 --tag <registry>/nemo-rl:r0.3.0 --push https://github.com/NVIDIA-NeMo/RL.git
22+
docker buildx build -f docker/Dockerfile \
23+
--build-arg NRL_GIT_REF=r0.3.0 \
24+
--tag <registry>/nemo-rl:r0.3.0 \
25+
--push https://github.com/NVIDIA-NeMo/RL.git
2526

2627
# Local NeMo RL source override:
27-
docker buildx build --target release --build-context nemo-rl=. -f docker/Dockerfile --tag <registry>/nemo-rl:latest --push .
28+
docker buildx build --build-context nemo-rl=. -f docker/Dockerfile \
29+
--tag <registry>/nemo-rl:latest \
30+
--push .
2831
```
2932

30-
**Note:** The `--tag <registry>/nemo-rl:latest --push` flags are not necessary if you just want to build locally.
33+
> [!NOTE]
34+
> The `--tag <registry>/nemo-rl:latest --push` flags are not necessary if you just want to build locally.
3135
32-
## Hermetic Image
36+
## Skipping vLLM or SGLang Dependencies
3337

34-
The hermetic image includes all Python dependencies pre-downloaded in the `uv` cache, eliminating the initial overhead of downloading packages at runtime. This is useful when you need a more predictable environment or have limited network connectivity.
38+
If you don't need vLLM or SGLang support, you can skip building those dependencies to reduce build time and image size. Use the `SKIP_VLLM_BUILD` and/or `SKIP_SGLANG_BUILD` build arguments:
3539

3640
```sh
37-
# Self-contained build (default: builds from main):
38-
docker buildx build --target hermetic -f docker/Dockerfile --tag <registry>/nemo-rl:latest --push .
39-
40-
# Self-contained build (specific git ref):
41-
docker buildx build --target hermetic -f docker/Dockerfile --build-arg NRL_GIT_REF=r0.3.0 --tag <registry>/nemo-rl:r0.3.0 --push .
42-
43-
# Self-contained build (remote NeMo RL source; no need for a local clone of NeMo RL):
44-
docker buildx build --target hermetic -f docker/Dockerfile --build-arg NRL_GIT_REF=r0.3.0 --tag <registry>/nemo-rl:r0.3.0 --push https://github.com/NVIDIA-NeMo/RL.git
45-
46-
# Local NeMo RL source override:
47-
docker buildx build --target hermetic --build-context nemo-rl=. -f docker/Dockerfile --tag <registry>/nemo-rl:latest --push .
41+
# Skip vLLM dependencies:
42+
docker buildx build -f docker/Dockerfile \
43+
--build-arg SKIP_VLLM_BUILD=1 \
44+
--tag <registry>/nemo-rl:latest \
45+
.
46+
47+
# Skip SGLang dependencies:
48+
docker buildx build -f docker/Dockerfile \
49+
--build-arg SKIP_SGLANG_BUILD=1 \
50+
--tag <registry>/nemo-rl:latest \
51+
.
52+
53+
# Skip both vLLM and SGLang dependencies:
54+
docker buildx build -f docker/Dockerfile \
55+
--build-arg SKIP_VLLM_BUILD=1 \
56+
--build-arg SKIP_SGLANG_BUILD=1 \
57+
--tag <registry>/nemo-rl:latest \
58+
.
4859
```
4960

50-
**Note:** The `--tag <registry>/nemo-rl:latest --push` flags are not necessary if you just want to build locally.
61+
When these build arguments are set, the corresponding `uv sync --extra` commands are skipped, and the virtual environment prefetching will exclude actors that depend on those packages.
62+
63+
> [!NOTE]
64+
> If you skip vLLM or SGLang during the build but later try to use those backends at runtime, the dependencies will be fetched and built on-demand. This may add significant setup time on first use.

nemo_rl/utils/prefetch_venvs.py

Lines changed: 30 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -22,20 +22,25 @@
2222
from nemo_rl.utils.venvs import create_local_venv
2323

2424

25-
def prefetch_venvs(filters=None):
25+
def prefetch_venvs(filters=None, negative_filters=None):
2626
"""Prefetch all virtual environments that will be used by workers.
2727
2828
Args:
2929
filters: List of strings to match against actor FQNs. If provided, only
3030
actors whose FQN contains at least one of the filter strings will
3131
be prefetched. If None, all venvs are prefetched.
32+
negative_filters: List of strings to exclude from prefetching. Actors whose
33+
FQN contains any of these strings will be skipped.
3234
"""
3335
print("Prefetching virtual environments...")
3436
if filters:
3537
print(f"Filtering for: {filters}")
38+
if negative_filters:
39+
print(f"Excluding: {negative_filters}")
3640

3741
# Track statistics for summary
3842
skipped_by_filter = []
43+
skipped_by_negative_filter = []
3944
skipped_system_python = []
4045
prefetched = []
4146
failed = []
@@ -47,6 +52,10 @@ def prefetch_venvs(filters=None):
4752
if filters and not any(f in actor_fqn for f in filters):
4853
skipped_by_filter.append(actor_fqn)
4954
continue
55+
# Apply negative filters if provided
56+
if negative_filters and any(f in actor_fqn for f in negative_filters):
57+
skipped_by_negative_filter.append(actor_fqn)
58+
continue
5059
# Skip system python as it doesn't need a venv
5160
if py_executable == "python" or py_executable == sys.executable:
5261
print(f"Skipping {actor_fqn} (uses system Python)")
@@ -88,6 +97,10 @@ def prefetch_venvs(filters=None):
8897
print(f" Skipped (filtered out): {len(skipped_by_filter)}")
8998
for actor_fqn in skipped_by_filter:
9099
print(f" - {actor_fqn}")
100+
if negative_filters:
101+
print(f" Skipped (negative filter): {len(skipped_by_negative_filter)}")
102+
for actor_fqn in skipped_by_negative_filter:
103+
print(f" - {actor_fqn}")
91104
if failed:
92105
print(f" Failed: {len(failed)}")
93106
for actor_fqn in failed:
@@ -202,6 +215,12 @@ def create_frozen_environment_symlinks(venv_configs):
202215
203216
# Prefetch multiple specific venvs
204217
python -m nemo_rl.utils.prefetch_venvs vllm policy environment
218+
219+
# Prefetch all venvs except vLLM-related ones
220+
python -m nemo_rl.utils.prefetch_venvs --negative-filters vllm
221+
222+
# Prefetch all venvs except vLLM and SGLang
223+
python -m nemo_rl.utils.prefetch_venvs --negative-filters vllm sglang
205224
""",
206225
)
207226
parser.add_argument(
@@ -211,6 +230,15 @@ def create_frozen_environment_symlinks(venv_configs):
211230
"contains at least one of these strings will be prefetched. "
212231
"If not provided, all venvs are prefetched.",
213232
)
233+
parser.add_argument(
234+
"--negative-filters",
235+
nargs="*",
236+
help="Filter strings to exclude from prefetching. Actors whose FQN "
237+
"contains any of these strings will be skipped.",
238+
)
214239
args = parser.parse_args()
215240

216-
prefetch_venvs(filters=args.filters if args.filters else None)
241+
prefetch_venvs(
242+
filters=args.filters if args.filters else None,
243+
negative_filters=args.negative_filters if args.negative_filters else None,
244+
)

0 commit comments

Comments
 (0)