22
33## Overview
44Users can compile their own software via the
5- [ Command Line Interface] ( ../overview.md ) (CLI). This is helpful, for example,
6- after introducing some changes or patches to the source code, or if users need
5+ [ Command Line Interface] ( ../overview.md ) (CLI). This is helpful if users need
76to run a specific version of an application that is not installed "globally".
8- Most of the globally installed applications are currently distributed as
7+ The globally installed applications are currently distributed as
98Apptainer[ ^ 1 ] (Singularity[ ^ 2 ] ) containers, bundled with all required
109dependencies. This ensures that each application is isolated and avoids
11- dependency conflicts. If you plan to run an application that is not installed in
12- our cluster, we encourage you to package your code and its dependencies as an
13- Apptainer/<wbr />Singularity container. If you already have a Docker image, it
14- can be converted into an Apptainer/<wbr />Singularity image.
10+ dependency conflicts.
1511
16- ## Experiment in Sandbox mode
12+ When planning to run an application that is not installed in
13+ our cluster, we encourage packaging code and its dependencies as an
14+ Apptainer/<wbr />Singularity container. Existing Docker images
15+ can be converted into an Apptainer/<wbr />Singularity images.
1716
17+ ## Using Sandbox mode
1818Apptainer's sandbox mode is helpful for testing and fine-tuning the build steps
1919interactively. To start it, first initialize a sandbox with ` --sandbox ` or ` -s `
2020flag:
21+
2122``` bash
2223apptainer build --sandbox qe_sandbox/ docker://almalinux:9
2324```
@@ -28,7 +29,7 @@ from the AlmaLinux 9 Docker image to a subdirectory named `qe_sandbox`.
2829Now, to install packages and save them to the sandbox folder, we can enter into
2930the container in shell (interactive) mode with write permission (use
3031` --writable ` or ` -w ` flag). We will also need ` --fakeroot ` or ` -f ` flag to
31- Install software as root inside the container:
32+ install software as root inside the container:
3233
3334``` bash
3435apptainer shell --writable --fakeroot qe_sandbox/
@@ -45,9 +46,9 @@ Once you are happy with the sandbox, have tested the build steps, and installed
4546everything you need, ` exit ` from the Apptainer shell mode.
4647
4748
48- ## Build container
49+ ## Building containers
4950
50- ### Build from a sandbox folder
51+ ### Build from a Sandbox folder
5152
5253We may either package the sandbox directory into a final image:
5354``` bash
@@ -134,6 +135,11 @@ along with its dependencies.
134135 4. Set runtime environment variables
135136 5. Build routine, under the `post` section
136137
138+ Now we are ready to build the container with:
139+ ``` bash
140+ apptainer build espresso.sif espresso.def
141+ ```
142+
137143### Build Considerations
138144
139145#### Running resource-intensive builds in batch mode
@@ -163,10 +169,71 @@ apptainer build espresso.sif espresso.def
163169#### Porting large libraries from the host
164170
165171Large libraries such as the Intel OneAPI suite and NVIDIA HPC SDK, which are
166- several gigabytes in size, can be mapped from our cluster host instead of
172+ several gigabytes in size, can be mapped from the cluster host instead of
167173bundling together with the application. However, this is not applicable if one
168174needs a different version of these libraries than the one provided.
169175
176+ This can be done by using the ` --bind ` directives and passing the appropriate
177+ library location from the host, e.g., from
178+ ` /cluster-001-share/compute/software/libraries ` or
179+ ` /export/compute/software/libraries/ ` .
180+
181+ See the GPU example below for more details.
182+
183+ #### Building containers with GPU support
184+
185+ To run applications with GPU acceleration, first, we need to compile the
186+ GPU code with appropriate GPU libraries used, which is done during the container
187+ build phase. Here, we will describe how we can compile our application code
188+ using NVIDIA HPC SDK (which includes CUDA libraries) and package the compiled
189+ code as a containerized application.
190+
191+ The process works even on systems without GPU devices or drivers,
192+ thanks to the availability of dummy shared objects (e.g.,
193+ ` libcuda.so ` ) in recent versions of the NVHPC SDK and CUDA Toolkit. These dummy
194+ libraries allow the linker to complete compilation without requiring an actual
195+ GPU.
196+
197+ NVIDIA HPC SDK (or CUDA Toolkit) is a large package,
198+ typically several gigabytes in size. Unless a specific version of CUDA is
199+ required, it’s more efficient to map the NVHPC installation available on
200+ the host cluster. Currently, NVHPC 25.3 with CUDA 12.8 is installed in the
201+ Mat3ra clusters. This version matches the NVIDIA driver version on the cluster's
202+ compute nodes.
203+
204+ We build our GPU containers in two stages:
205+
206+ 1 . ** Base Image and Compilation Stage** : Install NVHPC and all other
207+ dependencies, and compile the application code.
208+ 2 . ** Slim Production Image** : Create a final production container by copying
209+ only the compiled application and smaller dependencies (if any) into a new base
210+ image, omitting the NVHPC SDK.
211+
212+ To run such a container, we must ` --bind ` the NVHPC paths from the host and set
213+ appropriate ` PATH ` and ` LD_LIBRARY_PATH ` for apptainer. Specialized software
214+ libraries are installed under ` /export/compute/software ` in Mat3ra clusters.
215+ Also, to map the NVIDIA GPU drivers from the compute node, we must use the
216+ ` --nv ` flag. Now, to set ` PATH ` inside apptainer, we can set
217+ ` APPTAINERENV_PREPEND_PATH ` (or ` APPTAINERENV_APPEND_PATH ` ) on the host.
218+ However, for other ENV variables, such special Apptainer variables are not
219+ present, so we can use the ` APPTAINERENV_ ` prefix for them. So a typical job
220+ script would look like:
221+
222+ ``` bash
223+ export APPTAINERENV_PREPEND_PATH=" /export/compute/software/libraries/nvhpc-25.3-cuda-12.8/Linux_x86_64/25.3/comm_libs/12.8/hpcx/hpcx-2.22.1/hcoll/bin:/export/compute/software/libraries/nvhpc-25.3-cuda-12.8/Linux_x86_64/25.3/comm_libs/12.8/hpcx/hpcx-2.22.1/ompi/bin:/export/compute/software/libraries/nvhpc-25.3-cuda-12.8/Linux_x86_64/25.3/comm_libs/12.8/hpcx/hpcx-2.22.1/ucx/mt/bin:/export/compute/software/compilers/gcc/11.2.0/bin"
224+
225+ export APPTAINERENV_LD_LIBRARY_PATH="/export/compute/software/libraries/nvhpc-25.3-cuda-12.8/Linux_x86_64/25.3/comm_libs/12.8/hpcx/hpcx-2.22.1/hcoll/lib:/export/compute/software/libraries/nvhpc-25.3-cuda-12.8/Linux_x86_64/25.3/comm_libs/12.8/hpcx/hpcx-2.22.1/ompi/lib:/export/compute/software/libraries/nvhpc-25.3-cuda-12.8/Linux_x86_64/25.3/comm_libs/12.8/hpcx/hpcx-2.22.1/nccl_rdma_sharp_plugin/lib:/export/compute/software/libraries/nvhpc-25.3-cuda-12.8/Linux_x86_64/25.3/comm_libs/12.8/hpcx/hpcx-2.22.1/sharp/lib:/export/compute/software/libraries/nvhpc-25.3-cuda-12.8/Linux_x86_64/25.3/comm_libs/12.8/hpcx/hpcx-2.22.1/ucx/mt/lib:/export/compute/software/libraries/nvhpc-25.3-cuda-12.8/Linux_x86_64/25.3/comm_libs/12.8/hpcx/hpcx-2.22.1/ucx/mt/lib/ucx:/export/compute/software/libraries/nvhpc-25.3-cuda-12.8/Linux_x86_64/25.3/comm_libs/12.8/nccl/lib:/export/compute/software/libraries/nvhpc-25.3-cuda-12.8/Linux_x86_64/25.3/compilers/lib:/export/compute/software/libraries/nvhpc-25.3-cuda-12.8/Linux_x86_64/25.3/cuda/12.8/lib64:/export/compute/software/libraries/nvhpc-25.3-cuda-12.8/Linux_x86_64/25.3/math_libs/12.8/lib64:/export/compute/software/compilers/gcc/11.2.0/lib64:\${LD_LIBRARY_PATH}"
226+
227+ apptainer exec --nv --bind /export,/cluster-001-share < path-to-image.sif> pw.x -in pw.in > pw.out
228+ ```
229+
230+ To understand the details about library paths, one may inspect modulefiles (e.g.,
231+ ` /cluster-001-share/compute/modulefiles/applications/espresso/7.4.1-cuda-12.8 ` )
232+ available in our clusters and [ job scripts] (
233+ https://github.com/Exabyte-io/cli-job-examples/blob/main/espresso/gpu/job.gpu.pbs )
234+ to see how it is implemented. Do not forget to use a GPU-enabled queue,
235+ such as [ GOF] ( ../../infrastructure/clusters/google.md ) to submit your GPU jobs.
236+
170237
171238## Run jobs using Apptainer
172239
@@ -214,14 +281,8 @@ You can build containers on your local machine or use pull pre-built ones from
214281sources such as [ NVIDIA GPU Cloud] (
215282https://catalog.ngc.nvidia.com/orgs/hpc/containers/quantum_espresso ).
216283
217- If Apptainer is installed locally, build the container using:
218-
219- ``` bash
220- apptainer build espresso.sif espresso.def
221- ```
222-
223- Once built, you can push the image to a container registry such as the
224- [ GitHub Container Registry] (
284+ If the container is build locally, you can push the image to a container
285+ registry such as the [ GitHub Container Registry] (
225286https://docs.github.com/en/packages/working-with-a-github-packages-registry/working-with-the-container-registry ).
226287
227288``` bash
@@ -236,8 +297,8 @@ apptainer pull oras://ghcr.io/<user-or-org-name>/<namespace>/<container-name>:<t
236297
237298!!! tip
238299 - You may use GitHub workflow to build images and push to GHCR.
239- - When pulling a Docker image, Apptainer will automatically convert and save it as
240- SIF file.
300+ - When pulling a Docker image, Apptainer will automatically convert and save
301+ it as SIF file.
241302
242303Alternatively, you can copy the local image file directly to the cluster
243304via SCP:
0 commit comments