NVIDIA GPU support in SingularityCE 3.9.0

By Staff

Oct 25, 2021 | Blog, News

Recently we announced the first SingularityCE 3.9.0 release candidate, and gave an update on community engagement in the project. Over the next weeks,leading up to the stable release, we’ll be exploring more features in depth.

One of the benefits of SingularityCE has always been easy access to GPUs and other devices. Because of our focus on shared systems, and HPC in particular, we’ve aimed to make it easy to use GPUs from inside the container within your batch jobs and interactive workflows. GPU devices are available to users in containers, just as they are on the host. The --nv option makes required libraries and utilities available to your containerized workload, so that CUDA applications run easily.

Up until SingularityCE 3.9.0, we’ve used our own code to set up GPU devices and libraries in the container. This has the advantage of requiring no external tools, but means SingularityCE itself needs to keep pace with additions to the set of CUDA libraries. GPU configuration also works differently than with OCI runtimes such as Docker.

We’ve recently added experimental support to set up GPUs in the container using NVIDIA’s nvidia-container-cli. This tool, which is part of the libnvidia-container project, is used in the nvidia-docker runtime, and by other OCI container platforms, to support CUDA compute and GPU graphics. As well as tracking driver and CUDA updates, nvidia-container-cli brings additional benefits. With large GPU nodes containing multiple GPU cards becoming more common, there’s a frequent need to run multiple containerized jobs on the same host, with each limited to a subset of the GPUs.

SingularityCE’s legacy GPU support doesn’t allow strict limits on GPU access. All devices are presented to every container, and containers must obey the CUDA_VISIBLE_DEVICES environment variable in order to distribute work to specific GPUs. nvidia-container-cli is able to manage which devices are bound into the container at setup time, making strict controls on GPU usage per container possible.

To use the new GPU support, you must have nvidia-container-cli installed on your system. Then, add the --nvccli flag wherever you use --nv. We’ve written the nvccli functionality so that it works similarly to the existing approach by default. We’ll make all GPUs available in the container, and supply the computer libraries, and utility programs (nvidia-smi).

More powerful control of GPUs is available when you use the --contain option in conjunction with --nvccli. SingularityCE will then look at the NVIDIA_VISIBLE_DEVICES environment variable, and only make specified GPUs available in the container:

$ export NVIDIA_VISIBLE_DEVICES=1,2
$ singularity run --contain --nv --nvccli mycontainer.sif

You can refer to GPU devices by number, or UUID. If you are working on an A100 system with virtual MIG devices then these can be assigned to containers.

Other environment variables control different aspects of GPU setup. For example, you might set the NVIDIA_DRIVER_CAPABILITIES environment variable to request graphics support, which will bind OpenGL/Vulkan libraries intothe container.

For now, --nvccli is experimental rather than the default. It doesn’t yet integrate with the --fakeroot option in setuid installs. Also, there are some rough edges with library discovery on specific Linux distributions, which are being tackled in the upstream project. We hope people who need the additional functionality provided will try it out, and let us know what tweaks would be useful as we consider whether to use nvccli by default in SingularityCE 4.0.

Check back next week for a look at more of the improvements coming in SingularityCE 3.9.0. As always, please let us know what you need for your workflows via our open SingularityCE roadmap, or our community channels

Related Posts