Introducing CDI Support to SingularityCE 4.0

By Staff
What is CDI?
The Container Device Interface (CDI) is a CNCF-sponsored project under active development. It aims to define a standard for injecting devices into containerised environments and introduces the concept of a Device as a named resource that can be requested by a client. A Device is uniquely identified by its fully-qualified name which has the form:vendor.com/class=name
nvidia.com/gpu=GPU-79a2ba02-a537-ccbf-2965-8e9d90c0bd54
runc
or crun
to make these available.- Container engines remain vendor-agnostic – a CDI enabled container engine will support CDI devices for any vendor as long as a valid CDI specification has been generated. The CDI project includes an API to perform specification validation and OCI Runtime Specification modification making it relatively straightforward for new engines to enable CDI. From a vendor perspective this also means that little to no work is required to allow customers to access devices in new CDI-enabled container engines.
- Modifications are easier to reason about – the entities defined in a CDI specification map to common command line options or kubernetes pod spec values making their resultant modifications clearer. The one exception is the hooks, which remain somewhat opaque in terms of actual modifications. Here the use of hooks with command lines that call out specific functionality such as updating the ldcache or creating a symlink are used to provide some insight.
- CDI specification generation is separate from the container lifecycle – a CDI-enabled container engine only needs to access the CDI specification once when creating a container. As long as the specification is valid at the point of creation, it doesn’t matter when it was generated. This allows for the CDI specification to be adjusted to a specific use case or platform. For example, in the case of an installation where devices remain static, a CDI specification can be generated once and reused for multiple containers. In cases where dynamic resources are required, the specification for a particular container can be generated “just-in-time” for it to be created.
OCI Support In SingularityCE
SingularityCE, a popular open-source container runtime for HPC workloads, led by Sylabs, has long supported running GPU enabled Docker/OCI containers on HPC systems that contain NVIDIA hardware. A large number of containers distributed through NVIDIA’s NGC Catalog directly support execution with Singularity.GPU Support In SingularityCE
To date, SingularityCE has provided support for using NVIDIA GPUs in containers in two ways:- A naive approach, that binds in specific NVIDIA libraries and binaries from the host into the container. This doesn’t require external software, and generally works on all systems. The same basic code can also be used for other vendor’s devices. However, it lacks support for more complex configurations that NVIDIA’s own tooling provides.
- A method that leverages nvidia-container-cli, from the NVIDIA Container Toolkit, to perform container setup. This allows more complex GPU configurations in the container, and support for additional environments such as GPU-enabled WSL2 on Windows. However, the approach is NVIDIA specific and means that Singularity depends on external software that may change outside of the constraints of a public standard, and that must be provided and maintained by system administrators.
- Complete implementation of CDI GPU configurations that NVIDIA’s container toolkit can specify.
- Compatibility and consistency with other container runtimes that support CDI.
- Vendor neutrality – any vendor’s device for which a CDI configuration can be provided can be used in the same way.
Demo – PyTorch
We’ll perform some AI model training on AWS EC2 using the NVIDIA GPU-Optimized AMI. This image runs Ubuntu 20.04 and suggests a p3.2xlarge instance, providing a 16GiB P100 GPU by default. A pre-release version of SingularityCE 4 was installed from a .deb package. Both .deb and .rpm packages are available on the GitHub releases page.$ nvidia-modprobe -u
$ sudo nvidia-ctk cdi generate --output=/etc/cdi/nvidia.yaml
…
INFO[0000] Generated CDI spec with version 0.5.0
$ sudo chmod 755 /etc/cdi
nvidia.com/gpu=0
.
Let’s check that it’s visible inside a container run with the --device
option:$ singularity exec --oci --device nvidia.com/gpu=0 \
docker://ubuntu nvidia-smi --list-gpus
INFO: Using cached OCI-SIF image
GPU 0: Tesla V100-SXM2-16GB (UUID: GPU-06505224-0c9b-8a89-1ee9-a4bd7d4e15c0)
$ singularity pull --oci docker://nvcr.io/nvidia/pytorch:22.05-py3
…
INFO: Converting OCI image to OCI-SIF format
INFO: Squashing image to single layer
INFO: Writing OCI-SIF image
INFO: Cleaning up.
--cwd
flag to run it as python main.py
from the correct place:$ export EXAMPLE_DIR=/opt/pytorch/examples/upstream/mnist
$ singularity exec --oci --device nvidia.com/gpu=0 --cwd $EXAMPLE_DIR \
pytorch_22.05-py3.sif python main.py
…
Train Epoch: 1 [0/60000 (0%)] Loss: 2.305988
Train Epoch: 1 [640/60000 (1%)] Loss: 1.823968
Train Epoch: 1 [1280/60000 (2%)] Loss: 1.573821
Train Epoch: 1 [1920/60000 (3%)] Loss: 1.145820
Train Epoch: 1 [2560/60000 (4%)] Loss: 1.572196
Train Epoch: 1 [3200/60000 (5%)] Loss: 1.592174
…
Development / Collaboration
The HPC Containers Advisory Meeting (README) is a forum for technical discussions across the wide containers and orchestration community, seeking to identify holes and opportunities and to drive technical solutions forward. CDI discussions started there back in September of 2020, with Renaud Gaubert. Evan Lezar significantly drove forward that work and presented it again in more detail there and to the Singularity Community in May of 2023.Resources
- NGC – https://catalog.ngc.nvidia.com
- CDI repo – https://github.com/container-orchestrated-devices/container-device-interface
- Singularity user guide for CDI – https://docs.sylabs.io/guides/latest/user-guide/oci_runtime.html#container-device-interface-cdi
Join Our Mailing List
Recent Posts
Related Posts
Upgrade CentOS 7 to Alma 8 While Keeping SingularityCE Updated
Overview With CentOS 7 reaching end of life on June 30th, 2024 and CentOS 8 already discontinued in favor of CentOS Stream, users of open source SingularityCE might find themselves in a situation where a migration to another open source operating system is necessary....
Transforming Alzheimer’s Research with Singularity Containers: A Milestone in Scientific Reproducibility
Addressing The Grand Challenges of Our Time Through Singularity Container TechnologyAt Sylabs, our mission and vision aren't just statements on a wall, they're an ethos we embody daily. We're committed to facilitating cutting-edge research that seeks to address...
Harnessing the Power of OCI-Compatible Singularity Containers for FAIR Computational Workflows
Discover the role of Singularity and OCI in making computational workflows FAIR. Understand how container technology streamlines data management and improves research efficiency across platforms.