Sylabs continues to provide example Singularity containers. Our latest installment of updates is as follows: ContikiOSContiki is an open source operating system that runs on tiny low-power microcontrollers and makes it possible to develop applications that make...
Introducing HPC Affinities to the Enterprise: A New Open Source Project Integrates Singularity and Slurm via Kubernetes
TL;DR: “slurm-operator,” a new open source project, allows workload containerized by Singularity to be managed in one or more Slurm clusters directly from Kubernetes; the net result is a fully converged infrastructure for hybrid use cases involving a blend of services and compute-driven requirements.
We recently released an integration between Singularity, Kubernetes, and Slurm. This new open source project makes use of our existing integration between Singularity and Kubernetes via a Container Runtime Interface (CRI), and broadens its scope by incorporating Slurm – a workload manager (WLM) employed in HPC that is particularly adept at handling (for example) the Message Passing Interface (MPI) and distributed AI applications at the scale of supercomputers. The resulting convergence between services and HPC infrastructures is evident to end users whose compute-driven workloads are managed by Slurm from Kubernetes.
Use Case Example
As the following use case example illustrates, convergence is actually the case from a technical perspective.
In the simplest of HPC use cases, an application submitted for management by Slurm is regarded as a job. Through Slurm’s Command Line Interface (CLI), the workload manager is provided with the job’s requirements – from environment variables to required resources, scheduling considerations, and much more. (See the Slurm documentation on “sbatch” and “srun” for the details.) Because the specification of job-submission options can become quite cumbersome, frequent users of WLMs like Slurm capture the details in Bourne shell scripts that can be submitted directly. On a line-by-line basis, these scripts convey execution requirements to the WLM; for example
#SBATCH --nodes=1 --cpus-per-task=1
indicates that the non-interactive (aka “batch”) job should execute a single task per CPU on a single node. (The leading comment prevents “SBATCH” from being interpreted directly via the shell.) The outcome of running a job submission script is functionally equivalent to detailing options to “sbatch” or “srun” on the command line.
Slurm can execute a Singularity container directly; for example, for the locally available “lolcow_latest.sif” container,
srun singularity run lolcow_latest.sif
produces the expected whimsical output.
To leverage this new open source project, job submission specifics are captured via a YAML file; for example
apiVersion: slurm.sylabs.io/v1alpha1 kind: SlurmJob metadata: name: cow spec: batch: | #!/bin/sh ##SBATCH --nodes=1 --cpus-per-task=1 srun singularity pull -U library://sylabsed/examples/lolcow srun singularity run lolcow_latest.sif srun rm lolcow_latest.sif nodeSelector: slurm.sylabs.io/containers: singularity results: mount: name: data hostPath: path: /home/vagrant/job-results type: DirectoryOrCreate
In this YAML file, you can see some of the elements we’ve described above – e.g., the script embedded option for “sbatch.” Note also that a simple workflow has been captured, as three, independent “srun” invocations will be used to pull, execute, and finally remove the “lolcow_latest.sif” Singularity container. (In a subsequent revision, these invocations of “srun” could be made dependent upon each – e.g., so that container execution only proceeds if the image was successfully retrieved from the Singularity Container Library.) The remainder of the YAML file is used to capture additional specifics for executing this job via Slurm.
Once complete, the job is ‘submitted’ via the Kubernetes command line as follows:
$ kubectl apply -f examples/cow.yaml
If you’re already familiar with the Kubernetes “kubectl” CLI, you’ll be able to issue standard commands to determine the status of the job once the configuration has been applied; for those less familiar with this CLI, this project’s README provides the details. In other words, job status, output, and even dynamic logging of “stdout” and “stderr” during job execution are now available for your Slurm job via the Kubernetes CLI. Your is the operative word here, as jobs execute as the user, thus minimizing the potential for privilege-escalation vulnerabilities.
A prerelease demo of this integration was delivered by Ian Lumb during a talk he gave at the inaugural meeting of the Singularity User Group (SUG) in mid-March 2019. As you’ll note from the integration overview provided below, the project has evolved considerably from the time of Ian’s SUG talk.
The prerequisites for this integration are as follows:
- Kubernetes – a vanilla deployment based upon version 1.12 or more recent
- Singularity – owing to the need for compliance with the Open Containers Initiative (OCI) runtime specification, version 3.1 of the open source Community Edition or SingularityPRO 3.1
- Singularity CRI – the recently released v1.0.0-beta.1
- Slurm-operator – the v1.0.0-alpha.1 release of this project, see below for additional details
- Slurm – an existing or planned deployment based upon version 18.08 or more recent
Note that the specific versions identified above convey much more about our current testing matrix, than ‘hard’ constraints.
“slurm-operator” is the open source contribution of this project. Developed by Sylabs’ software engineers Vadzim Pisaruk, Sasha Yakovtseva, and Cedric Clerget, this project is comprised of three primary components:
- Red Box – a RESTful HTTP server written in the Go programming language that serves as a proxy between the project’s “job-companion” and the Slurm cluster itself.
- Resource Daemon – a service that ensures a consistent view of resource specification as well as utilization on an ongoing basis between Kubernetes and Slurm. The project employs extended resources to specify capabilities (e.g., “the maximum number allowed of simultaneously running jobs”) not accounted for through Kubernetes node labels (e.g., devices, plugins, architectures, …). Thus capabilities unique to a specific Slurm cluster become ‘known’ to Kubernetes.
- Operator – a Custom Resource Definition (CRD) and controller that extends Kubernetes for this project’s purpose.
Again, the project’s GitHub repository provides significantly more details including, of course, the source code itself.
Finally, as far as this technical overview is concerned, an architectural schematic is particularly helpful in providing additional context for the overall integration. From this schematic it is evident that a single deployment of Kubernetes can interoperate with one or more Slurm clusters – alongside a traditional deployment for services-oriented use cases orchestrated by Kubernetes.
Next-Gen Use Cases
As is often the case, milestones such as this alpha release of a brand-new open source project serve more as a starting point than a destination. Taken at face value, Singularity containerized applications and workflows can be ultimately managed in a Slurm cluster by the workload manager, though their details and control is all mediated via Kubernetes. Thus the immediate benefit then, is that (preexisting) Slurm clusters become ‘consolidated’ with the enterprise infrastructure orchestrated by Kubernetes – making the convergence claim of this integration tangible and of value.
Workload managers like Slurm however, excel at handling distributed processing. In classic HPC use cases, applications employing MPI can be scaled to the extreme on supercomputers by exploiting parallelism that embraces distributed memory. As frameworks for Deep Learning such as TensorFlow and PyTorch allow for distributed computing, either directly or by leveraging Horovod, the value proposition for workload management via Slurm rapidly multiplies. Why? HPC setups are typically predisposed towards being performant platforms for distributed computing at extreme scale – routinely featuring low-latency, high-bandwidth interconnects (e.g., InfiniBand), accelerators (e.g., GPUs), parallel file systems, and more. Through integrations such as this one, these ‘HPC affinities’ are rendered available and usable to even broader classes of use cases.
As we’ve been claiming for some time, the emergence of hybrid use cases is increasingly evident. Because these use cases involve streaming workloads, real-time analysis, and data pipelining into compute-focused services, they are inherently hybrid – and therefore demanding of the converged infrastructure enabled through this integration. To demonstrate just how tangible such hybrid use cases are in practice, Sylabs’ software engineer Carl Madison developed a demonstration that impressed attendees at DockerCon last week in San Francisco. (If you’re attending the Red Hat Summit this week in Boston, come and find us at booth #1133, as the demo will be available there as well.) While Carl’s demo doesn’t yet span Kubernetes to Slurm with multiple Singularity containers in real time, that’s definitely the direction he’s heading with it!
Finally, it’s important to note that the project announced here has emphasized Slurm as the workload manager. Workload management, by comparison to the container ecosystem, is an extremely mature area in terms of software lifecycle management – with some solutions boasting longevity over more than two decades at this point! From in-house to open source to commercial, offerings abound, and organizations can become quite polarized with respect to their preferences. As noted above, it is the red-box RESTful HTTP server that is lynchpin for this integration. In the case of the current project, a simple implementation capable of supporting a few endpoints caused us to develop our own implementation in golang. This approach could be replicated, or use could be made of existing REST APIs when available and appropriately useful. In other words, it wouldn’t require a tremendous amount of effort to support workload managers other than Slurm.
Whether it’s adapting this integration to work with other workload managers, or contributing to other projects in the container ecosystem, our last word remains true: we encourage you to get involved! If it’s the Singularity ecosystem where you might like to focus your contributions, the best placed to get started is here. We look forward to collaborating with you!
Here are a few more Singularity containers, that we at Sylabs have put together: CaffeCaffe is a Deep Learning framework. https://github.com/sylabs/examples/tree/master/machinelearning/caffeSingularity Definition File HorovodHorovod is a distributed-training framework...
Apache Spark is a fast, in-memory data processing engine applicable to large-scale data processing, for both batch and streaming machine learning that requires fast access to data sets. Apache Spark’s run-everywhere ethos is now taken a few steps further with the BYOE...