QA and Stability in Singularity

Oct 26, 2022 | Blog, News

There are many different approaches that can be taken when building software. At one end of the spectrum is the extreme caution and conservatism that’s appropriate, for example, of safety critical code used in vehicles or in real-time operating systems. At the other end is where ‘move fast and break things’ is taken too far for the comfort of the people and infrastructures that rely on the software. Sylabs release management focuses on operational stability for our users who are engaged in active workflows. We prioritize workflow stability as we iterate and build each new release.

Container runtimes like Singularity are foundational tools in HPC and AI/ML workflows. For some, the tool becomes more useful when features are rapidly added, for example to leverage new functionality in the Linux kernel. At the same time there can be wide-reaching implications when introducing new features as there might be side effects that are disruptive for those with a critical need for stability and reproducibility.

As we move toward Singularity 4.0, we’re working carefully to introduce significant new OCI functionality, and improved unprivileged container execution, without disrupting existing usage. When new features are being considered, or existing code requires change, the guiding questions asked include (but are not limited to) the following:

  • Will this make containers run differently, from a user’s perspective?
  • Are changes easy to understand/is there a clear benefit?
  • Should we make this change optional, or the default?
  • Is this compatible with older systems, which are widely deployed in the field?
  • Does this affect interactions with HPC frameworks (such as MPI) or hardware (such as Infiniband networking or GPUs)?
  • If there’s an unavoidable impact on users, how and when should we communicate that?

Often, the answers to these questions come out of the extensive QA work performed at Sylabs for SingularityPRO. Because the open source SingularityCE codebase becomes our long-term supported PRO releases, we routinely perform QA that isn’t easily possible in open CI/CD environments, such as:

  • Frequent testing across all currently supported versions of RHEL/Ubuntu LTS/SLES, including their native kernels (not dockerized tests).
  • Running builds and tests across AMD64, ARM64, and POWER architectures.
  • Executing multi-node MPI jobs across large Infiniband connected systems, which closely emulate deployed HPC clusters.
  • Using production grade GPU hardware.

At other times, the answers come from discussion with the SingularityCE community and our SingularityPRO customers. We’re always eager to hear what’s important to users, so we can prioritize features that will really make a difference while being aware of critical behavior to protect.

It is important for us to highlight the consistency of the release schedule, the stability of each release. This methodology continues to provide a differentiation from competitive offerings, while distinguishing Sylabs as a leader in the container ecosystem, for both open source and professional offerings. Singularity’s feature development originates from consumer feedback and from the observation of trends and needs within the container ecosystem. And yes, of course the lessons learned and experiences influence the testing and QA procedures in all of Sylabs software tools, including Singularity Enterprise and our SaaS offering of Singularity Container Services.

For Sylabs, as the longest-running provider of enterprise-level support, professional services and value-added tooling for Singularity, our approach will always favor the stability of existing usage. This level of consistency has been shaped through our history and in the interactions with our customers and community. We believe that for our customers and community, virtually all of which are users of high-performance computing systems who run vital data science, artificial intelligence and compute-driven analytics, this is the only approach to take to best serve their needs.

It takes years to build a consistent, stable set of tools and procedures to provide quality builds of software, this is the investment Sylabs makes and will continue to make in our products and the company’s future.

Related Posts