<style> .reveal { font-size: 18px; } .reveal pre { 2 font-size: 20px; } .reveal section p { text-align: left; font-size: 18px; line-height: 1.2em; vertical-align: top; } .reveal section figcaption { text-align: center; font-size: 20px; line-height: 1.2em; vertical-align: top; } .reveal section h1 { font-size: 26pxem; vertical-align: top; } .reveal section h2 { font-size: 24px; line-height: 1.2em; vertical-align: top; } .reveal section h3 { font-size: 22px; line-height: 1.2em; vertical-align: top; } .reveal ul { display: block; } .reveal ol { display: block; } </style> ![](https://hackmd.io/_uploads/Hy0aurcl6.png) # Part 1: Elevating Scientific Computing with Singularity Containers Ivan E. Cao-Berg Research Software Specialist Pittsburgh Supercomputing Center Carnegie Mellon University --- ## Meet the Team and Introductions <img src="https://hackmd.io/_uploads/SyxxS39gp.png" width="50%" /> --- ## Before we begin - :warning: Have an issue or question? - Feel free to ask during the presentation, on chat or Slack - Send an email to the Help Desk `help@psc.edu` after the workshop - :computer: What is the project charge ID? - `cis230059p` - :computer: What is the reservation name? - `workshop` - :computer: Where can I find the code and data? - The code and data is located in `/ocean/projects/cis230059p/shared` - The code can be found in this [repo](https://github.com/pscedu/workflow-examples) - :computer:Where do I save my output? - You can save your output in `/ocean/projects/cis230059p/$(whoami)`. - :computer: Where can I find the docs? - You can find the documentation [here](https://hackmd.io/@icaoberg/Ske8b00oh). --- ## Before we begin (cont.) - :warning: Have an issue or question during the workshop? - Raise your hand on Zoom. message us on Slack or feel free to ask questions during the presentation and hands on --- ## Resources available during this experience * 30 regular-memory compute nodes that can be accessed using SLURM from the partition named `RM-shared` and reservation `workshop`. * If you do not wish to install software, then you can use OpenOnDemand to connect to Bridges 2 using the link `http://ondemand.bridges2.psc.edu` * To connect to Bridges 2 use the official [documentation](https://www.psc.edu/resources/bridges-2/user-guide/#:~:text=Using%20your%20ssh%20client%2C%20connect,username%20and%20password%20when%20prompted). --- ## What to expect * A gentle introduction to workflow management systems. * Instructions on how to set up your user account for NextFlow, Snakemake and CWL-runner. * Inspect and run some simple examples to get you started. * This presentation is in the context of a basic power user (take some of my statements with a grain of salt since some things might be doable with the support of PSC engineers). * We will monitor the Slack workspace for a week after the workshop for any questions or concerns. * The presentations, documentation and video recording will be made available. --- ## Motivation for this workshop ![](https://hackmd.io/_uploads/HknJ9mHxp.png) ---- ## Motivation for this workshop (cont.) - FAIR principles are used in data management and stewardship. - FAIR stands for **Findable**, **Accessible**, **Interoperable**, and **Reproducible**. - Generally FAIR principles are applied to data and metadata. - FAIR principles are crucial for advancing data-driven research and innovation. - Implementing FAIR practices enhances the overall quality and impact of scientific work. - **A commitment to FAIR principles contributes to a more open, collaborative, and reproducible research ecosystem.** --- ## Motivation for this workshop (cont.) ![](https://hackmd.io/_uploads/H1m7hGclp.png) --- ## Containerization in Computing A **container** is a lightweight, standalone software package that encapsulates everything needed to run an application, including code, runtime, libraries, and settings. --- ## Why is container technology popular? **1. Isolation** - *Lightweight:* Containers are lighter than virtual machines. - *Isolation:* Each container isolates its application and dependencies. **2. Portability** - *Consistency:* Containers run consistently across environments. - *Platform-agnostic:* Containers run on various platforms. **3. Efficiency** - *Resource Efficiency:* Containers share the host OS kernel. - *Fast Start-up and Scaling:* Containers start quickly and scale easily. **4. Flexibility** - *Polyglot Environments:* Supports multiple programming languages. --- **5. Resource Utilization** - *Optimized Resource Utilization:* Containers efficiently use resources. - *Density:* Many containers can run on a single host. **6. Security** - *Isolation:* Containers limit the impact of security breaches. - *Immutable Infrastructure:* Containers, with immutable infrastructure, enhance security. **7. Community and Ecosystem** - *Open Source Ecosystem:* Strong open-source communities. - *Standardization:* Containers are a standard unit of deployment. --- ## My Biased opinion about containers * Users do not have to wait for an engineer to install a tools system-wide * Users can install in their users space non-traditional applications, such as editors, utilities and more. * Users can deploy applications that may not be built using the toolkits available on Bridges 2 * Users can easily deploy applications that are no longer supported, outdated or deprecated --- ## What is Docker? * [Docker](https://www.docker.com/) is a popular **containerization platform** that simplifies the process of creating, deploying, and managing containers. * While Docker is very popular, most HPC clusters do not support Docker out of the box :no_entry:. * [Docker Hub](https://hub.docker.com/) is a **cloud-based registry** provided by Docker that serves as a centralized platform for managing and distributing Docker containers. * **uDocker** is a user-level tool designed to enable the execution of Docker containers without requiring escalated privileges. It serves as a user-space replacement for Docker in scenarios where running Docker itself is not possible due to limitations such as the lack of root access (does not work with every container). --- ## What is Singularity? * [Singularity](https://sylabs.io/singularity/) is an open-source container platform designed for high-performance computing (HPC) and scientific workloads. * Singularity is designed for high compatibility with various Linux distributions and HPC environments. * Singularity is relatively easy to use, especially for users familiar with containerization concepts. * Singularity containers generally introduce minimal overhead, making them suitable for high-performance computing tasks. * Singularity facilitates reproducibility by encapsulating the entire software stack and dependencies within containers. * Singularity can convert Docker images, enhancing the usability of existing containerized applications. * Singularity is well-suited for scientific workflows, particularly in research and data analysis. --- ## Limitations * Even though most software can be containerized, there are many pieces of software that will not work properly due to their implementation. * For example, this includes software that may require temp files in the container. * Some microservices can be deployed in Singularity, however orchestration using Singularity can be challenging. --- ## Docker vs Singularity | Feature | Singularity | Docker | | ------------------------------ | ----------------------------------------------- | ------------------------------------------------ | | **Use Case** | High-performance computing (HPC), Scientific workloads | General-purpose containerization | | **Compatibility** | Optimized for HPC environments | Versatile, used in various environments and platforms | | **User Privileges** | User-friendly, runs with user privileges | Typically requires administrative privileges | | **Container Format** | Single-file format (.sif) | Multi-layer image format | | **Daemon Requirement** | No daemon required | Requires a background daemon for running containers | | **Security** | Emphasizes security, user namespace feature | Strong security features, with namespaces and cgroups | | **Transport and Sharing** | Single-file container, easy to transport and share | Images can be shared via registries like Docker Hub | | **Integration with Docker** | Can run Docker containers | Natively supports Docker container execution | | **Popularity** | Commonly used in HPC and scientific communities | Widely adopted in the software development community | *Note: This table provides a general comparison based on common characteristics, and specific use cases may influence the choice between Singularity and Docker.* --- ## What do I need to build a Singularity container? 1. **Base Operating System Image** 2. **Definition File (Singularity Recipe)** 3. **Bootstrap Process** 4. **Environment Setup** Remember that Singularity simplifies many aspects of containerization, making it user-friendly and particularly suitable for high-performance computing environments. --- ## Before we continue * Sylabs provides licensing, enterprise-level support, professional services, cloud services, and value-added tooling for performance-intensive, mission-critical compute environments and edge deployments. * Apptainer is an open-source project with a friendly community of developers and users. The user base continues to expand, with Apptainer/Singularity now used across industry and academia in many areas of work. --- ## Exercises. Click [here](https://hackmd.io/@icaoberg/SkeHG6Kxa). ---
{"slideOptions":"{\"theme\":\"white\",\"transition\":\"slide\"}","title":"Elevating Scientific Computing with Singularity Containers - Singularity","contributors":"[{\"id\":\"95d26c43-541b-4d60-ba03-d5ba7942c504\",\"add\":23485,\"del\":13211}]"}
    260 views