In the era of big data, computational pipelines have become indispensable for efficiently processing and analyzing vast amounts of data. With the advent of high-performance computing systems like Bridges-2, researchers now have access to unprecedented computing power and resources. However, designing and executing data-driven computational pipelines on such systems can be challenging. This presentation aims to explore the advantages and some use cases of three popular workflow management systems: NextFlow, Snakemake, and cwltool, all within the context of Bridges-2. These systems provide a streamlined approach to building scalable and reproducible computational pipelines for processing biological data. Additionally, we will discuss best practices for deploying these systems on Bridges-2, including resource management, job scheduling, and data management strategies. We will also address the challenges and potential solutions encountered when integrating these workflow management systems with Bridges-2’s unique features and constraints. By the end of this presentation, attendees will have a generic understanding of NextFlow, Snakemake, and cwltool, and how these frameworks can empower researchers to build robust and scalable data-driven computational pipelines on Bridges-2.
Speaker Biography: Ivan Cao-Berg is a research software specialist in the Biomedical Applications Group tinkering with technology in scientific related projects. At the moment, Ivan is involved in several projects HuBMAP, The Brain Image Library, SenNet and on occasion, with Bridges 2.
What are workflows?
In computational workflows, individual tasks or steps are organized in a logical order, where the output of one task serves as the input for the subsequent task. This allows for the creation of reproducible process that can be executed reliably and efficiently. Workflows can be designed to handle a wide range of tasks, including data processing, analysis, simulation, modeling, and decision-making.
There are different types of computational workflows, including procedural workflows, data-driven workflows, and model-driven workflows
Procedural workflows: These workflows follow a predefined sequence of steps or procedures. Each step specifies the input requirements, the processing to be performed, and the output produced. Procedural workflows are often used in scientific simulations or data processing tasks.