Try   HackMD

nf-core/blog : pipeline refactoring


With the ongoing migration to nf-test, our goal is to enhance the modularity and self-containment of each component in nf-core pipelines, including modules, sub-workflows, and workflows.

Key Enhancements:

  • Parallel Testing: Implementing one test per file to facilitate parallelisation.
  • Strict Tag Provenance: Ensuring rigorous tag provenance from modules to subworkflows and then to workflows. This approach aids in more targeted and efficient testing when code changes.
  • Dedicated Folders: Assigning a specific folder for each main script, irrespective of whether they are modules, subworkflows, or workflows.
  • Unified Script Naming: All main scripts for modules, subworkflows, and workflows will be named main.nf, residing in their respective folders.
  • Tests Alongside Main Scripts: Placing tests in their own tests/ folder alongside the main scripts.
  • Distinct emits: Designating specific emits for modules, subworkflows, and workflows. This practice ensures clear differentiation of output channels for consistent snapshot creation.

Structure Example:

Each module, sub-workflow, and workflow script will be isolated in its folder with a dedicated main.nf:

modules/nf-core/tool/subtool/ ├── main.nf ├── meta.yml ├── environment.yml └── tests ├── main.nf.test └── tags.yml
subworkflows/ ├── local │ └── tool-1_tool-2 │ ├── main.nf │ ├── main.nf.test.snap │ └── tags.yml └── nf-core └── tool-1_tool-2_tool-3_tool-4 ├── main.nf ├── meta.yml ├── nextflow.config └── tests ├── main.nf.test ├── main.nf.test.snap └── tags.yml
workflows/ ├── main.nf ├── nextflow.config └── tests ├── main.nf.test ├── main.nf.test.snap ├── parameter_1.nf.test ├── parameter_1.nf.test.snap ├── parameter_2.nf.test ├── parameter_2.nf.test.snap └── tags.yml

Shift from modules.config to Individual nextflow.config:

For Modules

With the introduction of DSL2, there is a shift in how we manage process configurations for modules. Previously, configurations and tool options for modules were consolidated in a single modules.config file. While this approach was functional, it lacked the granularity and isolation that complex pipelines demand.

To address this, we're migrating these configurations into separate nextflow.config files for each module. This change has several key benefits:

  • Enhanced Isolation: By having individual nextflow.config files for each module, we isolate configurations, reducing the risk of conflicts and increasing clarity.
  • Granularity in Configuration: Separate config files allow for more detailed and module-specific settings, catering to the unique requirements of each module.
  • Easier Maintenance and Updates: With configurations being module-specific, updating or maintaining a particular module becomes more straightforward, without the need to sift through a central, monolithic configuration file.
  • Improved Flexibility and Reusability: Modules with their own nextflow.config can be easily shared and reused across different nf-core pipelines, enhancing flexibility and promoting a modular development approach.

For Subworkflows/Workflows

For integrating module configurations into sub-workflows, individual nextflow.config files from modules are included in the sub-workflow’s nextflow.config:

includeConfig '../../../modules/nf-core/tool-1/subtool/nextflow.config' includeConfig '../../../modules/nf-core/tool-2/subtool/nextflow.config' includeConfig '../../../modules/nf-core/tool-3/subtool/nextflow.config'

Considerations

[add any considerations or potential disadvantages of this approach]


new housekeeping nf-core sub-wfs:

  • utils_nextflow_pipeline
  • utils_nfcore_pipeline
  • utils_nfvalidation_plugin

Changes to github CI


test-data params

params.modules_test_data_base

params.pipelines_test_data_base


conda declarations to have environment.yml


not including versions in the snapshots -> will have bot for bumping the versions