nf-core/blog : pipeline refactoring
With the ongoing migration to nf-test, our goal is to enhance the modularity and self-containment of each component in nf-core pipelines, including modules, sub-workflows, and workflows.
Key Enhancements:
- Parallel Testing: Implementing one test per file to facilitate parallelisation.
- Strict Tag Provenance: Ensuring rigorous tag provenance from modules to subworkflows and then to workflows. This approach aids in more targeted and efficient testing when code changes.
- Dedicated Folders: Assigning a specific folder for each main script, irrespective of whether they are modules, subworkflows, or workflows.
- Unified Script Naming: All main scripts for modules, subworkflows, and workflows will be named main.nf, residing in their respective folders.
- Tests Alongside Main Scripts: Placing tests in their own
tests/
folder alongside the main scripts.
- Distinct emits: Designating specific emits for modules, subworkflows, and workflows. This practice ensures clear differentiation of output channels for consistent snapshot creation.
Structure Example:
Each module, sub-workflow, and workflow script will be isolated in its folder with a dedicated main.nf:
Shift from modules.config
to Individual nextflow.config
:
For Modules
With the introduction of DSL2, there is a shift in how we manage process configurations for modules. Previously, configurations and tool options for modules were consolidated in a single modules.config
file. While this approach was functional, it lacked the granularity and isolation that complex pipelines demand.
To address this, we're migrating these configurations into separate nextflow.config
files for each module. This change has several key benefits:
- Enhanced Isolation: By having individual nextflow.config files for each module, we isolate configurations, reducing the risk of conflicts and increasing clarity.
- Granularity in Configuration: Separate config files allow for more detailed and module-specific settings, catering to the unique requirements of each module.
- Easier Maintenance and Updates: With configurations being module-specific, updating or maintaining a particular module becomes more straightforward, without the need to sift through a central, monolithic configuration file.
- Improved Flexibility and Reusability: Modules with their own nextflow.config can be easily shared and reused across different nf-core pipelines, enhancing flexibility and promoting a modular development approach.
For Subworkflows/Workflows
For integrating module configurations into sub-workflows, individual nextflow.config files from modules are included in the sub-workflow’s nextflow.config:
Considerations
[add any considerations or potential disadvantages of this approach]
new housekeeping nf-core sub-wfs:
- utils_nextflow_pipeline
- utils_nfcore_pipeline
- utils_nfvalidation_plugin
Changes to github CI
test-data params
params.modules_test_data_base
params.pipelines_test_data_base
conda declarations to have environment.yml
not including versions in the snapshots -> will have bot for bumping the versions