nf-core/genomeassembler input discussion

--- title: "nf-core/genomeassembler input discussion" tags: pipeline,genomeassembler,planning,plans --- # nf-core/genomeassembler input format discussion Genome assembly can be a complex process with a variety of data inputs and steps. We'd like to formalise how the workflow takes input here: ## TSV format Aimed at taking simpler input for less complex workflow execution: Proposed properties: - Should omit any one-to-many or many-to-one relations - Could accept comma separated lists though where neccessary - One row per assembly - Column names should be identical to YAML format ## YAML format Designed for more complex workflow executions. Scope: - Multiple samples - Assembler execution with varying parameters - What to subassemble/circularise? ```yaml samples: - id: My_awesome_species_1 assemblies: - id: assemblerX_build1 path: /path/to/assembly - id: assemblerY_build1 path: /path/to/assembly hic: - id: hic_assembly_1 path: /path/to/reads hifi: - path: /path/to/reads - path: /path/to/reads rnaseq: - path: /path/to/reads isoseq: - path: /path/to/reads tools: assembler1: options: - '--opt1 X --opt2 Y' - '--opt2 A --opt2 B --opt3 C' assembler2: options: - '--optx X --opty Y' - '--optx A --opty B --optz C' assembler3: true ``` - Should use [Pep Specification](http://pep.databio.org/en/latest/). ### Implementation details. - Tool options are configured using the `ext.args` process option. In order to use this in the configuration, the options must be passed with the `meta` input map/dictionary, and can then be accessed in the configuration using: ```nextflow process { withName: 'ASSEMBLER_X' { ext.args = { meta.assembler_options } } } ```