meta-omics meeting notes

# meta-omics meeting notes ## 2025-12-11 ### Attended James, David, Vangelis, Olga, Mirae, Nick ### Agenda #### Regular 1. [@Leads] Pipeline update round-robin 2. [@Leads] PR review trading 3. [@Leads] [Ideas board review](https://github.com/orgs/nf-core/projects/79/views/1) #### Specfic ### Minutes - Pipeline updates - James not much - David looking to find minor things in mag to start getting involved - Vangelis: various proteinfamilies (AWS full test problems), trying to get proteinannotator up to scratch - Olga: waiting for comments from Diego on MAG nf-test PR (today or tomorrow), will start working on some bacterial isolate modules probably bactopia - Mirae: working slowly on the FAIRY module (for mag) - Nick: waiting for latest version database compatibility with HUMANn4 by Biobakery (Janury) - currently not nf-core level, but making progress - Vangelis: reviewers wanted for adapter-removal subworkflow - James: requests for the new year? - Chaining? - DifferentialAbundance -> ## 2025-11-13 ### Attended ### Agenda #### Regular 1. [@Leads] Pipeline update round-robin 2. [@Leads] PR review trading 3. [@Leads] [Ideas board review](https://github.com/orgs/nf-core/projects/79/views/1) #### Specfic - ✅ createtaxdb needs a [release review](https://github.com/nf-core/createtaxdb/pull/133) & [detaxizer](https://github.com/nf-core/detaxizer/pull/88) - ✅ Hackathon de-brief: what progress was made? - ✅ Existing metagenomics interactive visualisation tool. - ✅ Opinions on MAG binners - ✅ Pipeline chaining discussions - ✅ SeqSubmit question from DAvid ### Minutes - Daniel volunteers to review createtaxdb, but looking for one more for detaizer - Hackathon debrief: - preprocessing subworkflow - seqsubmit - nf-test - Interactive visulisation tool suggestions? - Krona etc. - Daniel: QIIME has it's own set of tools, produces medium-nice figures but very nice and intuitive to look at - but not always the most helpful - Presented at Nextflow summit: https://depictio.github.io/depictio-docs/latest/0 - Binners - More efficient binners as default, 'better' but hungry resource as opt in - Pipeline chaining: on pause until lower level technical stuff worke daround - We aren't using SPAdes optimally - Daniel to forward to interested parties ## 2025-10-09 ### Attended 1. James 2. Vangelis 3. Daniel S. 4. Sofia 5. Lili 6. Nick W 8. Jim 9. David 10. Mirae ### Agenda #### Regular 1. [@Leads] Pipeline update round-robin 2. [@Leads] PR review trading 3. [@Leads] [Ideas board review](https://github.com/orgs/nf-core/projects/79/views/1) #### Specfic - (Nick?) Crowd-source opinions for funcprofiler - (James) Distribute info about exhibition day for _users_ ### Minutes - Trading: - Maybe James (mag, aDNA and BigMAg)and Vangelis (Proteinfamilies) - Hackathon projects: - Question marks about pipeline chaining - Lili asked about time line - (Jim) Or, like you said, produce one big "meta" samplesheet (JSON format?), but then have a tool that processes it (subsets it) to the required data for any given pipeline - Please advertise the community exhibition day - funcprofiler - Read-based functional profiling (a kin to taxprofiler, but for function, a kin to funcscan but with short reads) - Opinions on how to structure the database samplesheet (most complicated, where it is more different from taxprofiler as some tools nee dmore databases) - HUMAn3 requires three databases (metadphlan, protein, nucleotides) - Most people saw the logic, but felt it may not be so intuitive - Different alternative sproposed: - Drop csv entirely, use parameters - Long table - Yaml (more like a config file, as it shouldn't change much) - Current ## 2025-09-11 ### Attended - James - Daniel - Sofia - Vangelis - Martin - Varsha - Nick W ### Agenda #### Regular 1. [@Leads] Pipeline update round-robin 2. [@Leads] PR review trading 3. [@Leads] [Ideas board review](https://github.com/orgs/nf-core/projects/79/views/1) #### Specfic - Meta-\*omics exhibition day volunteers! - Any further hackathon ideas? - MGnify pipeline/subworkflow ideas (Evangelos and Martin) ### Minutes - James advertised the exhibition day - Who are interested in demoing, ca. 2 hours? - Doesn't have to be nf-core (but public) - Hackathon in October - Pipeline chaining - Preprocessing of reads: Subworkflow - Submission of data to ENA et al. - Vangelis plans to prioritize reads to ENA, then comes other data types and target dbs - Several parts of the proposal were discussed ## 2025-08-13 ### Attended - James - Daniel - Jim - Olga - Lili - Vangelis - Nick ### Agenda #### Regular 1. [@Leads] Pipeline update round-robin 2. [@Leads] PR review trading 3. [@Leads] [Ideas board review](https://github.com/orgs/nf-core/projects/79/views/1) #### Specific 1. BCN Hackathon project ideas? - Pipeline chaining? - Tutorial writing? 4. Minor: are people willing to try an alternative to gather.town? 5. Yes! work.adventure and zep.us? 6. Jim: Anyone know who to ask about the API at ENA? 8. Controlled vocab for `meta` fields ### Minutes - Round-robin - James: taxprofiler (template merge)/createtadb (raging at new module nf-tests), mag - Nick: funcprofiler (starting out, where do test-datasets go?) - Vangelis: proteinfamilies interested in chaining to proteinannotator and proteinfold - Daniel: metatdenovo (preparing new release, based on comments on manuscript) & switching magmap - Jim: nf-core modules for binning - Olga: used fetchngs but can't currently use for generating ampliseq samplesheet -> maybe will work on it! - Lili: metaval -> waiting for a review on pipeline level nf-test implementation; issue with MultiQC; will work on taxprofiler to metval pipeline chaining - Review trading - Daniel will need next week a release review - Lili might need an nf-test review - BCN hackathon: - 2-3 attends in person; Vangelis/Sofia/Lili online - James: Pipeline chaining - James: Tutorials for user demo - Online in addition to BCN: Do it simple(?) - Ask layout at venue -> smaller side rooms? - Try alternative to gather: Yes from all (most?) - ENA APT (Jim): Vangelis know people at ENA; Vangelis is our man on the inside - Standardize fields for the `meta` object (documentation: https://nf-co.re/docs/contributing/components/meta_map#advanced-pattern): - meta.id: sample name - meta.single_end: Illumina data has 1 or 2 FASTQ files - meta.strandedness: strandeness of DNA - meta.assembler: name of assembly software - meta.tool: as umbrella tool (if we allow nesting), and within which stores e.g. assembler or binner (customisable) - meta.binner: name of genome binning software - meta.tool: as umbrella tool (if we allow nesting), and within which stores e.g. assembler or binner (customisable) - meta.chunk: number (integer) of a chunk when splitting a file for running in parallel before merging again -> helps with tagging processes and naming files (test_1.faa, test_2.faa, etc.) - meta.raw_id: original ID prior cleanup - meta.reference_id: name of the reference (but many people currently just embed it in `meta.id`) - meta.db_id: name of the database (but many people currently just embed it in `meta.id`) - or `meta.target` ? - meta.study_id: name of the study (but many people currently just embed it in `meta.id`) - meta.contig_id: name of a contig (but many people currently just embed it in `meta.id`) ### Action Points ## 2025-04-10 ### Attended - James - Daniel - Jasmin - Samuel - Eray - Sofia - Jim - Evangelos - Olga ### Agenda #### Regular 1. [@Leads] Pipeline update round-robin 2. [@Leads] PR review trading 3. [@Leads] [Ideas board review](https://github.com/orgs/nf-core/projects/79/views/1) #### Specific 5. [@James] How was the hackathon for everyone? 6. [@James] Meta-omics nf-test hackathon June 11-12th! 7. [@Daniel] Guidelines for database downloads (e.g. GTDBTK) 9. ... ### Minutes - Round robin - James: Mostly mag, but also bug fixes for mag internal ancient DNA pipeline paper - Jim: Released NN, work on mag - Evangelos: proteinfamilies; added roadmap issues with potential future enhancements, MGnifams pipeline update work, nf-core/denovotranscript pipeline updates in the works (discuss with Daniel for potential overlap) - Sofia: nanopore classifiers and more - Olga: Has problems with GTDB-Tk (mag, I assume) + functional annotation - Samuel: 16S pipeline, not Ampliseq, HIV for viralrecon - Eray: Interested in learning - Daniel: metatdenovo, with possible upcoming release (occasioanlly touching magmap/phyloplace) - Jasmin: funcscan minor fixes, but proposal for a new subworkflow for mobile-genetic elements - Trading - Evangelos/Daniel/James - Ideas board - Sam: Static reports? James points to #nf-core-reports on slack, and Daniel points out ampliseq has this - Hackathon - Evangelos: mostly training newcomers, pairing people up to write nf-tests for nf-core/proteinfamilies local modules; fewer people came than expected (15 signed up, 9 came) - Olga: Had previous Nextflow experience, so first nf-core hackathon (nf-core standards new, nf-tests), not so many people in Heidelberg but good communication - Sofia: online only (not so fun...), but went smoothly, introducing modules to new people went well and was productive - Jim: at sanger about 20, converted local to nf-core modules -> no complaints, maybe a bit more collaborative would be nice (each person was working on their own thing), - James reminded about the June nf-test hackathon - Guidelines for database downloads - Daniel got a request from Edmund -> don't have process for downloading database (EggNog), use staging (i.e., let Nextflow handle it) - We have a storeDir parameter for module (when already downloaded, it won't download it again) - Edmund's justification: on offline nodes this process will not work (but we often offer a way of giving a local path) - Sam: is there a way to declare dependencies in advance in a machine readable manner? e.g. requirements.txt but for databases - James: maybe have offline section in the `usage.md` docs to force pipeline developers to document this - Daniel: vaguely remembers that `nf-core pipelines download` for offline had an idea to do this - include all the databases (e.g. a little makefile that it reads) - Sam: this would be important for IVDR and provenance in clinical settings - GTDB - Olga: Integration of GTDB-Tk into bacass - Olga: Functional annotation -> funcscan (cazyme, dbCan, abrocate(?), traitor (prokaryotic traits)) ### Action Points - [ ] Evangelos/James/Jasmin to send Eray module suggestions for learning - [ ] James/Daniel/Evangelos to trade small release reviews ## 2025-03-13 ### Attended - Daniel - James - Kauthar - Evangelos - Jim - Joon - Lili - Jasmin - Daniel Straub - Jasmin - Sofia - Samuel ### Agenda 1. Pipeline update round-robin 2. PR review trading 3. [Ideas board review](https://github.com/orgs/nf-core/projects/79/views/1) 4. Hackathon preparation 5. Next metaomics hackathon: nf-test for everyone's pipelines? ### Minutes - Round robin - Taxprofiler (Sofia): patch release with KrakenUniq resume fixing (through batch sorting), otherwise working on longread stuff - Kauthar: interested in contributing StrainPhlAn to taxprofiler - Metaeval (Lili): adding BLAST to metaval - [internal long-read metagenome(https://github.com/sanger-tol/metagenomeassembly) pipeline (Jim) - Inspired by mag but pacbio only, and HiC short reads - proteinfamilies (Evangelos): a few minor patch/module fixes, and checking for bottlenecks - otherwise working on 'main job' internal pipeline (for MGnify) - funcscan (Jasmin): funcscan 2.1 now out :tada:, fixed some nf-test issues, and now benchmarking on various HPCs (getting config experience) - ampliseq/bacass (Daniel S.): tech support at the moment - metatdenovo (Daniel L.): two releases! Mainly new taxonomy assignment procedure. Danielo working on magmap review comments (will open a new PR probably) - custom viralgenie (Joon): looking for good eukaryotic virus gene annotator, but not much luck (going with defualt of Prokka). Converting variables form config to parameters - PR review: Daniel L and Danilo will have a first version review for magmap within ca. two weeks - Ideas - Hackathon - Strong interest for a pipeline nf-test hackathon - Also could use time to develop documentation - Aim for June - Open round: - Lili: does anyone work with HIV drug resistant tests (at sequencing level) - Joon can ask around - Joon: interested what the 'clinical metagenomics' idea is ### Action Point - Everyone add projects to the webpage for hackathon - Kauthar and Sofia talk about taxprofiler and strainphlan - James to send poll for days in June for nf-test ## 2025-02-13 ### Attended - James - Daniel - Jennifer - Samuel - Jasmin - Evangelos - Sofia ### Agenda 1. Brief reports on progress on our pipelines - Hackathon! - Who is going? - what are you working on? - common project ideas? - anyone want any support? - Return of the chain! - @joon-klaps reports the Nextflow-level bugs might be resolved, we can restart the chaining project - anyone interested? ### Minutes - Evangelos and James will do introductory teaching at the Hackathon (Athens and Berlin) - Chaining: Use output directly from channels rather than output from processes. See publish output in Nextflow (https://www.nextflow.io/docs/latest/workflow.html#publishing-outputs). (Edge release of Nextflow possibly needed) - Samuel mentioned he's looking at ideas for provenance tracking of files. Still WIP/research project but possibly connected to `nf-prov`. - James is working on making a smaller taxon dump - Jennifer applied Bakta on lots of isolates of a bacterial species (want to find genetic features causing pathogenicity in cohort), but gene names can't be used because they get heterogeniously updated in each strain (gene name is based on each annotation tool) - does anyone have any experience with this problem when trying to do gene enrichment. UniProt IDs have been recommended, but this has their own issues. - Evangelos suggested trying: https://biit.cs.ut.ee/gprofiler/orth or https://pavlopoulos-lab-services.org/shiny/app/flame - Daniel remembers a poster on an nf-core pipeline using OrthoFinder - Samuel: maybe ENSEMBL IDs might be better (as they are genes rather than proteins) -> https://ensembl.org/info/genome/stable_ids/index.html - Pipeline updates: - mag: release today - createtaxdb: progressing, battle testing with Sam Wilkinson - taxprofiler: sofia did template merge, sofia also working on sylph, open PR improving resume by sorting krakenuniq inputs prior batching - funcscan: release tomorrow :tada: - proteinfamilies: first release a couple of weeks ago! Working on minor fixes. James forgot to Zenodo update :eyes:, working on workflow benchmarking figure - metatdenovo: close to first release! - magmap: template updates and more for 1.1 release, PR out now: https://github.com/nf-core/metatdenovo/pull/334, needs one more reviewer - phyloplace: 2.0 release PR soon, major rework - PR reviewing trade - Jasmin - Daniel - Danilo ## Action Items ## 2025-01-09 On Zoom ### Attended - James F. Y. - Danilo D. L. - Samuel L. - Daniel L. - Martin B. - Jon B. - Sofia S. - Evangelos K. ### Agenda 1. Brief reports on progress on our pipelines 2. Discussion/brain storm on how we can reach more people, 3. Ask the community - diamond - mag/magmap integration(?) - Samuel/Sofia nanopore 4. (Carson) AllTheBacteria overview, and ideas ### Minutes 1. Project updates: - magmap (Danilo): first release review processing (big thing: duplicate contig names e.g. from two samples from metaSPAdes, updating contig names) - metatdenovo (Danilo/Daniel): want to switch taxonomic profiling (ukelele -> diamond, problems with ukelele database building) - (Samuel): Exploring alternatives to Emu for 16S nanopore data analysis due to some worries about sustainability, any ideas for alternatives -> depends on the quality of the data more than anything -> Daniel suggests try benchmarking; Jon suggests looking at a different tool ([CONCOMPRA](https://github.com/willem-stock/CONCOMPRA)) - phyloplace (Daniel): still not finished stuff from before, so slow going - taxprofiler (Sofia): finished template merge with Lili, testing fixes to the download pipeline CI tests - proteinfamilies (Evangelos): preparing pseudo PR for first release! Altering a few bits to do with sequence coordinates vs evelope - createtaxdb/mag/funcscan James: createtaxdb missing one tool otherwise last final tweaks/fixes before starting first release, mag - managing new contributors and trying to get stuff ready for adding pipeline level tests, funcscan fixes to MMSeqs by Anan and Jasmin and preparing release with new ampcombi 3. Brainstorming: - Sofia: maybe if people add it to the calendar, but google invite links aren't working -> find a better way - Samuel: scope not clear - Jon: advertise more regularly outside of nf-core slack; people are more interested in just learning _how_ they do it (examples, practical hands-on). - Daniel: 'bytesize' short presentations -> invite other people (e.g. we see a new tool out, invite to talk) 5. Ask the community - Evangelos: proteinfamilies dev branch is not protected, can he push stuff directly to dev prior first release (and James asks if we can [update docs](https://nf-co.re/docs/tutorials/adding_a_pipeline/move_to_nf-core_org#repository-setup)) - Daniel: DIAMOND database creation with custom files/taxonomies? Add to metatdenovo or leave to createtaxdb, and where to host databases (general conclusion: leave to createtaxdb versions shouldn't update to often anyway, James interested to hear about SciLife data center about hosting) ## 2024-11-12 On gather.town ### Attended - James F.Y. - Daniel L. - Maxime G. - Evangelos V. - Lili A.-L. - Samuel L. - Carson M ## Minutes - Announcements - James got a grant to work more on this stuff! - Discussion on writing training, tutorial documentation - Should we do it? - Exhibtion days? True training days? - Main question: how detailed (Sarek level), vs Taxprofiler - without too much maintainence burden - Tutorials - Live execution? - Examples are important! - Add more documentation how to use output - Pipeline updates - James: (eager almost DSL2! but in hands of Thiseas Lamnidis), mag(juggling lots of plates: bug fixes, custom to official DSL2 conversion, managing new contributions, dealing with new databsaes etc), funcscan (fixing mmseqs, writing paper), - Maxime: could we make Viralrecon metagenomics - Daniel: magmap, phyloplace (slowly adding ability to add less-well defined proteins) - Evangelos: close to release of proteinfold, adding one last major section - Lili: taxprofiler (+ meteval) - Q: moving metaval to nf-core? Discussion? - Carson: phyloplace almost there just need to find time to wrap it up - Interesting preprint about AllTheBacteria shared by Carson - https://www.biorxiv.org/content/10.1101/2024.03.08.584059v3.full - Big multi-institutional project, 'nf-core could do it better' - We will talk more in a later meeting :+1: - Question from Samuel - Adding nanopore workflow into ampliseq? ## 2024-10-10 On gather.town ### Attended - James Fellows Yates - Daniel Lundin - Carson Miller - Daniel Straub - Joon Klaps - Lili Andersson-Li - Sofia Stamouli - Jasmin Frangenberg - Evangelos Karatzas - Martin Berachochea ### Agenda - meta-omics hackathon update - General meta-omics feedback: what's missing, what's not working etc ### Minutes - meta-omics hackathon update: - Making good progress (Joon, James, Sofia, Lili, Evangelos, Jasmin): https://github.com/orgs/nf-core/projects/81/views/1 - Finding good edge cases and rapid iteration - Daniel S.: What about fetchNGS support (e.g. for Ampliseq) - James: [different way of doing it](https://github.com/nf-core/fetchngs/blob/master/modules/local/sra_to_samplesheet/main.nf), could in future harmonise but not within scope of the hackathon - Discussed some more concepts - Daniel L.: Modularising the parsing code and loading these in upstream pipelines to then process (so don't need to custom code every single time). Maybe, but would need more thinking - General discussion - Big issue for Carson: huge number of files being generated (e.g. fetchNGS, mag) - solutions or work around? - James pointed out nf-boost, but apparently not enough - Evangelos suggested custom modules that do multiple steps in one with additional clean up (downside of nf-core one-tool-one-module approach) - General discussion on approaches - Lili said she was hitting issues on tools failing because not enough data - James pointed to `errorStrategy` to pick up certain error codes - Martin (and Daniel in Ampliseq) said instead he has filtering steps to ensure low quality files don't reach the process in the first place, e.g. using `countFastq()` to get the number of reads (but be aware that this function runs on the headnode, which might not be great for large files!) - Evangelos suggested also making bash or python wrappers around the command itself in the module, so you can customise the exit code behaviour (either different exit code, or pick up error messages and give more specific exit code) ### TODO ## 2024-09-11 ### Attended - James Fellows Yates - Daniel Lundin - Carson Miller - Joana Carlevaro Fita - Daniel Straub - Jacobo de la Cuesta - Jennifer Müller - Joon Klaps - Lili Andersson-Li - Sofia Stamouli - Ryan Teo - Jasmin Frangenberg ### Agenda - Introductions - Regular meetings - https://www.when2meet.com/?26405047-vZc2v - Wish and priority list ### Minutes - Wishlist and priority list - Pipeline linking: interest - Standarded parameters: (maybe)? - Office hours: minimum 5 people interested - Tutorials: - Routine steps add to the pipeline itself? - People willing to write tutorials: Ryan, Jacobo, Joana - Tutorials for linking pipelines: come with the above - meta-omics specific hackathons - lots of support - Exhibition idea: - Daniel L. might be more interested - Everyone else a bit quiet, maybe flesh out a proper plan - New idea: Daniel S. - no way of collecting ideas of what is 'missing' amongst our pipelines (e.g. a lot of people asking for short-read functional profiling) - Created new board for dumping ideas: https://github.com/orgs/nf-core/projects/79/views/1?pane=issue&itemId=79431000 ### TODO - [ ] (All): fill out when2meet to find common regular time - [ ] (JFY): make a poll for hackathon days for pipeline linking