nf-core/funcscan publication planning

--- title: nf-core/funcscan publication planning tags: funcscan,meeting --- ## PI meeting notes 2024-07-03 ### To-Dos until next meeting: - check if differences in ARG/BGC/AMP numbers are valid or are prone to biases from sampling/sequencing/assembly/contig length/identification tools or if we can/should normalize them to some parameters - read publication of wastewater datasets ([PDF](https://microbiomejournal.biomedcentral.com/counter/pdf/10.1186/s40168-019-0663-0.pdf)) - use figures as inspiration for own comparisons/analysis/interpretation of results - check closely for any method biases (e.g. sample handling, sequencing, quality check...) - go back to our own assembly from Louisa and check quality/stats - fix remaining funcscan bugs (BGC summary (GECCO only in current summary?!) + stuffs on slack/[Github](https://github.com/nf-core/funcscan/milestone/3)) - run/analyse results again, can we get a solution to why SWH_EFF has more genes than SWH_IN - finish manuscript draft (intro, methods), add - figure suggestions: - have the 3 datasets logically ordered: SWH_IN, SWH_AS, SWH_EFF - heatmap (taxonomic tree on one side, 2 samples or ARG classes on other side, color intesity gives number of hits per sample/ARG class) ## dev team meeting 2024-01-31 ### Status - Jasmin: - antismash update: - version 7.1.0 tested/working - stub test seems not working? - nf-test implementation not possible atm (need of aliases or `untar` module) - options? - use only stub test for the moment :heavy_check_mark: - try to find a workaround (maybe new tar archive on test-data dir) - put on hault until aliases available - the other module (`antismashlite`) needs even more aliasing for fasta/gff files... - bakta 1.9.2: container tests are working, nf-test missing - to-do: RGI 6.0, hamronization 1.1.4 (worked on both previously, put on halt when problems occurred), deeparg update ## dev team meeting 2024-01-24 ### Notes - Louisa: - No/informal send off - Free time for questions via slack, tiny bit of data analysis and manuscript reading - She sent us the final data for analysis - Pipeline status: - Anan: - planned: adding taxonomic classification (for all categories -> mmseqs2) - re-factored: AMPCombi - parses 6 tools - standardised table - internal filters (e.g. stop codon, no transporter) - cluster AMPs (via mmseqs2) - developed: visualisation tool -> global viewer across all summary tools - Jasmin: no progress - - Manuscript status: - Jasmin: no progress - Anan: no progress ### Actions Items: - Jasmin - Update all non-breaking modules in pipeline - Test antismash-lite conda & bio-containers manually & update module and pipeline - Anan - Adding mmseqs taxclass subworkflow - Standardising three workflows COMBI - James - High priority: nf-test - Lower priority pre-supplied ORFs (likely 1.3) **Target**: 1.2 by 21st of Febuary **21st of Februrary to 18th March**: re-run funcscan 1.2 on wastewater contigs, and perform analysis/generte figures **March-April**: finish manuscript ## nf-core/funcscan PI meeting notes 2023-11-07 - Discussed the 4 paragraph suggestions from the manuscript (see google doc), PIs are fine. - Discussed which pipelines and what exactly to compare ### To-Dos - Finish comparison table (Jasmin) - Get wastewater dataset through (3 samples) : https://microbiomejournal.biomedcentral.com/articles/10.1186/s40168-019-0663-0#Sec19 : Samples: 'SWHIN, Shek Wu Hui STP influent, SWHAS, Shek Wu Hui STP activated sludge; SWHEFF, Shek Wu Hui STP effluent' - RShiny interface for all three workflows - start playing with them, see if they make sense - We don't solve biological questions here, but: Go back to publications of our old datasets and find out biological questions that the authors tried to have solved and can be easily solved with funcscan now. ## Publication: 5 subheaders 2023-10-30 ### Five "headaches" that funcscan solves Each point will result in a seperate sub-header/paragraph in the manuscript: 1. Tools and databases: user friendly - the pipeline does all downloading and runs the tools in containers, all downloads may be saved for later usage - the user is ensured usage of uptodate tools, take antismash as an example, major database upgades with each version **Header: Run diverse workflows, tools, and databases with one command**   2. Meaningful default values for all tools: the pipeline can be run without looking into each tool setting - adds to time efficiency. Highlight added value! **Header: Meaningful default values: funcscan works out of the box** 3. Workflow modularity - get the maximum of information out of your data! ARG, AMP and BGC workflows and tools within each workflow can be turned on and off depending on the users demand: possibility of running all workflows to generate new insights into the same dataset. Highlight added value! **Header: Workflow modularity: Easy analysis customization** 4. Summary tools overcome different tool's output formats: customized funcscan summary tools standardize and join all output tables from each workflow, get rid of "trash hits". Highlight added value! - Harmonized tables that incorporate the results from diff tools making it easy to compare and relate results within in a workflow. The tables are setup in a way to make it easy for the user to extract information for further downstream analysis, e.g. fasta sequences in teh ampcombi results for alphafold analysis. Furthermore, teh tables can be visualiyed by Rshiny that allow the user to easily customize the output further and extract the information required for further downstream analysis. - The three workflows statndardiyed outputs can be combined to allow easy biological comparisons, which increases the annotation/classification of a hit, increasing its likelihood to be a potentially viable candidate in ina lab setting. Take BGCs and ARGs as an example. (TODO!) **Header: The summary tools: user friendly standardized output** 5. Filtering out potential false positives - Summary tools allow data filtering and refinement for meaningful hits with adjusted parameters (Anan's & Rose's work). Highlight added value! - Numerous parametrs were incoporated in the summary tools with real time lab users in mind, use AMP workflow as an example (AMPcombi), e.g. chemical and physical properties othe AMP hits, stop-codons presence across the 20 codons upstream and downstream of the AMP hits, detection of any 'transporter' proteins across the 10 annotated CDSs downstreama nd upstream, all of which increases the likelihood that the AMP hits detected are more likely 'expressable' in a lab setting. If fARGene and hamronization people respond to our requests: Add summary of summary table (see below) **Header: Elevating scientific relevance: AMP refinement for biologically meaningful results** ### Manuscript structure considerations - We want to combine *Results and Discussion* in the manuscript, not separate, which is possible in NAR Genomics and Bioinformatics. This is then followed by *Conclusions*. ### To-do - Search for similar-function pipelines of either all or one/two of funcscan workflows for the functional comparison in the manuscript | Feature | funcscan | mettannotator | bacannot | HT-ARGfinder | PathoFact | SqueezeMeta | MetaErg | ARGs-OAP | |:-:|:-:|:-:|:-:|:-:|:-:|:-:|:-:|:-:| | ARG screening | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: | (:heavy_check_mark:) | (:heavy_check_mark:) | :heavy_check_mark: | | AMP screening | :heavy_check_mark: | :x: | :x: | :x: | :x: | (:heavy_check_mark:) | (:heavy_check_mark:) | :x: | | BGC screening | :heavy_check_mark: | :heavy_check_mark: | :x: | :x: | :x: | (:x:) | (:x:) | :x: | | Taxonomic assignment of contigs | :heavy_check_mark: | :x: | :x: | :heavy_check_mark: | (:x:) | :heavy_check_mark: | :heavy_check_mark: | :x: | | Results summary | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: | (:heavy_check_mark:) | (:heavy_check_mark:) | :heavy_check_mark: | :heavy_check_mark: | :x: | | Container support (docker, singularity) | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: | :x: | :x: | :x: | :heavy_check_mark: | (:x:) | | Modularity | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: | :x: | :heavy_check_mark: | (:heavy_check_mark:) | :x: | :x: | | One-click installation | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: | :x: | :x: | (:x:) | :x: | :x: | | Local installation possible | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: | (:x:) | | Web-based execution possible | (:heavy_check_mark:) | (:heavy_check_mark:) | (:heavy_check_mark:) | :x: | :x: | :x: | :x: | :heavy_check_mark: | | Software reviewing | :heavy_check_mark: | :heavy_check_mark: | :x: | :x: | :x: | :x: | :x: | :x: | | Automated software testing | :heavy_check_mark: | :heavy_check_mark: | (:x:) | (:x:) | (:x:) | :x: | :x: | :x: | | Licence | MIT | Apache-2.0 | GPL-3.0 | None | GPL-3.0 | GPL-3.0 | AFL | AFL | - Provide the user with summary-summary table - This should be a table of overlapping hits (AMPs, ARGs, and BGCs combined) for each contig. We can skip contigs if there aren't any overlaps. Maybe: If they don't overlap, but have multiple hits per contig, it might still be biologically interesting. - For example: | **Contig ID** | **AMPs** | **ARGs** | **BGCs** | |:--------------:|:-----------:|:-------------:|:-----------:| | c_2134 | 34 | 2 | 5 | | c_2135 | 1 | 0 | 1 | | c_2136 | 0 | 0 | 5 | - For that: Create issues for - **hamronization** request: Include contig ID column in output TSV (very important for meaningful downstream analyses!) - abricate :white_check_mark: colname: "SEQUENCE" - amrfinderplus :white_check_mark: colname: "Contig id" - RGI :white_check_mark: colname: "Contig" - fargene :ballot_box_with_check: (only as part of the header in the "<sample>_FASTA-<class>-filtered.fasta" file) - **fARGene** request: Include contig ID also in output TSV, not only in the output FASTA headers - If they don't respond: We create PRs - If they still don't respond: We fork them :P (and keep the forks manually up-to-date with the original repo) ## nf-core/funcscan PI meeting notes 2023-10-25 ### Main discussion point Developers prefer a new publication strategy (the one we discussed already) - e.g. time-wise more efficient to change to NAR Genomics and Bioinformatics - good impact factor, description friendly - pipeline benchmark on a more functional level **Why change strategy** **Jasmin:** Introduce/give hard facts why the initial dataset benchmark idea is being changed: Annotation number as parameter for comparison is not a good base value, because "the more the better" does not hold true for annotations without functional testing, name differences in BGC classes would have to be standardized, no way to find out which ones may be false positives etc. **Opinions** Pierre: make sure we have a pipeline that is useful to the community, **show what can be done with the pipeline** Christina: funcscan advantage - **advantages of the pipeline usage vs single tool usage**. Pierre: we have to show that there is a benefit of having data on ARGs, BGCs and AMPs in one go. Why that might be important. Show the pipeline adds a value. Not just a mechanical thing but also a **intellectual contribution**. Christina: 1. **time efficiency** for users, technical advantage 2. **biological**ly the **results** are telling you something new, are ARGs, AMPs and BGCs really related? 3. One of the main benefits is **consistent data reporting scheme**! **Anan:** Take technical aspect of the dataset runs, without biological interpretation. Then take only one dataset to have a closer look at the biological content. **Additional feature suggestion** - Add a summary-summary-tool that shows contig names and if the contig had a hit in AMP, ARG and/or BGC. - Output FASTA file of selected best XX hits if user wants to use for downstream analyses (e.g. alphafold) **Publication strategy** * Not sure whether NAR Genomics and Bioinformatics (IF 4) or NAR (IF 14...), Tina is fine with both. Pierre had to leave earlier because on conference but I think it was obvious he prefers NAR. * Usecase descriptions and how funcscan deals with them: Make a list of “headaches” for users that are easily resolved by funcscan. * tool versions and installations, database downloads and installations -> no installation of multiple tools necessary, databases can be stored for re-runs * tool output formats - benefits of the summary tools * filtering out “trash” hits, evaluation of hits (include user experience: e.g. Roses feedback on data refinement) * Reproducibility: re-runs under same conditions are possible, runs of samples generated at different times (lab work sometimes is delayed..) can be done under the same conditions * Maybe a showcase it in the end with a biological meaningful insight. Not big and time-intensive but interesting. ### Wrap up - Introduction: 1. problem that we are solving 2. pipeline design - Pick out 5 key aspects that make it really useful for the user ("added value") and elaborate on these, common bottlenecks in this kind of analysis (User experience UX :wink:) - Write those 5 as Results subheaders in the manuscript - Benchmark functionality of different pipelines (e.g. checklist). Since funcscan is unique with 3 workflows (which we should state), we compare each funcscan workflow against a similar pipeline, if exists (for AMPs there exists none yet). List resources, runtime, cpu usage, file size etc. for users to know what to expect. - Filtering parameters to get high quality hits - Meet again in 2 weeks to check on those 5 headers - Generally: very brainstormy meeting, a lot of ideas appeared from the PIs and us (Anan, Louisa, Jasmin), nice and productive athmosphere. I (Jasmin) appreciate Tina's active interest in the project and the regular meetings! ## nf-core/funcscan meeting notes 2023-05-10 - Connected OpenStack S3 bucket to our cluster on several authentication layers!! (@James) - ## nf-core/funcscan meeting notes 2023-05-03 ```mermaid flowchart LR subgraph openstack object_storage end object_storage([s3 object storage]) <-- internet --> head_node object_storage([s3 object storage]) <-- internet --> worker_node_n subgraph simplevm_volume volume_funcscan([/volume/funcscan]) <----> head_node end subgraph simplevm_cluster subgraph head head_node <---> head_ephemeral([ephemeral]) end volume_spool([/vol/spool]) <----> head_node volume_spool([/vol/spool]) <----> worker_node_n head_node <----> worker_node_n subgraph worker_node worker_node_n <--> worker_ephemeral_disk([/vol/scratch]) end end ``` To summarise: - tool specific scratch space, `/vol/scratch` on worker node (NXF: scratch directrive) - nextflow working directory on ephemeral disk of head node (/vol/spool) - input/results s3 bucket - Manuscript - Introduction: Restructure commented parts (see James' comments) - Other sections: Fill with bullet points, start writing - Openstack bucket? ## nf-core/funcscan meeting notes 2023-04-20 ### Protocol - Set up [funcscan_manuscript](https://drive.google.com/drive/folders/1uqLl3KxomroYV3-VDgs_MnrBIFl5VnhV) GoogleDrive folder (restricted access) - Tried to set up deNBI cluster - Helpdesk from deNBI solved errors when starting any cluster - Start cluster with several nodes (large to medium) - Milestones 1.1: - Cleared up already solved issues - comBGC PR: reviewed, will merge after update - ressource optimization: WIP James ### To do - Manuscript - Finish intro draft - Bullet point other paragraphs - GoogleDrive folder - share with all authors - Add affiliations to spreadsheet - Add Nat. Commun. template to `01_drafts/main_text` - Finally runnnn funcscan on selected datasets (deNBI)! ## nf-core/funcscan meeting notes 2023-04-04 ### Introduction outline - **Significance of natural product and AMP discovery:** Along the lines of "We need drug development for medical use", e.g. > "The increasing speed at which – in particular multidrug resistant – pathogens spread poses a serious threat to the health and well-being of humanity (reference). As a consequence new anti-infective drugs are urgently needed. Bacteria are prolific producers of low molecular weight compounds (reference), so-called natural products, with antimicrobial activity that can serve as basis for the development of novel drugs (reference). To ensure self-resistance against their own compounds, the producing bacteria often carry corresponding resistance genes together with biosynthetic genes. Unfortunately, the excessive use of antibiotics in medicine (reference) and agriculture (reference) has led to a significant threat to human health by accelerating the evolution of resistance genes and thereby promoting the occurrence of multi-drug resistant bacteria." - give example - possible AMP reference: https://www.nature.com/articles/nbt.2572 - **Significance of global ARG crisis:** We need to characterize resistance spectra and determinants, for example by studying their evolution - give example - **The problem:** Multiple gene annotation and function screening tools exist, are often used in a non-comprehensive and non-efficient manner -> reduced predictive power/results + wasted resources - **Status quo of annotation pipelines**: Name similar purpose pipelines + what are their properties (prbly none of them is as scalable etc.) - **"Here we present":** a scalable pipeline to enable identifying and characterizing AMPs, ARGs and BGCs from bacterial genomic sequences. (parallelized, highly reproducible...) Describe characteristics and benefits (or maybe more in the results?). - **"We confirmed":** Effectiveness (and efficiency) are shown via "benchmarking" with workflows from previously screened datasets of ancient DNA, soil, as well as plant microbiomes. We (will hopefully have) identified a lot more genes because the power/advantage of multiple identification strategies and databases are combined in this pipeline. - **Bottom line:** "Using this workflow, we gained detailed quantitative insight into the resistomes and spectra of BGCs and AMPs within metagenomes of diverse origin. This pipeline is awesome for what you did manually before." ## To-do - Finish introduction draft - Pipeline code: Finish PRs for Milestone 1.1 - Try out simpleVM clusters, run assemblies/funcscan ## nf-core/funcscan meeting notes 2023-03-21 - Our dataset table: https://hackmd.io/C2AlX3X1T5-jwZU-q2njFg - Create assemblies from selected dataset on our own clusters - Check if they are the same/similar to the publications' one - If assemblies are provided, download + put on deNBI - Prioritize papers using tools from funcscan + check versions of tools and databases - To-do next week: - Create prioritized list of suitable papers - Start running assemblies - Review more papers - Jasmin: bullet point list for intoduction ## nf-core/funcscan meeting notes 2023-03-08 - Review previous tasks 1. Nat Commun research article types: - **Showcase**: Walk reader through the developed method - e.g. [here](https://www.nature.com/articles/s41467-021-24746-w): State method goal, summarize what it does, show results, investigate/verify results, show nice visualizations and user-friendly interface - run similar tools and summarize performance (e.g. [scMoMaT single cell multi omics method](https://www.nature.com/articles/s41467-023-36066-2)) - sometimes simulated, sometimes real datasets - more examples: - [Strainberry assembly pipeline](https://www.nature.com/articles/s41467-021-24515-9) - [SMAP pipeline: sample matching in proteogenomics](https://www.nature.com/articles/s41467-022-28411-8) - [scDML: remove batch effect in scRNA-seq data](https://www.nature.com/articles/s41467-023-36635-5) - [AlphaPeptDeep: framework to predict peptide properties for proteomics](https://www.nature.com/articles/s41467-022-34904-3) - [PRECAST: integrate spatial transcriptomics data](https://www.nature.com/articles/s41467-023-35947-w) - [PyUUL: translate biological structures into 3D tensors](https://www.nature.com/articles/s41467-022-28327-3) - **Sections order**: | scDML, scMoMaT | Strainberry, W-SAS | SMAP | | -------- | -------- | -------- | | Abstract | Abstract | Abstract | | Introduction | Introduction | Introduction | | Results | Results | Results | | Discussion | Discussion | Discussion | | Methods | Methods | Methods | | Data availability | Data availability | Data availability | | Code availability | Code availability | Code availability | | References | References | References | | Acknowledgements | Acknowledgements | Acknowledgements | | Author information | Author information | Author information | | Ethics declarations | Ethics declarations | Ethics declarations | | Peer review | | Peer review | | Additional information | Additional information | Additional information | | Supplementary information | Supplementary information | Supplementary information | | Source data | | | | Rights and permissions | Rights and permissions | Rights and permissions | | About this article | About this article | About this article | | | This article is cited by | | | Comments | Comments | Comments |# - Nature Protocols - [SCENIC: scalable SC network analysis](https://www.nature.com/articles/s41596-020-0336-2) - [lentiMPRA, MPRAflow: high-throughput functional characterization of gene regulatory elements](https://www.nature.com/articles/s41596-020-0333-5) - Nature Methods - [Emu: species-level microbial community profiling](https://www.nature.com/articles/s41592-022-01520-4) 2. Datasets - aDNA: - [remote pristine Antarctic soils](https://microbiomejournal.biomedcentral.com/articles/10.1186/s40168-018-0424-5) (ARGs) - Datasets table: https://hackmd.io/C2AlX3X1T5-jwZU-q2njFg?edit - Submitted de.NBI cloud computing application (should get access within a week) - Citation for publication: > The development and support of the cloud is possible above all through the funding of the cloud infrastructure by the Federal Ministry of Education and Research (BMBF)! We would highly appreciate the following citation in your next publication(s): This work was supported by the BMBF-funded de.NBI Cloud within the German Network for Bioinformatics Infrastructure (de.NBI) (031A537B, 031A533A, 031A538A, 031A533B, 031A535A, 031A537C, 031A534A, 031A532B). ## nf-core/funcscan meeting notes 2023-03-01 ### Agenda - Review previous tasks 1. Nat Comms: research article type; not yet found examples 2. Examples of other pipelines (more to come): - [METABOLIC](https://github.com/AnantharamanLab/METABOLIC) - [MetaErg](https://www.frontiersin.org/articles/10.3389/fgene.2019.00999/full) - [SqueezeMeta](https://www.frontiersin.org/articles/10.3389/fmicb.2018.03349/full) - [Bacannot](https://github.com/fmalmeida/bacannot) 3. Datasets - Plant Microbiome: - [AMPs: microbiome of tea leaves](https://doi.org/10.1186/s12918-017-0503-4), sequencing reads: [SRP113601](https://www.ncbi.nlm.nih.gov/Traces/study/?acc=SRP113601&o=acc_s%3Aa), total RNA and 16S rRNA, assemby with Trinity, functional assignment KEGG, alignment with Bowtie2 to AMP databases (ADAM, CAMP, APD). - [BGCs: rhizosphere soil bacteria from tomato](https://doi.org/10.1186/s12864-020-07346-8), sequening reads: [PRJNA503984](https://www.ncbi.nlm.nih.gov/bioproject/PRJNA503984/), assembled on [Medusa werb server](http://combo.dbe.unifi.it/medusa), BGC mining with AntiSMASH and BAGEL4 - [ARGs: soil vs grassland microbiome](https://doi.org/10.1016/j.scitotenv.2022.159179), sequencing reads: [PRJNA773121](https://www.ncbi.nlm.nih.gov/Traces/study/?query_key=4&WebEnv=MCID_6405e7e73954ad0735606e97&o=acc_s%3Aa), assembly with MegaHit, Prokka+KEGG for functional annotation, qPCR for ARGs with primer set of 216 pairs - Human fecal metagenomics: - [Rampelli _et al._ (2015)](https://doi.org/10.1016/j.cub.2015.04.055) - Sediments/Soil/Water - Tara Oceans - Assembled - 'Vicinity' paper - Assembled - SARG/DeepARG/AMRFinderPlus one - Assemble with MAG to get contigs - Ancient DNA: no candidates 4. Waiting on 2 - Minor release with updated BAKTA and other databases ### TODO - Jasmin: to continue finding examples of Nat. Comms. methods papers - Jasmin to continue looking for other pipelines - Jasmin/Moritz/Louisa: to continue looking for datasets - Anan: Start running assemblies on the datasets that will be compared in FUNCSCAN. ## nf-core/funcscan meeting notes 2023-02-22 ### Publication Planning Tasks 1. [@jasmezz] **Structure/Scope** Check Nat. Comms. software paper structure - Possible? - Need 'story'? Or Methods OK? - Example of Nat Comms software papers 2. [@jasmezz] **Literature review**: for existing tools for same purposes - Exisitng pipelines same thing (multi-functioanl group screening) - Existing pipelines for one group, e.g. - ARG/BGC: https://github.com/fmalmeida/bacannot - 'HMMR' based: https://joss.theoj.org/papers/10.21105/joss.04851 3. **Detection Improvements**: Identify datasets of ARGS/BGCs/AMPs and compare results (+ hopefully find more) - Datasets - [@jasmezz] Ancient DNA dataset (HKI) - Brealey 2022 - AMR Swedish Bears? - Warinner 2014 - AMR(?) Dental Calculus - [@louperelo] Plant microbiome dataset (Louisa) - [@Midnighter] Human fecal metagenomes datasets - [@louperelo] Wastewater - [@Darcy220606] Sediments - Specifications - Minimum One AMP/One BGC/One ARG; one of each for each dataset (or mixture if exists!) - Well described methods: provide raw data, describe parameters, versions etc. - Well described results: do they provide lists of hits etcs. - Ideally: reproducible (can we just re-generate the same results fro their analyses) 4. **Resource Benchmark** - Compare one category against other pipelines (e.g. bacannot vs funcscan); which is faster/more hits etc. - Start from FASTQs or MAGs? - If from FASTQs we can start with nf-core/mag - Or run e.g. bacanot (if that is our comparison after lit review) and re-use those contigs produced there in funcscan General comments - Try to benchmark on different platforms (one for each dataset? HPC, local server, cloud) - Apply for deNBI Simple VM cluster for project ### Louisa Paper suggestions: Publication with interesting datasets: - ARGs - Wastewater: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6408512/ - MGmapper for mapping to reference sequences, AMR genes from ResFinder database - Sediment (ancient, permafrost, Alaska): https://www.nature.com/articles/nature10388#accession-codes - BGCs - Red Sea Brine Pool: https://www.mdpi.com/1660-3397/17/5/273#app1-marinedrugs-17-00273 - AntiSMASH - Grassland isolates: https://bmcgenomics.biomedcentral.com/articles/10.1186/s12864-020-6563-7#Sec11 - AntiSMASH - AMPs - Maize plants: https://www.mdpi.com/1422-0067/18/9/1938 - alignment with Muscle ### Next week - Review Nat Comms. software papers - Review possible competition pipeline - Review datasets against specifications