frodriguez.students
      • Sharing URL Link copied
      • /edit
      • View mode
        • Edit mode
        • View mode
        • Book mode
        • Slide mode
        Edit mode View mode Book mode Slide mode
      • Customize slides
      • Note Permission
      • Read
        • Owners
        • Signed-in users
        • Everyone
        Owners Signed-in users Everyone
      • Write
        • Owners
        • Signed-in users
        • Everyone
        Owners Signed-in users Everyone
      • Engagement control Commenting, Suggest edit, Emoji Reply
    • Invite by email
      Invitee

      This note has no invitees

    • Publish Note

      Share your work with the world Congratulations! 🎉 Your note is out in the world Publish Note

      Your note will be visible on your profile and discoverable by anyone.
      Your note is now live.
      This note is visible on your profile and discoverable online.
      Everyone on the web can find and read all notes of this public team.
      See published notes
      Unpublish note
      Please check the box to agree to the Community Guidelines.
      View profile
    • Commenting
      Permission
      Disabled Forbidden Owners Signed-in users Everyone
    • Enable
    • Permission
      • Forbidden
      • Owners
      • Signed-in users
      • Everyone
    • Suggest edit
      Permission
      Disabled Forbidden Owners Signed-in users Everyone
    • Enable
    • Permission
      • Forbidden
      • Owners
      • Signed-in users
    • Emoji Reply
    • Enable
    • Versions and GitHub Sync
    • Note settings
    • Note Insights New
    • Engagement control
    • Make a copy
    • Transfer ownership
    • Delete this note
    • Insert from template
    • Import from
      • Dropbox
      • Google Drive
      • Gist
      • Clipboard
    • Export to
      • Dropbox
      • Google Drive
      • Gist
    • Download
      • Markdown
      • HTML
      • Raw HTML
Menu Note settings Note Insights Versions and GitHub Sync Sharing URL Help
Menu
Options
Engagement control Make a copy Transfer ownership Delete this note
Import from
Dropbox Google Drive Gist Clipboard
Export to
Dropbox Google Drive Gist
Download
Markdown HTML Raw HTML
Back
Sharing URL Link copied
/edit
View mode
  • Edit mode
  • View mode
  • Book mode
  • Slide mode
Edit mode View mode Book mode Slide mode
Customize slides
Note Permission
Read
Owners
  • Owners
  • Signed-in users
  • Everyone
Owners Signed-in users Everyone
Write
Owners
  • Owners
  • Signed-in users
  • Everyone
Owners Signed-in users Everyone
Engagement control Commenting, Suggest edit, Emoji Reply
  • Invite by email
    Invitee

    This note has no invitees

  • Publish Note

    Share your work with the world Congratulations! 🎉 Your note is out in the world Publish Note

    Your note will be visible on your profile and discoverable by anyone.
    Your note is now live.
    This note is visible on your profile and discoverable online.
    Everyone on the web can find and read all notes of this public team.
    See published notes
    Unpublish note
    Please check the box to agree to the Community Guidelines.
    View profile
    Engagement control
    Commenting
    Permission
    Disabled Forbidden Owners Signed-in users Everyone
    Enable
    Permission
    • Forbidden
    • Owners
    • Signed-in users
    • Everyone
    Suggest edit
    Permission
    Disabled Forbidden Owners Signed-in users Everyone
    Enable
    Permission
    • Forbidden
    • Owners
    • Signed-in users
    Emoji Reply
    Enable
    Import from Dropbox Google Drive Gist Clipboard
       Owned this note    Owned this note      
    Published Linked with GitHub
    • Any changes
      Be notified of any changes
    • Mention me
      Be notified of mention me
    • Unsubscribe
    # A guide to analyzing metagenomic data with metaWRAP ###### tags: `metagenomic`, `metaWRAP` Note: This pipeline is only a guide. None of metaWRAP's modules are dependant of each other, so if you want to do certain steps with another software, you are free to do so. For example, if you want to try the Reassemble_bins module on your own bins, you do not need to run the other modules just to get to that point. Or if you want to use a different assembler that metaSPAdes of MegaHit, you can do so, and then proceed to the rest of the pipeline with your own assembly. ##First load the module from BIOWARE: `module load metawrap` ##Check this: `which config-metawrap` ##output should be: /bioware/metawrap-1.1/bin/config-metawrap ## Step 0: Download sample metagenomic data from the metaHIT gut survey (or use your own unzipped, demultiplexed, paired-end Illumina reads). Download data from 3 samples: ``` wget ftp.sra.ebi.ac.uk/vol1/fastq/ERR011/ERR011347/ERR011347_1.fastq.gz wget ftp.sra.ebi.ac.uk/vol1/fastq/ERR011/ERR011347/ERR011347_2.fastq.gz wget ftp.sra.ebi.ac.uk/vol1/fastq/ERR011/ERR011348/ERR011348_1.fastq.gz wget ftp.sra.ebi.ac.uk/vol1/fastq/ERR011/ERR011348/ERR011348_2.fastq.gz wget ftp.sra.ebi.ac.uk/vol1/fastq/ERR011/ERR011349/ERR011349_1.fastq.gz wget ftp.sra.ebi.ac.uk/vol1/fastq/ERR011/ERR011349/ERR011349_2.fastq.gz ``` or better: ``` module load sratoolkit/2.10.7 module load fastqc ``` The first time you use fastq-dump is going to fail and tell you to configure the program. run this command `vdb-config --interactive` `fastq-dump --split-files --gzip --read-filter pass --skip-technical ERR011347` `fastq-dump --split-files --gzip --read-filter pass --skip-technical ERR011348` `fastq-dump --split-files --gzip --read-filter pass --skip-technical ERR011349` Unzip the data ``` gunzip *.gz ``` Place the raw sequencing reads into a new folder ``` mkdir RAW_READS mv *fastq RAW_READS ls RAW_READS ERR011347_1.fastq ERR011347_2.fastq ERR011348_1.fastq ERR011348_2.fastq ERR011349_1.fastq ERR011349_2.fastq ``` ## Step 1: Run metaWRAP-Read_qc to trim the reads and remove human contamination Note: you will need the bmtagger hg38 index to remove the human reads - see the metaWRAP database installation instructions. You may also use another host genome to filter against with the `-x` option. Alternatively, use the `--skip-bmtagger` flag of of the ReadQC module to only do the read trimming. Individually process each sample, with --skip-bmtagger: `metawrap read_qc --skip-bmtagger -1 RAW_READS/SRR_1.fastq -2 RAW_READS/SRR_2.fastq -t 4 -o READ_QC/SRR` - [ ] #FR1: try to do the metawrap read_qc step for the two other sets ``` mkdir READ_QC metawrap read_qc -1 RAW_READS/ERR011347_1.fastq -2 RAW_READS/ERR011347_2.fastq -t 4 -o READ_QC/ERR011347 metawrap read_qc -1 RAW_READS/ERR011348_1.fastq -2 RAW_READS/ERR011348_2.fastq -t 4 -o READ_QC/ERR011348 metawrap read_qc -1 RAW_READS/ERR011349_1.fastq -2 RAW_READS/ERR011349_2.fastq -t 4 -o READ_QC/ERR011349 ``` #FR: No neeed to do this way: Alternatively, process all samples at the same time with a parallel for loop (especially if you have many samples): ``` for F in RAW_READS/*_1.fastq; do R=${F%_*}_2.fastq BASE=${F##*/} SAMPLE=${BASE%_*} metawrap read_qc -1 $F -2 $R -t 1 -o READ_QC/$SAMPLE & done ``` Or as a one-liner: `for F in RAW_READS/*_1.fastq; do R=${F%_*}_2.fastq; BASE=${F##*/}; SAMPLE=${BASE%_*}; metawrap read_qc -1 $F -2 $R -t 1 -o READ_QC/$SAMPLE & done` Lets have a glance at one of the output folders: `ls READ_QC/ERR011347` These are html reports of the read quality before and after QC: ``` post-QC_report pre-QC_report ``` Original raw reads: ![Read quality before QC](https://i.imgur.com/x8aaFWs.png) Final QC'ed reads: ![Read quality before QC](https://i.imgur.com/drJfxC9.png) These are the final trimmed and de-contaminated reads: ``` final_pure_reads_1.fastq final_pure_reads_2.fastq ``` Move over the final QC'ed reads into a new folder ``` mkdir CLEAN_READS for i in READ_QC/*; do b=${i#*/} mv ${i}/final_pure_reads_1.fastq CLEAN_READS/${b}_1.fastq mv ${i}/final_pure_reads_2.fastq CLEAN_READS/${b}_2.fastq done ``` Without loop: ``` mv READ_QC/ERR011347/final_pure_reads_1.fastq CLEAN_READS/ERR011347_1.fastq mv READ_QC/ERR011347/final_pure_reads_2.fastq CLEAN_READS/ERR011347_2.fastq ``` ## Step 2: Assembling the metagenomes with the metaWRAP-Assembly module Note: Depending on your goals you may want to assemble each sample seperately, but for the purposes of analyzing the whole community across samples, we will be co-assembling our samples. Concatinate the reads from all the samples: ``` cat CLEAN_READS/ERR*_1.fastq > CLEAN_READS/ALL_READS_1.fastq cat CLEAN_READS/ERR*_2.fastq > CLEAN_READS/ALL_READS_2.fastq ``` Assemble the reads with the metaSPAdes option flag (usually prefered over MegaHIT unless you have a very large data set): - [ ] #FR: We need a rather potent server for this; let's check domino and flicker if they are available (ssh -X ) ``` metawrap assembly -1 CLEAN_READS/ALL_READS_1.fastq -2 CLEAN_READS/ALL_READS_2.fastq -m 200 -t 12 --metaspades -o ASSEMBLY ``` You will find the assembly file in `ASSEMBLY/final_assembly.fasta`, and the QUAST assembly report html in `ASSEMBLY/assembly_report.html`! Assembly statistics: ![Assembly stats](https://i.imgur.com/RbDldGU.png) Looking at the top 10 contigs shows we got some longer contigs (considering that we are working with just 7Gbp of data)! ``` grep ">" ASSEMBLY/final_assembly.fasta | head >NODE_1_length_196124_cov_2.427049 >NODE_2_length_176373_cov_3.889994 >NODE_3_length_163601_cov_3.070200 >NODE_4_length_142996_cov_2.771017 >NODE_5_length_109931_cov_3.516837 >NODE_6_length_106321_cov_2.842875 >NODE_7_length_99368_cov_2.860703 >NODE_8_length_95669_cov_2.506714 >NODE_9_length_91511_cov_12.466716 >NODE_10_length_88949_cov_2.730882 ``` ## Step 3: Run Kraken module on both reads and the assembly Running kraken on the reads will give us an idea of the taxonomic composition of the communities in the three samples, while running kraken on the assembly will give us an idea what taxonomic groups were assembled better than others (the assembly process is heavily biased and should not be used to infer overall community composition). Run the Kraken module on all files at once, subsetting the reads to 1M reads per sample to speed up the run use `ssh minnie` for this step (needs a lot of memory) ``` metawrap kraken -o KRAKEN -t 12 -s 1000000 CLEAN_READS/ERR*fastq ASSEMBLY/final_assembly.fasta ``` Lets have a look out the output folder: ``` ERR011347.kraken ERR011348.kraken ERR011349.kraken final_assembly.kraken kronagram.html ERR011347.krona ERR011348.krona ERR011349.krona final_assembly.krona ``` The .kraken files contain the KRAKEN-estimated taxonomy of each read or contig, while the `.krona` files summarize taxonomy statistics to be fed into KronaTools, which makes the kronagram.html file. The `kronagram.html` file contains all the taxonomy information from all the samples and the co-assembly. Inspecting the kronas in a web browser will show you what the community composition is like. For example, here is the taxonomic composition of our first sample: ![Krona](https://i.imgur.com/jZiFPUV.png) ## Step 4: Bin the co-assembly with three different algorithms with the Binning module The initial binning process with CONCOCT, MaxBin, and metaBAT will be the more time intensive steps (especially CONCOCT and MaxBin), so I would advise you to run the Binning module with each of the algorithms seperately. However, metaWRAP supports running all three together. Our dataset is reasonably small, so I will run all three binning predictions at the same time. If you are used to a different binning software(s), feel free to run them instead. The downstream refinement process (the Bin_refinement module) takes in up to 3 different bin sets, although you can get around this by running splitting your bin sets into groups and then recursively consolidating them. Run the binning module with all three binners - notice how I put both the F and R read files at the end of the command. ``` metawrap binning -o INITIAL_BINNING -t 96 -a ASSEMBLY/final_assembly.fasta --metabat2 --maxbin2 --concoct CLEAN_READS/ERR*fastq ``` In the output folder, we see folders with the 3 final bin sets, and a file containing the average and standard deviation of the library insert sizes (this may or may not be usefull to you). ``` insert_sizes.txt concoct_bins maxbin2_bins metabat2_bins work_files ``` Looking inside these folders reveals that we found 47, 29, and 20 bins with concoct, metabat2, and maxbin2, respectively. But we do not know how good these bins are yet (unless you used the `--run-checkm` flag). We will find out the quality of the bins in the next step! ## Step 5: Consolidate bin sets with the Bin_refinement module Note: make sure you downloaded the CheckM database (see metaWRAP database instrucitons) Now what you have metabat2, maxbin2, and concoct bins, lets consolidate them into a single, stronger bin set! If you used your own binning software, feel free to use any 3 bin sets. If you have more than 3, you can run them in groups. For example if you have 5 bin sets, try consolidating 1+2+3 and 4+5, and then consolidate again between the outputs. When you do your refinement, make sure to put some thought into the minimum compleiton (-c) and maximum contamination (-x) parameters that you enter. During refinement, metaWRAP will have to chose the best version of each bin between 7 different versions of each bin. It will dynamically adjust to prioritize the bin quality that you desire. Consider this example: bin_123 comes in four versions in terms of completion/contamination: 95/15, 90/10, 80/5, 70/5. Which one is the best version? The high completion but high contamination, or the less complete but more pure bin? This is subjective and depends on what you value in a bin and on what your purposes for bin extraction are. By default, the minimum completion if 70%, and maximum contamination is 5%. However, because of the relatively poor depth of these demonstration samples, we will set minimum completion to 50% and maximum contamination to 10%, but feel free to be much more picky. Parameters like `-c 90 -x 5` are not unreasonable in some data (but you will get fewer bins, of course). Run metaWRAP's Bin_refinement module: ``` metawrap bin_refinement -o BIN_REFINEMENT -t 96 -A INITIAL_BINNING/metabat2_bins/ -B INITIAL_BINNING/maxbin2_bins/ -C INITIAL_BINNING/concoct_bins/ -c 50 -x 10 ``` In the output directory, you will see the three original bin folders we fed in, as well as `metaWRAP` directory, which contains the final, consilidated bins. You will also see a `Binning_refiner` bin set - this is an internal benchmark for the bins produced by using another consolidation software (Binning_refiner). You can ignore this set - it will likely have low contamination and completion. You will also see .stats files for each one of the bin directories. ``` concoct_bins.stats maxbin2_bins.stats metabat2_bins.stats metawrap.stats Binning_refiner.stats concoct_bins maxbin2_bins metabat2_bins metawrap Binning_refiner ``` The .stat files contain usefull information about each bin, inscuding its completeness and contamination. For example, `cat BIN_REFINEMENT/metawrap.stats`: ``` bin completeness contamination GC lineage N50 size binner bin.5 100.0 1.6 0.311 Euryarchaeota 12686 1705532 binsO.checkm bin.4 99.32 1.342 0.408 Clostridiales 58825 2083650 binsO.checkm bin.14 86.69 5.896 0.293 Bacteria 3754 2199676 binsO.checkm bin.6 86.22 2.348 0.371 Clostridiales 4283 2055792 binsO.checkm bin.8 83.16 2.516 0.446 Clostridiales 2723 1467846 binsO.checkm bin.2 80.34 0.0 0.469 Bacteria 11936 3579466 binsO.checkm bin.9 76.57 2.648 0.425 Selenomonadales 3155 1796524 binsO.checkm bin.13 74.82 1.710 0.435 Bacteroidales 7456 3643185 binsO.checkm bin.3 74.53 0.377 0.284 Clostridiales 10440 1241933 binsO.checkm bin.10 65.78 0.0 0.263 Bacteria 3045 1159966 binsO.checkm bin.11 64.85 3.776 0.417 Bacteroidales 2086 3103352 binsO.checkm bin.1 57.36 0.0 0.430 Bacteria 4628 2673426 binsO.checkm bin.7 52.94 1.724 0.501 Bacteria 3614 1465011 binsO.checkm ``` To evaluate how many "good bins" (based on out >50% comp., <10% cont. metric) metaWRAP produced, we can run ``` cat BIN_REFINEMENT/metawrap_bins.stats | awk '$2>50 && $3<10' | wc -l 13 ``` By inspecting the other files, we find that metaBAT2, MaxBin2, CONCOCT, and metaWRAP produced 11, 7, 10, and 13 bins, respectively. So metaWRAP produced 2 more bins that the best single binner. Not bad! But this is just the number of bins. To closer compare the bin sets in terms of completion and contamination, we can look at the plots in `BIN_REFINEMENT/figures/`: ![Bin_refinement](https://i.imgur.com/m6RRJxi.jpg) Note: This graph no longer has `Binning_refiner` in it, to reduce confusion. If you want to see Binning_refiner's performance, look at binsABC (or binsAB if you have two bin sets) in the other figure. As you can see, the refinment process signifficantly produced the best bin set in terms of both compleiton and contamination. Keep in mind that these improvements are even more dramatic in more complex samples. ## Step 6: Visualize the community and the extracted bins with the Blobology module Lets use the Blobology module to project the entire assembly onto a GC vs Abundance plane, and annote them with taxonomy and bin information. This will not only give us an idea of what these microbial communities are structured like, but will also show us our binning success in a more visual way. NOTE: In order to annotate the blobplot with bins with the `--bins` flag, you **MUST use the non-reassembled bins**! In other words, use the bins produced by the Bin_Refinment module, not the Reassemble_bins module. Note2: You will need R for this module. ``` metawrap blobology -a ASSEMBLY/final_assembly.fasta -t 96 -o BLOBOLOGY --bins BIN_REFINEMENT/metawrap_bins CLEAN_READS/ERR*fastq ``` You will find that the output has a number of blob plots of our communities, annotated with different levels of taxonomy or their bin membership. Note that to help with visualizing the bins, some of the plots only contain the contigs that were successfully binned (these files have `.binned.blobplot.` in their names). Phyla taxonomy of the entire assembled community: ![Phyla](https://i.imgur.com/VihLGWb.jpg) Bin membership of all the contigs: ![bins](https://i.imgur.com/GDmIYe5.jpg) ## Step 7: Find the abundaces of the draft genomes (bins) across the samples We would like to know how the extracted genomes are distributed across the samples, and in what abundances each bin is present in each sample. The Quant_bin module can give us this information. It used Salmon - a tool conventionally used for transcript quantitation - to estimate the abundance of each scaffold in each sample, and then computes the average bin abundances. NOTE: In order to run this module, it is **highly** recomended to use the non-reassembled bins (the bins produced by the Bin_Refinment module, not the Reassemble_bins module) and provide the entire non-binned assembly with the `-a` option. This will give more accurate bin abundances that are in context of the entire community. Lets run the Quant_bins module: ``` metawrap quant_bins -b BIN_REFINEMENT/metawrap_bins -o QUANT_BINS -a ASSEMBLY/final_assembly.fasta CLEAN_READS/ERR*fastq ``` The output contains several usefull files. First, there is the `bin_abundance_heatmap.png` - a quick heatmap made to visualize the bin abundances across the samples. ![heatmap](https://i.imgur.com/K1RaPUT.png) The raw data for this plot (as you will most likely want to make your own heatmaps to analyze) is in `bin_abundance_table.tab`. Note that the abundances are expressed as "genome copies per million reads", and are calculated with Salmon in a simmilar way like TPM (transcripts per million) is calculated in RNAseq analysis. As such, they should be already standardized to the individual sample size. ``` Genomic bins ERR011349 ERR011348 ERR011347 bin.9 0.113912116828 35.851964987 39.8440491514 bin.10 0.273774684856 9.52869077293 39.988244574 bin.1 7.87827599808 31.3262582417 72.4475075589 bin.4 1.11852631889 100.052540293 111.213423224 bin.2 42.0242612674 69.0094806385 80.200001212 bin.5 2.16260151787 22.06396779 43.7720962538 bin.11 64.2884105466 25.3703846834 29.5444322752 bin.6 517.890689122 0.379711918465 0.834196723864 bin.7 0.499019812767 61.2121001057 82.5953338481 bin.14 5.49635966692 14.631433905 32.98399834 bin.13 0.230760165209 56.0018529273 91.6502833521 bin.8 98.4767064505 38.0691238971 22.8857472565 bin.3 349.730007621 0.0911113402849 0.196554603409 ``` Finally, you can view the abundances of all the individual contigs in all the samples in the `quant_files` folder. ## Step 8: Re-assemble the consolidated bin set with the Reassemble_bins module Now that we have our final, consilidated bin set in `BIN_REFINEMENT/metawrap_bins`, we can try to further improve it with reassembly. The Reassemble bins module will collect reads belonging to each bun, and then reassemble them sepperately with a "permissive" and a "strict" algorithm. Only the bins that improved through reassembly will be altered in the final set. Let us run the Reassemble_bins module with all the reads we have: ``` metawrap reassemble_bins -o BIN_REASSEMBLY -1 CLEAN_READS/ALL_READS_1.fastq -2 CLEAN_READS/ALL_READS_2.fastq -t 96 -m 800 -c 50 -x 10 -b BIN_REFINEMENT/metawrap_bins ``` Looking at the output in `BIN_REASSEMBLY/reassembled_bins.stats`, we can see that 3 bins were improved though strict reassembly, 6 improved thorugh permissive reassembly, and 4 bins could not be improved (`.strict`, `.permissive`, and `.orig` bin extensions, respectively): ``` bin completeness contamination GC lineage N50 size binner bin.10.orig 65.78 0.0 0.263 Bacteria 3045 1159966 NA bin.7.strict 54.94 0.671 0.501 Clostridiales 3947 1474089 NA bin.4.permissive 99.32 1.342 0.408 Clostridiales 72135 2088821 NA bin.2.permissive 82.06 0.0 0.469 Bacteria 18989 3604843 NA bin.14.strict 85.84 3.066 0.293 Bacteria 4576 2201824 NA bin.9.permissive 76.74 2.554 0.425 Selenomonadales 3601 1802438 NA bin.13.permissive 78.37 1.357 0.435 Bacteroidales 9887 3675176 NA bin.11.orig 64.85 3.776 0.417 Bacteroidales 2086 3103352 NA bin.6.permissive 88.04 1.006 0.371 Clostridiales 6288 2070146 NA bin.5.orig 100.0 1.6 0.311 Euryarchaeota 12686 1705532 NA bin.3.permissive 74.91 0.396 0.284 Clostridiales 16578 1243641 NA bin.1.orig 57.36 0.0 0.430 Bacteria 4628 2673426 NA bin.8.strict 83.89 1.342 0.446 Clostridiales 3870 1474833 NA ``` But how much did our bin set really improve? We can look at the `BIN_REASSEMBLY/reassembly_results.png` plot to compare the original and reassembled sets. We can see that the bin reassembly modestly improved the bin N50 and completion metrics, and signifficantly reduced contamination. Fantastic! ![heatmap](https://i.imgur.com/V8IosYQ.jpg) We can also view the CheckM plot of the final bins in `BIN_REASSEMBLY/reassembled_bins.png`: ![heatmap](https://i.imgur.com/Yx00fuQ.png) ## Step 9: Determine the taxonomy of each bin with the Classify_bins module Note: you will need the NCBI_nt and NCBI_tax databases for this module (see Database Installation section). We already got an idea for the approximate taxonomy of each bin from CheckM in the `.stats` files in the Bin_refinment and Reassemble_bins modules. We can do better than that, however. The Classify_bins module uses Taxator-tk to accurately assign taxonomy to each contig, and then consolidates the results to estimate the taxonomy of the whole bin. Of course the success and accuracy of our predictions will rely heavily on the existing database. Estimate the taxonomy of our final, reassembled bins with the Classify_bins module: ``` metawrap classify_bins -b BIN_REASSEMBLY/reassembled_bins -o BIN_CLASSIFICATION -t 48 ``` We can view the final estimated taxonomy in `BIN_CLASSIFICATION/bin_taxonomy.tab`: ``` bin.1.orig.fa Bacteria;Firmicutes;Clostridia;Clostridiales bin.5.orig.fa Archaea;Euryarchaeota;Methanobacteria;Methanobacteriales;Methanobacteriaceae;Methanobrevibacter;Methanobrevibacter smithii bin.11.orig.fa Bacteria;Bacteroidetes;Bacteroidia;Bacteroidales;Bacteroidaceae;Bacteroides bin.2.permissive.fa uncultured organism bin.10.orig.fa Bacteria;Firmicutes;Clostridia;Clostridiales;Clostridiaceae bin.14.strict.fa Bacteria;Firmicutes bin.8.strict.fa Bacteria bin.9.permissive.fa Bacteria bin.6.permissive.fa Bacteria;Firmicutes;Clostridia;Clostridiales bin.3.permissive.fa Bacteria;Firmicutes;Clostridia;Clostridiales;Clostridiaceae bin.4.permissive.fa Bacteria bin.13.permissive.fa Bacteria;Bacteroidetes;Bacteroidia;Bacteroidales;Bacteroidaceae bin.7.strict.fa Bacteria ``` As you can see some of the bins are annotated very deeply, while others can only be classified as "Bacteria". This method is relatively trustworthy, however it often fails to annotate organisms that are very distant from anything in the NCBI database. For these tricky bins, manually looking at marker genes (such as ribosomal proteins) can result in much more sensitive taxonomy assignment. ## Step 10: Functionally annotate bins with the Annotate_bins module Now that we have our finalized reassembled bins, we are ready to functionally annotate them for downstream functional analysis. This module simply annotates the genes with PROKKA - it cannot do the actual functional analysis for you. Run the functional annotation module on the final, reassembled bins: ``` metaWRAP annotate_bins -o FUNCT_ANNOT -t 96 -b BIN_REASSEMBLY/reassembled_bins/ ``` You will find the functional annotations of each bin in GFF format in the `FUNCT_ANNOT/bin_funct_annotations` folder ``` head FUNCT_ANNOT/bin_funct_annotations/bin.1.orig.gff NODE_75_length_31799_cov_0.983871 Prodigal:2.6 CDS 2866 3645 . - 0 ID=HMOHEJHL_00001;inference=ab initio prediction:Prodigal:2.6;locus_tag=HMOHEJHL_00001;product=hypothetical protein NODE_75_length_31799_cov_0.983871 Prodigal:2.6 CDS 3642 4478 . - 0 ID=HMOHEJHL_00002;inference=ab initio prediction:Prodigal:2.6;locus_tag=HMOHEJHL_00002;product=hypothetical protein NODE_75_length_31799_cov_0.983871 Prodigal:2.6 CDS 4606 5859 . - 0 ID=HMOHEJHL_00003;inference=ab initio prediction:Prodigal:2.6;locus_tag=HMOHEJHL_00003;product=hypothetical protein NODE_75_length_31799_cov_0.983871 Prodigal:2.6 CDS 5856 6575 . - 0 ID=HMOHEJHL_00004;Name=ypdB;gene=ypdB;inference=ab initio prediction:Prodigal:2.6,similar to AA sequence:UniProtKB:P0AE39;locus_tag=HMOHEJHL_00004;product=Transcriptional regulatory protein YpdB NODE_75_length_31799_cov_0.983871 Prodigal:2.6 CDS 6603 7658 . - 0 ID=HMOHEJHL_00005;eC_number=1.1.1.261;Name=egsA;gene=egsA;inference=ab initio prediction:Prodigal:2.6,similar to AA sequence:UniProtKB:P94527;locus_tag=HMOHEJHL_00005;product=Glycerol-1-phosphate dehydrogenase [NAD(P)+] NODE_75_length_31799_cov_0.983871 Prodigal:2.6 CDS 7843 11946 . - 0 ID=HMOHEJHL_00006;eC_number=3.2.1.78;inference=ab initio prediction:Prodigal:2.6,similar to AA sequence:UniProtKB:A1A278;locus_tag=HMOHEJHL_00006;product=Mannan endo-1%2C4-beta-mannosidase NODE_75_length_31799_cov_0.983871 Prodigal:2.6 CDS 12597 13586 . - 0 ID=HMOHEJHL_00007;eC_number=3.2.1.89;Name=ganB_1;gene=ganB_1;inference=ab initio prediction:Prodigal:2.6,similar to AA sequence:UniProtKB:Q65CX5;locus_tag=HMOHEJHL_00007;product=Arabinogalactan endo-beta-1%2C4-galactanase NODE_75_length_31799_cov_0.983871 Prodigal:2.6 CDS 13719 14564 . - 0 ID=HMOHEJHL_00008;Name=malG_1;gene=malG_1;inference=ab initio prediction:Prodigal:2.6,similar to AA sequence:UniProtKB:P68183;locus_tag=HMOHEJHL_00008;product=Maltose transport system permease protein MalG NODE_75_length_31799_cov_0.983871 Prodigal:2.6 CDS 14565 15476 . - 0 ID=HMOHEJHL_00009;Name=malF_1;gene=malF_1;inference=ab initio prediction:Prodigal:2.6,similar to AA sequence:UniProtKB:P02916;locus_tag=HMOHEJHL_00009;product=Maltose transport system permease protein MalF NODE_75_length_31799_cov_0.983871 Prodigal:2.6 CDS 15484 16704 . - 0 ID=HMOHEJHL_00010;Name=malE;gene=malE;inference=ab initio prediction:Prodigal:2.6,similar to AA sequence:UniProtKB:P0AEY0;locus_tag=HMOHEJHL_00010;product=Maltose-binding periplasmic protein ``` You will also find the translated and unstranslated predicted genes in fasta format in `FUNCT_ANNOT/bin_translated_genes` and `FUNCT_ANNOT/bin_untranslated_genes` folders. Finally, you can find the raw PROKKA output files in `FUNCT_ANNOT/prokka_out`.

    Import from clipboard

    Paste your markdown or webpage here...

    Advanced permission required

    Your current role can only read. Ask the system administrator to acquire write and comment permission.

    This team is disabled

    Sorry, this team is disabled. You can't edit this note.

    This note is locked

    Sorry, only owner can edit this note.

    Reach the limit

    Sorry, you've reached the max length this note can be.
    Please reduce the content or divide it to more notes, thank you!

    Import from Gist

    Import from Snippet

    or

    Export to Snippet

    Are you sure?

    Do you really want to delete this note?
    All users will lose their connection.

    Create a note from template

    Create a note from template

    Oops...
    This template has been removed or transferred.
    Upgrade
    All
    • All
    • Team
    No template.

    Create a template

    Upgrade

    Delete template

    Do you really want to delete this template?
    Turn this template into a regular note and keep its content, versions, and comments.

    This page need refresh

    You have an incompatible client version.
    Refresh to update.
    New version available!
    See releases notes here
    Refresh to enjoy new features.
    Your user state has changed.
    Refresh to load new user state.

    Sign in

    Forgot password

    or

    By clicking below, you agree to our terms of service.

    Sign in via Facebook Sign in via Twitter Sign in via GitHub Sign in via Dropbox Sign in with Wallet
    Wallet ( )
    Connect another wallet

    New to HackMD? Sign up

    Help

    • English
    • 中文
    • Français
    • Deutsch
    • 日本語
    • Español
    • Català
    • Ελληνικά
    • Português
    • italiano
    • Türkçe
    • Русский
    • Nederlands
    • hrvatski jezik
    • język polski
    • Українська
    • हिन्दी
    • svenska
    • Esperanto
    • dansk

    Documents

    Help & Tutorial

    How to use Book mode

    Slide Example

    API Docs

    Edit in VSCode

    Install browser extension

    Contacts

    Feedback

    Discord

    Send us email

    Resources

    Releases

    Pricing

    Blog

    Policy

    Terms

    Privacy

    Cheatsheet

    Syntax Example Reference
    # Header Header 基本排版
    - Unordered List
    • Unordered List
    1. Ordered List
    1. Ordered List
    - [ ] Todo List
    • Todo List
    > Blockquote
    Blockquote
    **Bold font** Bold font
    *Italics font* Italics font
    ~~Strikethrough~~ Strikethrough
    19^th^ 19th
    H~2~O H2O
    ++Inserted text++ Inserted text
    ==Marked text== Marked text
    [link text](https:// "title") Link
    ![image alt](https:// "title") Image
    `Code` Code 在筆記中貼入程式碼
    ```javascript
    var i = 0;
    ```
    var i = 0;
    :smile: :smile: Emoji list
    {%youtube youtube_id %} Externals
    $L^aT_eX$ LaTeX
    :::info
    This is a alert area.
    :::

    This is a alert area.

    Versions and GitHub Sync
    Get Full History Access

    • Edit version name
    • Delete

    revision author avatar     named on  

    More Less

    Note content is identical to the latest version.
    Compare
      Choose a version
      No search result
      Version not found
    Sign in to link this note to GitHub
    Learn more
    This note is not linked with GitHub
     

    Feedback

    Submission failed, please try again

    Thanks for your support.

    On a scale of 0-10, how likely is it that you would recommend HackMD to your friends, family or business associates?

    Please give us some advice and help us improve HackMD.

     

    Thanks for your feedback

    Remove version name

    Do you want to remove this version name and description?

    Transfer ownership

    Transfer to
      Warning: is a public team. If you transfer note to this team, everyone on the web can find and read this note.

        Link with GitHub

        Please authorize HackMD on GitHub
        • Please sign in to GitHub and install the HackMD app on your GitHub repo.
        • HackMD links with GitHub through a GitHub App. You can choose which repo to install our App.
        Learn more  Sign in to GitHub

        Push the note to GitHub Push to GitHub Pull a file from GitHub

          Authorize again
         

        Choose which file to push to

        Select repo
        Refresh Authorize more repos
        Select branch
        Select file
        Select branch
        Choose version(s) to push
        • Save a new version and push
        • Choose from existing versions
        Include title and tags
        Available push count

        Pull from GitHub

         
        File from GitHub
        File from HackMD

        GitHub Link Settings

        File linked

        Linked by
        File path
        Last synced branch
        Available push count

        Danger Zone

        Unlink
        You will no longer receive notification when GitHub file changes after unlink.

        Syncing

        Push failed

        Push successfully