# Metagenomics for resistomes, Quality Check
In this hands-on exercise, you will work with a dataset from a pilot study on fecal samples from pigs. Together we will inspect the taxonomy, you will run a workflow to determine AMR in the dataset, and you will visualize the data with basic tools and discuss the results.
The dataset is part of a pilot study to test extraction methods for the material gathered in the **HUNT One Health** project. The HUNT- One Health project is a collaboration between NMBU, the Veterinary Institute and NTNU. It is a sub-project of the population health study in Nord-Trøndelag **HUNT4**. The samples are derived from fecal smears on paper cards that were stored at -80°C until DNA extraction. They were sequenced at an external provider (BGI, Hong Kong) for sequencing on a NovaSeq 6000 platform (Illumina). The output from this platform is short, paired-end reads. More information on HUNT One Health can be found here https://www.nmbu.no/en/projects/hunt.
First, you will import the dataset from a shared history.
**I.** Navigate to (https://usegalaxy.no/)
**II.** We will now import a history to your user account. This data will be used during the course.
First; copy the below given url address to your browser. For this to work, you have to already be logged in at Galaxy NO. So use the same browser as the one you are already logged into Galaxy with.
HUNT data:
https://usegalaxy.no/u/6feb7210142f4721b3766892afde6d40/h/metagenomics
Here is how you import a history:
<iframe src="https://scribehow.com/embed/Usegalaxy_Workflow__Qw4MMz6yR2ijek7kPWY_Sg" width="640" height="640" allowfullscreen frameborder="0"></iframe>
**This task must be done before you continue - ALL WILL START THIS, then take a 15 min break.**
**III.** Rename the imported history in Galaxy to `Metagenomics - course 2023`
Now we want to quality check the reads.
</div>
</html>
## Tools to QC
<html>
<div style="background-color: #F8F8F8">
**ABOUT**
:::spoiler {state="open"} Tools
**FastQC**
Literature: [Andrews](http://www.bioinformatics.babraham.ac.uk/projects/fastqc/), 2010. FastQC: A Quality Control tool for High Throughput Sequence Data. Andrews S. (2010). Available online [here](http://www.bioinformatics.babraham.ac.uk/projects/fastqc).
**MultiQC**
Literature: [Ewels et al.](https://academic.oup.com/bioinformatics/article/32/19/3047/2196507), 2016. MultiQC: summarize analysis results for multiple tools and samples in a single report. This tool aggregates results from bioinformatics analyses across many samples into a single report also viewable on a web-page. More about this tool can be found [here](https://multiqc.info/).
**Fastp**
Literature: [Chen et al.](https://doi.org/10.1093/bioinformatics/bty560), 2018. fastp: an ultra-fast all-in-one FASTQ preprocessor.
Quality control and preprocessing of sequencing data are critical to
obtaining high-quality and high-confidence variants in downstream
data analysis. In the past, multiple tools were employed for FASTQ data quality control and preprocessing, e.g. a combination of FASTQC for quality control, Cutadapt for adapter trimming and Trimmomatic for read pruning and filtering. More about this tool can be found [here](https://github.com/OpenGene/fastp).
:::
</div>
</html>
## Exercise
**IV.** Go to the **Tools** section, and scroll down to **GENOMIC FILE MANIPULATION** -- **FASTQ Quality Control** and click on **FastQC**. Pick your input data and choose batch-mode to include all four fastq files and mark these blue. Leave `Contaminant list`, `Adapter list` and `Submodule and Limit specifing file` blank. Leave the remaining settings on default. Press `Execute`.
**V.** Inspect the results from read `Pig1_R1` by clicking on the <i class="fa fa-eye fa-fw"></i> .
:::warning
**Answer the following questions:**
* How many sequences were obtained for this sample?
* Is this enough for resistance genes determinants? (Remember the introduction from earlier.)
* How would you describe the sequence quality and of what length are the reads? And how do you explain this?
:::
**VI.** If you have many sequences, you can compile the FastQC results for the four fastq files in one report using the MultiQC tool. Find the **MultiQC** software . Choose correct input from which you wish **MultiQC** to aggregate the results (hint: `FastQC on data (1-4): RawData`). Execute.
**VII.** Inspect the report from the **MultiQC** by downloading the webpage (**Webpage**) and opening up in a web-browser. The text reports you can view in usegalaxy.com by clicking **Stats** in history.
:::warning
**Answer the following question:**
* Are you happy with the result?
:::
:::info
::bulb:: Need help looking at MultiQC? See [video](https://www.youtube.com/watch?v=qPbIlO_KWN0).
:::