# 2. QC Raw Data Using FastQC ###### tags: `2. Main Steps` `tutorials` `fastQC` > Program required: fastqc ## Setup 1. Once you have retrieved the raw data using `sftp`, you should quality check all the raw `.fastq` files. Here, we use a program called `FastQC`. 2. From this step on, we will utilize a system called **SLURM** to run each job for us. It is already built inside the cluster. All you have to do is to make sure that, instead of running the code *interactively,* you want to make a `.sh` file (stands for shell format) that is formatted to be run on SLURM. 3. The benefits of submitting jobs to SLURM come in many folds. First, you don't have to stay logged-in to your cluster the entire time the system is running your code. It can take up to *DAYS* to finish a single job. SLURM lets you run the job using the internal computer nodes with large memory capacity and high computing power in the background, all the while you are out drinking coffee at Starbucks. 4. The second benefit is that you can *parallelize* your job submissions. We will go over this in the next step (Raw fastq to uBAM and mark adapters). ## The Code 1. Create a `.sh` file by typing `vim fastqc.sh` (See how to use vim instruction [here](https://github.com/Noor-WGS-data/Genome_sequence_data/blob/main/Tutorials/vim_cheat_sheet.md).) 2. Press `i`, to enter the `insert` mode, and copy and paste the code below. 3. Press `esc` key to exit the `insert` mode, and type `:wq` to *write* and *quit* vim. ``` #!/bin/bash #SBATCH -e errs/%a_fastqc-%A.err #SBATCH -o outs/%a_fastqc-%A.out #SBATCH --mem-per-cpu=20G #SBATCH -p scavenger #SBATCH --cpus-per-task=4 #SBATCH --mail-type=ALL --mail-user=rp280@duke.edu # gunzip is the same as gzip -d gunzip -k ../data/2021_09_21_fastq/*.fastq.gz for FILE in ../data/2021_09_21_fastq/*.fastq; do fastqc -o ../data/2021_09_21_fastq/QC_report ${FILE} done ``` The first line shown below tells the system to use *bash* to execute the code in this file. ``` #!/bin/bash ``` Back in the working directory, first make the file executable first by type in: ``` chmod u+w fastqc.sh ``` And then, to execute the shell file, type in: ``` ./fastqc.sh ``` ## A few notes on SLURM The first couple of lines in the code above (with the prefix `#SBATCH`) input myriad arguments tells SLURM to run or perform the tasks in certain ways. For example, `#SBATCH -e errs/%a_fastqc-%A.err` lets SLURM knows that you want any error message be written in a file named by the job name and job ID. `#SBATCH --mail-type=ALL --mail-user=youremail@duke.edu` lets SLURM know to email this email address with any updates (run failures, errors, completions, etc). We won't do a comprehensive SLURM review here, but there is a good overview youtube video that could be useful linked [here](https://www.youtube.com/watch?v=U42qlYkzP9k&ab_channel=BYUSupercomputing).