Try   HackMD

Initial File Processing

Creating the sample FASTQ Files

Set 1

These samples are Zymo MMC gDNA samples, sequenced as part of pilot run for GI project at JPL.

mv SH5203_FT-SA29139_S11_L002_R1_001.fastq.gz sample1_R1.fastq.gz
mv SH5203_FT-SA29139_S11_L002_R2_001.fastq.gz sample1_R2.fastq.gz

mv SH5203_FT-SA29140_S12_L002_R1_001.fastq.gz sample2_R1.fastq.gz
mv SH5203_FT-SA29140_S12_L002_R2_001.fastq.gz sample2_R2.fastq.gz

cp sample1_R1.fastq.gz sample3_R1.fastq.gz
cp sample1_R2.fastq.gz sample3_R2.fastq.gz

mv SH5203_FT-SA29142_S14_L002_R1_001.fastq.gz sample4_R1.fastq.gz
mv SH5203_FT-SA29142_S14_L002_R2_001.fastq.gz sample4_R2.fastq.gz

Archived the sample FASTQ files for transfer to cloud instance

cd /Users/cm/Documents/fastqs
tar -czvf sample1234.tar.gz .

Created new Jetstream instance in project cmicro with s1.xlarge (CPU 44, Mem 120, Disk: 480)

Added Bioconda channel to pre-installed Conda

conda config --add channels bioconda

Create directories for scripts, data and sample fastq files

mkdir scripts data fastqs samples

Transferred sample FASTQ files to the cloud instance

scp -C /Users/cm/Documents/fastqs/sample1234.tar.gz cmicro@149.165.171.66:/home/cmicro/data

Unpacked the FASTQ files

tar -xvf sample1234.tar.gz

Trimmomatic

Trimmomatic is a flexible read trimming tool for Illumina NGS data.

Generic Trimmomatic command:

java -jar trimmomatic-0.39.jar PE inputforward.fq.gz inputreverse.fq.gz outputforwardpaired.fq.gz outputforwardunpaired.fq.gz outputreversepaired.fq.gz outputreverseunpaired.fq.gz ILLUMINACLIP:TruSeq3-PE.fa:2:30:10:2:keepBothReads LEADING:3 TRAILING:3 MINLEN:36

Created a bash script for running trimmomatic on multiple files at the same time:

#!/bin/bash
# arg1: number of threads
# to run: 
# chmod +x trim.sh
# <path>/trim.sh <number of threads>
# Example: ./trim.sh 40

for f in *_R1.fastq.gz # for each sample F

do
    n=${f%%_R1.fastq.gz} # strip part of file name

trimmomatic PE -threads $1 ${n}_R1.fastq.gz  ${n}_R2.fastq.gz \
${n}_R1_trimmed.fastq.gz ${n}_R1_unpaired.fastq.gz ${n}_R2_trimmed.fastq.gz \
${n}_R2_unpaired.fastq.gz ILLUMINACLIP:TruSeq3-PE.fa:2:30:10 \
LEADING:3 TRAILING:3 SLIDINGWINDOW:4:15 MINLEN:36

done

Transfered BASH script to cloud instance

scp -C /Users/cm/JPL_Google_Drive/scripts/trim.sh cmicro@149.165.171.66:/home/cmicro/scripts
conda config --add channels bioconda
conda create -y -n qc trimmomatic fastqc
cp /opt/miniconda3/pkgs/trimmomatic-*/share/trimmomatic-*/adapters/TruSeq3-PE.fa .
conda activate qc
chmod +x trim.sh
/home/cmicro/scripts/trim.sh 40
mkdir trimmed_fastqs
find . -type f -name "*trimmed*" -exec mv '{}' trimmed_fastqs/ \;

Downloading simulated datasets

even-perfect-sim

curl -o even-perfect-sim-R1.fq.gz https://ndownloader.figshare.com/files/24058625
curl -o even-perfect-sim-R2.fq.gz https://ndownloader.figshare.com/files/24058631
curl -o even-perfect-sim-abundances.tsv https://ndownloader.figshare.com/files/24058619

uneven-perfect-sim

curl -o uneven-perfect-sim-R1.fq.gz https://ndownloader.figshare.com/files/24058634
curl -o uneven-perfect-sim-R2.fq.gz https://ndownloader.figshare.com/files/24058637
curl -o uneven-perfect-sim-abundances.tsv https://ndownloader.figshare.com/files/24058628

even-hiseq-sim

curl -o even-hiseq-sim-R1.fq.gz https://ndownloader.figshare.com/files/24058652
curl -o even-hiseq-sim-R2.fq.gz https://ndownloader.figshare.com/files/24058667
curl -o even-hiseq-sim-abundances.tsv https://ndownloader.figshare.com/files/24058658
ls -l | grep ".fq." | awk '{ print $9 }'

even-hiseq-sim-R1.fq.gz
even-hiseq-sim-R2.fq.gz
even-perfect-sim-R1.fq.gz
even-perfect-sim-R2.fq.gz
uneven-hiseq-sim-R1.fq.gz
uneven-hiseq-sim-R2.fq.gz
uneven-perfect-sim-R1.fq.gz
uneven-perfect-sim-R2.fq.gz

Simulated files renaming

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →

Transfer files to Cloud from local computer

Transfer python script to rename files

scp /Users/cm/JPL_Google_Drive/scripts/scripts_2020/rename.py \
cmicro@149.165.171.66:/home/cmicro/scripts

Transfer naming info file

scp /Users/cm/JPL_Google_Drive/hbcu_data_analysis/sim_file_names.txt \ 
cmicro@149.165.171.66:/home/cmicro/data/

On Cloud Instance:

Make a new directory & copy the simulated fastq files to it

mkdir sim_fastqs
find . -type f -name "*.fq.gz" -exec cp {} sim_fastqs/ \;

Running Python script to rename files

python /home/cmicro/scripts/rename.py sim_fastqs sim_file_names.txt \
sim_fastqs_renamed

Moved all sample files from sim_fastqstrimmed_fastqs folder to samples folder:

find . -type f -name "*.fastq.gz" -exec mv {} /home/cmicro/samples \;
ls -lh samples/ | awk '{ print $9 "\t" $5}'

sample1_R1_trimmed.fastq.gz 384M
sample1_R2_trimmed.fastq.gz 378M
sample2_R1_trimmed.fastq.gz 390M
sample2_R2_trimmed.fastq.gz 381M
sample3_R1_trimmed.fastq.gz 384M
sample3_R2_trimmed.fastq.gz 378M
sample4_R1_trimmed.fastq.gz 461M
sample4_R2_trimmed.fastq.gz 450M
sample5_R1_trimmed.fastq.gz 285M
sample5_R2_trimmed.fastq.gz 293M
sample6_R1_trimmed.fastq.gz 188M
sample6_R2_trimmed.fastq.gz 188M
sample7_R1_trimmed.fastq.gz 288M
sample7_R2_trimmed.fastq.gz 295M
sample8_R1_trimmed.fastq.gz 190M
sample8_R2_trimmed.fastq.gz 190M


All sample files are available at:

cmicro@149.165.171.66:/home/cmicro/samples