# BI278 lab2
Exercise 1. Download public genomic data from NCBI
I selected *Taylorella equigenitalis* as my genome and I downloaded from the NCBI website.
Then I use copy the downloaded file from my local directory to my personal directory in BI278.
```
cp Downloads/GCF_002288025.1_ASM228802v1_genomic.fna.gz /Volumes/Personal/xsi25
```
Then I create a new folder in my home directory and move the genome file into the new folder:
```
cd ~
mkdir lab_02
mv /personal/xsi25/GCF_002288025.1_ASM228802v1_genomic.fna.gz ~/lab_02
```
Then I run this file with my shell code from last week.
My shell code:
```
#!/bin/bash
grep -v ">" $1
grep -v ">" $1 | tr -d -c ATatGCgc | wc
grep -v ">" $1 | tr -d -c GCgc | wc
```
My result:
Binary file GCF_002288025.1_ASM228802v1_genomic.fna.gz matches
0 1 11
0 1 6
1.2 Download raw reads via SRA TOOLKIT
First I try to access the NCBI file directly through my terminal by the following codes:
```
vdb-config --interactive
fastq-dump
```
But the external service failed, so I downloaded the FASTA file and move it to my personal directory.
Then I use *cat* command to check the content in the file.
Afterward, when I on the task to download another file, I am able to download the DRR322713 file:
My code:
```
fastq-dump --split-3 --skip-technical --readids --read-filter pass --dumpbase --clip -v --fasta default --outdir ~/lab_02 DRR322713
```
When I check the size of the file:
> -rw-r--r--. 1 xsi25 49681 1.1G Sep 20 15:01 DRR322713_pass.fasta
-rwxrwx---. 1 xsi25 49681 482K Sep 20 13:29 GCF_002288025.1_ASM228802v1_genomic.fna.gz
-rw-r--r--. 1 xsi25 49681 104 Sep 20 13:54 lab02.sh
-rwxrwx---. 1 xsi25 49681 472M Sep 20 14:31 SRR17695649.fasta
Exercise 2. Count k-mers and estimate genome size based on k-mer
frequencies
1. I chose 24 mers
2. And I chose the P. terrae as my organism for the first time:
```
jellyfish count -t 2 -C -s 1G -m 24 -o P.terrae.m24.count DRR322713_pass.fasta
```
Since teh memory storage has been run out, I cannot complete the last step. For the following steps, I use the file from professor (pfung.m29.histo).
```
less pfung.m29.histo #I take a look at what is inside of this file
```
And also because we run out of memory, I cannot replicate these steps with another file.
2.2. VISUALIZE YOUR KMER COUNTS AND ESTIMATE GENOME SIZE IN R
First, I change the directory to the lab_02 and I assign the .histo file to the name pfung.
> setwd("/home2/xsi25/lab_02/")
> getwd()
"/home2/xsi25/lab_02"
> pfung <- read.table("pfung.m29.histo", h=F, sep=" ")
Then I try the following command:
```
> plot(pfung[5:250,], type="l")
[1] 154
```
Then I try to estimate the size of the genome:
```
> sum(pfung[5:nrow(pfung),1]*pfung[5:nrow(pfung),2])/154
[1] 9185185
```
And that matches the one provided with the handout.
Exercise 3 has other issue and it is not meant to be done this time.