# BI278 Lab#2 notes
## exercise 1.1
copy the file you found for the genome sequencing by using `cp Downloads/GCF_000931575.1_ASM93157v1_genomic.fna/Volumes/Personal/khirat24`
Move thefile into my own personal server by using the code `mv /personal/khirat24/GCF_000931575.1_ASM93157v1_genomic.fna ~`
Since I didnt have the shell codes in my home, I had to use `mv /home/students/k/khirat24/lab_01b ~` to move my codes to home
use `sh lab_01b/speedcounting.sh GCF_000931575.1_ASM93157v1_genomic.fna` to basecount
## exercise 1.2
after finding a file, use `fastq-dump --split-3 --skip-technical --readids --read-filter pass --dumpbase --clip -v --fasta default --outdir ~/lab_02 SRR12379546` to read the genome file
Use `head -20 lab_02/*.fasta` to take a look at the first 20 lines of the FASTA files
| Organism | SRA instrument record | SRA run number | Genome Size (bp) | Estimated Size |
| ---------- | --------------------- | -------------- | ---------------- | -------------- |
|P.fungorum|Illumina NovaSeq 6000|SRR11022347|9,058,983|9,185,185|
|P. sprentiae|Illumina GA IIx|SRR3927471|7,829,542|d|
|Haemophilus influenzae| |NZ_CP007470.1| | |
| | | | | |
# Exercise 2
## exercise 2.1
Attempted to run `jellyfish count -t 2 -C -s 1G -m 21 -o P.sprentiae.m21.count P.sprentiae.fasta` but could not complete it due to lack of memory, so we looked at the professor's file instead by using `cat /courses/bi278/Course_Materials/lab_02/pfung.m29.histo`
We skipped the rest due to memory lack
## exercise 2.2
Since I only have Professor Noh's histo data, I had to copy the data to my directory home2 by using `cp /courses/bi278/Course_Materials/lab_02/pfung.m29.histo home2` then used`mv home2 pfung.m29.histo` to change the name back from home2 to pfung.m29.histo
Now that I have a histo data in my directory, I moved to R studio.
Used `getwd()` to make sure that I am in home2
Use `plot(pfung, type="l")`
`plot(pfung[5:250,], type="l")`
`plot(pfung[5:250,], type="l")` to plot the data
`pfung[150:180,]` to see the highest point, which I chose 159
I used `sum(name[5:nrow(name),1]*name[5:nrow(name),2])/154` to find the estimated size, which was 889643.