# BI278 Lab#2 notes ## exercise 1.1 copy the file you found for the genome sequencing by using `cp Downloads/GCF_000931575.1_ASM93157v1_genomic.fna/Volumes/Personal/khirat24` Move thefile into my own personal server by using the code `mv /personal/khirat24/GCF_000931575.1_ASM93157v1_genomic.fna ~` Since I didnt have the shell codes in my home, I had to use `mv /home/students/k/khirat24/lab_01b ~` to move my codes to home use `sh lab_01b/speedcounting.sh GCF_000931575.1_ASM93157v1_genomic.fna` to basecount ## exercise 1.2 after finding a file, use `fastq-dump --split-3 --skip-technical --readids --read-filter pass --dumpbase --clip -v --fasta default --outdir ~/lab_02 SRR12379546` to read the genome file Use `head -20 lab_02/*.fasta` to take a look at the first 20 lines of the FASTA files | Organism | SRA instrument record | SRA run number | Genome Size (bp) | Estimated Size | | ---------- | --------------------- | -------------- | ---------------- | -------------- | |P.fungorum|Illumina NovaSeq 6000|SRR11022347|9,058,983|9,185,185| |P. sprentiae|Illumina GA IIx|SRR3927471|7,829,542|d| |Haemophilus influenzae| |NZ_CP007470.1| | | | | | | | | # Exercise 2 ## exercise 2.1 Attempted to run `jellyfish count -t 2 -C -s 1G -m 21 -o P.sprentiae.m21.count P.sprentiae.fasta` but could not complete it due to lack of memory, so we looked at the professor's file instead by using `cat /courses/bi278/Course_Materials/lab_02/pfung.m29.histo` We skipped the rest due to memory lack ## exercise 2.2 Since I only have Professor Noh's histo data, I had to copy the data to my directory home2 by using `cp /courses/bi278/Course_Materials/lab_02/pfung.m29.histo home2` then used`mv home2 pfung.m29.histo` to change the name back from home2 to pfung.m29.histo Now that I have a histo data in my directory, I moved to R studio. Used `getwd()` to make sure that I am in home2 Use `plot(pfung, type="l")` `plot(pfung[5:250,], type="l")` `plot(pfung[5:250,], type="l")` to plot the data `pfung[150:180,]` to see the highest point, which I chose 159 I used `sum(name[5:nrow(name),1]*name[5:nrow(name),2])/154` to find the estimated size, which was 889643.![](https://i.imgur.com/JUHQSRi.png)