bi278 fall2022 lab practical

# bi278 fall2022 lab practical First, make a copy of the markdown version of this page. Then fill in the commands to execute each step with a: ``` code block ``` Answer any questions and fill any tables. *Reminder that `ls` and autocomplete will be your best friends.* 0. SSH onto `bi278`. 1. Make a new directory called 'practical'. ``` mkdir practical ``` 2. Now go into this directory. ``` cd practical ``` 3. Copy only the fasta files (`*.fasta` and `*.fna`) from `/courses/bi278/Course_Materials/practical` to your current location. ``` cp /courses/bi278/Course_Materials/new_practical/*.fna practical cp /courses/bi278/Course_Materials/new_practical/*.fasta practical ``` 4. Find out which organisms the two genome files belong to. ``` grep ">" /practical GCF_020419785.1_ASM2041978v1_genomic.fna grep ">" /practical GCF_003019965.1_ASM301996v1_genomic.fna ``` .fna files contains Burkholderia cepacia and Burkholderia multivorans 5. These two organisms are close relatives, often found in the lungs of cystic fibrosis patients. Given this fact and based on what is contained in these genome files, what would you determine is the status of each genome? Choose between the options: draft or finished. Explain why. Burkhoria multivorans' genome is finished because its status of the genome is stated as a complete sequence with chromosome 1, 2, and a plasmid. On the other hand, Burkholderia cepacia has many shotgun sequences qith various length as short as 500, which shows that it is drafted. 6. Find the genome size and GC% for the genome files. ``` grep -v ">" /courses/bi278/Course_Materials/lab_01b/test.fa | tr -d -c GCgc | wc -c grep -v ">" /courses/bi278/Course_Materials/lab_01b/test.fa | tr -d -c GCgcATat | wc -c grep -v ">" /practical GCF_020419785.1_ASM2041978v1_genomic.fna | tr -d -c GCgcATat | wc -c grep -v ">" /practical GCF_020419785.1_ASM2041978v1_genomic.fna | tr -d -c GCgc | wc -c ``` Burkhoria multivorans 6797081 genome size 67.2% GC% Burkholderia cepacia 8809768 genome size 66.9% GC% 7. What is the appropriate command to download the raw sequencing reads from this sample? **(but don't run it)** https://www.ncbi.nlm.nih.gov/sra/SRX1304848[accn] ``` fastq-dump --split-3 --skip-technical --readids --read-filter pass --dumpbase --clip -v --fasta default --outdir ~/SRR2558789 ``` 8. SRA reads have already been downloaded for you. How many reads are included in each `*SRR*.fasta` file? ``` 540248973 for bceno_SRR2558789 and 258996753 for bmulti SRR8885150 ``` 9. `jellyfish count` has already been run for you on both SRA files and left in the remote "practical" directory above. Recreate at least one of the commands that was used to do this task **(but don't run it)**. Make sure the input and output file names correspond to the files in the remote directory. ``` jellyfish count -t 2 -C -s 1G -m 29 -o bceno.m29.count bceno_SRR2558789.fasta ``` 10. Run `jellyfish histo` on both of the `*.count` files still in the remote directory, without copying them to your current directory. ``` jellyfish histo -o bceno.m29.histo /courses/bi278/Course_Materials/new_practical/bceno.m29.count jellyfish histo -o bmulti.m29.histo /courses/bi278/Course_Materials/new_practical/bmulti.m29.count ``` 11. Import the resulting `*.histo` files into R and estimate each genome size based on their kmer curves. No need to report R code back but fill in the table. | Organism | Genome size (basepair count from step 6) | Genome size (kmer estimate) | | -------- | -------- | -------- | | Burkholderia cepacia | 8809768 | 9662230 | | Burkholderia multivorans | 6797081 | 6357976 | 12. Exit out of your SSH connection. ``` eixt ```