BI278 fall2022 lab practical

# BI278 fall2022 lab practical Answer any questions and fill any tables. *Reminder that `ls` and autocomplete will be your best friends.* 0. SSH onto `bi278`. 1. Make a new directory called 'practical'. ``` (current directory = home2/tywang23) mkdir practical ``` 2. Now go into this directory. ``` cd practical ``` 3. Copy only the fasta files (`*.fasta` and `*.fna`) from `/courses/bi278/Course_Materials/practical` to your current location. ``` cp /courses/bi278/Course_Materials/new_practical/*.fna . cp /courses/bi278/Course_Materials/new_practical/*.fasta . ``` 4. Find out which organisms the two genome files belong to. ``` grep ">" GCF_003019965.1_ASM301996v1_genomic.fna grep ">" GCF_020419785.1_ASM2041978v1_genomic.fna ``` * `GCF_003019965.1_ASM301996v1_genomic.fna` belongs to the *Burkholderia multivorans*, strain FDAARGOS_246 * `CF_020419785.1_ASM2041978v1_genomic.fna` belongs to *Burkholderia cepacia* strain AU41368 5. These two organisms are close relatives, often found in the lungs of cystic fibrosis patients. Given this fact and based on what is contained in these genome files, what would you determine is the status of each genome? Choose between the options: draft or finished. Explain why. * The *Burkholderia multivorans* genome must be a finished genome since in the genome file, it says complete sequence. Usually, draft genome will have gaps and sequences which are left out, so a complete sequence for both chromosome 1 and 2, and the unamed plasmid indicates that this is a finished genome. * The *Burkholderia cepacia* genome file, has several headers labelled as NODEs with varying lenghts and are whole genome shotgun sequences. These seems like contigs or reads that have not yet been fully assembled, so it ight be a draft genome. 6. Find the genome size and GC% for the genome files. ``` grep -v ">" GCF_003019965.1_ASM301996v1_genomic.fna | wc -c GCF_003019965.1_ASM301996v1_genomic.fna | tr -d -c GCgc | wc -c ``` * *Burkholderia multivorans* * genome size = 6401896 * GC (count) = 4251619 * GC content = 66.4% * *Burkholderia cepacia* * genome size = 8297493 * GC (count) = 5487852 * GC content = 66.1% 7. What is the appropriate command to download the raw sequencing reads from this sample? **(but don't run it)** https://www.ncbi.nlm.nih.gov/sra/SRX1304848[accn] Find the run id for the genome first from the website and then use SRA toolkit to download the raw sequencing reads. ``` vdb-config --interactve fastq-dump -X 3 -Z SRR2558789 fastq-dump -X 3 --split-3 --skip-technical --readids --read-filter pass --dumpbase --clip -Z SRR2558789 fastq-dump --split-3 --skip-technical --readids --read-filter pass --dumpbase --clip -v --fasta default --outdir ~/practical SRR2558789 ``` 8. SRA reads have already been downloaded for you. How many reads are included in each `*SRR*.fasta` file? ``` tail bceno_SRR2558789.fasta tail bmulti_SRR8885150.fasta ``` Look at the read (sequence) number of the last read *Burkholderia multivorans* = 1636959 *Burkholderia cepacia* = 2608479 9. `jellyfish count` has already been run for you on both SRA files and left in the remote "practical" directory above. Recreate at least one of the commands that was used to do this task **(but don't run it)**. Make sure the input and output file names correspond to the files in the remote directory. ``` jellyfish count -t 2 -C -s 1G -m 29 -o bceno.m29.count bceno_SRR2558789.fasta ``` 10. Run `jellyfish histo` on both of the `*.count` files still in the remote directory, without copying them to your current directory. ``` jellyfish histo -o bmulti.m29.histo /courses/bi278/Course_Materials/new_practical/bmulti.m29.count jellyfish histo -o bceno.m29.histo /courses/bi278/Course_Materials/new_practical/bceno.m29.count ``` 11. Import the resulting `*.histo` files into R and estimate each genome size based on their kmer curves. No need to report R code back but fill in the table. | Organism | Genome size (basepair count from step 6) | Genome size (kmer estimate) | | -------- | -------- | -------- | | *Burkholderia multivorans*|6401896|6357976 | |*Burkholderia cepacia*|8297493|9662230| 12. Exit out of your SSH connection. ``` exit ```