# BI278 Lab 1 #### Olivia Schirle #### 09/13/2022 ### 1. Course Unix Environment #### 1.1 Connect to BI278 Unix Environment ``` ssh oeschi23@bi278 # connects to course environment exit # leaves course environment ``` #### ssh colbyid@bi278 connects you to your course environment. After entering this you will be prompted to enter your Colby password. #### 1.2 Navigate to Colbyhome ``` pwd # print working directory ls # lists all files and directories (folders) in the current working directory ``` #### Variations of `ls`: ``` ls ~ # home directory ls /home/students/o/oeschi23 # home directory ls . # current directory ls -lh # more detailed list of directory contents (file permissions, file size, date created, ...) ``` #### ~ is equivalent to your home directory, /home/students/o/oeschi23. The current directory is tyically denoted as '.', so ls and ls . produce the same results. ``` ls /courses/bi278 # BI278 course directory where all course files are stored ``` #### Moving around in Unix: ``` cd # change directory to cd /personal/oeschi23 # navigate to colbyhome cd .. # move up a directory to parent directory pwd cd . pwd ``` #### 1.3 Organizing Files ``` cd ~ mkdir lab_01a lab_01b # make 2 new directories cp /courses/bi278/Course_Materials/lab_01a/* ./lab_01a # copy entire contents in this week's lab folder to first new folder cd lab_01a mkdir songplots mv fig*_songplots* ./songplots mkdir raw achybrid bayespref mv raw_* ./raw mv achybrid_* ./achybrid mv fig*_bayespref_* ./bayespref mv *achybrid_* ./achybrid ``` ### 2. Collect basic genome statistics for multiple genomes #### 2.2 Get Basic Information About Genomes #### Finding the strain and contig count for each genome. The contig count is the number of headers. ``` ls /courses/bi278/Course_Materials/lab_01b/ grep ">" /courses/bi278/Course_Materials/lab_01b/P.bonniea_bbqs395.nanopore.fasta grep ">" /courses/bi278/Course_Materials/lab_01b/P.bonniea_bbqs433.nanopore.fasta grep ">" /courses/bi278/Course_Materials/lab_01b/P.hayleyella_bhqs155.nanopore.fasta grep ">" /courses/bi278/Course_Materials/lab_01b/P.hayleyella_bhqs171.nanopore.fasta grep ">" /courses/bi278/Course_Materials/lab_01b/P.hayleyella_bhqs21.nanopore.fasta grep ">" /courses/bi278/Course_Materials/lab_01b/P.hayleyella_bhqs22.nanopore.fasta grep ">" /courses/bi278/Course_Materials/lab_01b/P.hayleyella_bhqs23.nanopore.fasta grep ">" /courses/bi278/Course_Materials/lab_01b/P.hayleyella_bhqs530.nanopore.fasta grep ">" /courses/bi278/Course_Materials/lab_01b/P.hayleyella_bhqs69.pacbio.fasta grep ">" /courses/bi278/Course_Materials/lab_01b/GCF_000756045.1_ASM75604v1_genomic.fna grep ">" /courses/bi278/Course_Materials/lab_01b/GCF_000961515.1_ASM96151v1_genomic.fna grep ">" /courses/bi278/Course_Materials/lab_01b/GCF_001865575.1_ASM186557v1_genomic.fna grep ">" /courses/bi278/Course_Materials/lab_01b/GCF_002902925.1_ASM290292v1_genomic.fna grep ">" /courses/bi278/Course_Materials/lab_01b/GCF_009455625.1_ASM945562v1_genomic.fna grep ">" /courses/bi278/Course_Materials/lab_01b/GCF_009455635.1_ASM945563v1_genomic.fna grep ">" /courses/bi278/Course_Materials/lab_01b/GCF_009455685.1_ASM945568v1_genomic.fna ``` | Organism | Strain | Contig count | Genome size (bp) | GC % | | ---------------- | ------------ | ------------ | ---------------- | --- | | *P. agricolaris* | baqs159 | 2 | 8721420 | 0.6163 | | *P. bonniea* | bbqs859 | 2 | 4098182 | 0.5872 | | *P. bonniea* | bbqs395 | 2 | 4009285 | 0.5880 | | *P. bonniea* | bbqs433 | 2 | 4013203 | 0.5878 | | *P. fungorum* | atcc baa-463 | 4 | 9058983 | 0.6175 | | *P. hayleyella* | bhqs11 | 2 | 4125700 | 0.5924 | | *P. hayleyella* | bhqs155 | 2 | 4118676 | 0.5924 | | *P. hayleyella* | bhqs171 | 35 | 4088457 | 0.5922 | | *P. hayleyella* | bhqs21 | 35 | 4088512 | 0.5922 | | *P. hayleyella* | bhqs22 | 45 | 4084312 | 0.5922 | | *P. hayleyella* | bhqs23 | 36 | 4090401 | 242298 | | *P. hayleyella* | bhqs530 | 2 | 4118722 | 0.5924 | | *P. hayleyella* | bhqs69 | 2 | 4125852 | 0.5924 | | *P. sprentiae* | wsm5005 | 5 | 7829542 | 0.6321 | | *P. terrae* | dsm 17804 | 4 | 10062489 | 0.6192 | | *P. xenovorans* | lb400 | 3 | 9702951 | 0.6263 | #### test.fa practice genome questions: #### Variations to `grep`: ``` grep ">" /courses/bi278/Course_materials/lab_01b/test.fa # displays the header of the file (the text next to ">") grep -v ">" /courses/bi278/Course_materials/lab_01b/test.fa # dislays all characters in the file besides the header grep -v ">" /courses/bi278/Course_materials/lab_01b/test.fa | tr -d -c GCgc # displays all characters in the file that are a G, C, g, or c grep -v "">" /courses/bi278/Course_materials/lab_01b/test.fa | tr -d -c GCgc | wc -c # counts the characters that are a G, C, g, or c ``` #### a. What is the size of the genome? ``` grep -v ">" /courses/bi278/Course_Materials/lab_01b/test.fa | tr -d -c ATGCatgc | wc -c # counts all of the bases ``` #### The genome is 400 bp long. #### b. What is the GC%? ``` grep -v ">" /courses/bi278/Course_Materials/lab_01b/test.fa | tr -d -c GCgc | wc -c # counts the total number of G and C in the genome ``` #### Number of G and C is 253. Since there is a total of 400 bases, the GC% is 253/400. ``` awk 'BEGIN {print (253/400)}' # computes and prints result of 253/400 ``` #### GC% is 0.6325 #### Write and Run a Unix Script to Automate the Collection of Genome Statistics ``` nano # opens an editor within Unix shell ``` #### Crtl+X brings you back to the normal terminal prompt. ``` #!/bin/bash grep -v ">" /courses/bi278/Course_Materials/lab_01b/P.bonniea_bbqs433.nanopore.fasta | tr -d -c GCATgcat | wc -c grep -v ">" /courses/bi278/Course_Materials/lab_01b/P.bonniea_bbqs433.nanopore.fasta | tr -d -c GCgc | wc -c ``` #### Execute the script file: ``` sh genome_stats.sh ``` ``` awk 'BEGIN {print (2359065/4013203)}' # computes the GC% based on the total genome size and GC count outputs from the script ``` #### Found that *P.bonniea* bbqs433 has a genome size of 4013203 and a GC% of 0.587826. #### Generalize the script to any file by using `$1` instead of the filename: ``` #!/bin/bash grep -v ">" /courses/bi278/Course_Materials/lab_01b/$1 | tr -d -c GCATgcat | wc -c grep -v ">" /courses/bi278/Course_Materials/lab_01b/$1 | tr -d -c GCgc | wc -c ``` #### Execute the script file ``` sh genome_stats.sh P.bonniea_bbqs433.nanopore.fasta ``` #### Run script for every genome file to complete the table above.