# BI278 Lab 1
#### Olivia Schirle
#### 09/13/2022
### 1. Course Unix Environment
#### 1.1 Connect to BI278 Unix Environment
```
ssh oeschi23@bi278 # connects to course environment
exit # leaves course environment
```
#### ssh colbyid@bi278 connects you to your course environment. After entering this you will be prompted to enter your Colby password.
#### 1.2 Navigate to Colbyhome
```
pwd # print working directory
ls # lists all files and directories (folders) in the current working directory
```
#### Variations of `ls`:
```
ls ~ # home directory
ls /home/students/o/oeschi23 # home directory
ls . # current directory
ls -lh # more detailed list of directory contents (file permissions, file size, date created, ...)
```
#### ~ is equivalent to your home directory, /home/students/o/oeschi23. The current directory is tyically denoted as '.', so ls and ls . produce the same results.
```
ls /courses/bi278 # BI278 course directory where all course files are stored
```
#### Moving around in Unix:
```
cd # change directory to
cd /personal/oeschi23 # navigate to colbyhome
cd .. # move up a directory to parent directory
pwd
cd .
pwd
```
#### 1.3 Organizing Files
```
cd ~
mkdir lab_01a lab_01b # make 2 new directories
cp /courses/bi278/Course_Materials/lab_01a/* ./lab_01a # copy entire contents in this week's lab folder to first new folder
cd lab_01a
mkdir songplots
mv fig*_songplots* ./songplots
mkdir raw achybrid bayespref
mv raw_* ./raw
mv achybrid_* ./achybrid
mv fig*_bayespref_* ./bayespref
mv *achybrid_* ./achybrid
```
### 2. Collect basic genome statistics for multiple genomes
#### 2.2 Get Basic Information About Genomes
#### Finding the strain and contig count for each genome. The contig count is the number of headers.
```
ls /courses/bi278/Course_Materials/lab_01b/
grep ">" /courses/bi278/Course_Materials/lab_01b/P.bonniea_bbqs395.nanopore.fasta
grep ">" /courses/bi278/Course_Materials/lab_01b/P.bonniea_bbqs433.nanopore.fasta
grep ">" /courses/bi278/Course_Materials/lab_01b/P.hayleyella_bhqs155.nanopore.fasta
grep ">" /courses/bi278/Course_Materials/lab_01b/P.hayleyella_bhqs171.nanopore.fasta
grep ">" /courses/bi278/Course_Materials/lab_01b/P.hayleyella_bhqs21.nanopore.fasta
grep ">" /courses/bi278/Course_Materials/lab_01b/P.hayleyella_bhqs22.nanopore.fasta
grep ">" /courses/bi278/Course_Materials/lab_01b/P.hayleyella_bhqs23.nanopore.fasta
grep ">" /courses/bi278/Course_Materials/lab_01b/P.hayleyella_bhqs530.nanopore.fasta
grep ">" /courses/bi278/Course_Materials/lab_01b/P.hayleyella_bhqs69.pacbio.fasta
grep ">" /courses/bi278/Course_Materials/lab_01b/GCF_000756045.1_ASM75604v1_genomic.fna
grep ">" /courses/bi278/Course_Materials/lab_01b/GCF_000961515.1_ASM96151v1_genomic.fna
grep ">" /courses/bi278/Course_Materials/lab_01b/GCF_001865575.1_ASM186557v1_genomic.fna
grep ">" /courses/bi278/Course_Materials/lab_01b/GCF_002902925.1_ASM290292v1_genomic.fna
grep ">" /courses/bi278/Course_Materials/lab_01b/GCF_009455625.1_ASM945562v1_genomic.fna
grep ">" /courses/bi278/Course_Materials/lab_01b/GCF_009455635.1_ASM945563v1_genomic.fna
grep ">" /courses/bi278/Course_Materials/lab_01b/GCF_009455685.1_ASM945568v1_genomic.fna
```
| Organism | Strain | Contig count | Genome size (bp) | GC % |
| ---------------- | ------------ | ------------ | ---------------- | --- |
| *P. agricolaris* | baqs159 | 2 | 8721420 | 0.6163 |
| *P. bonniea* | bbqs859 | 2 | 4098182 | 0.5872 |
| *P. bonniea* | bbqs395 | 2 | 4009285 | 0.5880 |
| *P. bonniea* | bbqs433 | 2 | 4013203 | 0.5878 |
| *P. fungorum* | atcc baa-463 | 4 | 9058983 | 0.6175 |
| *P. hayleyella* | bhqs11 | 2 | 4125700 | 0.5924 |
| *P. hayleyella* | bhqs155 | 2 | 4118676 | 0.5924 |
| *P. hayleyella* | bhqs171 | 35 | 4088457 | 0.5922 |
| *P. hayleyella* | bhqs21 | 35 | 4088512 | 0.5922 |
| *P. hayleyella* | bhqs22 | 45 | 4084312 | 0.5922 |
| *P. hayleyella* | bhqs23 | 36 | 4090401 | 242298 |
| *P. hayleyella* | bhqs530 | 2 | 4118722 | 0.5924 |
| *P. hayleyella* | bhqs69 | 2 | 4125852 | 0.5924 |
| *P. sprentiae* | wsm5005 | 5 | 7829542 | 0.6321 |
| *P. terrae* | dsm 17804 | 4 | 10062489 | 0.6192 |
| *P. xenovorans* | lb400 | 3 | 9702951 | 0.6263 |
#### test.fa practice genome questions:
#### Variations to `grep`:
```
grep ">" /courses/bi278/Course_materials/lab_01b/test.fa # displays the header of the file (the text next to ">")
grep -v ">" /courses/bi278/Course_materials/lab_01b/test.fa # dislays all characters in the file besides the header
grep -v ">" /courses/bi278/Course_materials/lab_01b/test.fa | tr -d -c GCgc # displays all characters in the file that are a G, C, g, or c
grep -v "">" /courses/bi278/Course_materials/lab_01b/test.fa | tr -d -c GCgc | wc -c # counts the characters that are a G, C, g, or c
```
#### a. What is the size of the genome?
```
grep -v ">" /courses/bi278/Course_Materials/lab_01b/test.fa | tr -d -c ATGCatgc | wc -c # counts all of the bases
```
#### The genome is 400 bp long.
#### b. What is the GC%?
```
grep -v ">" /courses/bi278/Course_Materials/lab_01b/test.fa | tr -d -c GCgc | wc -c # counts the total number of G and C in the genome
```
#### Number of G and C is 253. Since there is a total of 400 bases, the GC% is 253/400.
```
awk 'BEGIN {print (253/400)}' # computes and prints result of 253/400
```
#### GC% is 0.6325
#### Write and Run a Unix Script to Automate the Collection of Genome Statistics
```
nano # opens an editor within Unix shell
```
#### Crtl+X brings you back to the normal terminal prompt.
```
#!/bin/bash
grep -v ">" /courses/bi278/Course_Materials/lab_01b/P.bonniea_bbqs433.nanopore.fasta | tr -d -c GCATgcat | wc -c
grep -v ">" /courses/bi278/Course_Materials/lab_01b/P.bonniea_bbqs433.nanopore.fasta | tr -d -c GCgc | wc -c
```
#### Execute the script file:
```
sh genome_stats.sh
```
```
awk 'BEGIN {print (2359065/4013203)}' # computes the GC% based on the total genome size and GC count outputs from the script
```
#### Found that *P.bonniea* bbqs433 has a genome size of 4013203 and a GC% of 0.587826.
#### Generalize the script to any file by using `$1` instead of the filename:
```
#!/bin/bash
grep -v ">" /courses/bi278/Course_Materials/lab_01b/$1 | tr -d -c GCATgcat | wc -c
grep -v ">" /courses/bi278/Course_Materials/lab_01b/$1 | tr -d -c GCgc | wc -c
```
#### Execute the script file
```
sh genome_stats.sh P.bonniea_bbqs433.nanopore.fasta
```
#### Run script for every genome file to complete the table above.