# BI278 Lab 1 Kirsten Pastore
1.1 Connect to the BI278 Unix Environment
connect to the course environment
```
#secure shell (ssh) allow us to connect to remote computing environment
ssh klpast23@bi278
```
1.2 Navigate and find your Colby home
working directory is denoted as:
[klpast23@vcacbi278 ~]$
begin navigating directories:
```
#pwd (print working directory)
pwd
#returns /home/students/k/klpast23
#ls (list of all subdirectories in cwd)
ls
#returns colby.cshrc colby.login colby.logout colby.profile
# -1h flag which asks for a more detailed list of directory contents
ls -lh
#returns file permissions, file size, and date created
```
navigate to the courses fileserver
```
cd /courses/bi278/klpast23
```
note: ~ = /home/colbyid
1.3 Organize your files
*note: don't use spaces in file / directory names
```
#navigate to bi278 home directory
cd ~
#make two new directories
mkdir lab_01a lab_01b
#copy over the entire contents of the first lab folder
cp /courses/bi278/Course_Materials/lab_01a/* ./lab01a
```
commands:
```
#make a directory named name
mkdir name
#remove a spec. directory (only works if it is empty, must use rm -r if not empty)
rmdir name
#copy file a to b; an be used to change location and filename
cp a b
#move a filed from a to b, used for renaming files
mv a b
#remove a file - cannot be reversed
rm filename
#concatenate, or print to screen the entire contents of a file as long as it is a text file
cat filename
#display the contents of a file one screen length at a time, exit by typing q
less filename
#print to screen the top 10 lines of a file
head filename
#print to screen the bottom 10 lines of a file
tail filename
#manual for mst Unix commands
man command
```
Organizing / Moving Files
```
#made a directory for figures
mkdir figures
#move all files starting with fig to figure
mv fig1_all_variation.eps figures
#did this for files starting with fig(s)
#made a directory for raw preference
mkdir raw_preference
#to move all raw_preference files to raw_preferece folder
mv raw_preference_* raw_preference
mv *pref* raw_preference
#made a directory for achybrid files
mkdir achybrid
#move all achybrid files to achybrid folder
mv *achybrid* achybrid
```
Exercise 2. Collect Basic Genome Statistics for Multiple Genomes
2.1 Get Basic Information About Genomes
additional commands
```
#finds a specific pattern within a file
grep pattern filename
#counts words in a file, can be used to counts lines -1 or characters -c in a file
wc filename
#translate or delete sets of chars
tr
# > sends the results of the command that proceded it to a file that you specify after it
some command > filename
```
1. Navigate to lab_01b and find what kind of sequences are listed in each file
```
#performed for each file in the folder lab_01b
grep ">" /courses/bi278/Course_Materials/lab_01b/GCF_00756045.1_ASM75604v1_genomic.fna
#counted headers to find contigs
```
2. fill out table
| Organism | Strain | Contig Count | Genome Size | GC% |
| ------------- | ------------ | ------------ | --- | --- |
| P.agricolaris | baqs159 | 2 | 8721420 | 61.6% |
| P. bonniea | bbqs859 | 2 | 4098182 | 58.7% |
| P. bonniea | bbqs395 | 2 | 4009285 | 58.8% |
| P. bonniea | bbqs433 | 2 | 4013203 | 58.8% |
| P. fungorum | ATCC BAA-463 | 4 | 9058983 | 61.8% |
| P. hayleyella | bhqs11 | 2 | 4125700 | 59.2% |
| P. hayleyella | bhqs155 | 2 | 4118676 | 59.2% |
| P. hayleyella | bhqs21 | 35 | 4088512 | 59.2% |
| P. hayleyella | bhqs22 | 45 | 4084312 | 59.2% |
| P. hayleyella | bhqs23 | 36 |4090401 | 59.2% |
| P. hayleyella | bhqs171 | 35 | 4088457 | 59.2% |
| P. hayleyella | bhqs530 | 2 | 4118722 | 59.2% |
| P. hayleyella | bhqs69 | 2 | 4125852 | 59.2% |
| P. sprentiae | WSM5005 | 5 | 7829542 | 63.2% |
| P. terrae | DSM 17804 | 4 | 10062489 | 61.9% |
| P. xenovorans | LB400 | 3 | 9702951 | 62.6% |
3. answer questions for test.fa
```
wc test.fa
#returns 411 because it counts >test chars and end of line chars
```
What is your GC%
```
grep ">" test.fa
#returns >test
grep -v ">" test.fa
#returns bases (all text that doesnt include the ">" as a line )
grep -v ">" test.fa | tr -d -c GCgc
#deletes or truncates sets of characters, -d deletes chars in SET1, -c finds the complement
grep -v ">" test.fa | tr -d -c GCgc | wc -c
returns 253
awk 'BEGIN {print(453/400) }'
returns 0.6325
```
2.2 Write and Run a UNIX Script to Automate Your Collection of Genome Stats
```
# open nano
nano
#tell Unix that this is a bash shell script
#!/bin/bash
```
write a script that finds the total number of BP and the total number of GC
```
#this tells me which strain the file refers too
grep ">" $1
#tells me how many total base pairs there are
grep -v ">" $1 | tr -d -c GCAT | wc -c
#tells me how many GCs there are
grep -v ">" $1 | tr -d -c GCgc | wc -c
```
named the file BPCounter.sh
run the file with the fasta filename:
```
sh BPCounter.sh fasta_filename
```
Ran the script for each genome to complete the table above