# LAB 01
bi278 lab #1
Course Unix environment
1.1 Connect to the bi278 Unix environment
```
ssh colbyid@bi278
exit
```
1.2 Navigate and find your colbyhome
Here I first tried to ‘print working directory’:
```
pwd #shows your current position
```
and also check what are the files in this directory:
```
ls # this lists everything in your current working directory
```
then I tried to use flags -lh to ask for
a more detailed list of directory contents:
```
ls -lh
```
Then I learned how to back to home after I cd into personal/xsi25
```
cd ~
```
1.3 Organize your files
First I created two new directories
```
mkdir lab_01a lab_01b
```
And then I copy over the entire contents of this first lab folder (=directory) using a command that says to copy all the
files (*) in this week’s lab folder to lab_01a:
```
cp /courses/bi278/Course_Materials/lab_01a/* ./lab_01a
```
Then I cd into lab_01a and check the files in this directory:
```
cd lab_01a
ls
```
And I tried different files organization methods:
```
#remove this file from this directory
rm raw_preference_f1_control.txt
#print the content of this file in terminal
cat raw_preference_prelim_ca.txt
#to view the content of this file one screen at a time
less raw_preference_prelim_ca.txt
```
I also tried to look up the options of one command:
```
man less #use q to exist
```
2. Collect genome statistics using a Unix script
2.1 Get basic information about genomes
First I cd into the lab material directory and check the files in this directory:
```
cd /courses/bi278/Course_Materials/lab_01b
ls
```
And then I checked these files one by one. For example:
```
grep ">" /courses/bi278/Course_Materials/lab_01b/GCF_000756045.1_ASM75604v1_genomic.fna
```
The result is:
>NZ_CP008760.1 Paraburkholderia xenovorans LB400 chromosome 1, complete sequence
>NZ_CP008762.1 Paraburkholderia xenovorans LB400 chromosome 2, complete sequence
>NZ_CP008761.1 Paraburkholderia xenovorans LB400 chromosome 3, complete sequence
Then I tried these following methods:
```
grep ">" PATH/test.fa
#->test
grep -v ">" PATH/test.fa
#-> ACTGGCTCAGTTACTGGCACTGCGCCCGCATATCCGTAACGAAGTGCGTCGTGTACTGACGAGCGGCGACATAGCCCTCG
TCCTGCTCGACTGGACCCTCAACGGGACACTACCGGACGGCCGGGAGCACGAGGAGCGCGGTACCGCTACACAGGTCATG
GAAAGGGGCCGCGACGGTGGGTGGAAGCTGCGGATCTCCAATCCGTCCGGGCTGAACTGAATGGCATCGAAGGTGCGCCG
CCGTAAACGACCGGCAAGGATGTTCAAGCTGCTCCTGTCCGGCTCGAGCTTGATGCCGGCATCGCCGGGACATCGAACTG
CCCCGCGATGCGCGGCCACTGGCGCCAGGCAATTGCCTTCCCGGAAATCTGGTTGCGTGTGACTGCCCTGGCATATCTGA
grep -v ">" PATH/test.fa | tr -d -c GCgc
#->CGGCCGCGGCCGCGCCCGCCCGCGGGCGCGGCGCGGCGGCGCGCCCCGCCGCCGCGGCCCCCGGGCCCCGGCGGCCGGGGCCGGGGCGCGGCCGCCCGGCGGGGGGCCGCGCGGGGGGGGCGCGGCCCCCGCCGGGCGCGGGCCGGGGCGCCGCCGCGCCGGCGGGCGCGCCCGCCGGCCGGCGGCCGGCCGCCGGGCCGCGCCCCGCGGCGCGGCCCGGCGCCGGCGCCCCCGGCGGGCGGGCGCCCGGCCG
grep -v ">" PATH/test.fa | tr -d -c GCgc | wc -c
#->253
awk 'BEGIN {print (253/400)}'
#->0.6325
```
2.2. WRITE AND RUN A UNIX SCRIPT TO AUTOMATE YOUR COLLECTION OF GENOME STATISTICS
First I type in nano and I got to an editor within the Unix shell and use contorl + x to exist the editor
In nano, I calculate the base pair.
`#!/bin/bash
grep -v ">" /courses/bi278/Course_Materials/lab_01b/test.fa | tr -d -c GCgc | wc -c
grep -v ">" /courses/bi278/Course_Materials/lab_01b/test.fa | tr -d -c ATatGCgc | wc -c`
To check each oragnis one by one, I wrote codes in nano as:
```
grep -v ">" $1
grep -v ">" $1 | tr -d -c ATatGCgc | wc
grep -v ">" $1 | tr -d -c GCgc | wc
```
The form:
| oragnism | strain | contig count | cenome size | gc% |
| ------------- | ------------ | ------------ | ----------- | ---- |
| P.agricolaris | baqs159 | 2 | 8721420 | 0.62(5375334) |
| P. bonniea | bbqs859 | 2 | 4098182 | 0.59(2406657) |
| P. bonniea | bbqs359 | 2 | 4009285 | 0.59(2357500) |
| P. bonniea | bhqs433 | 2 | 4013203 |0.59(2359065)|
| P. fungorum | ATCC BAA-463 | 4 | 9058983 | 0.62(5593928) |
| P.hayleyella | bhqs11 | 2 | 4125700 | 0.59(2444079) |
| P. hayleyella | BHQS155 | 2 | 4118676 | 0.59(2439862) |
| P. hayleyella | BHQS171 | 35 | 4088457 |0.59(2421259)|
| P. hayleyella | bhqs21 | 35 | 4088512 | 0.59(2421281) |
| P. hayleyella | bhqs22 | 45 | 4084312 | 0.59(2418627) |
| P. hayleyella | bhqs23 | 36 | 4090401 | 0.59(2422298) |
| P. hayleyella | bhqs530 | 2 | 4118722 | 0.59(2439957) |
| P. hayleyella | bhqs69 | 2 | 4125852 | 0.59(2444184) |
| P. sprentiae | WSM5005 | 5 | 7829542 | 0.63(4948909) |
| P. terrae | DSM17804 | 4 | 10062489 | 0.62(6230535) |
| P. xenovorans | LB400 | 3 | 9702951 | 0.63(6077288) |