# LAB 01 bi278 lab #1 Course Unix environment 1.1 Connect to the bi278 Unix environment ``` ssh colbyid@bi278 exit ``` 1.2 Navigate and find your colbyhome Here I first tried to ‘print working directory’: ``` pwd #shows your current position ``` and also check what are the files in this directory: ``` ls # this lists everything in your current working directory ``` then I tried to use flags -lh to ask for a more detailed list of directory contents: ``` ls -lh ``` Then I learned how to back to home after I cd into personal/xsi25 ``` cd ~ ``` 1.3 Organize your files First I created two new directories ``` mkdir lab_01a lab_01b ``` And then I copy over the entire contents of this first lab folder (=directory) using a command that says to copy all the files (*) in this week’s lab folder to lab_01a: ``` cp /courses/bi278/Course_Materials/lab_01a/* ./lab_01a ``` Then I cd into lab_01a and check the files in this directory: ``` cd lab_01a ls ``` And I tried different files organization methods: ``` #remove this file from this directory rm raw_preference_f1_control.txt #print the content of this file in terminal cat raw_preference_prelim_ca.txt #to view the content of this file one screen at a time less raw_preference_prelim_ca.txt ``` I also tried to look up the options of one command: ``` man less #use q to exist ``` 2. Collect genome statistics using a Unix script 2.1 Get basic information about genomes First I cd into the lab material directory and check the files in this directory: ``` cd /courses/bi278/Course_Materials/lab_01b ls ``` And then I checked these files one by one. For example: ``` grep ">" /courses/bi278/Course_Materials/lab_01b/GCF_000756045.1_ASM75604v1_genomic.fna ``` The result is: >NZ_CP008760.1 Paraburkholderia xenovorans LB400 chromosome 1, complete sequence >NZ_CP008762.1 Paraburkholderia xenovorans LB400 chromosome 2, complete sequence >NZ_CP008761.1 Paraburkholderia xenovorans LB400 chromosome 3, complete sequence Then I tried these following methods: ``` grep ">" PATH/test.fa #->test grep -v ">" PATH/test.fa #-> ACTGGCTCAGTTACTGGCACTGCGCCCGCATATCCGTAACGAAGTGCGTCGTGTACTGACGAGCGGCGACATAGCCCTCG TCCTGCTCGACTGGACCCTCAACGGGACACTACCGGACGGCCGGGAGCACGAGGAGCGCGGTACCGCTACACAGGTCATG GAAAGGGGCCGCGACGGTGGGTGGAAGCTGCGGATCTCCAATCCGTCCGGGCTGAACTGAATGGCATCGAAGGTGCGCCG CCGTAAACGACCGGCAAGGATGTTCAAGCTGCTCCTGTCCGGCTCGAGCTTGATGCCGGCATCGCCGGGACATCGAACTG CCCCGCGATGCGCGGCCACTGGCGCCAGGCAATTGCCTTCCCGGAAATCTGGTTGCGTGTGACTGCCCTGGCATATCTGA grep -v ">" PATH/test.fa | tr -d -c GCgc #->CGGCCGCGGCCGCGCCCGCCCGCGGGCGCGGCGCGGCGGCGCGCCCCGCCGCCGCGGCCCCCGGGCCCCGGCGGCCGGGGCCGGGGCGCGGCCGCCCGGCGGGGGGCCGCGCGGGGGGGGCGCGGCCCCCGCCGGGCGCGGGCCGGGGCGCCGCCGCGCCGGCGGGCGCGCCCGCCGGCCGGCGGCCGGCCGCCGGGCCGCGCCCCGCGGCGCGGCCCGGCGCCGGCGCCCCCGGCGGGCGGGCGCCCGGCCG grep -v ">" PATH/test.fa | tr -d -c GCgc | wc -c #->253 awk 'BEGIN {print (253/400)}' #->0.6325 ``` 2.2. WRITE AND RUN A UNIX SCRIPT TO AUTOMATE YOUR COLLECTION OF GENOME STATISTICS First I type in nano and I got to an editor within the Unix shell and use contorl + x to exist the editor In nano, I calculate the base pair. `#!/bin/bash grep -v ">" /courses/bi278/Course_Materials/lab_01b/test.fa | tr -d -c GCgc | wc -c grep -v ">" /courses/bi278/Course_Materials/lab_01b/test.fa | tr -d -c ATatGCgc | wc -c` To check each oragnis one by one, I wrote codes in nano as: ``` grep -v ">" $1 grep -v ">" $1 | tr -d -c ATatGCgc | wc grep -v ">" $1 | tr -d -c GCgc | wc ``` The form: | oragnism | strain | contig count | cenome size | gc% | | ------------- | ------------ | ------------ | ----------- | ---- | | P.agricolaris | baqs159 | 2 | 8721420 | 0.62(5375334) | | P. bonniea | bbqs859 | 2 | 4098182 | 0.59(2406657) | | P. bonniea | bbqs359 | 2 | 4009285 | 0.59(2357500) | | P. bonniea | bhqs433 | 2 | 4013203 |0.59(2359065)| | P. fungorum | ATCC BAA-463 | 4 | 9058983 | 0.62(5593928) | | P.hayleyella | bhqs11 | 2 | 4125700 | 0.59(2444079) | | P. hayleyella | BHQS155 | 2 | 4118676 | 0.59(2439862) | | P. hayleyella | BHQS171 | 35 | 4088457 |0.59(2421259)| | P. hayleyella | bhqs21 | 35 | 4088512 | 0.59(2421281) | | P. hayleyella | bhqs22 | 45 | 4084312 | 0.59(2418627) | | P. hayleyella | bhqs23 | 36 | 4090401 | 0.59(2422298) | | P. hayleyella | bhqs530 | 2 | 4118722 | 0.59(2439957) | | P. hayleyella | bhqs69 | 2 | 4125852 | 0.59(2444184) | | P. sprentiae | WSM5005 | 5 | 7829542 | 0.63(4948909) | | P. terrae | DSM17804 | 4 | 10062489 | 0.62(6230535) | | P. xenovorans | LB400 | 3 | 9702951 | 0.63(6077288) |