# BI278 Lab 1 Kirsten Pastore 1.1 Connect to the BI278 Unix Environment connect to the course environment ``` #secure shell (ssh) allow us to connect to remote computing environment ssh klpast23@bi278 ``` 1.2 Navigate and find your Colby home working directory is denoted as: [klpast23@vcacbi278 ~]$ begin navigating directories: ``` #pwd (print working directory) pwd #returns /home/students/k/klpast23 #ls (list of all subdirectories in cwd) ls #returns colby.cshrc colby.login colby.logout colby.profile # -1h flag which asks for a more detailed list of directory contents ls -lh #returns file permissions, file size, and date created ``` navigate to the courses fileserver ``` cd /courses/bi278/klpast23 ``` note: ~ = /home/colbyid 1.3 Organize your files *note: don't use spaces in file / directory names ``` #navigate to bi278 home directory cd ~ #make two new directories mkdir lab_01a lab_01b #copy over the entire contents of the first lab folder cp /courses/bi278/Course_Materials/lab_01a/* ./lab01a ``` commands: ``` #make a directory named name mkdir name #remove a spec. directory (only works if it is empty, must use rm -r if not empty) rmdir name #copy file a to b; an be used to change location and filename cp a b #move a filed from a to b, used for renaming files mv a b #remove a file - cannot be reversed rm filename #concatenate, or print to screen the entire contents of a file as long as it is a text file cat filename #display the contents of a file one screen length at a time, exit by typing q less filename #print to screen the top 10 lines of a file head filename #print to screen the bottom 10 lines of a file tail filename #manual for mst Unix commands man command ``` Organizing / Moving Files ``` #made a directory for figures mkdir figures #move all files starting with fig to figure mv fig1_all_variation.eps figures #did this for files starting with fig(s) #made a directory for raw preference mkdir raw_preference #to move all raw_preference files to raw_preferece folder mv raw_preference_* raw_preference mv *pref* raw_preference #made a directory for achybrid files mkdir achybrid #move all achybrid files to achybrid folder mv *achybrid* achybrid ``` Exercise 2. Collect Basic Genome Statistics for Multiple Genomes 2.1 Get Basic Information About Genomes additional commands ``` #finds a specific pattern within a file grep pattern filename #counts words in a file, can be used to counts lines -1 or characters -c in a file wc filename #translate or delete sets of chars tr # > sends the results of the command that proceded it to a file that you specify after it some command > filename ``` 1. Navigate to lab_01b and find what kind of sequences are listed in each file ``` #performed for each file in the folder lab_01b grep ">" /courses/bi278/Course_Materials/lab_01b/GCF_00756045.1_ASM75604v1_genomic.fna #counted headers to find contigs ``` 2. fill out table | Organism | Strain | Contig Count | Genome Size | GC% | | ------------- | ------------ | ------------ | --- | --- | | P.agricolaris | baqs159 | 2 | 8721420 | 61.6% | | P. bonniea | bbqs859 | 2 | 4098182 | 58.7% | | P. bonniea | bbqs395 | 2 | 4009285 | 58.8% | | P. bonniea | bbqs433 | 2 | 4013203 | 58.8% | | P. fungorum | ATCC BAA-463 | 4 | 9058983 | 61.8% | | P. hayleyella | bhqs11 | 2 | 4125700 | 59.2% | | P. hayleyella | bhqs155 | 2 | 4118676 | 59.2% | | P. hayleyella | bhqs21 | 35 | 4088512 | 59.2% | | P. hayleyella | bhqs22 | 45 | 4084312 | 59.2% | | P. hayleyella | bhqs23 | 36 |4090401 | 59.2% | | P. hayleyella | bhqs171 | 35 | 4088457 | 59.2% | | P. hayleyella | bhqs530 | 2 | 4118722 | 59.2% | | P. hayleyella | bhqs69 | 2 | 4125852 | 59.2% | | P. sprentiae | WSM5005 | 5 | 7829542 | 63.2% | | P. terrae | DSM 17804 | 4 | 10062489 | 61.9% | | P. xenovorans | LB400 | 3 | 9702951 | 62.6% | 3. answer questions for test.fa ``` wc test.fa #returns 411 because it counts >test chars and end of line chars ``` What is your GC% ``` grep ">" test.fa #returns >test grep -v ">" test.fa #returns bases (all text that doesnt include the ">" as a line ) grep -v ">" test.fa | tr -d -c GCgc #deletes or truncates sets of characters, -d deletes chars in SET1, -c finds the complement grep -v ">" test.fa | tr -d -c GCgc | wc -c returns 253 awk 'BEGIN {print(453/400) }' returns 0.6325 ``` 2.2 Write and Run a UNIX Script to Automate Your Collection of Genome Stats ``` # open nano nano #tell Unix that this is a bash shell script #!/bin/bash ``` write a script that finds the total number of BP and the total number of GC ``` #this tells me which strain the file refers too grep ">" $1 #tells me how many total base pairs there are grep -v ">" $1 | tr -d -c GCAT | wc -c #tells me how many GCs there are grep -v ">" $1 | tr -d -c GCgc | wc -c ``` named the file BPCounter.sh run the file with the fasta filename: ``` sh BPCounter.sh fasta_filename ``` Ran the script for each genome to complete the table above