# BI278 lab 1 – Unix, genome files, and shell scripts
1.1 Connect to bi Unix environment
ssh cgsnow25@bi278
exit
1.2 Navigate and find colby home
when you log in, letters should say
[cgsnow25@vcacbi278 ~]$
'~' changes as you move around directory structure
pwd
Print working directory, pwd brings to /home/cgsnow25 - so cd ~, pwd brings you home
ls
ls ~
ls /home/students/d/cgsnow25
ls .
This command asks for a list of all of the files and directories (folders)
ls -lh
This command asks for more detailed list of directory contents
ls colbyhome
ls /personal/cgsnow25
This is a link to your personal irectory on Colby's fileserver Filer
You can mount a folder to your desktop by going to Mac desktop and typing [command K] and enter smb://filer.colby.edu and use your Mac's finger to navigate the folder
ls /courses/bi278
Allows you to access the Courses filesaver
cd
'Change directory to'
cd colbyhome
cd /personal/cgsnow25
Will take you to colbyhome, you can type in the "ls" command to find files
Once you're in colbyhome, it won't be a visible destination from your current location, but you can navigate the directory structure using cd
cd ..
pwd
Will allow you to move up a directory or go to a parent directory. '.' is the current directory, so
cd .
pwd
will put you in the same place
1.3 Organize your files
*Don't use spaces in directory or file names. Example: My Documents would be typed in Unix like
ls My\ Documents
You should instead use an underscore or period to type
my_documents
my.documents
myDocuments
Example: make two directories adn copy over entire contents of the lab folder/directory
mkdir lab_01a lab_01b
cp /courses/bi278/Course_Materials/lab_01a/* ./lab_01a
Go to the directory
cd lab_01a
ls
Organize directory
mkdir name #Make a directory named whatever you list after this command
rmdir name #To remove specific directory
rm -r #if directory is not empty
cp a b #Copy file a to b, changes location and file name
mv a b #move a file from a to b, both must be specified
rm filename #rm (remove) whatever you list after it, irreversable
cat filename #concatenate, or print to screen the entire contents of a file
less filename #display contents of a file one screenlength at a time
head filename #print to screen the top 10 lines of a file
tail filename #print to screen the ottom 10 lines of a file
man comand #manual for most Unix commands, shows how the command works
command --help #access usage information
Useful shortcuts:
. (current location)
. . (one directory above)
~ (home)
*(a wildcard that will match any string of characters)
You can find most recent commands by using up arrow key, you can see recent commands by typing in 'history'
Autocomplete saves time, if you start typing in a command or filename and click tab, the Unix shell will complete the name for you.
Example: moving all eps files into a folder
ls *eps #finds all eps files
mkdir eps_files #creates new directory
mv *eps eps_files #moves all files containing eps into the new directory
2. Collect basic genome statistics for multiple genomes
Don't copy over files from folder, refer to them by specifying where the file is (its PATH)
grep pattern filename #find a specific pattern within a file, use ('') to contain the pattern
wc filename #count the words in a file; can be used to count lines (-1) or characters (-c)
tr #translate or delete sets of characters
In Unix, '>' sends the results of the command that preceded it into a file that you specify after it so use qua=otes if you want to use '>' as a pattern
grep ">" /courses/bi278/Course_Materials/lab_01b/filename
Run command for a given geome to see what kinds of sequences are in each file (example: to see all GCF files, you would type in GCF* for filename)
GCF file will tell you how many chromosomes, preceded by the header (the line that starts with >), will tell you which organism and the strain
Other files have the organism and strain identity in file name but not the header
| Organism | Strain | Contig count | Genome size (bp) | GC % |
| -------------- | ------- | ------------ | ---------------- | ---- |
| P. agricolaris | baqs159 | 2 | 8721420 | 5375334 |
| P. bonniea | bbqs859 | 2 | 4098182 | 2406657 |
| P. bonniea | bbqs395 | 2 | 9058983 | 5593928 |
| P. bonniea | bbqs433 | 2 | 7829542 | 4948909 |
| P. fungorum | ATCC BAA-463 | 4 | 9058983 | 5593928 |
| P. hayleyella | bbqs155 | 2 | 10062489 | 6230535 |
| P. hayleyella | bhqs171 | 35 | 4088457 | 2421259 |
| P. hayleyella | bhqs21 | 35 | 4088512 | 2421281 |
| P. hayleyella | bhqs22 | 45 | 4084312 | 2418627 |
| P. hayleyella | bhqs23 | 36 | 4090401 | 2422298 |
| P. hayleyella | bhqs530 | 2 | 4118722 | 2439957 |
| P. hayleyella | bhqs69 | 2 | 4125852 | 2444184 |
| P. Terrae | DSM17804 | 4 | 10062489 | 6230535 |
| P. xenovorans | LB400 | 3 | 9702951 | 6077288 |
| P. sprentiae | WSM505 | 5 | 7829542 | 4948909 |
| P. hayleyella | bhqs11 | 2 | 4125700 | 2444079 |
What is the size of your genome (how many total bases?)
grep -v ">" PATH/test.fa | tr -d -d ATGCatgc | wc -c
What is your GC%?
awk 'BEGIN {print (253/400)}'
Commands that help:
grep "v" PATH/test.fa #test
grep -v ">" PATH/test.fa #genetic code
grep -v ">" PATH/test.fa | tr -d -c GCgc #only Gs and Cs
grep -v ">" PATH/test.fa | tr -d -c GCgc | wc -c #G and C count": 258
awk 'BEGIN {print (253/400)}' #GC%, 0.6325
Write and run a unix script to automate your collection of genome statistics
To bundle commands together
To open up editor within unix shell type
nano
You can use control X to exit back to the normal terminator
#!/bin/bash
Then type commands
grep -v ">" /courses/bi278/Course_Materials/lab_01b/test.fa |tr -d -c ATGCatgc | wc -c
grep -v ">" /courses/bi278/Course_Materials/lab_01b/test.fa |tr -d -c GCgc | wc -c
#awk 'BEGIN {print (253/400)}'
Type in filename (GC%_genomesize.sh)
To save, you want to save it to folder lab_01b
~/lab_01b/
To excecute:
sh GC%_genomesize.sh #only if you're in the directory with ~/lab_01b
Change filename into a variable that can be designated at excecution
nano GC%_genomesize.sh #to go back and edit
Change filename in each line (test.fa) to $1 and exit while saving
grep -v ">" /courses/bi278/Course_Materials/lab_01b/$1 |tr -d -c ATGCatgc | wc -c
grep -v ">" /courses/bi278/Course_Materials/lab_01b/$1 |tr -d -c GCgc | wc -c
#awk 'BEGIN {print (253/400)}'
To excecute and get the numbers to calculate %GC, type as prompt:
sh GC%_genomesize.sh fasta_filename
When done working in bi278 type
exit