**LAB/WEEK ONE NOTES**
# 1.1 CONNECTING TO UNIX ENVIRONMENT
username for Unix environment = ssh (secure shell) aplhanumerical colby id (cibene26) @bi278 it should loook like cibene26@bi278, then enter pswrd
# **1.2 FIND YOUR COLBY HOME AND COURSES DIRECTORIES**
-will end up in home directory or folder
When done with work use command: exit
~ =home directory
*To get the full path directory (address within the file system of where you are) use command that stands for print workinf directory:
pwd
typing pwd when at ~ shouldvgive me /home2/cibene26
ls =list all the directory (files) and folders in the current working directory
*Current directory is usually denoted as .
-the dot means here
eg.if i were in the home directory, doing the following commands would give me the same things (listing eveything in home):
ls ~
ls /home/cibene26
ls .
to get more desciptive info of what is in your current working directory(date and time created)..more info, used the flags -lh to ask for a more detailed list (l) of directory contents that include file permissions, file size, and date created, in human-readable (h) form:
ls -lh
ls colbyhome(directory); list eveything in colbyhome directory, which is a symbolic link to my persoanl directory on Colby's fileserver called Filer
^Can access my files as long as i put then in this folder(directory)
^To attach or mount this folder directly to your computer by going to the Mac desktop and typing [command-K]. You can enter the following address and then use your Mac’s Finder to navigate to the same folder:
smb://filer.colby.edu
^***For when you need to move files from your desktop to the server and vice versa***
Other fileserve that Colby maintains is Courses- all material from our course will be available to me via the Courses fileserver and specific course directory in it. ***In which i can copy files out of this directory (cant edit or write on them)***
courses fileserver to make sure i have access: ls /courses/bi278
Given files to work with= /courses/bi278/Course_Materials/
Personal directories to work= /courses/bi278/cibene26 (give me private wheni type command)
# **1.3 NAVIGATING IN AND OUT OF DIRECTORIES**
I can go to any location in fileserve by specifying it's path
cd= change directory, to change destination
if in colbyhome (colbyhome stands for: /personal/cibene26), could change directories into it using :cd /personal/colbyid
cd ..= parent directory
pwd to check address of where you are
cd .= current directory
#### = sharps which are used to comment in code, everything after a sharp is ignored. You need for adding comments to same line as a command
eg:
ls # this lists everything in your current working directory
# **1.4 ORGANIZE SOME FILES**
/courses/bi278/Course_Materials/lab_01a
^directory that contains bunch of files from previous Prof Noh's projects
-figure files-->(*.eps)
###### *Dont use spaces in directory or file names to escape them use a backslash eg: My Documents --> ls My\Documents
^use underscore or period such as my_documents, myDocuments, etc.
mkdir= command for making new directories
rmdir= remove directories
#### cp (copy) *(all info in file)
history= list of every command us used or can use up arrow
autocomplete command or file press Tab
##### mkdir lab_01a lab_01b cp /courses/bi278/Course_Materials/lab_01a/* ./lab_01a
^-this takes the Path of material you are copying to the destination that you want to copy to. So this is taking all the contents of lab_01a folder from the CourseMaterials( files given) and copying it to my lab_01a directory which i just created using mkdir: (mkdir lab_01a)
#### SOME UNIX COMMNADS
#### Some Unix commands (and quick descriptions):
**mkdir** name (make a directory named whatever you list after this command)
**rmdir** name (remove a specific directory, only works if it’s empty; must use rm –r if the directory is not empty)
**cp a b** (copy file a to b; can be used to not only change location but also filename at same time)
**mv a b** (move a file from a to b; both a and b must be specified; also used for renaming files)
**rm filename** (rm whatever you list after it; cannot be reversed)
**cat filename** (concatenate, or print to screen the entire contents of a file as long as it is a text file)
less filename (display the contents of a file one screen length at a time; use arrows to scroll up or down one line at a time; **exit by typing q)
head filename** (print to screen the top 10 lines of a file)
**tail filename** (print to screen the bottom 10 lines of a file)
**man command** (manual for most Unix commands)
**SHORTCUTS**
. (current location)
..(one directory above) its two dots not three
~ (home, this will take you to wherever folder you started when you connected to bi278)
*(a wildcard that will match any string of characters (A-z; 0-9; etc)
press up arrow key to find most recent command. Could also type history to see every recent command used
if typing a command or filename, click on tab key, and unix will complete the name for you:
eg. if i type: ls /courses/bi278/Course_Materials/lab_01b/GCF and press tab, i would have to press tab again to show me all matching files, because there are many files that start with GCF
**2. BEGIN TO WORK WITH GENOME FILES**
To find genomes in text files, i can find the ones we will mainly be working with and a smaller FASTA (standard sequence file format) file (test.fa) in:
/courses/bi278/Course_Materials/lab_01b/
^To go here, I started at home (~), the i cd to /courses/bi278/Course_Materials, the i ls lab_01b^^ quicker wya but i did it this way to gain fluency
# 2.1 GET BASIC INFORMATION ABOUT GENOMES
GENOME FILES ARE USUALLY IN FASTA FORM(thsi is a text file that sues a specific format to describe DNA (or protein) sequences. Individual chromosomes/ contigs or genes are labeled in a FASTA file by this symbol: >
2Complete Table
P.bonniea strain: bbqs395
P.bonniea strain:bbqs433
P.fungorum strain:BAA-463
using grep ">" filename helped me to obtain the info within the file
to not repeat inputing the PATH i did cd (when i was in home directory) PATH
ADDITIONAL UNIX COMMANDS (and quick description):
Comment
**grep pattern filename** (find a specific pattern within a file; if the pattern is complicated use quotes **(‘,”)** to contain the pattern; grep has lots of options so check out its manual, in particular the option **-v**)
**wc filename** (count the words in a file; can be used to count lines **(-l)** or characters **(-c)** in a file)
**tr** -translate or delete sets of characters
**grep -v ">" PATH/test.fa** gives me the total sequence within file
**BTW**
In Unix, **>** sends the results of the command before it to a file that you specify after it. It will write iver that file that alr exists
eg:** some command > filename**
so
use quotes if you want to use > as a pattern: eg.
** grep ">" filename**
##### This command for a given genome will let me see what kinds of sequences are included in each file: grep ">" /courses/bi278/Course_Materials/lab_01b/GFC_0096...
#####
eg. one of GCF files look like:
>NZ_CP008760.1 Paraburkholderia xenovorans LB400 chromosome 1, complete sequence
>NZ_CP008762.1 Paraburkholderia xenovorans LB400 chromosome 2, complete sequence
>NZ_CP008761.1 Paraburkholderia xenovorans LB400 chromosome 3, complete sequence
>^this tell me that this sequence file contains three chromosome per the three >, since this is a genome file. The DNA sequences for each chromosome is preceded by the header (the line that starts with >). These genome files were downloaded from NCBI and the header also tells you which organism and strain the sequence belongs to.
>Some files have the organism and starin identity in t the file name but not the header.
EG. a P.bonniea files look like this:
>1 length=3124304 depth=1.00x
>2 length=884981 depth=0.94x circular=true
Tip#2: Autocomplete is a great time-saving feature in Unix. If you start typing a command or filename and click on the [tab] key, your Unix shell will complete the name for you.
**grep -v ">" PATH/test.fa | tr -d -c GCgc
^will give me the total of C and G nucleotides in the sequence**
The -v prints all the lines without/against the grep pattern that you set it to look for cause grep is a pattern seeking command
adding ">" after -v prints the lines that have >
**contigs count**: grep ">" PATH/file/ |wc -l (l stands for lines)
^ if |wc -c that would print characters
**awk 'BEGIN {print (253/400)}'
for gc %**
**To get genome size**: grep -v ">" /Path/file | tr -d -c ATGCatgc |wc -c
^the ATGCatgc will give the total count of all the nucleotides ^ (note that tehr is no -v)
eg.grep -v ">" ./GCF_020419785.1_ASM2041978v1_genomic.fna | tr -d -c ATGCatgc| wc -c
**grep -v ">" PATH test.fa | tr _d _c GCgc | wc -c
^ gives me the word count of the total C and G nucleotides
then**
awk 'BEGIN {print (253/400)}' will give me the percentage of GC