**LAB/WEEK ONE NOTES** # 1.1 CONNECTING TO UNIX ENVIRONMENT username for Unix environment = ssh (secure shell) aplhanumerical colby id (cibene26) @bi278 it should loook like cibene26@bi278, then enter pswrd # **1.2 FIND YOUR COLBY HOME AND COURSES DIRECTORIES** -will end up in home directory or folder When done with work use command: exit ~ =home directory *To get the full path directory (address within the file system of where you are) use command that stands for print workinf directory: pwd typing pwd when at ~ shouldvgive me /home2/cibene26 ls =list all the directory (files) and folders in the current working directory *Current directory is usually denoted as . -the dot means here eg.if i were in the home directory, doing the following commands would give me the same things (listing eveything in home): ls ~ ls /home/cibene26 ls . to get more desciptive info of what is in your current working directory(date and time created)..more info, used the flags -lh to ask for a more detailed list (l) of directory contents that include file permissions, file size, and date created, in human-readable (h) form: ls -lh ls colbyhome(directory); list eveything in colbyhome directory, which is a symbolic link to my persoanl directory on Colby's fileserver called Filer ^Can access my files as long as i put then in this folder(directory) ^To attach or mount this folder directly to your computer by going to the Mac desktop and typing [command-K]. You can enter the following address and then use your Mac’s Finder to navigate to the same folder: smb://filer.colby.edu ^***For when you need to move files from your desktop to the server and vice versa*** Other fileserve that Colby maintains is Courses- all material from our course will be available to me via the Courses fileserver and specific course directory in it. ***In which i can copy files out of this directory (cant edit or write on them)*** courses fileserver to make sure i have access: ls /courses/bi278 Given files to work with= /courses/bi278/Course_Materials/ Personal directories to work= /courses/bi278/cibene26 (give me private wheni type command) # **1.3 NAVIGATING IN AND OUT OF DIRECTORIES** I can go to any location in fileserve by specifying it's path cd= change directory, to change destination if in colbyhome (colbyhome stands for: /personal/cibene26), could change directories into it using :cd /personal/colbyid cd ..= parent directory pwd to check address of where you are cd .= current directory #### = sharps which are used to comment in code, everything after a sharp is ignored. You need for adding comments to same line as a command eg: ls # this lists everything in your current working directory # **1.4 ORGANIZE SOME FILES** /courses/bi278/Course_Materials/lab_01a ^directory that contains bunch of files from previous Prof Noh's projects -figure files-->(*.eps) ###### *Dont use spaces in directory or file names to escape them use a backslash eg: My Documents --> ls My\Documents ^use underscore or period such as my_documents, myDocuments, etc. mkdir= command for making new directories rmdir= remove directories #### cp (copy) *(all info in file) history= list of every command us used or can use up arrow autocomplete command or file press Tab ##### mkdir lab_01a lab_01b cp /courses/bi278/Course_Materials/lab_01a/* ./lab_01a ^-this takes the Path of material you are copying to the destination that you want to copy to. So this is taking all the contents of lab_01a folder from the CourseMaterials( files given) and copying it to my lab_01a directory which i just created using mkdir: (mkdir lab_01a) #### SOME UNIX COMMNADS #### Some Unix commands (and quick descriptions): **mkdir** name (make a directory named whatever you list after this command) **rmdir** name (remove a specific directory, only works if it’s empty; must use rm –r if the directory is not empty) **cp a b** (copy file a to b; can be used to not only change location but also filename at same time) **mv a b** (move a file from a to b; both a and b must be specified; also used for renaming files) **rm filename** (rm whatever you list after it; cannot be reversed) **cat filename** (concatenate, or print to screen the entire contents of a file as long as it is a text file) less filename (display the contents of a file one screen length at a time; use arrows to scroll up or down one line at a time; **exit by typing q) head filename** (print to screen the top 10 lines of a file) **tail filename** (print to screen the bottom 10 lines of a file) **man command** (manual for most Unix commands) **SHORTCUTS** . (current location) ..(one directory above) its two dots not three ~ (home, this will take you to wherever folder you started when you connected to bi278) *(a wildcard that will match any string of characters (A-z; 0-9; etc) press up arrow key to find most recent command. Could also type history to see every recent command used if typing a command or filename, click on tab key, and unix will complete the name for you: eg. if i type: ls /courses/bi278/Course_Materials/lab_01b/GCF and press tab, i would have to press tab again to show me all matching files, because there are many files that start with GCF **2. BEGIN TO WORK WITH GENOME FILES** To find genomes in text files, i can find the ones we will mainly be working with and a smaller FASTA (standard sequence file format) file (test.fa) in: /courses/bi278/Course_Materials/lab_01b/ ^To go here, I started at home (~), the i cd to /courses/bi278/Course_Materials, the i ls lab_01b^^ quicker wya but i did it this way to gain fluency # 2.1 GET BASIC INFORMATION ABOUT GENOMES GENOME FILES ARE USUALLY IN FASTA FORM(thsi is a text file that sues a specific format to describe DNA (or protein) sequences. Individual chromosomes/ contigs or genes are labeled in a FASTA file by this symbol: > 2Complete Table P.bonniea strain: bbqs395 P.bonniea strain:bbqs433 P.fungorum strain:BAA-463 using grep ">" filename helped me to obtain the info within the file to not repeat inputing the PATH i did cd (when i was in home directory) PATH ADDITIONAL UNIX COMMANDS (and quick description): Comment **grep pattern filename** (find a specific pattern within a file; if the pattern is complicated use quotes **(‘,”)** to contain the pattern; grep has lots of options so check out its manual, in particular the option **-v**) **wc filename** (count the words in a file; can be used to count lines **(-l)** or characters **(-c)** in a file) **tr** -translate or delete sets of characters **grep -v ">" PATH/test.fa** gives me the total sequence within file **BTW** In Unix, **>** sends the results of the command before it to a file that you specify after it. It will write iver that file that alr exists eg:** some command > filename** so use quotes if you want to use > as a pattern: eg. ** grep ">" filename** ##### This command for a given genome will let me see what kinds of sequences are included in each file: grep ">" /courses/bi278/Course_Materials/lab_01b/GFC_0096... ##### eg. one of GCF files look like: >NZ_CP008760.1 Paraburkholderia xenovorans LB400 chromosome 1, complete sequence >NZ_CP008762.1 Paraburkholderia xenovorans LB400 chromosome 2, complete sequence >NZ_CP008761.1 Paraburkholderia xenovorans LB400 chromosome 3, complete sequence >^this tell me that this sequence file contains three chromosome per the three >, since this is a genome file. The DNA sequences for each chromosome is preceded by the header (the line that starts with >). These genome files were downloaded from NCBI and the header also tells you which organism and strain the sequence belongs to. >Some files have the organism and starin identity in t the file name but not the header. EG. a P.bonniea files look like this: >1 length=3124304 depth=1.00x >2 length=884981 depth=0.94x circular=true Tip#2: Autocomplete is a great time-saving feature in Unix. If you start typing a command or filename and click on the [tab] key, your Unix shell will complete the name for you. **grep -v ">" PATH/test.fa | tr -d -c GCgc ^will give me the total of C and G nucleotides in the sequence** The -v prints all the lines without/against the grep pattern that you set it to look for cause grep is a pattern seeking command adding ">" after -v prints the lines that have > **contigs count**: grep ">" PATH/file/ |wc -l (l stands for lines) ^ if |wc -c that would print characters **awk 'BEGIN {print (253/400)}' for gc %** **To get genome size**: grep -v ">" /Path/file | tr -d -c ATGCatgc |wc -c ^the ATGCatgc will give the total count of all the nucleotides ^ (note that tehr is no -v) eg.grep -v ">" ./GCF_020419785.1_ASM2041978v1_genomic.fna | tr -d -c ATGCatgc| wc -c **grep -v ">" PATH test.fa | tr _d _c GCgc | wc -c ^ gives me the word count of the total C and G nucleotides then** awk 'BEGIN {print (253/400)}' will give me the percentage of GC