Bi278 Lab Num 1

# Bi278 Lab Num 1 ### By Lee Ferenc 9/19/2022 ## Excerise 1. Organization ### Connecting to the Unix Enviroment ##### In the upper right search "terminal" and input the following (colby ID interchangable depending on person) > ssh enfere24@bi278 > exit #### This is your working directory. Here are basic commands ##### (Outside of this notebook I have a google doc with examples and links that I'll share with whoever asks but I didn't put it here because I wasn't sure on the sharing capabilities of HackMD). >pwd #print working directory >cd #change directory >cd .. #move back >ls #list all files in a directory (add -lh for a more detailed description) >mkdir (name) #make directory >rmdir (name) #remove directory >cp a b #copy #### Lab folders under the following, seperated into lab folders >personal/enfere24/Genomics #### Making folder example and moving lab_01a to it >mkdir lab_01a >cp /courses/bi278/Course_Materials/lab_01a/* personal/enfere24/Genomics/lab_01a ## Exercise 2. Collect basic genome statistics for multiple genomes ### 1. An Example #### Getting basic information about a genome >grep ">" /courses/bi278/Course_Materials/lab_01b/"filename" #### Example: >grep ">" /courses/bi278/Course_Materials/lab_01b/P.hayleyella_bhqs530.nanopore.fasta #### This outputs: 1 length=3288159 depth=1.00x circular=true 2 length=830527 depth=0.97x circular=true #### Instead of individually applying to each file (and after failing to create a loop command) I asked Professor Noh, who showed the wild card which runs through each files > grep ">" /courses/bi278/Course_Materials/lab_01b/* ### 2 Table: ![](https://i.imgur.com/dshnrYx.png) #### To fill out the table: * Col 2: I used the file names and grep ">" for non-labeled * Col 3: grep ">" /courses/bi278/Course_Materials/lab_01b/* and counted the number of lines * Col 4: created a nano script that had "wc -m $1" * Col 5: created a nano script that had "grep -v ">" $1 | tr -d -c GCgc | wc -c" and then divided col 4. ###### I used awk 'BEGIN{print(GC/total)}' at first, but then found it easier to put it in a table and divide across the rows. I would perfer to find a way to automatically calculate in a nano script and spent a decent portion of time but wasn't sucessful due to nano/unix being very abstract compared to other languages I've used ### 3 Test.fa #### A is 400. To get length of the genome I used wc (word counter) or wc + pathway + /test.fa but on test.fa it returned 411 instead of 400 (I did cat + file and then put it into an online character counter). That's because of the character after every line and then the first line. I tried for a decent amount of time to no avail. #### B. 0.6325 I used the nano script from the table and then divided by A. The script for the number of GC is below, but does not make a large portion of sense due to some of them not being needed/used differently (-c is byte count instead of char count) >grep -v ">" PATH/test.fa | tr -d -c GCgc | wc -c ### Nano #### Opening: >nano #### Exit: Press control + x and then insert the file name if you wish to save, then click Y: >sh "your_script".sh ###### Note: I needed to use the right control key #### To code: On the first line input the following: >#!/bin/bash #### You may then add commands, but where you need file name insert: $1 #### To edit script: >nano "your_script".sh ##### To run: >sh "your_script".sh filename.fafsa