# Bi278 Lab Num 1
### By Lee Ferenc 9/19/2022
## Excerise 1. Organization
### Connecting to the Unix Enviroment
##### In the upper right search "terminal" and input the following (colby ID interchangable depending on person)
> ssh enfere24@bi278
> exit
#### This is your working directory. Here are basic commands
##### (Outside of this notebook I have a google doc with examples and links that I'll share with whoever asks but I didn't put it here because I wasn't sure on the sharing capabilities of HackMD).
>pwd #print working directory
>cd #change directory
>cd .. #move back
>ls #list all files in a directory (add -lh for a more detailed description)
>mkdir (name) #make directory
>rmdir (name) #remove directory
>cp a b #copy
#### Lab folders under the following, seperated into lab folders
>personal/enfere24/Genomics
#### Making folder example and moving lab_01a to it
>mkdir lab_01a
>cp /courses/bi278/Course_Materials/lab_01a/* personal/enfere24/Genomics/lab_01a
## Exercise 2. Collect basic genome statistics for multiple genomes
### 1. An Example
#### Getting basic information about a genome
>grep ">" /courses/bi278/Course_Materials/lab_01b/"filename"
#### Example:
>grep ">" /courses/bi278/Course_Materials/lab_01b/P.hayleyella_bhqs530.nanopore.fasta
#### This outputs:
1 length=3288159 depth=1.00x circular=true
2 length=830527 depth=0.97x circular=true
#### Instead of individually applying to each file (and after failing to create a loop command) I asked Professor Noh, who showed the wild card which runs through each files
> grep ">" /courses/bi278/Course_Materials/lab_01b/*
### 2 Table:

#### To fill out the table:
* Col 2: I used the file names and grep ">" for non-labeled
* Col 3: grep ">" /courses/bi278/Course_Materials/lab_01b/* and counted the number of lines
* Col 4: created a nano script that had "wc -m $1"
* Col 5: created a nano script that had "grep -v ">" $1 | tr -d -c GCgc | wc -c" and then divided col 4.
###### I used awk 'BEGIN{print(GC/total)}' at first, but then found it easier to put it in a table and divide across the rows. I would perfer to find a way to automatically calculate in a nano script and spent a decent portion of time but wasn't sucessful due to nano/unix being very abstract compared to other languages I've used
### 3 Test.fa
#### A is 400. To get length of the genome I used wc (word counter) or wc + pathway + /test.fa but on test.fa it returned 411 instead of 400 (I did cat + file and then put it into an online character counter). That's because of the character after every line and then the first line. I tried for a decent amount of time to no avail.
#### B. 0.6325 I used the nano script from the table and then divided by A. The script for the number of GC is below, but does not make a large portion of sense due to some of them not being needed/used differently (-c is byte count instead of char count)
>grep -v ">" PATH/test.fa | tr -d -c GCgc | wc -c
### Nano
#### Opening:
>nano
#### Exit: Press control + x and then insert the file name if you wish to save, then click Y:
>sh "your_script".sh
###### Note: I needed to use the right control key
#### To code: On the first line input the following:
>#!/bin/bash
#### You may then add commands, but where you need file name insert: $1
#### To edit script:
>nano "your_script".sh
##### To run:
>sh "your_script".sh filename.fafsa