# Notebook 5 Genomics Lab
`export
BLASTDB="$BLASTDB:/courses/bi278/Course_Materials/blastdb"`
set value for BLASTDB (tell where the databse is)
`echo $BLASTDB` gets the value.
### 1.1 Run command line BLASTP
`awk 'BEGIN {RS=">"} /hypothetical protein/ {print ">"$0}'
ORS="" PROKKA_inputfile`
What the command above does is:
1.Read the file you designate at the end of the command by whatever you tell it to use as a record separator (RS=">")
2.Match a particular pattern (/pattern/) you give it in each record, such as the word ‘hypothetical’
3.When a record contains that pattern, it then prints the symbol ‘>’ followed by the
record ($0)
4.When it prints, it doesn’t use an additional output record separator (ORS="")
Simply, it gets all records with the /pattern/ separated by ">".
`awk 'BEGIN {RS=">"} /hypothetical protein/ {print ">"$0}'
ORS="" PROKKA_inputfile > output_file_name`
Ouput the records to a new file.
`awk 'BEGIN {RS=">"; srand()} (rand()<=.002) {print ">"$0}'
ORS="" hypothetical_inputfile`
Gets a random part of the input file.
`blastp -task blastp-fast -num_threads 2 -db database_name -
evalue 1e-6 -query faa_file`
Note: all the dashes (-) belong right next to each option name and the option you choose should be separated from the option name with a space. If you are happy with the results, you should send the results of these searches to a file with the > operator so you can work with them.
The evalue threshold (-evalue 1e-6) determines the stringency of your search. If you don't get any matches, try removing this option. But when you look at your results, note that good e-values for BLAST searches are considered to be those that are 1e-6 or smaller (including 0).
Add`-outfmt "6 std ppos qcovs stitle sscinames staxid" -
max_target_seqs 10` and `-outfmt 3 -num_descriptions 10 -num_alignments 10`to the command above like this
```
blastp -task blastp-fast -num_threads 2 -db
refseq_select_prot -evalue 1e-6 -outfmt "6 std ppos qcovs
stitle sscinames staxid" -max_target_seqs 10 -query
faa_file > subset_blastp_fmt6.txt
blastp -task blastp-fast -num_threads 2 -db
refseq_select_prot -evalue 1e-6 -outfmt 3 -num_descriptions
10 -num_alignments 10 -query faa_file >
subset_blastp_fmt3.txt
```
`awk '$1=="BO395_02501" {print $2}' subset_blastp_fmt6.txt | sort | uniq > subset_blastp_BO395_02501.sseqids.txt `
This line of code take the formate 6 file and get all second column data of specific query of BO395_02501.
What is in the format 6 file is as following.
**"qseqid sseqid pident length mismatch gapopen
qstart qend sstart send evalue bitscore"**
`blastdbcmd -db refseq_select_prot -entry_batch
example_blastp_BO395_01177.sseqids.txt`
This code above gets all entries with subject ids found from the last code.
### 1.2 BLASTN
`blastn -task megablast -num_threads 2 -query ffn_file -db
nt -evalue 1e-6`
get the nucleotide BLAST search.
Then retrieve the fasta sequence from the nucleotide sequence file for query sequence get from 1.1 with code below.
`samtools faidx PROKKA_09162022.ffn BO395_02501 BO395_00780 BO398_00049`
### 1.3 PARSE OUTPUT OF BLAST RESULT
`awk -F "\t" '{print $1,$2,$15,$16,$13,$14,$11}' OFS="\t"
example_blastp_outfmt6.txt`
Retrieves the entries indicated from entries below.
`qseqid sseqid pident length mismatch gapopen
qstart qend sstart send evalue bitscore ppos
qcovs stitle sscinames staxid`
#### Q2
| Parameters | Deafult value BLASTN | Deafult value BLASTP |
|:---------------- | -------------------- |:-------------------- |
| Matrix | BLOSUM 62 | BLOSUM62 |
| evalue | 10.0 | 10.0 |
| max_target_seqs | 500 | 500 |
| num_descriptions | 500 | 500 |
| num_alignments | 250 | 250 |
For matrix, I would change the value when I don't want to use the BLOSUM62 matrix.
For evalue, I would change the value if I don't want to make the expect value for saving hits to be 10.
For max_target_seqs, I would change the value when I want to keep less or more aligned sequences than 500.
For num_descriptions, I would change it if I want descriptions to appear more frequently than one description per 500 database sequences.
For num_aligments, I would change it if I want to see a different number of alignments from 250.