# String Alignment
random_lines.txt (emailed on 9/1/20)
Download file and place in new working directory
Will be working in julia (CRC or local machine, wherever you've been working)
in julia
```
]
add StringDistances
add CSV
ctrl+c
using StringDistances
using CSV
df = CSV.read("random_lines.txt", header=0)
#compare two strings that are spelled slightly differently
evaluate(Levenshtein(), "martha", "mathra")
#see number of differences between strings as output
#can also try this with DNA characters in quotation marks
#try with insertions/deletions
df1 = convert(Matrix,df)
findmax("ACACCAATAGCAGTCCAGATGACCAAATTGGCTACTACCGAAGAGCTACCAGACGAATCC", df1, Levenshtein(), min_score = 0.0)
# consider use when looking for a sequence match in a database of sequences (e.g., you have a random bacteria sequence and want to find it's match in a database of bacteria databases)
#Distances can be calculated via other metrics - try Jaccard distance
compare("martha", "marthe", Jaccard(2))
#number inside parentheses - length of substring used for comparison
```
## Alignments
```
]
add BioAlignments
ctrl+c
using BioAlignments
#Set up cost model - will allow you to set up 'costs'
of mismatching
costmodel = CostModel(match=0, mismatch=1, insertion=1, deletion=1)
pairalign(EditDistance(), "GATCCTAG", "GATCCTAG", costmodel)
# cost of zero - same string
pairalign(EditDistance(), "GATCCTAG", "GATCCTTG", costmodel)
# change string elements to see cost change, insert, delete, etc.
```
When you use blast, you can see similar comparisons in the results (scroll down in blast results). Here I just copied and blasted some of the characters in random_lines.txt, and it was a perfect match to a result in NCBI

```
]
add BioSequences
ctrl+c
using BioSequences
s1 = dna"CCTAGGAGGA"
s2 = dna"ACCTGGTAAC"
scoremodel = AffineGapScoreModel(EDNAFULL, gap_open=-5, gap_extend=-1)
res = pairalign(GlobalAlignment(), s1, s2, scoremodel)
# can edit numbers in score model to weight differences to your specifications
```

- Ended here 9/1/20
---
## External links
[reading for today](https://www.nature.com/scitable/topicpage/basic-local-alignment-search-tool-blast-29096/)
Other distances available in StringDistances package: https://github.com/matthieugomez/StringDistances.jl
More information on alignments in Julia: https://biojulia.net/BioAlignments.jl/v0.3/alignments.html
