# String Alignment random_lines.txt (emailed on 9/1/20) Download file and place in new working directory Will be working in julia (CRC or local machine, wherever you've been working) in julia ``` ] add StringDistances add CSV ctrl+c using StringDistances using CSV df = CSV.read("random_lines.txt", header=0) #compare two strings that are spelled slightly differently evaluate(Levenshtein(), "martha", "mathra") #see number of differences between strings as output #can also try this with DNA characters in quotation marks #try with insertions/deletions df1 = convert(Matrix,df) findmax("ACACCAATAGCAGTCCAGATGACCAAATTGGCTACTACCGAAGAGCTACCAGACGAATCC", df1, Levenshtein(), min_score = 0.0) # consider use when looking for a sequence match in a database of sequences (e.g., you have a random bacteria sequence and want to find it's match in a database of bacteria databases) #Distances can be calculated via other metrics - try Jaccard distance compare("martha", "marthe", Jaccard(2)) #number inside parentheses - length of substring used for comparison ``` ## Alignments ``` ] add BioAlignments ctrl+c using BioAlignments #Set up cost model - will allow you to set up 'costs' of mismatching costmodel = CostModel(match=0, mismatch=1, insertion=1, deletion=1) pairalign(EditDistance(), "GATCCTAG", "GATCCTAG", costmodel) # cost of zero - same string pairalign(EditDistance(), "GATCCTAG", "GATCCTTG", costmodel) # change string elements to see cost change, insert, delete, etc. ``` When you use blast, you can see similar comparisons in the results (scroll down in blast results). Here I just copied and blasted some of the characters in random_lines.txt, and it was a perfect match to a result in NCBI ![](https://i.imgur.com/7Z4v9pL.png) ``` ] add BioSequences ctrl+c using BioSequences s1 = dna"CCTAGGAGGA" s2 = dna"ACCTGGTAAC" scoremodel = AffineGapScoreModel(EDNAFULL, gap_open=-5, gap_extend=-1) res = pairalign(GlobalAlignment(), s1, s2, scoremodel) # can edit numbers in score model to weight differences to your specifications ``` ![](https://i.imgur.com/9V6NC1H.png) - Ended here 9/1/20 --- ## External links [reading for today](https://www.nature.com/scitable/topicpage/basic-local-alignment-search-tool-blast-29096/) Other distances available in StringDistances package: https://github.com/matthieugomez/StringDistances.jl More information on alignments in Julia: https://biojulia.net/BioAlignments.jl/v0.3/alignments.html ![](https://i.imgur.com/vepzDK1.png)