Write in an editor the program, which calculates the distance between two sequences.
A simple program (without function and modules) is sufficient.
Compare the distance between the first and the second and the first and the third sequence and print the alignment with the smaller distance. If the distances are equal, then print the alignment of the first and second sequence.
Test your program with the following sequences:
Open an editor and save your new program. In this program we will create a few functions.
1.1 Define the two functions similarity
and distance
:
Note: Purines are A and G, pyrimidines are C and T.
1.2 Write two functions sequence_similarity
and sequence_distance
, which calculates the similarity and distance of two whole sequences.
1.3 Calculate the similarity and distance for the following sequences.
Read these sequences from the command line and print out their similarity and distance.
In this exercise we will write three different programs.
2.1 Write a new Python file (module) called sequence_tools.py
which contain both the two functions similarity and distance as defined previously.
2.2 Write another Python file that calculates for each combination of two sequences stored in list seq_list
the similarity and distance using the module defined previously.
l = ["ATCCGGT", "GCGTTAC", "CTACTGC", "TTGCAGT", "AGTCACC"]
2.3 Extend your program. Determine the combination of sequences with the highest similarity of all sequences stored in list l. Write these two sequences and the alignment into a new file, called similar_sequences.txt
.
For example for two given sequences: "ATC" and "ACC" The alignment would be:
And this alignment should be written to a new output file.
Hint: A line-break in Python can be made by adding ’\n’ to the end of the line.
UPPMAX
Intro course