# Topics in bioinformatics HW assignment Your task is to write a `Python` script which implements a simple Neighbor Joining algorithm for phylogeny inference based on the Jukes Cantor distance, that works reasonably fast for ~20 or so sequences. Do not use "prepackaged" routines from `BioPython`, other than to read sequence files. #### Input : a FASTA multiple sequence alignment #### Output: inferred phylogenetic tree with branch lengths (you can choose the format, but Newick format is standard) For example ``` $python3 NJ.py --msa test.fas ... (((human:0.01, chimp: 0.02): 0.03, gorilla : 0.03, orangutan : 0.03) : 0.01, gibbon : 0.05) ``` Program tasks 1. Read a FASTA alignment on N sequences 2. Compute pairwise distance matrix (NxN) on input sequences using the JC69 distance (transformed p-distance) 3. Run the neighbor joining algorithm on the matrix, decide which sequences to join, in what order, and compute branch lengths 4. Convert the result of the NJ algorithm to something human readable and print to console I will check: 1. Program correctness (as a suggestion, you may use one of many existing implementations of NJ to check that your program works as expected) 2. Program readability (does your code look OK) 3. Tolerance to errors (what happens if I input sequences that are not aligned, etc) 4. Readability of output You may use this file as a test alignment https://www.dropbox.com/s/9q2sqlo530mjgsp/brown.fas?dl=0 **Please submit the Python 3 script as a single file**