# COVID-19 BioHackathon: Machine Learning topic ## Project Idea #1 ### Short description Apply Markovian Clustering (MCL) on the currently available SARS-CoV-2 sequences GenBank sequences in order to identify potential groupings beyond the traditional phylogenetic ones. Apply both at the NT and the AA level, based on a number of distance metrics (including e-value, string distance, etc). ### Available Resources _Retrieved from [Data and Resources wiki page](https://github.com/virtual-biohackathons/covid-19-bh20/blob/master/datasets_and_tools.md)_ #### Data - [EBI Data](https://www.ebi.ac.uk/ena/pathogens/covid-19) - [SARS-CoV-2 sequences GenBank](https://www.ncbi.nlm.nih.gov/genbank/sars-cov-2-seqs/) - [nCoV sequences GISAID](https://www.gisaid.org/epiflu-applications/next-hcov-19-app/) - Please be aware of the licenses #### Tools - (**new**) [MCL - a cluster algorithm for graphs](https://micans.org/mcl/) #### Models #### R packages ### Participants interested for this - Fotis Psomopoulos - ### Skills needed - clustering (MCL even better) - sequence manipulation - Programming (possibly R, bash, Python, other?) - ## Project Idea #2