# COVID-19 BioHackathon: Machine Learning topic
## Project Idea #1
### Short description
Apply Markovian Clustering (MCL) on the currently available SARS-CoV-2 sequences GenBank sequences in order to identify potential groupings beyond the traditional phylogenetic ones. Apply both at the NT and the AA level, based on a number of distance metrics (including e-value, string distance, etc).
### Available Resources
_Retrieved from [Data and Resources wiki page](https://github.com/virtual-biohackathons/covid-19-bh20/blob/master/datasets_and_tools.md)_
#### Data
- [EBI Data](https://www.ebi.ac.uk/ena/pathogens/covid-19)
- [SARS-CoV-2 sequences GenBank](https://www.ncbi.nlm.nih.gov/genbank/sars-cov-2-seqs/)
- [nCoV sequences GISAID](https://www.gisaid.org/epiflu-applications/next-hcov-19-app/)
- Please be aware of the licenses
#### Tools
- (**new**) [MCL - a cluster algorithm for graphs](https://micans.org/mcl/)
#### Models
#### R packages
### Participants interested for this
- Fotis Psomopoulos
-
### Skills needed
- clustering (MCL even better)
- sequence manipulation
- Programming (possibly R, bash, Python, other?)
-
## Project Idea #2