# MSc Summer 2024 - Abhash and shapelets ###### tags: `aeon-msc` __Contributor:__ Abhash Shrestha __Project:__ Shapelet Algorithms for Time Series Analysis __Project length:__ 12 Weeks __Mentors:__ Tony Bagnall, Matthew Middlehurst __Mid-project evaluation:__ ?? __Thesis submission:__ Monday, August 26 __Regular meeting time:__ Friday, ?? ## Project Summary ## Project Timeline ### June ### July ### August ## Getting started tasks - [ ] Introduce yourself in the community Slack channels. Use __#introductions__ to introduce youself to the wider community if you have not already and __#summer-2024__ to introduce yourself and your project to other students and mentors. - [ ] Go through the contributor guide on the _aeon_ website (https://www.aeon-toolkit.org/en/stable/contributing.html). - [ ] Set up a development environment, including _pytest_ and _pre-commit_ dependencies. This will make development a lot easier for you, as you must pass the PR tests to have your code merged (https://www.aeon-toolkit.org/en/stable/developer_guide/dev_installation.html). - [ ] Review some of the important dependencies for developing aeon at a basic level: - [ ] __scikit-learn__ the interface aeon estimators extend from. We aim to keep as compatible as possible with sklearn tools. - [ ] __pytest__ for unit testing. Any code added will have to be covered by tests. - [ ] __sphinx/myst__ for documentation. Adding new functions and classes will have to be added to the API docs. - [ ] __numba__ for writing efficient functions. - [ ] Make some basic Pull Requests (PRs) to gain some experience with contributing to _aeon_ through GitHub. - [ ] Add the project time line objects to this document. # Make notes of progress here This is just an informal place to make notes on objectives and progress ## Week 1: 2nd June ### Initial email from Tony for background reading on shapelets, please focus on these papers. https://link.springer.com/article/10.1007/S10618-016-0483-9 https://link.springer.com/article/10.1007/s10618-024-01022-1 https://link.springer.com/article/10.1007/s10618-013-0322-1 I had forgotten about this https://link.springer.com/chapter/10.1007/978-3-642-32639-4_58 Broadly, your tasks are 1. Background reading, familiarise with aeon and open source, make a contribution, learn how to use tsml-eval toolkit 2. Understand how the shapelet transform (ST) currently works, run shapelet transform classifier (STC) on the Southampton HPC 3. Implement experimental improvements to ST in the tsml-eval toolkit, evaluate if they improve performance 4. Implement our final system in aeon and make a PR It might be possible to get a research paper out of this if it goes well, but let us worry about that for now. So improvements I would like to assess are 1. Alternative distance measures (as in the original paper). Are they faster and/or more accurate? 2. Using dilation 3. Alternative merge method to avoid repetitions of the same shapelet 4. Multivariate alternatives and potential for channel selection through ST. Not expecting you to do them all, it is a wish list :😊 ## Meeting 7/6/24 Reading papers on TSC and shapelets. Good to get more familiar with aeon for next week. Good first issue list: https://github.com/aeon-toolkit/aeon/issues?q=is%3Aissue+is%3Aopen+label%3A%22good+first+issue%22 Should analyse different selecion methods for shapelets on expanded UCR archive - is information gain still the best? Need to apply for HPC - https://www.southampton.ac.uk/isolutions/staff/lyceum.page https://github.com/aeon-toolkit/aeon/issues/186 todo 1. Tony to apply for HPC accounts 2. Abhash to make first PR to aeon 3. Read up quality measures, start overleaf doc, describe there-in 4. set up on tsml-eval 5. implement quality measures in tsml-eval 6. run experiments and compare 7. port final version into aeon# ## 14/6/24 First PR made. Proceed with tsml eval and read up on shapelets. To do 1. Tony to apply for HPC accounts 2. Tony to set up shapelet branch on tsml-eval branch set up, called shapelets 4. Abhash to work on tsml eval and read up on shapelet transform 5. Implement f-stat as a stand alone function ## 21/6/24 Done: Tony: set up branch and applied for lyceum accounts 1. measures to implement as functions -Kruskal-wallace -F-statistic -moods median 2. write test function use a numba wrapper 3. integrate into ST help from tony and matt ## 28/6/24 First PR made to aeon. Still working on distance functions. Tony has set up the structure to include them, stressed need to include test functions for each. ## 12/7/24 In tsml-eval: - Adapt Shapelet Transform to use other quality measures - Add classifier to set_classifier framework - Upload and run on HPC cluster - Evaluate results ## 2/8/24 - Experiments on univariate and multivariate mostly complete - In terms of accuracy not much significant difference - In terms of run time, to be confirmed Next Stages - Sort out run time experiments - Find and implement new tests of two distributions - https://www.stat.cmu.edu/~larry/=sml/Opt.pdf https://en.wikipedia.org/wiki/Two-sample_hypothesis_testing Wasserstein distance