GSoC 2024 - Clustering DL Aadya

# GSoC 2024 - Clustering DL Aadya ###### tags: `aeon-gsoc` __Contributor:__ Aadya Chinubhai __GSoC page:__ https://summerofcode.withgoogle.com/programs/2024/projects/Hvd0DfkD __Project:__ Developing Deep Learning Framework and Implementations for Time Series Clustering __Project length:__ 14 Week __Mentors:__ Matthew Middlehurst, Ali Ismail-Fawaz, Tony Bagnall __Mid-project evaluation:__ Friday, July 19 __Final evaluation:__ Monday, September 9 __Blog link:__ https://medium.com/@aadyachinubhai __Regular meeting time:__ Friday, 13:00 UTC ## Project Summary Time series clustering involves grouping similar time series data together based on specific features or patterns. Deep learning algorithms have become increasingly popular for clustering. However, the aeon's deep clustering module currently lacks several deep learning-based algorithms. In this project the aim is to implement some of the top performing and interesting algorithms from a recent comparison of deep learning for time series clustering and benchmark them. This project includes further developing the aeon deep learning networks module, making the package publicly documented for user to explore and well tested to help the maintenance of the deep learning implemented in the future. ## Project Timeline ### Week 1-2 - Write framework for the following network classes in the style of AEResNetNetwork and AEFCNNetwork: - AEDRNNNetwork - AEBiGRUNetwork - AEAttentionBiGRUNetwork - AEDCNNNetwork - DCNNNetwork - Rework and expand on test_all_networks.py to achieve better test coverage for the implemented networks ### Week 3-4 - Add documentation for the implemented networks and add them to the API webpage. - Implement the following deep learning clusterers extending from BaseDeepClusterer in the style of AEResNetClusterer and AEFCNClusterer: - AEDRNNClusterer - AEBiGRUClusterer - AEBiGRUAttClusterer - InceptionClusterer? ### Week 5-6 - Add documentation for the implemented clusterers and add them to the API webpage. - Ensure the clusterers are covered by unit testing by introducing tests for all deep clusterers and individual clusterers where necessary. ### Week 7 - Improve the networks notebook and start to create a notebook specifically for deep learning clustering. ### Mid-project Deliverables - 3 networks classes - 3 clustering deep learners ### Week 8-9 - Implement the multi_rec loss function and add functionality to deep learners, this would mean an edit should be done to all AE based networks - Document the functiona and add a test to ensure loss output is correct - Edit deep learner documentation, making it clear in the docstring how to choose the losses and whats the difference between them ### Week 10-11 - Benchmarking all of the deep clusterers over the UCR archive, compare their performance to the original bake off ### Week 12-13 - Add the different losses functionality to the deep clustering notebook - Add some latent space visualization in the deep clustering notebook using TSNE, PCA, DBscan etc. ### Week 14 - Add a saving/loading mechanism to deep clusterers, similar to the one available for deep classifiers and regressors - Add the documentation necessary for the saving/loading of deep clusterers and examples to the notebook ### Final Deliverables - New deep clustering models with two different ways of using the pretext loss with sufficient testing - Documentation and API pages detailing the different parameters of all deep clusterers and a new deep clustering notebook - Saving/loading mechanisms for the deep clustering module - Benchmark results over all deep clusterers using the UCR archive datasets ## Community Bonding Period - [x] Introduce yourself in the community Slack channels. Use __#introductions__ to introduce youself to the wider community if you have not already and __#summer-2024__ to introduce yourself and your project to other students and mentors. - [x] Go through the contributor guide on the _aeon_ website (https://www.aeon-toolkit.org/en/stable/contributing.html). - [x] Set up a development environment, including _pytest_ and _pre-commit_ dependencies. This will make development a lot easier for you, as you must pass the PR tests to have your code merged (https://www.aeon-toolkit.org/en/stable/developer_guide/dev_installation.html). - [x] Review some of the important dependencies for developing aeon at a basic level: - [x] __scikit-learn__ the interface aeon estimators extend from. We aim to keep as compatible as possible with sklearn tools. - [x] __pytest__ for unit testing. Any code added will have to be covered by tests. - [x] __sphinx/myst__ for documentation. Adding new functions and classes will have to be added to the API docs. - [x] __numba__ for writing efficient functions. - [x] __tensorflow__ is the current package used for all of our deep learning algorithms. - [x] Make some basic Pull Requests (PRs) to gain some experience with contributing to _aeon_ through GitHub. Some suggestions: - [ ] Update regression API docs [#1500](https://github.com/aeon-toolkit/aeon/issues/1500) - [x] Add a dummy clusterer [#1528](https://github.com/aeon-toolkit/aeon/issues/1528) - [x] Read up on the subject of your project (DL Clustering). We will provide some literature, but we encourage you to go beyond that and ask any questions you have. - [Comparison of DL clusterers for time series](https://link.springer.com/article/10.1007/s10618-021-00796-y) - [x] Decide on a project length (https://developers.google.com/open-source/gsoc/help/project-dates). - [x] Refine the project timeline and deliverables with the project mentors. Agree on some milestones for both mid-project and final evaluations. - [x] Update the GSoC webpage project to better match any new directions after discussions with mentors. - [x] Select a tracking/blogging medium to write down and track progress made on the project. Agree on a frequency of updates. - [x] Set up regular meeting days and times to discuss the project and track progress. ## Week 1: 27th May Present: AC MM AF 3 PRs have been created for networks, we have discussed the PRs and left some review comments. test_all_networks does not work currently, causing a lot of missing coverage for the modules: https://github.com/aeon-toolkit/aeon/blob/main/aeon/networks/tests/test_all_networks.py https://app.codecov.io/gh/aeon-toolkit/aeon/blob/main/aeon%2Fnetworks%2F_encoder.py It would be good to figure out why and repair the test. Individual test_x.py files should be added for networks, testing parameterisation. Blog post: https://medium.com/@aadyachinubhai/gsoc-24-numfocus-week-1-038aa6a91469 ## Week 2: 3rd June Meeting moved to Monday, discussion on Slack and progress is satisfactory. Blog post: https://medium.com/@aadyachinubhai/gsoc-24-numfocus-week-2-617c4b4cd863 ## Week 3: 10th June Discussed expectations for mid-term evaluation: MM to clarify exact deliverables. For now, continue implementing the clusterers using the implemented networks. Make sure it is all documented and covered by unit tests. To disuss with Ali: Tensorflow upper boun and tensorflow-addons. Blog post: https://medium.com/@aadyachinubhai/greetings-14b323b3e332 ## Week 4: 17th June suggested plan for #1631: - Remove tag from registry - implement `_config` dictionary class variable for network base class and any network which uses configuration other than the default - Update the structure of the assertion and dependency checking portion to use new tags (see https://github.com/aeon-toolkit/aeon/pull/1631#pullrequestreview-2131464473 doe suggestions) - Add `_config` to auto encoders with all the network configuration, i.e. code snipper below. By default "auto_encoder" should be False in the base class. ``` _config = { "python_dependencies": "tensorflow", "python_version": "<3.12", "auto_encoder": True, } ``` - Create new test to check the `_config` variable in all networks, make sure all of them a. have `_config` b. have the correct 3 config items c. the config items have the correct type (str for first 2, bool for auto_encoder). - Update the PR title and description to match new contents. #1576, #1577, #1583, #1608, #1702 (after #1631): - Update to use `BaseDeepLearningNetwork` - Add `_config` - Implement remaining suggestions by Ali Blog post: https://medium.com/@aadyachinubhai/gsoc-24-numfocus-week-4-e58b77e59e0b ## Week 5: 24th June Blog post: https://medium.com/@aadyachinubhai/gsoc-24-numfocs-week-5-f838f4a98cf7 ## Week 6: 1st July Blog post: https://medium.com/@aadyachinubhai/gsoc-24-numfocus-week-6-b86a70272948 ## Week 7: 8th July Blog post: https://medium.com/@aadyachinubhai/gsoc-24-numfocus-week-7-98d6eb276f73 todo: Create notebook for DL clusterers Create structure for evaluating DL clusterers ## Week 8: 15th July Blog post: For multi_roc loss: Adding the multi_rec loss functionality, this part is tricky given that their will be the need to adapt all the AE based networks to return the output of each encoder/decoder layers (output activations only of course). This is followed by adding the loss functionality in each of the associate Deep Clusterer in the Deep Clustering module. At first step, it would be better to only adapt one of the existing models, i propose to start with AEFCNNetwork as it is the smallest and simplest model, than re-do the work for the rest if possible. This would be a flag set in the deep clusterer to either use the default reconstruction loss (the one used now in current deep clusterers), and the multi-reconstruction loss. This part should be highlighted in the documentation, as we should detail whats the difference between using the default reconstruction loss and the multi reconstruction loss. Testing the different reconstruction loss functionality for all of the deep clusterers. ## Week 8: 22nd July ## Week 8: 29th July working on loss function docs notebook

Syntax	Example	Reference
# Header	Header	基本排版
- Unordered List	Unordered List
1. Ordered List	Ordered List
- [ ] Todo List	Todo List
> Blockquote	Blockquote
Bold font	Bold font
Italics font	Italics font
~~Strikethrough~~	~~Strikethrough~~
19^th^	19^th
H~2~O	H₂O
++Inserted text++	Inserted text
==Marked text==	Marked text
[link text](https:// "title")	Link
![image alt](https:// "title")	Image
`Code`	`Code`	在筆記中貼入程式碼
```javascript var i = 0; ```	`var i = 0;`
:smile:		Emoji list
{%youtube youtube_id %}	Externals
$L^aT_eX$	L^aT_eX
:::info This is a alert area. :::	This is a alert area.