# PhD 2024 - Chuanhang (Chris) Qiu ###### tags: `aeon-research` __Project:__ Rare class classification __Supervisors:__ Tony Bagnall, Matthew Middlehurst and Daniel Clark __Weekly meeting time:__ Friday 10:30 **Project proposal** The primary objective of this research is to enhance the accuracy of TSC in few-shot scenarios, characterized by scarce labeled data. Additionally, this proposal aims to build an open-source framework incorporating various deep learning-based TSC methods. The goal is to systematically summarize these methods and make fair comparisons with non-deep learning approaches. Furthermore, the research will explore the methodology of training a large-scale model with high generalization in TSC benchmarks. ## Getting started tasks - [ ] Go through the contributor guide on the _aeon_ website (https://www.aeon-toolkit.org/en/stable/contributing.html). - [ ] Set up a development environment, (https://www.aeon-toolkit.org/en/stable/developer_guide/dev_installation.html). fork: clone locally: create environment: pip install --editable .[all_extras,dev] - [ ] Review some of the important dependencies for developing aeon at a basic level: - [ ] __scikit-learn__ the interface aeon estimators extend from. We aim to keep as compatile as possible with sklearn tools. - [ ] __pytest__ for unit testing. Any code added will have to be covered by tests. - [ ] __sphinx/myst__ for documentation. Adding new functions and classes will have to be added to the API docs. - [ ] __numba__ for writing efficient functions. - [ ] Make a basic Pull Request (PR) to gain some experience with contributing to _aeon_ through GitHub. - [ ] Add the project time line objects to this document. ## Project overiew TSC with class imbalance Basic idea of predictive modelling with a few "positive" cases is an old one, but never addressed for time series classification. Related to few shot learning and to anomaly prediction. ### Year 1 objectives 1. Define TSC class imbalance problem 2. Specify a set of data to test algorithms for TSC-ci on 3. Evaluate current SOTA on TSC for TSC-ci 4. Specify the "anomaly prediction" use case, find real world example data and do a case study 5. Write comparative study paper for end of year 1 ### Year 2 objectives 1. Develop better algorithms to handle ci. Ensemble weighting schemes, recalibration etc 2. Relate this work to the related deep learning research, focus on deep learning TSC variants 3. Develop anomaly prediction as a new field, developed from anomaly detection 4. Write original contribution paper for end of year 2. **First meeting 3/10/24** items 1. Logistics: desk, money, meetings, pgrmanager etc 2. Getting started tasks Tasks 1. get up speed with aeon 2. do the same with tsml-eval https://github.com/time-series-machine-learning/tsml-eval 3. Background reading on class imbalance with machine learning 4. Install this, read the docs https://github.com/scikit-learn-contrib/imbalanced-learn 5. set up on overleaf, start draft background **to do** Tony to get some background references ## 8/10/24 1. Assess training needs 1.9 Coding skills. Programming: audit first year programming module 1.4. English for academic purposes (international PGRs) 1.5. Presentation skills 9.4. Teaching/other career development opportunities Next stages: 1. Class imbalance problem: 2. write basic literature review. SMOTE all class imbalance. Relationship between class imbalance/few shot learning 3. use sklearn-imbalance 4. create some class imbalance/few shot learning datasets, test some TSC algorithms weekly meeting: Friday 10:30am ## 10/10/24 Weekly objective: understanding SMOTE and ## 18/10/24 chris's this week work: 1: understanding smote. 2. implement the method in https://github.com/scikit-learn-contrib/imbalanced-learn using iridis 3. write little note use overleaf https://www.overleaf.com/8512297167stkwycfrhmnp#b7314e chris want to discuss: 1: current understanding of imbalanced data classification: focus on data augmentation. 2: first paper should it be a survey or other things. 3: thing about computer and monitor. Tony to do: 1. To ask about monitor 2. bring down the mac 3. Set up tsml-eval for imbalanced learning working area Chris to do: 1. UCR data processing (download, classify to 2 categories) 2. made a benchmark with smote and use some Elastic Distance Functions for Time Series Clustering 3. a bibtext file and collate references to using SMOTE for time series (https://www.overleaf.com/4191343229fkgrynbzxpkq#ba9ff0) ## 25/3/25 TSC with Class Imbalance – Research Plan ### 1. Data #### (1) Univariate Imbalanced Data - **Deadline:** Easter 2025 - Key questions: - Do we need to remove any datasets? - Set up resampling methods in `tsml-eval` #### (2) Multivariate Imbalanced Data - **Deadline:** Summer 2025 - Tasks: - Make an imbalanced multivariate archive #### (3) Big Data Imbalance - **Timeline:** Year 2 - Focus: Handling large-scale, imbalanced time series data using downsampling ### 2. Algorithms #### (1) SMOTE / Distance-based Extensions - **Timeline:** Summer - Focus areas: - Neighbours selection in MSM space - Sample generation guided by MSM path #### (2) Generative Models - **Timeline:** Year 2 - Focus: Explore generative approaches (GANs, diffusion models, etc.) for imbalanced TSC using softMSM #### (3) HIVE-COTE - **Timeline:** Year 2 - Key questions: - Investigate why HIVE-COTE performs worse with imbalanced data and why MultiRocket-Hydra not - Find components to compensate #### (4) Downsampling Algorithm - **Timeline:** Year 2 - For Big data imbalance