PhD 2024 - Chuanhang (Chris) Qiu

tags: `aeon-research`

Project: Rare class classification
Supervisors: Tony Bagnall, Matthew Middlehurst and Daniel Clark
Weekly meeting time: Friday 10:30

Project proposal
The primary objective of this research is to enhance the accuracy of TSC in few-shot scenarios, characterized by scarce labeled data. Additionally, this proposal aims to build an open-source framework incorporating various deep learning-based TSC methods. The goal is to systematically summarize these methods and make fair comparisons with non-deep learning approaches. Furthermore, the research will explore the methodology of training a large-scale model with high generalization in TSC benchmarks.

Getting started tasks

Go through the contributor guide on the aeon website (https://www.aeon-toolkit.org/en/stable/contributing.html).
Set up a development environment,
(https://www.aeon-toolkit.org/en/stable/developer_guide/dev_installation.html).

fork: clone locally: create environment:
pip install –editable .[all_extras,dev]

Review some of the important dependencies for developing aeon at a basic level:
- scikit-learn the interface aeon estimators extend from. We aim to keep as compatile as possible with sklearn tools.
- pytest for unit testing. Any code added will have to be covered by tests.
- sphinx/myst for documentation. Adding new functions and classes will have to be added to the API docs.
- numba for writing efficient functions.
Make a basic Pull Request (PR) to gain some experience with contributing to aeon through GitHub.
Add the project time line objects to this document.

Project overiew TSC with class imbalance

Basic idea of predictive modelling with a few "positive" cases is an old one, but never addressed for time series classification. Related to few shot learning and to anomaly prediction.

Year 1 objectives

1. Define TSC class imbalance problem
2. Specify a set of data to test algorithms for TSC-ci on
3. Evaluate current SOTA on TSC for TSC-ci
4. Specify the "anomaly prediction" use case, find real world example data and do a case study
5. Write comparative study paper for end of year 1

Year 2 objectives

1. Develop better algorithms to handle ci. Ensemble weighting schemes, recalibration etc
2. Relate this work to the related deep learning research, focus on deep learning TSC variants
3. Develop anomaly prediction as a new field, developed from anomaly detection
4. Write original contribution paper for end of year 2.

First meeting 3/10/24

items

Logistics: desk, money, meetings, pgrmanager etc
Getting started tasks

Tasks
1. get up speed with aeon
2. do the same with tsml-eval
https://github.com/time-series-machine-learning/tsml-eval
3. Background reading on class imbalance with machine learning
4. Install this, read the docs
https://github.com/scikit-learn-contrib/imbalanced-learn
5. set up on overleaf, start draft background

to do
Tony to get some background references

8/10/24

Assess training needs
1.9 Coding skills. Programming: audit first year programming module
1.4. English for academic purposes (international PGRs)
1.5. Presentation skills
9.4. Teaching/other career development opportunities

Next stages:

Class imbalance problem:
write basic literature review. SMOTE all class imbalance. Relationship between class imbalance/few shot learning
use sklearn-imbalance
create some class imbalance/few shot learning datasets, test some TSC algorithms

weekly meeting: Friday 10:30am

10/10/24

Weekly objective: understanding SMOTE and

18/10/24

chris's this week work:
1: understanding smote.
2. implement the method in https://github.com/scikit-learn-contrib/imbalanced-learn using iridis
3. write little note use overleaf https://www.overleaf.com/8512297167stkwycfrhmnp#b7314e

chris want to discuss:
1: current understanding of imbalanced data classification: focus on data augmentation.
2: first paper should it be a survey or other things.
3: thing about computer and monitor.

Tony to do:

To ask about monitor
bring down the mac
Set up tsml-eval for imbalanced learning working area

Chris to do:

UCR data processing (download, classify to 2 categories)
made a benchmark with smote and use some Elastic Distance Functions for Time Series Clustering
a bibtext file and collate references to using SMOTE for time series (https://www.overleaf.com/4191343229fkgrynbzxpkq#ba9ff0)

25/3/25 TSC with Class Imbalance – Research Plan

1. Data

(1) Univariate Imbalanced Data

Deadline: Easter 2025
Key questions:
- Do we need to remove any datasets?
- Set up resampling methods in tsml-eval

(2) Multivariate Imbalanced Data

Deadline: Summer 2025
Tasks:
- Make an imbalanced multivariate archive

(3) Big Data Imbalance

Timeline: Year 2
Focus: Handling large-scale, imbalanced time series data using downsampling

2. Algorithms

(1) SMOTE / Distance-based Extensions

Timeline: Summer
Focus areas:
- Neighbours selection in MSM space
- Sample generation guided by MSM path

(2) Generative Models

Timeline: Year 2
Focus: Explore generative approaches (GANs, diffusion models, etc.) for imbalanced TSC using softMSM

(3) HIVE-COTE

Timeline: Year 2
Key questions:
- Investigate why HIVE-COTE performs worse with imbalanced data and why MultiRocket-Hydra not
- Find components to compensate

(4) Downsampling Algorithm

Timeline: Year 2
For Big data imbalance