Try   HackMD

PhD 2024 - Chuanhang (Chris) Qiu

tags: aeon-research

Project: Rare class classification
Supervisors: Tony Bagnall, Matthew Middlehurst and Daniel Clark
Weekly meeting time: Friday 10:30

Project proposal
The primary objective of this research is to enhance the accuracy of TSC in few-shot scenarios, characterized by scarce labeled data. Additionally, this proposal aims to build an open-source framework incorporating various deep learning-based TSC methods. The goal is to systematically summarize these methods and make fair comparisons with non-deep learning approaches. Furthermore, the research will explore the methodology of training a large-scale model with high generalization in TSC benchmarks.

Getting started tasks

fork: clone locally: create environment:
pip install editable .[all_extras,dev]

  • Review some of the important dependencies for developing aeon at a basic level:
    • scikit-learn the interface aeon estimators extend from. We aim to keep as compatile as possible with sklearn tools.
    • pytest for unit testing. Any code added will have to be covered by tests.
    • sphinx/myst for documentation. Adding new functions and classes will have to be added to the API docs.
    • numba for writing efficient functions.
  • Make a basic Pull Request (PR) to gain some experience with contributing to aeon through GitHub.
  • Add the project time line objects to this document.

Project overiew TSC with class imbalance

Basic idea of predictive modelling with a few "positive" cases is an old one, but never addressed for time series classification. Related to few shot learning and to anomaly prediction.

Year 1 objectives

​​​​1. Define TSC class imbalance problem
​​​​2. Specify a set of data to test algorithms for TSC-ci on
​​​​3. Evaluate current SOTA on TSC for TSC-ci
​​​​4. Specify the "anomaly prediction" use case, find real world example data and do a case study
​​​​5. Write comparative study paper for end of year 1

Year 2 objectives

​​​​1. Develop better algorithms to handle ci. Ensemble weighting schemes, recalibration etc
​​​​2. Relate this work to the related deep learning research, focus on deep learning TSC variants
​​​​3. Develop anomaly prediction as a new field, developed from anomaly detection
​​​​4. Write original contribution paper for end of year 2.

First meeting 3/10/24

items

  1. Logistics: desk, money, meetings, pgrmanager etc
  2. Getting started tasks

Tasks
1. get up speed with aeon
2. do the same with tsml-eval
https://github.com/time-series-machine-learning/tsml-eval
3. Background reading on class imbalance with machine learning
4. Install this, read the docs
https://github.com/scikit-learn-contrib/imbalanced-learn
5. set up on overleaf, start draft background

to do
Tony to get some background references

8/10/24

  1. Assess training needs
    1.9 Coding skills. Programming: audit first year programming module
    1.4. English for academic purposes (international PGRs)
    1.5. Presentation skills
    9.4. Teaching/other career development opportunities

Next stages:

  1. Class imbalance problem:
  2. write basic literature review. SMOTE all class imbalance. Relationship between class imbalance/few shot learning
  3. use sklearn-imbalance
  4. create some class imbalance/few shot learning datasets, test some TSC algorithms

weekly meeting: Friday 10:30am

10/10/24

Weekly objective: understanding SMOTE and

18/10/24

chris's this week work:
1: understanding smote.
2. implement the method in https://github.com/scikit-learn-contrib/imbalanced-learn using iridis
3. write little note use overleaf https://www.overleaf.com/8512297167stkwycfrhmnp#b7314e

chris want to discuss:
1: current understanding of imbalanced data classification: focus on data augmentation.
2: first paper should it be a survey or other things.
3: thing about computer and monitor.

Tony to do:

  1. To ask about monitor
  2. bring down the mac
  3. Set up tsml-eval for imbalanced learning working area

Chris to do:

  1. UCR data processing (download, classify to 2 categories)
  2. made a benchmark with smote and use some Elastic Distance Functions for Time Series Clustering
  3. a bibtext file and collate references to using SMOTE for time series (https://www.overleaf.com/4191343229fkgrynbzxpkq#ba9ff0)

25/3/25 TSC with Class Imbalance – Research Plan

1. Data

(1) Univariate Imbalanced Data

  • Deadline: Easter 2025
  • Key questions:
    • Do we need to remove any datasets?
    • Set up resampling methods in tsml-eval

(2) Multivariate Imbalanced Data

  • Deadline: Summer 2025
  • Tasks:
    • Make an imbalanced multivariate archive

(3) Big Data Imbalance

  • Timeline: Year 2
  • Focus: Handling large-scale, imbalanced time series data using downsampling

2. Algorithms

(1) SMOTE / Distance-based Extensions

  • Timeline: Summer
  • Focus areas:
    • Neighbours selection in MSM space
    • Sample generation guided by MSM path

(2) Generative Models

  • Timeline: Year 2
  • Focus: Explore generative approaches (GANs, diffusion models, etc.) for imbalanced TSC using softMSM

(3) HIVE-COTE

  • Timeline: Year 2
  • Key questions:
    • Investigate why HIVE-COTE performs worse with imbalanced data and why MultiRocket-Hydra not
    • Find components to compensate

(4) Downsampling Algorithm

  • Timeline: Year 2
  • For Big data imbalance