Intern Summer 2024 - Adam and forecasting

# Intern Summer 2024 - Adam and forecasting ###### tags: `aeon-intern` __Contributor:__ Adam Unal __Project:__ forecasting with aeon (TBC) __Project length:__ TBC __Mentors:__ Tony Bagnall, Matthew Middlehurst __Start date:__ 17/06/2024 __End date:__ TBC __Regular meeting time:__ 12:30, Friday ## Project Summary This project will investigate algorithms for forecasting based on traditional machine learning (tree based) and time series machine learning (transformation based). Note this project will not involve deep learning based forecasting. It will involve helping develop the aeon framework to work more transparently with ML algorithms, evaluating regression algorithms already in aeon[1] for forecasting problems and implementing at least one algorithm from the literature not already in aeon, such as SETAR-Tree [3]. [1] Guijo-Rubio, D.,Middlehurst, M., Arcencio, G., Furtado, D. and Bagnall, A. Unsupervised Feature Based Algorithms for Time Series Extrinsic Regression, arXiv2305.01429, 2023 [2] https://forecasters.org/resources/time-series-data/ [3] Godahewa, R., Webb, G.I., Schmidt, D. et al. SETAR-Tree: a novel and accurate tree algorithm for global time series forecasting. Mach Learn 112, 2555–2591 (2023). https://link.springer.com/article/10.1007/s10994-023-06316-x ## Project Timeline Project Stages: Learn about aeon best practices, coding standards and testing policies. Adapt the M competition set up [2] for ML experimental framework to assess time series regression algorithms [1]. Implement a machine learning forecasting algorithm [3] ## Getting started tasks - [x] Introduce yourself in the community Slack channels. Use __#introductions__ to introduce youself to the wider community if you have not already and __#summer-2024__ to introduce yourself and your project to other students and mentors. - [x] Go through the contributor guide on the _aeon_ website (https://www.aeon-toolkit.org/en/stable/contributing.html). - [x] Set up a development environment, including _pytest_ and _pre-commit_ dependencies. This will make development a lot easier for you, as you must pass the PR tests to have your code merged (https://www.aeon-toolkit.org/en/stable/developer_guide/dev_installation.html). - [x] Review some of the important dependencies for developing aeon at a basic level: - [x] __scikit-learn__ the interface aeon estimators extend from. We aim to keep as compatible as possible with sklearn tools. - [x] __pytest__ for unit testing. Any code added will have to be covered by tests. - [x] __sphinx/myst__ for documentation. Adding new functions and classes will have to be added to the API docs. - [x] __numba__ for writing efficient functions. - [x] Make some basic Pull Requests (PRs) to gain some experience with contributing to _aeon_ through GitHub. - [ ] Add the project time line objects to this document. # Make notes of progress here This is just an informal place to make notes on objectives and progress ## Meeting 21/6/24 discussed plans to remove existing forecasting module. Short term goals 1. Learn how to use nixtla 2. Form list of M4 competition datasets to use in evaluation 3. design preliminary experiment in tsml-eval ## 12/7/24 ported Hyndman's etscalc function and compared its output to Nixtla's implemetation. Short term goals: write a test function for fit_ets, set up a timing experiment for different length input and compare performance with and without Numba. ## 26/7/24 accomplished the short terms recently set of writing test functions for fit_ets as well as setting up a timing experiment to determine the effectiveness of using Numba. Two test functions were written. One checking the shape and type of the function's output, `test_fit_etsc`, and the other checking the correctness of the output for a given input against the output of Nixtla's implementation, `test_fit_etsc_output`. The timing experiment involved measuring the runtime of the function for different input sizes. This was done with and without numba to compare the speed difference. The results revealed that Numba reduced runtime roughly by a factor of 100 for all input sizes (see Figure below). ![timing_ets_plot](https://hackmd.io/_uploads/B1aEghUtA.png)