Forecasting module notes

# Forecasting module notes ###### tags: `aeon-research` Some notes on the design of the new forecasting module. The driving principles is to separate fit and predict as much as possible and conform to standard use cases. Start with a review of other packages ## nixtla https://github.com/Nixtla/statsforecast The code is generated from notebooks using NBS. Quick start says use this ```python from statsforecast import StatsForecast from statsforecast.models import AutoARIMA from statsforecast.utils import AirPassengersDF df = AirPassengersDF sf = StatsForecast( models = [AutoARIMA(season_length = 12)], freq = 'M' ) sf.fit(df) sf.predict(h=12, level=[95]) ``` This will only work with dataframe formatted in their specific way and not with a numpy array. From their docs https://nixtlaverse.nixtla.io/statsforecast/docs/getting-started/getting_started_complete.html The input to StatsForecast is always a dataframe in long format with three columns: unique_id, ds and y. The design principle is to train multiple models at once on a single series, so models must be a list. freq is also a required parameter. You can try build a model directly. Looking at ETS, if is due for deprecation and extends AutoETS. ```python= from statsforecast.models import ETS from aeon.datasets import load_airline airline = load_airline() a = airline.to_numpy() ets = ETS(season_length=4) ets.fit(a) x=ets.predict(h=1) print(x) print(type(x)) ``` *Note* 1. Must have horizon h in predict 2. Returns a dictionary {'mean': array([414.736988])} <class 'dict'> Tracing the exectution to code generated by nbs/src/core/models.ipynb fit has a more understandable interface ```python def fit( self, y: np.ndarray, X: Optional[np.ndarray] = None, ): r"""Fit the Exponential Smoothing model. Fit an Exponential Smoothing model to a time series (numpy array) `y` and optionally exogenous variables (numpy array) `X`. Parameters ---------- y : numpy.array Clean time series of shape (t, ). X : array-like Optional exogenous of shape (t, n_x). Returns ------- self : Exponential Smoothing fitted model. """ ``` the class contains a model_ object, created in `fit` by a call to `ets_f` ```python y = _ensure_float(y) self.model_ = ets_f( y, m=self.season_length, model=self.model, damped=self.damped, phi=self.phi ) self.model_["actual_residuals"] = y - self.model_["fitted"] self._store_cs(y=y, X=X) return self ``` function `ets_f` is in file ets generated by nbs/src/ets.ipynb. It does not use njit. It has all the ETS parameters as arguments, set to infinity if not passed and checks for a constant series. It has a weird model argument, which I will ignore for now, it defaults to none, I think it is for updating from previous fit. it does lots of validation and conversion of parameter arguments into ETS arguments. It then loops through parameter combinations ```python for etype in errortype: for ttype in trendtype: for stype in seasontype: for dtype in damped: ``` Ultimately comes down to this call ```python= fit = etsmodel( y, m, etype, ttype, stype, dtype, alpha, beta, gamma, phi, lower=lower, upper=upper, opt_crit=opt_crit, nmse=nmse, bounds=bounds, maxit=maxit, ) ``` all of the generated code is undocumented. `etsmodel` is not njit. It takes 19 parameters (last three ignored above), and returns a dictionary of 16 parameters. I strongly suspect it is adapted from ecscalc. But it is very confusing. Core structure seems to be ```python= # Initialise parameters par_ = initparam( alpha, beta, gamma, phi, trendtype, seasontype, damped, lower, upper, m, bounds ) # Initialise state init_state = initstate(y, m, trendtype, seasontype) # Do something to fred? fred = optimize_ets_target_fn(...) # Fit the actual model? amse, e, states, lik = pegelsresid_C( y, m, init_state, errortype, trendtype, seasontype, damped, alpha, beta, gamma, phi, nmse, ) # Update stats. pegelsresid_C(...) ``` here we get to actually fitting a model. ```python= lik = _ets.calc( x, e, amse, nmse, y, switch(errortype), switch(trendtype), switch(seasontype), alpha, beta, gamma, phi, m, ) ``` ``_ets.calc`` is almost certainly the adapted hyndman function ``etscalc``. However, at this point they obfuscate their code beyond my desire to look further. Think its using `pybind11_builtins` to hide the true source. ## statsmodels ## sktime