Time Series Models

# Streamlining Time Series Analysis with sktime 楊仁樞 Ren-Shu Yang ## Introduction Time series analysis involves examining data points collected or observed over successive time intervals. This process aims to uncover underlying patterns, trends, and seasonal variations to predict future values and understand the mechanisms generating the data. However, analyzing time series data presents unique challenges that require specialized tools and techniques. ### Challenges in Time Series Analysis * **Non-stationarity:** A common hurdle in time series analysis is non-stationarity, where the statistical properties of the data, such as mean and variance, change over time. This violates the assumptions of many traditional statistical methods, making them less effective. * **Noise and Irregularities:** Time series data often contains noise, outliers, and irregular fluctuations that can obscure underlying patterns and make accurate analysis difficult. ### sktime to the Rescue! To address these challenges, we can leverage the power of machine learning with the `sktime` library. `sktime` is a user-friendly and versatile Python library specifically designed for tackling machine learning problems involving time series data. Think of it as the "scikit-learn" for time series, offering a unified and consistent interface for various time series tasks. #### Key Features and Capabilities * **Comprehensive Algorithms:** `sktime` provides a wide range of algorithms for time series classification, regression, clustering, forecasting, and more. This includes both classical statistical methods and modern machine learning approaches. * **Modular Design:** `sktime` follows a modular design, allowing you to easily combine different components (transformers, estimators, pipelines) to build complex workflows. This flexibility enables you to customize your analysis to suit your specific needs. * **Interoperability with scikit-learn:** `sktime` seamlessly integrates with scikit-learn, leveraging its familiar API and tools for tasks like hyperparameter tuning, model selection, and pipeline construction. * **Growing Ecosystem:** `sktime` is actively developed and maintained, with a growing community and expanding functionality. ## Time Series Models This section explores two fundamental time series models: ETS (Error, Trend, Seasonality) and ARIMA (Autoregressive Integrated Moving Average). We'll examine their components, provide code implementations with `sktime`, and illustrate their application using the airline dataset. ### ETS Model The ETS model decomposes a time series into three core components: **Error**, **Trend**, and **Seasonality**. This decomposition helps us understand the underlying patterns and forecast future values. * **Error:** Represents the random fluctuations or noise in the series. * **Trend:** Captures the long-term direction of the series, such as upward or downward movement. * **Seasonality:** Identifies repeating patterns with a fixed periodicity, like daily, weekly, or yearly cycles. #### Exponential Smoothing: A Building Block Exponential Smoothing is a foundational technique within the ETS framework. It forecasts future values by assigning exponentially decreasing weights to past observations. The simplest form uses the following formula: $$ \hat{y}_{t+1} = \alpha y_t + (1 - \alpha) \hat{y}_t $$ where: * $\hat{y}_{t+1},\hat{y}_t$ is the forecast for the next period. * $y_t$ is the actual value at the current period. * $\alpha$ is the smoothing factor ($0 \leq \alpha \leq 1$), controlling the weight given to the most recent observation. A higher α gives more weight to recent data. #### Code Example: ETS with sktime This code snippet demonstrates how to use `sktime`'s `AutoETS` to automatically select the best ETS model for the airline dataset and generate forecasts. ```python= import warnings import pandas as pd from sktime.datasets import load_airline from sktime.forecasting.ets import AutoETS from sktime.forecasting.model_selection import temporal_train_test_split # Suppress warnings warnings.filterwarnings('ignore') # Load the airline dataset y = load_airline() # Split data into training and testing sets y_train, y_test = temporal_train_test_split(y, test_size=10) # Initialize and fit the AutoETS model with explicit seasonal_periods forecaster = AutoETS(auto=False, seasonal="additive", sp=12) forecaster.fit(y_train) # Create a forecasting horizon forecast_horizon = [i + 1 for i in range(len(y_test))] # Generate predictions y_pred = forecaster.predict(fh=forecast_horizon) # Combine actual and predicted values into a DataFrame output = pd.DataFrame({ "Actual": y_test.values, "Predicted": y_pred.values }, index=y_test.index) # Print the results print("=== Actual vs Predicted Values ===") print(output) ``` ![d527a445-25d3-4a29-9948-64fa9dab32e1](https://hackmd.io/_uploads/Hk2fADZL1e.png) ``` === Actual vs Predicted Values === Actual Predicted Period 1960-03 419.0 451.994293 1960-04 461.0 438.698238 1960-05 472.0 457.430750 1960-06 535.0 511.220195 1960-07 622.0 578.053105 1960-08 606.0 578.834752 1960-09 508.0 474.773403 1960-10 461.0 416.029592 1960-11 390.0 368.198395 1960-12 432.0 404.993453 ``` ### ARIMA Model ARIMA stands for **Autoregressive Integrated Moving Average**. It's a versatile model that combines three components: * **Autoregressive (AR):** Models the relationship between an observation and a specified number of lagged observations. * **Integrated (I):** Incorporates differencing to make the time series stationary, a requirement for ARIMA modeling. * **Moving Average (MA):** Models the relationship between an observation and past forecast errors (residuals). #### Mathematical Representation * **$\rm{AR}(p)$ model:** $$x_t = \phi_1 x_{t-1} + \phi_2 x_{t-2} + \dots + \phi_p x_{t-p} + w_t$$ where $p$ is the order of the AR model, $\phi_i$ are the AR coefficients, and $w_t$ is white noise. * **$\rm{MA}(q)$ model:** $$x_t = w_t + \theta_1 w_{t-1} + \theta_2 w_{t-2} + \dots + \theta_q w_{t-q}$$ where $q$ is the order of the MA model, $\theta_i$ are the MA coefficients, and $w_t$ is white noise. * **$\rm{ARMA}(p, q)$ model:** Combines $\rm{AR}(p)$ and $\rm{MA}(q)$ models. $$x_t = \phi_1 x_{t-1} + \dots + \phi_p x_{t-p} + w_t + \theta_1 w_{t-1} + \dots + \theta_q w_{t-q}$$ * $\rm{ARIMA}(p, d, q)$ model: Includes the order of differencing $d$ to achieve stationarity. #### Code Example: ARIMA with sktime This code demonstrates how to apply an ARIMA model to the airline dataset using `sktime`. You can experiment with different values for `p`, `d`, and `q` to find the best ARIMA configuration for your data. ``` python= import warnings import pandas as pd from sktime.forecasting.arima import ARIMA from sktime.forecasting.base import ForecastingHorizon from sktime.datasets import load_airline from sktime.forecasting.model_selection import temporal_train_test_split from sktime.utils.plotting import plot_series import matplotlib.pyplot as plt # Suppress warnings warnings.filterwarnings('ignore') # Load the airline dataset y = load_airline() # Split data into training and testing sets y_train, y_test = temporal_train_test_split(y, test_size=6) # Define the forecasting horizon forecast_horizon = ForecastingHorizon(y_test.index, is_relative=False) # Initialize and fit the ARIMA model with specified parameters forecaster = ARIMA(order=(2, 1, 2)) # Specify ARIMA(p, d, q) forecaster.fit(y_train) # Generate predictions y_pred = forecaster.predict(fh=forecast_horizon) # Combine actual and predicted values into a DataFrame results = pd.DataFrame({ "Actual": y_test.values, "Predicted": y_pred.values }, index=y_test.index) # Output the results as text print("=== Actual vs Predicted Values ===") print(results) ``` ![d49ee66b-05b8-4f8f-8034-38e541024f24](https://hackmd.io/_uploads/Sk-sAwWIyx.png) ``` === Actual vs Predicted Values === Actual Predicted Period 1960-07 622.0 548.679570 1960-08 606.0 539.754674 1960-09 508.0 513.095688 1960-10 461.0 477.792642 1960-11 390.0 444.427132 1960-12 432.0 422.184452 ``` ## Advanced Concepts in Time Series Analysis This section delves into advanced concepts that enhance the accuracy and sophistication of time series analysis. ### Exogenous Features Exogenous features, also known as exogenous variables or external regressors, are variables that are not part of the time series itself, but can influence its behavior. These are external factors that provide additional information or context to the time series. Examples include: * Economic indicators (GDP, inflation, interest rates) for stock * Weather data (temperature, rainfall, snowfall) for sales volume * Social media trends for sales volume * Promotional events or holidays for sales volume #### Importance Incorporating exogenous features into time series analysis can significantly improve model accuracy and provide valuable insights. Here's why: * **Enhanced Predictive Power:** Exogenous features can capture variations in the time series that are not explained by its own past values. This leads to more accurate predictions. * **Causal Insights:** By including relevant exogenous features, you can uncover potential causal relationships between external factors and the time series. For example, you might find that increased marketing spending leads to higher sales. * **Improved Robustness:** Models that incorporate exogenous features are often more robust to changes in the underlying patterns of the time series. This is because they are less reliant on the assumption that past patterns will perfectly repeat in the future. #### Example: Longley Dataset * **Motivation:** The Longley dataset is a classic example in statistics and econometrics, often used to illustrate the challenges of multicollinearity (high correlation between predictor variables) and small sample sizes. It's helpful for demonstrating how exogenous features can improve model performance even with these challenges. * **Variables:** The Longley dataset includes annual economic data for the United States from 1947 to 1962. The variables are: * `TOTEMP`: Total employment (in thousands of persons) * `GNPDEFL`: GNP deflator (a measure of price inflation) * `GNP`: Gross National Product (in millions of dollars) * `UNEMP`: Number of unemployed (in thousands of persons) * `ARMED`: Number of people in the armed forces (in thousands) * `POP`: Noninstitutionalized population over 14 years of age (in thousands) ```python= import warnings import pandas as pd from sktime.datasets import load_longley from sktime.performance_metrics.forecasting import mean_absolute_percentage_error from sktime.forecasting.arima import ARIMA from sktime.forecasting.model_selection import temporal_train_test_split # Suppress warnings warnings.filterwarnings('ignore') # Load the Longley dataset y, X = load_longley() # Split the data into training and testing sets y_train, y_test, X_train, X_test = temporal_train_test_split(y, X, test_size=3) # Initialize ARIMA models with and without exogenous features forecaster_without_X = ARIMA() forecaster_with_X = ARIMA() # Define the forecasting horizon forecast_horizon = [1, 2, 3] # Fit the models forecaster_without_X.fit(y_train, fh=forecast_horizon) forecaster_with_X.fit(y_train, X=X_train, fh=forecast_horizon) # Make predictions y_pred_no_X = forecaster_without_X.predict() y_pred_with_X = forecaster_with_X.predict(X=X_test) # Evaluate performance (using Mean Absolute Percentage Error) mape_no_X = mean_absolute_percentage_error(y_test, y_pred_no_X) mape_with_X = mean_absolute_percentage_error(y_test, y_pred_with_X) # Print the MAPE results print(f"MAPE without exogenous features: {mape_no_X:.5f}") print(f"MAPE with exogenous features: {mape_with_X:.5f}") # Combine actual and predicted values into a DataFrame for comparison comparison = pd.DataFrame({ "Actual": y_test.values, "Predicted (No X)": y_pred_no_X.values, "Predicted (With X)": y_pred_with_X.values }, index=y_test.index) # Print the comparison table print("\n=== Actual vs Predicted Values ===") print(comparison) ``` ![B1cQcKtG1l](https://hackmd.io/_uploads/HJXuJOW8ke.png) ``` MAPE without exogenous features: 0.02825 MAPE with exogenous features: 0.02605 === Actual vs Predicted Values === Actual Predicted (No X) Predicted (With X) Period 1960 69564.0 68216.665473 70371.821285 1961 69331.0 67823.598319 70872.801273 1962 70551.0 67471.123718 73677.403123 ``` ### Probabilistic Forecasting Probabilistic forecasting goes beyond providing a single, definitive prediction. Instead, it offers a range of possible future outcomes, each with an associated probability. This approach acknowledges the inherent uncertainty in predicting the future and provides a more complete picture of what might happen. #### Key Concepts * **Probability Distributions:** Probabilistic forecasts often present results in the form of probability distributions. These distributions show the likelihood of different outcomes. For example, a forecast might indicate that there's a 60% chance of sales falling between $1 million and $1.2 million next quarter. * **Interval Forecasts:** These forecasts provide a range of values where the future outcome is likely to fall, along with a confidence level (e.g., a 95% prediction interval). The confidence level indicates how certain we are that the actual value will fall within that range. * **Scenarios and Risk Assessment:** Probabilistic forecasting allows you to explore different scenarios and assess the associated risks. By considering a range of possibilities, you can better prepare for different outcomes and make more informed decisions. #### Benefits * **Improved Decision-Making:** Probabilistic forecasts provide decision-makers with a more complete understanding of the uncertainties involved. This allows for more informed and robust decision-making, as strategies can be evaluated based on the likelihood of different outcomes. * **Risk Management:** By considering a range of possibilities and their probabilities, probabilistic forecasting helps with risk assessment and mitigation. You can identify potential risks and their likelihood, allowing you to take proactive steps to minimize their impact. * **Transparency and Realism:** Probabilistic forecasting acknowledges that the future is uncertain. This transparency provides a more realistic view of potential outcomes and avoids overconfidence in predictions. * **Enhanced Accuracy:** Probabilistic forecasting methods often use more sophisticated models that can capture complex patterns and dependencies in the data. This can lead to more accurate predictions compared to traditional point forecasts. #### Example: Prediction Intervals A prediction interval is a range of values where we expect the future value of a time series to fall with a certain level of confidence. For example, a 95% prediction interval means we are 95% confident that the actual future value will be within that interval. This code demonstrates how to generate forecasts with prediction intervals using `sktime`'s `AutoARIMA` model. The `return_pred_int=True` argument tells the `predict` method to return prediction intervals, and `coverage=[0.9]` specifies a 90% confidence level. The `plot_series` function is used to visualize the results, including the prediction intervals. ```python= import warnings import numpy as np import pandas as pd from sktime.forecasting.arima import AutoARIMA from sktime.datasets import load_airline from sktime.forecasting.model_selection import temporal_train_test_split # Suppress warnings warnings.filterwarnings('ignore') # Load the airline dataset y = load_airline() # Split into training and testing sets y_train, y_test = temporal_train_test_split(y, test_size=12) # Initialize and fit the AutoARIMA model forecaster = AutoARIMA(sp=12, suppress_warnings=True) # sp=12 for monthly seasonality forecaster.fit(y_train) # Define the forecasting horizon forecast_horizon = np.arange(1, len(y_test) + 1) # Generate predictions and prediction intervals y_pred = forecaster.predict(fh=forecast_horizon) prediction_intervals = forecaster.predict_interval(fh=forecast_horizon, coverage=[0.9]) pred_interval_90 = prediction_intervals['Number of airline passengers'][0.9] # Combine actual values, predicted values, and prediction intervals into a DataFrame results = pd.DataFrame({ "Actual": y_test.values, "Predicted": y_pred.values, "Lower Bound (90%)": pred_interval_90['lower'].values, "Upper Bound (90%)": pred_interval_90['upper'].values }, index=y_test.index) # Print the results print("=== Forecast Results with 90% Prediction Intervals ===") print(results) ``` ![rJjv02Xzyg](https://hackmd.io/_uploads/BJE6y_-UJl.png) ``` === Forecast Results with 90% Prediction Intervals === Actual Predicted Lower Bound (90%) Upper Bound (90%) Period 1960-01 417.0 419.967172 403.428671 436.505674 1960-02 391.0 399.849851 379.615533 420.084168 1960-03 419.0 457.994382 434.223590 481.765174 1960-04 461.0 444.474156 418.904153 470.044159 1960-05 472.0 464.789179 437.942467 491.635891 1960-06 535.0 514.125441 486.525883 541.724999 1960-07 622.0 587.804170 559.706498 615.901842 1960-08 606.0 597.010823 568.602835 625.418811 1960-09 508.0 499.531269 470.921824 528.140714 1960-10 461.0 442.359679 413.622040 471.097318 1960-11 390.0 396.410217 367.589784 425.230650 1960-12 432.0 438.651595 409.778046 467.525145 ``` ### Forecasting Pipelines A forecasting pipeline is a structured and sequential workflow that automates the process of analyzing time series data and generating predictions. It's like an assembly line for time series analysis, where each step prepares the data or model for the next step. This approach is widely used in various fields, including finance, supply chain management, and weather forecasting, to improve the efficiency and reliability of predictions. #### Standard Steps A typical forecasting pipeline consists of the following steps: 1. **Data Collection:** Gathering the relevant historical time series data. 2. **Data Preprocessing:** Cleaning and preparing the data, which includes handling missing values, smoothing outliers, and transforming the data into a suitable format for modeling. 3. **Feature Engineering:** Creating new features from the raw data to improve model accuracy. This might involve adding lagged variables, calculating rolling statistics, or incorporating external data sources. 4. **Model Selection:** Choosing an appropriate forecasting model based on the characteristics of the data and the forecasting task. 5. **Training and Validation:** Training the chosen model on a portion of the data and validating its performance on a separate held-out set. This helps ensure the model generalizes well to new, unseen data. 6. **Evaluation:** Assessing the model's performance using appropriate metrics (e.g., Mean Absolute Error, Root Mean Squared Error). #### Advantages Using forecasting pipelines offers several benefits: * **Efficiency and Automation:** Pipelines automate repetitive tasks, saving time and effort. This allows you to quickly experiment with different models and configurations. * **Consistency and Accuracy:** By following a structured process, pipelines minimize the risk of errors and ensure that each step is performed consistently. This leads to more reliable and reproducible results. * **Reproducibility:** Pipelines make it easy to reproduce your analysis and share it with others. This is essential for collaboration and ensuring the integrity of your results. #### Example: Airline Dataset **Goal:** To demonstrate a forecasting pipeline, we'll use the `sktime` library to forecast the number of airline passengers. This code demonstrates a simple forecasting pipeline in `sktime`. It includes steps for log-transforming the data, removing seasonality, and applying an ARIMA model. The `TransformedTargetForecaster` makes it easy to chain these steps together. ```python= import warnings import pandas as pd from sktime.forecasting.compose import TransformedTargetForecaster from sktime.transformations.series.boxcox import LogTransformer from sktime.transformations.series.detrend import Deseasonalizer from sktime.datasets import load_airline from sktime.forecasting.model_selection import temporal_train_test_split from sktime.forecasting.arima import AutoARIMA # Suppress warnings warnings.filterwarnings('ignore') # Load the airline dataset y = load_airline() # Split the data into training and testing sets y_train, y_test = temporal_train_test_split(y, test_size=12) # Define the forecasting horizon forecast_horizon = list(range(1, len(y_test) + 1)) # Create a forecasting pipeline forecaster = TransformedTargetForecaster([ ("log", LogTransformer()), # Log transform the data ("deseasonalize", Deseasonalizer(sp=12, model="multiplicative")), # Remove seasonality ("arima", AutoARIMA(sp=12, suppress_warnings=True)) # Apply an ARIMA model ]) # Fit the forecaster to the training data forecaster.fit(y_train) # Generate forecasts y_pred = forecaster.predict(fh=forecast_horizon) # Combine actual and predicted values into a DataFrame results = pd.DataFrame({ "Date": y_test.index, "Actual": y_test.values, "Predicted": y_pred.values }) # Print the results print("=== Forecast Results ===") print(results) ``` ![Bkhok_mMke](https://hackmd.io/_uploads/Sk_1xuZ81g.png) ``` === Forecast Results === Date Actual Predicted 0 1960-01 417.0 419.786274 1 1960-02 391.0 410.730031 2 1960-03 419.0 485.746060 3 1960-04 461.0 468.291993 4 1960-05 472.0 478.422005 5 1960-06 535.0 551.858214 6 1960-07 622.0 625.563411 7 1960-08 606.0 628.724658 8 1960-09 508.0 540.088068 9 1960-10 461.0 466.099623 10 1960-11 390.0 402.834730 11 1960-12 432.0 462.846638 ``` ### Parameter Estimation Parameter estimation is the process of finding the optimal values for the parameters of a time series model. These parameters control the model's behavior and how it captures patterns in the data. Accurate parameter estimation is essential for: * **Model Fit and Accuracy:** The estimated parameters determine how well the model fits the historical data. Accurate estimation leads to a model that closely reflects the underlying patterns, resulting in lower errors and better predictions. * **Forecasting Precision:** Parameters influence how the model extrapolates patterns into the future. Well-estimated parameters improve the accuracy and reliability of forecasts. * **Stability and Robustness:** Models with accurately estimated parameters are more stable and less prone to overfitting (fitting the noise in the data) or underfitting (failing to capture important patterns). This makes the model more generalizable to new, unseen data. #### Point Estimation This approach provides a single, "best-guess" value for each parameter. It's like finding the most likely value based on the available data. * **Pros:** * Simple to understand and interpret. * Computationally efficient. * **Cons:** * Doesn't provide information about the uncertainty associated with the estimate. * Can be misleading if the sample size is small or the model assumptions are violated. #### Interval Estimation This method provides a range of plausible values for a parameter, along with a confidence level. This range, called a confidence interval, quantifies the uncertainty in the estimate. * **Pros:** * Provides a measure of uncertainty, giving a more complete picture of the parameter. * Useful for decision-making and risk assessment, as it allows you to consider a range of possibilities. * **Cons:** * Can be more computationally intensive. * Requires careful interpretation of confidence intervals. ##### Comparison and Use Cases * **Point estimation** is often used when a quick and easily interpretable estimate is needed, and you have high confidence in the data and model assumptions. * **Interval estimation** is more appropriate when it's important to understand the uncertainty in the parameter estimates, especially when dealing with limited data or high-stakes decisions. #### Code Example This code demonstrates how to access the estimated parameters of an `AutoARIMA` model in `sktime` using `get_fitted_params()`. It also shows how to calculate the MAPE to evaluate the model's forecasting accuracy. The estimated parameters and the MAPE provide insights into how well the model has learned from the data and its ability to generate accurate forecasts. ```python= import warnings from sktime.performance_metrics.forecasting import mean_absolute_percentage_error from sktime.datasets import load_airline from sktime.forecasting.model_selection import temporal_train_test_split from sktime.forecasting.arima import AutoARIMA # Suppress warnings warnings.filterwarnings('ignore') # Load the airline dataset y = load_airline() # Split the data into training and test sets y_train, y_test = temporal_train_test_split(y, test_size=12) # Initialize and fit an AutoARIMA model forecaster = AutoARIMA(sp=12, suppress_warnings=True) forecaster.fit(y_train) # Access the estimated parameters fitted_params = forecaster.get_fitted_params() print("Estimated parameters:", fitted_params) # Define the forecasting horizon forecast_horizon = list(range(1, len(y_test) + 1)) # Generate forecasts y_pred = forecaster.predict(fh=forecast_horizon) # Calculate the Mean Absolute Percentage Error (MAPE) mape = mean_absolute_percentage_error(y_test, y_pred) print(f"Mean Absolute Percentage Error (MAPE): {mape:.2f}") ``` ``` Estimated parameters: {'intercept': 5.534100348886643, 'ar.L1': 0.7048918277963387, 'ar.L2': 0.25742151823961595, 'ar.L3': -0.14344774118809428, 'sigma2': 101.09689149992465, 'order': (3, 0, 0), 'seasonal_order': (0, 1, 0, 12), 'aic': 905.6860046595817, 'aicc': 906.2123204490554, 'bic': 919.623463373492, 'hqic': 911.3460709571818} Mean Absolute Percentage Error (MAPE): 0.03 ``` ### Parameter Tuning Parameter tuning is the process of finding the optimal values for a model's hyperparameters to improve its performance on a given task. Unlike model parameters, which are learned from the training data,hyperparameters are set before training and control how the model learns or its structure. #### Key Concepts in Parameter Tuning * **Hyperparameters:** These are values that govern the behavior of the learning algorithm or the architecture of the model. Examples include: * Learning rate in machine learning models. * Number of lags in an ARIMA model for time series. * Depth of decision trees in ensemble models like Random Forest. * **Tuning Process:** The goal of tuning is to optimize hyperparameters to minimize a model's error or maximize its performance on a validation dataset. Popular techniques include: * Grid Search: Exhaustively tests combinations of hyperparameter values. * Random Search: Randomly samples hyperparameter combinations for testing. #### Advantages of Parameter Tuning * **Improved Model Performance:** Properly tuned hyperparameters can significantly enhance a model’s accuracy, precision, or other performance metrics. In time series forecasting, well-tuned parameters can better capture trends and seasonality, improving forecast reliability. * **Reduced Overfitting and Underfitting:** Tuning allows finding the right balance between model complexity and generalization. For instance, in decision trees, depth tuning prevents overfitting on training data. * **Better Resource Utilization:** Optimizing hyperparameters ensures that computational resources (e.g., training time and memory) are used efficiently by reducing unnecessary experimentation with suboptimal settings. * **Adaptation to Specific Problems:** Tailoring hyperparameters to the dataset or specific problem ensures the model is appropriately configured, leading to domain-specific insights or performance. #### Summary parameter tuning is a critical step in developing high-performing models, allowing them to maximize predictive power, adapt to data characteristics, and efficiently utilize resources. #### Code Implementation ```python= import warnings from sktime.split import ExpandingWindowSplitter from sktime.utils import plot_windows from sktime.datasets import load_shampoo_sales from sktime.split import temporal_train_test_split from sktime.forecasting.ets import AutoETS from sktime.forecasting.model_selection import ForecastingGridSearchCV from sktime.performance_metrics.forecasting import MeanAbsolutePercentageError # Suppress warnings warnings.filterwarnings('ignore') # Load the shampoo sales dataset y = load_shampoo_sales() # Split the data into training and testing sets y_train, y_test = temporal_train_test_split(y, test_size=6) # Initialize the forecaster (AutoETS with manual parameter search) forecaster = AutoETS(auto=False) # Define the parameter grid for manual search param_grid = { "error": ["add", "mul"], "trend": ["add", "mul"], "damped_trend": [True, False], "seasonal": ["add", "mul"], "sp": [4, 6, 12], } # Set up the expanding window cross-validation splitter cv = ExpandingWindowSplitter(fh=range(1, 7), initial_window=12, step_length=1) # Define the scoring metric (symmetric MAPE) smape = MeanAbsolutePercentageError(symmetric=True) # Perform grid search cross-validation gscv = ForecastingGridSearchCV( forecaster=forecaster, cv=cv, param_grid=param_grid, scoring=smape, n_jobs=-1 ) # Fit the grid search on the training data gscv.fit(y_train, fh=range(1, 7)) # Extract the best score and corresponding parameters best_score = gscv.best_score_ best_params = gscv.best_params_ # Print results print("=== Grid Search Results ===") print(f"Best symmetric MAPE score: {best_score:.6f}") print("Best parameters:") for param, value in best_params.items(): print(f"{param}: {value}") ``` ![HyQpMMnmyx](https://hackmd.io/_uploads/Hyamx_W81e.png) ``` === Expanding Window Splits === Split 1: Train indices: [ 0 1 2 3 4 5 6 7 8 9 10 11] Test indices: [12 13 14 15 16 17] ... Split 13: Train indices: [ 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23] Test indices: [24 25 26 27 28 29] ``` #### Prediction results ``` === Grid Search Results === Best symmetric MAPE score: 0.080676 Best parameters: damped_trend: False error: add seasonal: add sp: 12 trend: mul === Prediction Comparison === Date Actual Predicted (Auto) Predicted (Tuned) 0 1993-07 575.5 462.347728 483.652040 1 1993-08 407.6 481.534237 535.507483 2 1993-09 682.0 500.720745 522.489273 3 1993-10 475.3 519.907254 569.305179 4 1993-11 581.3 539.093762 611.663984 5 1993-12 646.9 558.280271 586.641418 === Performance Metrics === sMAPE (Auto): 0.17% sMAPE (Tuned): 0.17% ``` ### Graphical Pipelines A graphical pipeline refers to a visual, structured approach to designing, configuring, and executing data workflows or processes. It represents the sequence of operations in a graph format, where nodes symbolize tasks, processes, or computations, and edges define the dependencies or data flow between these tasks. #### Key component of Graphical Piplines * **Nodes:** Represent individual tasks or steps (e.g., data preprocessing, model training). * **Edges:** Show the flow of data or dependencies between nodes. * **Directed Acyclic Graphs (DAGs):** The pipelines are often structured as DAGs to ensure there are no cyclic dependencies. #### Benefits of Graphical Pipelines * **Visual Clarity:** Complex workflows are easier to understand, manage, and debug with a visual representation. Facilitates communication between technical and non-technical stakeholders. * **Modularity and Reusability:** Tasks (nodes) can often be reused across different workflows. Modular design allows easier updates or scaling of specific components. * **Ease of Debugging:** Errors or bottlenecks in a workflow can be quickly identified by analyzing the flow graph. * **Scalability:** Many graphical pipeline tools allow for distributed computation, automatically managing dependencies and parallelism. * **Integration with Automation:** Pipelines can be automated for tasks like scheduling jobs, retraining models, or updating dashboards. Triggers and conditional logic allow dynamic workflows. * **Flexibility:** Graphical pipelines can accommodate diverse use cases, from ETL (Extract, Transform, Load) processes to machine learning workflows. They support branching and looping, offering a high degree of customization. * **Lower Barrier to Entry:** Visual interfaces are intuitive for beginners and reduce the learning curve for adopting complex workflows. #### Code Implementation **How to build a Graphical Pipline** Let us first visualise a simple forecasting pipeline, and we want to construct: ![forecasting_pipeline](https://hackmd.io/_uploads/SynYtuh7Je.png) ##### Method 1: Pass all steps to the pipeline during initialisation as for the sequential pipelines. ```python= import warnings from sktime.forecasting.sarimax import SARIMAX from sktime.pipeline.pipeline import Pipeline from sktime.transformations.series.difference import Differencer # Suppress warnings warnings.filterwarnings('ignore') # Initialize the differencer transformation differencer = Differencer() # Create the general pipeline general_pipeline = Pipeline( [ { "skobject": differencer, "name": "differencer", "edges": {"X": "y"} }, { "skobject": SARIMAX(), "name": "sarimax", "edges": {"X": "X", "y": "differencer"} }, { "skobject": differencer, "name": "differencer_inv", "edges": {"X": "sarimax"}, "method": "inverse_transform" }, ] ) ``` **A more interesting example** ##### Method 2: Create a pipeline object and add the steps one by one. ```python= import warnings from sktime.pipeline.pipeline import Pipeline from sktime.transformations.series.difference import Differencer from sktime.forecasting.sarimax import SARIMAX # Suppress warnings warnings.filterwarnings('ignore') # Initialize the pipeline general_pipeline = Pipeline() # Initialize the differencer transformation differencer = Differencer() # Add the differencing step to the pipeline general_pipeline = general_pipeline.add_step( differencer, name="differencer", edges={"X": "y"} ) # Add the SARIMAX forecasting step to the pipeline general_pipeline = general_pipeline.add_step( SARIMAX(), name="sarimax", edges={"X": "X", "y": "differencer"} ) # Add the inverse differencing step to the pipeline general_pipeline = general_pipeline.add_step( differencer, name="differencer_inv", edges={"X": "sarimax"}, method="inverse_transform" ) ``` **Parameter explanation** * `skobject`: The sktime object added to the pipeline * `name`: The name of the step * `edges`: The keys $(X\,\text{or}\, y)$ of the dictionary indicate the input of the skobject, and the values are the names of the steps that should be connected to the input argument. * We can think $\{X:y\}$ as for $X$ means the Exogenous, and $y$ means the input data. As for $\{\text{"X": "X", "y": "differencer"}\}$, we can think the $y$ in $\{\text{"y":"differencer"}\}$ as the Target input as picture shown. Also, the input is the result from differencer. * `method`: The skobject’s method that should be called. If not provided, the default method would be inferred based on the added skobject. This parameter is used for the inverse_transform method. Optional. * `kwargs`: Additional keyword arguments passed to the sktime object. Optional. ##### Prediction ```python= import warnings import pandas as pd from sktime.forecasting.sarimax import SARIMAX from sktime.pipeline.pipeline import Pipeline from sktime.transformations.series.difference import Differencer from sktime.datasets import load_longley from sktime.forecasting.model_selection import temporal_train_test_split # Suppress warnings warnings.filterwarnings('ignore') # Initialize the differencer transformation differencer = Differencer() # Define the pipeline general_pipeline = Pipeline( [ { "skobject": differencer, "name": "differencer", "edges": {"X": "y"} }, { "skobject": SARIMAX(), "name": "sarimax", "edges": {"X": "X", "y": "differencer"} }, { "skobject": differencer, "name": "differencer_inv", "edges": {"X": "sarimax"}, "method": "inverse_transform" }, ] ) # Load the Longley dataset y, X = load_longley() # Split the data into training and testing sets y_train, y_test, X_train, X_test = temporal_train_test_split(y, X) # Fit the pipeline on the training data general_pipeline.fit(y=y_train, X=X_train, fh=[1, 2, 3, 4]) # Generate predictions y_pred = general_pipeline.predict(X=X_test) # Combine true and predicted values into a DataFrame results_df = pd.DataFrame({ "True Values": y_test.values, "Predicted Values": y_pred.values }) # Output the results as a table print("True and Predicted Values:") print(results_df) ``` ``` True and Predicted Values: True Values Predicted Values 0 68655.0 67213.229394 1 69564.0 68326.992339 2 69331.0 68736.176458 3 70551.0 71320.567457 ``` ##### An interesting example ![graphical_pipeline_example](https://hackmd.io/_uploads/Bkk_Q4a7Jl.png) ```python= import warnings from sktime.pipeline.pipeline import Pipeline from sklearn.preprocessing import StandardScaler from sktime.transformations.series.adapt import TabularToSeriesAdaptor from sktime.transformations.series.detrend import Deseasonalizer from sklearn.linear_model import Ridge from sktime.forecasting.compose import make_reduction from sktime.datasets import load_macroeconomic from sktime.forecasting.base import ForecastingHorizon from sktime.forecasting.model_selection import temporal_train_test_split # Suppress warnings warnings.filterwarnings("ignore") # Initialize the pipeline pipe = Pipeline() # Add steps to the pipeline pipe = pipe.add_step( TabularToSeriesAdaptor(StandardScaler()), name="scaler", edges={"X": "X__unemp"}, ) pipe = pipe.add_step( Deseasonalizer(sp=4), name="deseasonalizer", edges={"X": "X__realgdp_realdpi"}, ) pipe = pipe.add_step( make_reduction(Ridge(), windows_identical=False, window_length=5), name="forecaster_gdp", edges={"y": "deseasonalizer__realgdp"}, ) pipe = pipe.add_step( make_reduction(Ridge(), windows_identical=False, window_length=5), name="forecaster_dpi", edges={"y": "deseasonalizer__realdpi"}, ) pipe = pipe.add_step( make_reduction(Ridge(), windows_identical=False, window_length=5), name="forecaster_unemp", edges={ "y": "scaler__unemp", "X": [ "forecaster_gdp", "forecaster_dpi", ], }, ) pipe = pipe.add_step( make_reduction(Ridge(), windows_identical=False, window_length=5), name="forecaster_inflation", edges={"X": ["forecaster_dpi", "forecaster_unemp"], "y": "y"}, ) # Load the macroeconomic dataset data = load_macroeconomic() # Prepare features and target X = data[["realgdp", "realdpi", "unemp"]] y = data[["infl"]] # Define forecasting horizon fh = ForecastingHorizon([1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12]) # Split data into training and testing sets y_train, y_test, X_train, X_test = temporal_train_test_split(y, X=X, fh=fh) # Fit the pipeline pipe.fit(y=y_train, X=X_train, fh=fh) # Predict using the pipeline result = pipe.predict(X=None, fh=y_test.index) # Output results in programmatic form print("Predicted Values:") print(result) ``` ``` Predicted Values: infl Period 2006Q4 3.090428 2007Q1 1.676421 2007Q2 0.219586 2007Q3 1.570087 2007Q4 0.350137 2008Q1 0.438966 2008Q2 0.615457 2008Q3 0.119022 2008Q4 0.257887 2009Q1 0.129785 2009Q2 -0.056094 2009Q3 -0.066123 ``` ## Advanced Modelling Features ### Parallelism for multivariate and hierarchical broadcasting Broadcasting is a method in computer programming, particularly in numerical computing and parallel processing, that allows operations to be applied to datasets with different shapes or dimensions by automatically expanding smaller datasets to match the shape of larger ones. **Multivariate Broadcasting:** Multivariate broadcasting refers to extending the concept of broadcasting to operations involving multiple variables (datasets) of differing shapes. Instead of just aligning one smaller dataset to another, multiple smaller datasets are expanded to align with the largest dimensions among them. #### Mechanism * All datasets are aligned by expanding (not copying) their dimensions to match the target shape. * Broadcasting rules are applied across all dimensions simultaneously. * Operations are executed element-wise on the expanded datasets. #### Parallelism in Multivariate Broadcasting: Parallel processing can split the computations across processors. Each processor handles a subset of the expanded datasets, enabling simultaneous computation and reducing the overall execution time. ##### Advantages * Reduces memory usage since data expansion is virtual . * Speeds up computation by leveraging multiple processors. * Simplifies coding for large-scale numerical problems, such as matrix algebra or multivariate statistics. #### Hierarchical Broadcasting Hierarchical broadcasting extends broadcasting to datasets organized in a hierarchy, such as nested arrays, tensors, or multi-dimensional structures where lower-level datasets need to align with higher-level structures. ##### Mechanism * Operate within each level of the hierarchy independently. * Broadcast the data across levels as necessary, ensuring compatibility between parent and child structures. * Combine results from lower levels hierarchically to form the final output. #### Parallelism in Hierarchical Broadcasting Hierarchical structures can be divided across processors. Each processor focuses on one part of the hierarchy, allowing computations at different levels to proceed independently and concurrently. ##### Advantages * Enables efficient computation for complex structures like tensors, neural networks, and hierarchical datasets. * Exploits nested patterns for operations, such as in multi-resolution analysis or multi-level data processing. * Allows seamless scaling to distributed systems, making it suitable for big data applications. ##### Benefits of Parallelism in Broadcasting * **Enhanced Performance:** * By distributing workloads across multiple processors, parallelism accelerates computations. * Ideal for large datasets and high-dimensional problems. * **Memory Efficiency:** * Virtual broadcasting avoids unnecessary memory duplication, even when parallelism is applied. * **Simplified Coding:** * Parallel broadcasting abstracts the complexity of aligning and distributing data, allowing programmers to focus on the logic of their operations. * **Scalability:** * Works effectively with distributed computing systems, such as clusters or cloud-based environments. * **Flexibility in Applications:** * Supports a wide range of applications, including numerical simulation, machine learning, and hierarchical data analysis. #### Summary By combining broadcasting with parallelism, both multivariate and hierarchical computations become more efficient, scalable, and user-friendly, catering to diverse domains in scientific computing and data science. #### Code Implementation ```python= import warnings import pandas as pd from sktime.forecasting.arima import ARIMA from sktime.datasets import load_longley # Suppress warnings warnings.filterwarnings("ignore") # Initialize the ARIMA forecaster forecaster = ARIMA() # Load the Longley dataset and drop unnecessary columns _, y = load_longley() y = y.drop(columns=["UNEMP", "ARMED", "POP"]) # Set parallel backend configuration forecaster.set_config(**{"backend:parallel": "loky"}) # or "multiprocessing" / "dask" forecaster.set_config(**{"backend:parallel:params": {"n_jobs": 2}}) # Fit the forecaster on the data forecaster.fit(y, fh=[1, 2, 3]) # Access the forecasters used forecasters = forecaster.forecasters_ # Convert forecasters to pandas DataFrame for better visualization forecasters_df = pd.DataFrame( [(key, str(value)) for key, value in forecasters.items()], columns=["Variable", "Forecaster"] ) # Output the forecasters as a DataFrame print("Forecasters DataFrame:") print(forecasters_df) ``` ``` Forecasters DataFrame: Variable Forecaster 0 GNPDEFL forecasters ARIMA()\nName: GNPDEFL, dtype: ... 1 GNP forecasters ARIMA()\nName: GNP, dtype: object ``` #### Hierarchy Example ![hierarchy](https://hackmd.io/_uploads/r1rEVH_V1l.png) ```python= import numpy as np import pandas as pd from sktime.forecasting.ets import AutoETS from sktime.utils._testing.hierarchical import _make_hierarchical from sktime.forecasting.model_selection import temporal_train_test_split def load_product_hierarchy(): """ Generate hierarchical sales data with two levels of hierarchy and apply preprocessing steps. """ # Generate 5 years of daily sales data n_years = 5 y = ( _make_hierarchical( hierarchy_levels=(2, 4), min_timepoints=365 * n_years, max_timepoints=365 * n_years, random_state=0, ) .drop( index=[ ("h0_0", "h1_2"), ("h0_0", "h1_3"), ("h0_1", "h1_0"), ("h0_1", "h1_1"), ] ) .rename( index={ "h0_0": "Food preparation", "h0_1": "Food preservation", "h1_0": "Hobs", "h1_1": "Ovens", "h1_2": "Fridges", "h1_3": "Freezers", } ) .reset_index() .rename( columns={ "h0": "Product line", "h1": "Product group", "time": "Date", "c0": "Sales", } ) ) # Convert date to monthly frequency and aggregate sales y["Date"] = y["Date"].dt.to_period("M") y = y.groupby(by=["Product line", "Product group", "Date"]).sum() # Add random noise to simulate variability noise = np.random.RandomState(seed=0).normal(1, 0.3, np.shape(y)) y = (y * noise).round(0) return y # Load the generated product hierarchy data y = load_product_hierarchy() # Split data into training and testing sets y_train, y_test = temporal_train_test_split(y, test_size=4) # Display the first few rows of the training set print(y_train.head()) # Query sales data for January 2000 result = y.loc[(slice(None), slice(None), "2000-01")] # Display the queried data print("Sales data for January 2000:") print(result) # Assuming `y_pred` is the prediction result from a forecaster # Example forecaster forecaster = AutoETS(auto=True) forecaster.fit(y_train, fh=[1, 2, 3, 4]) y_pred = forecaster.predict() # Convert prediction to a DataFrame and display y_pred_df = y_pred.reset_index() print("Predicted Sales Data:") print( pd.DataFrame({ "Product line": y_pred_df["Product line"].tolist(), "Product group": y_pred_df["Product group"].tolist(), "Date": y_pred_df["Date"].tolist(), "Predicted Sales": y_pred_df["Sales"].round(2).tolist() }).to_string(index=False) ) ``` ``` Sales Product line Product group Date Food preparation Hobs 2000-01 245.0 2000-02 144.0 2000-03 184.0 2000-04 265.0 2000-05 236.0 Sales data for January 2000-01: Product line Product group Sales Food preparation Hobs 245.0 Ovens 114.0 Food preservation Freezers 164.0 Fridges 136.0 Predicted Sales Data: Product line Product group Date Predicted Sales Food preparation Hobs 2004-09 135.50 Food preparation Hobs 2004-10 135.49 Food preparation Hobs 2004-11 135.47 Food preparation Hobs 2004-12 135.46 Food preparation Ovens 2004-09 182.82 Food preparation Ovens 2004-10 183.98 Food preparation Ovens 2004-11 185.14 Food preparation Ovens 2004-12 186.30 Food preservation Freezers 2004-09 148.41 Food preservation Freezers 2004-10 148.41 Food preservation Freezers 2004-11 148.41 Food preservation Freezers 2004-12 148.41 Food preservation Fridges 2004-09 139.04 Food preservation Fridges 2004-10 139.04 Food preservation Fridges 2004-11 139.04 Food preservation Fridges 2004-12 139.04 ``` ### `skpro`-Alternative method for statistic forecasting #### Overview `skpro` is a Python library designed for probabilistic forecasting, offering tools to predict, evaluate, and handle uncertainty in outcomes. It extends the familiar Scikit-learn API to enable users to incorporate probabilistic modeling seamlessly. ##### Features * **Probabilistic Model Framework:** * Models generate probability distributions as predictions. * Supports various distribution types and custom configurations. * **Evaluation Metrics:** * Includes probabilistic-specific metrics like Continuous Ranked Probability Score (CRPS) to evaluate forecast quality. * These metrics assess both accuracy and uncertainty estimation. * **Flexible Pipelines:** * Users can build modular pipelines for probabilistic forecasting, leveraging Scikit-learn's interoperability. * **Applications:** * Stock price distribution prediction for risk analysis. * Probabilistic weather models for emergency planning. * Demand forecasting for supply chain optimization. ##### Benefits of Probabilistic Forecasting with `skpro` * **Incorporates Uncertainty:** Provides decision-makers with a better understanding of risks and variances, aiding in robust planning. * **Improves Decision Quality:** Enables strategies like hedging against worst-case scenarios in finance or supply chain buffering. * **Broad Applicability:** Suitable for time series forecasting, regression tasks, and real-world problems involving uncertain outcomes. * **Scalable and Flexible:** Leverages Scikit-learn's ecosystem, allowing users to integrate probabilistic models into existing workflows with minimal learning curve. * **Enhanced Evaluation:** By using metrics tailored to probabilistic outputs, models can be rigorously assessed beyond point prediction accuracy. ##### Summary: Probabilistic forecasting, with its focus on distribution outputs, represents a leap forward from deterministic predictions by embracing uncertainty. Tools like skpro empower practitioners to model and evaluate uncertainty, making them invaluable for applications requiring risk management and strategic decision-making. ##### Code Implement ```python= import numpy as np import matplotlib.pyplot as plt from sklearn.datasets import load_diabetes from sklearn.ensemble import RandomForestRegressor from sklearn.linear_model import LinearRegression from sklearn.model_selection import train_test_split from skpro.regression.residual import ResidualDouble from skpro.utils.plotting import plot_crossplot_interval # Load the diabetes dataset X, y = load_diabetes(return_X_y=True, as_frame=True) # Split the data into training and testing sets X_train, X_test, y_train, y_test = train_test_split(X, y) # Define the base and residual regressors reg_mean = LinearRegression() reg_resid = RandomForestRegressor() # Create the probabilistic regression model reg_proba = ResidualDouble(reg_mean, reg_resid) # Fit the probabilistic model on the training data reg_proba.fit(X_train, y_train) # Predict the probabilistic intervals on the test set y_pred_proba = reg_proba.predict_proba(X_test) # Plot the prediction intervals and the y=x line plot_crossplot_interval(y_test, y_pred_proba, coverage=0.9) x = np.linspace(min(y_test), max(y_test), 100) plt.plot(x, x, color='black', linestyle='--', label='y=x') plt.legend() plt.show() ``` ![b67a7583-8d6b-4f41-921b-4f3b2a6cd8cf](https://hackmd.io/_uploads/rJlA3JgHkl.png) ### Benchmarking - comparing estimator performance #### Overview Benchmarking in the context of estimator performance refers to systematically evaluating and comparing the performance of different machine learning models (estimators) on the same dataset(s). <br> The goal is to identify which model performs best for a specific task or to assess whether a proposed method offers improvements over established techniques. #### Process of Benchmarking 1. **Dataset Selection:** Choose a representative dataset (or datasets) that reflects the problem you aim to solve. 2. **Model Selection:** Include a variety of estimators. These could range from simple baseline models (e.g., linear regression) to advanced techniques (e.g., ensemble methods or deep learning models). 3. **Performance Metrics:** Define the metrics relevant to your task. For example: * Classification tasks: Accuracy, F1-score, AUC-ROC. * Regression tasks: Mean Absolute Error (MAE), Mean Squared Error (MSE). 4. **Cross-Validation:** Use cross-validation techniques (e.g., k-fold cross-validation) to ensure the results are not biased by the specific data split. 5. **Fair Comparisons:** Ensure all models are trained on the same training data and evaluated on the same validation or test data. ##### Advantages of Benchmarking * **Fair and Objective Evaluation:** Provides a systematic way to compare models under the same conditions, ensuring objectivity. * **Insight into Model Suitability:** Helps identify which model is best suited for a specific dataset or task. * **Improved Model Development:** Highlights strengths and weaknesses of different approaches, guiding future improvements. * **Efficiency in Model Selection:** Simplifies the process of selecting an appropriate model by providing a direct comparison. * **Validation of New Methods:** Offers a mechanism to assess whether a new algorithm genuinely outperforms existing methods. * **Transparency and Reproducibility:** A well-documented benchmarking process ensures that others can reproduce the results and verify claims. ##### Practical Example: Imagine you are predicting housing prices and have three models to compare: 1. Linear Regression 2. Random Forest 3. XGBoost ##### Benchmarking Process: 1. Use the same dataset split into training and testing sets. 2. Train each model on the training set. 3. Evaluate each model on the test set using MSE and R² metrics. 4. Use k-fold cross-validation to assess performance consistency. 5. Compile the results: * Linear Regression: MSE = 3000, R² = 0.7 * Random Forest: MSE = 2000, R² = 0.85 * XGBoost: MSE = 1800, R² = 0.9 **Insights:** From the benchmarking results, XGBoost performs best in terms of both metrics, making it the preferred model for this task. #### Summary: Benchmarking helps ensure that model comparisons are meaningful, fair, and reproducible, enabling practitioners to make informed decisions when selecting or improving machine learning models. #### Code Implementation ```python= from sktime.benchmarking.forecasting import ForecastingBenchmark from sktime.datasets import load_airline from sktime.forecasting.model_selection import ExpandingWindowSplitter from sktime.forecasting.naive import NaiveForecaster from sktime.performance_metrics.forecasting import MeanSquaredPercentageError # Initialize the forecasting benchmark benchmark = ForecastingBenchmark() # Add competing estimators benchmark.add_estimator( estimator=NaiveForecaster(strategy="mean", sp=12), estimator_id="NaiveForecaster-mean-v1" ) benchmark.add_estimator( estimator=NaiveForecaster(strategy="last", sp=12), estimator_id="NaiveForecaster-last-v1" ) # Define cross-validation splitter cv_splitter = ExpandingWindowSplitter( initial_window=24, step_length=12, fh=12 ) # Define scoring metrics scorers = [MeanSquaredPercentageError()] # Define dataset loaders dataset_loaders = [load_airline] # Add tasks for each dataset for dataset_loader in dataset_loaders: benchmark.add_task( dataset_loader=dataset_loader, cv_splitter=cv_splitter, scorers=scorers ) # Run the benchmark and save results to a CSV file results_df = benchmark.run("./forecasting_results.csv") # Display the results transposed for better readability print("Benchmark results:") print(results_df.T) ``` <div> <style scoped> </style> <table border="1" class="dataframe"> <thead> <tr style="text-align: right;"> <th></th> <th>0</th> <th>1</th> </tr> </thead> <tbody> <tr> <th>validation_id</th> <td>[dataset=load_airline]_[cv_splitter=ExpandingW...</td> <td>[dataset=load_airline]_[cv_splitter=ExpandingW...</td> </tr> <tr> <th>model_id</th> <td>NaiveForecaster-last-v1</td> <td>NaiveForecaster-mean-v1</td> </tr> <tr> <th>runtime_secs</th> <td>0.211127</td> <td>0.088294</td> </tr> <tr> <th>MeanSquaredPercentageError_fold_0_test</th> <td>0.024532</td> <td>0.049681</td> </tr> <tr> <th>MeanSquaredPercentageError_fold_1_test</th> <td>0.020831</td> <td>0.0737</td> </tr> <tr> <th>MeanSquaredPercentageError_fold_2_test</th> <td>0.001213</td> <td>0.05352</td> </tr> <tr> <th>MeanSquaredPercentageError_fold_3_test</th> <td>0.01495</td> <td>0.081063</td> </tr> <tr> <th>MeanSquaredPercentageError_fold_4_test</th> <td>0.031067</td> <td>0.138163</td> </tr> <tr> <th>MeanSquaredPercentageError_fold_5_test</th> <td>0.008373</td> <td>0.145125</td> </tr> <tr> <th>MeanSquaredPercentageError_fold_6_test</th> <td>0.007972</td> <td>0.154337</td> </tr> <tr> <th>MeanSquaredPercentageError_fold_7_test</th> <td>0.000009</td> <td>0.123298</td> </tr> <tr> <th>MeanSquaredPercentageError_fold_8_test</th> <td>0.028191</td> <td>0.185644</td> </tr> <tr> <th>MeanSquaredPercentageError_fold_9_test</th> <td>0.003906</td> <td>0.184654</td> </tr> <tr> <th>MeanSquaredPercentageError_mean</th> <td>0.014104</td> <td>0.118918</td> </tr> <tr> <th>MeanSquaredPercentageError_std</th> <td>0.011451</td> <td>0.051265</td> </tr> </tbody> </table> </div> By MSPE for NaiveForecaster-last-v1 MSPE is 0.014104，and NaiveForecaster-mean-v1 MSPE is 0.118918 ,we can conclude that last is better than mean. ## Conclusion The use of machine learning, particularly with the `sktime` library, offers significant benefits in time series analysis. Machine learning models provide flexibility in handling non-stationary data and capturing complex patterns. `sktime` streamlines the process with its user-friendly interface, comprehensive algorithms, and modular design, enabling efficient prototyping and analysis. Looking ahead, several areas hold potential for further exploration: * **Deep Learning Approaches:** Investigating the capabilities of LSTMs, GRUs, and Transformers in capturing long-term dependencies and non-linear patterns. * **Hybrid Models:** Combining statistical and machine learning models to leverage the strengths of both approaches. * **Anomaly Detection:** Developing robust methods for identifying unusual events or patterns in time series data. Advanced topics, such as probabilistic forecasting and forecasting pipelines, highlighted the importance of managing uncertainty and automating workflows for efficiency. Comparative analysis showed that AutoETS excels with seasonal data, whereas AutoARIMA performs better on non-stationary datasets. ## Reference * Cerqueira, V., Torgo, L., Moniz, N., & Silva, F. (2023). Sktime: A Unified Python Library for Time Series Machine Learning. In Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (pp. 1842-1853). [Link](https://arxiv.org/pdf/1909.07872) * Hyndman, R.J., & Athanasopoulos, G. (2021) Forecasting: Principles and Practice (3rd ed). OTexts: Melbourne, Australia. [Link]([07872](https://otexts.com/fpp3/)) * James, G., Witten, D., Hastie, T., & Tibshirani, R. (2013). An Introduction to Statistical Learning (Vol. 112). New York: Springer. [Link](https://www.statlearning.com/) * Moniz, N., & Cerqueira, V. (2023). sktime-tutorial-pydata-global-2023. GitHub Repository. [Link](https://github.com/sktime/sktime-tutorial-pydata-global-2023/blob/main/notebooks/01_introduction.ipynb) * `sktime` GitHub Repository. [Link](https://github.com/sktime/sktime)