# Streamlining Time Series Analysis with sktime
楊仁樞 Ren-Shu	Yang
## Introduction
Time series analysis involves examining data points collected or observed over successive time intervals. This process aims to uncover underlying patterns, trends, and seasonal variations to predict future values and understand the mechanisms generating the data. However, analyzing time series data presents unique challenges that require specialized tools and techniques.
### Challenges in Time Series Analysis
* **Non-stationarity:** A common hurdle in time series analysis is non-stationarity, where the statistical properties of the data, such as mean and variance, change over time. This violates the assumptions of many traditional statistical methods, making them less effective.
* **Noise and Irregularities:** Time series data often contains noise, outliers, and irregular fluctuations that can obscure underlying patterns and make accurate analysis difficult.
### sktime to the Rescue!
To address these challenges, we can leverage the power of machine learning with the `sktime` library.  `sktime` is a user-friendly and versatile Python library specifically designed for tackling machine learning problems involving time series data. Think of it as the "scikit-learn" for time series, offering a unified and consistent interface for various time series tasks.
#### Key Features and Capabilities
* **Comprehensive Algorithms:** `sktime` provides a wide range of algorithms for time series classification, regression, clustering, forecasting, and more. This includes both classical statistical methods and modern machine learning approaches.
* **Modular Design:** `sktime` follows a modular design, allowing you to easily combine different components (transformers, estimators, pipelines) to build complex workflows. This flexibility enables you to customize your analysis to suit your specific needs.
* **Interoperability with scikit-learn:** `sktime` seamlessly integrates with scikit-learn, leveraging its familiar API and tools for tasks like hyperparameter tuning, model selection, and pipeline construction.
* **Growing Ecosystem:** `sktime` is actively developed and maintained, with a growing community and expanding functionality.
## Time Series Models
This section explores two fundamental time series models: ETS (Error, Trend, Seasonality) and ARIMA (Autoregressive Integrated Moving Average). We'll examine their components, provide code implementations with `sktime`, and illustrate their application using the airline dataset.
### ETS Model
The ETS model decomposes a time series into three core components: **Error**, **Trend**, and **Seasonality**. This decomposition helps us understand the underlying patterns and forecast future values.
* **Error:** Represents the random fluctuations or noise in the series.
* **Trend:** Captures the long-term direction of the series, such as upward or downward movement.
* **Seasonality:** Identifies repeating patterns with a fixed periodicity, like daily, weekly, or yearly cycles.
#### Exponential Smoothing: A Building Block
Exponential Smoothing is a foundational technique within the ETS framework. It forecasts future values by assigning exponentially decreasing weights to past observations.  The simplest form uses the following formula:
$$
\hat{y}_{t+1} = \alpha  y_t + (1 - \alpha) \hat{y}_t
$$
where:
* $\hat{y}_{t+1},\hat{y}_t$ is the forecast for the next period.
* $y_t$ is the actual value at the current period.
* $\alpha$ is the smoothing factor ($0 \leq \alpha \leq 1$), controlling the weight given to the most recent observation. A higher α gives more weight to recent data.
#### Code Example: ETS with sktime
This code snippet demonstrates how to use `sktime`'s `AutoETS` to automatically select the best ETS model for the airline dataset and generate forecasts.
```python=
import warnings
import pandas as pd
from sktime.datasets import load_airline
from sktime.forecasting.ets import AutoETS
from sktime.forecasting.model_selection import temporal_train_test_split
# Suppress warnings
warnings.filterwarnings('ignore')
# Load the airline dataset
y = load_airline()
# Split data into training and testing sets
y_train, y_test = temporal_train_test_split(y, test_size=10)
# Initialize and fit the AutoETS model with explicit seasonal_periods
forecaster = AutoETS(auto=False, seasonal="additive", sp=12)
forecaster.fit(y_train)
# Create a forecasting horizon
forecast_horizon = [i + 1 for i in range(len(y_test))]
# Generate predictions
y_pred = forecaster.predict(fh=forecast_horizon)
# Combine actual and predicted values into a DataFrame
output = pd.DataFrame({
    "Actual": y_test.values,
    "Predicted": y_pred.values
}, index=y_test.index)
# Print the results
print("=== Actual vs Predicted Values ===")
print(output)
```

```
=== Actual vs Predicted Values ===
         Actual   Predicted
Period                     
1960-03   419.0  451.994293
1960-04   461.0  438.698238
1960-05   472.0  457.430750
1960-06   535.0  511.220195
1960-07   622.0  578.053105
1960-08   606.0  578.834752
1960-09   508.0  474.773403
1960-10   461.0  416.029592
1960-11   390.0  368.198395
1960-12   432.0  404.993453
```
### ARIMA Model
ARIMA stands for **Autoregressive Integrated Moving Average**. It's a versatile model that combines three components:
* **Autoregressive (AR):** Models the relationship between an observation and a specified number of lagged observations.  
* **Integrated (I):** Incorporates differencing to make the time series stationary, a requirement for ARIMA modeling.
* **Moving Average (MA):** Models the relationship between an observation and past forecast errors (residuals).
#### Mathematical Representation
* **$\rm{AR}(p)$ model:** $$x_t = \phi_1 x_{t-1} + \phi_2 x_{t-2} + \dots + \phi_p x_{t-p} + w_t$$ where $p$ is the order of the AR model, $\phi_i$ are the AR coefficients, and $w_t$ is white noise.
* **$\rm{MA}(q)$ model:** $$x_t = w_t + \theta_1 w_{t-1} + \theta_2 w_{t-2} +  \dots + \theta_q w_{t-q}$$ where $q$ is the order of the MA model, $\theta_i$ are the MA coefficients, and $w_t$ is white noise.
* **$\rm{ARMA}(p, q)$ model:** Combines $\rm{AR}(p)$ and $\rm{MA}(q)$ models. $$x_t = \phi_1 x_{t-1} + \dots + \phi_p x_{t-p} + w_t + \theta_1 w_{t-1} + \dots + \theta_q w_{t-q}$$
* $\rm{ARIMA}(p, d, q)$ model:  Includes the order of differencing $d$ to achieve stationarity.
#### Code Example: ARIMA with sktime
This code demonstrates how to apply an ARIMA model to the airline dataset using `sktime`. You can experiment with different values for `p`, `d`, and `q` to find the best ARIMA configuration for your data.
``` python=
import warnings
import pandas as pd
from sktime.forecasting.arima import ARIMA
from sktime.forecasting.base import ForecastingHorizon
from sktime.datasets import load_airline
from sktime.forecasting.model_selection import temporal_train_test_split
from sktime.utils.plotting import plot_series
import matplotlib.pyplot as plt
# Suppress warnings
warnings.filterwarnings('ignore')
# Load the airline dataset
y = load_airline()
# Split data into training and testing sets
y_train, y_test = temporal_train_test_split(y, test_size=6)
# Define the forecasting horizon
forecast_horizon = ForecastingHorizon(y_test.index, is_relative=False)
# Initialize and fit the ARIMA model with specified parameters
forecaster = ARIMA(order=(2, 1, 2))  # Specify ARIMA(p, d, q)
forecaster.fit(y_train)
# Generate predictions
y_pred = forecaster.predict(fh=forecast_horizon)
# Combine actual and predicted values into a DataFrame
results = pd.DataFrame({
    "Actual": y_test.values,
    "Predicted": y_pred.values
}, index=y_test.index)
# Output the results as text
print("=== Actual vs Predicted Values ===")
print(results)
```

```
=== Actual vs Predicted Values ===
         Actual   Predicted
Period                     
1960-07   622.0  548.679570
1960-08   606.0  539.754674
1960-09   508.0  513.095688
1960-10   461.0  477.792642
1960-11   390.0  444.427132
1960-12   432.0  422.184452
```
## Advanced Concepts in Time Series Analysis
This section delves into advanced concepts that enhance the accuracy and sophistication of time series analysis.
### Exogenous Features
Exogenous features, also known as exogenous variables or external regressors, are variables that are not part of the time series itself, but can influence its behavior. These are external factors that provide additional information or context to the time series. 
Examples include:
* Economic indicators (GDP, inflation, interest rates) for stock
* Weather data (temperature, rainfall, snowfall) for sales volume
* Social media trends for sales volume
* Promotional events or holidays for sales volume
#### Importance 
Incorporating exogenous features into time series analysis can significantly improve model accuracy and provide valuable insights. 
Here's why:
* **Enhanced Predictive Power:** Exogenous features can capture variations in the time series that are not explained by its own past values. This leads to more accurate predictions.
* **Causal Insights:** By including relevant exogenous features, you can uncover potential causal relationships between external factors and the time series. For example, you might find that increased marketing spending leads to higher sales.
* **Improved Robustness:** Models that incorporate exogenous features are often more robust to changes in the underlying patterns of the time series. This is because they are less reliant on the assumption that past patterns will perfectly repeat in the future.
#### Example: Longley Dataset
* **Motivation:** The Longley dataset is a classic example in statistics and econometrics, often used to illustrate the challenges of multicollinearity (high correlation between predictor variables) and small sample sizes. It's helpful for demonstrating how exogenous features can improve model performance even with these challenges.
* **Variables:** The Longley dataset includes annual economic data for the United States from 1947 to 1962. The variables are:
    * `TOTEMP`: Total employment (in thousands of persons)
    * `GNPDEFL`: GNP deflator (a measure of price inflation)
    * `GNP`: Gross National Product (in millions of dollars)
    * `UNEMP`: Number of unemployed (in thousands of persons)
    * `ARMED`: Number of people in the armed forces (in thousands)
    * `POP`: Noninstitutionalized population over 14 years of age (in thousands)
```python=
import warnings
import pandas as pd
from sktime.datasets import load_longley
from sktime.performance_metrics.forecasting import mean_absolute_percentage_error
from sktime.forecasting.arima import ARIMA
from sktime.forecasting.model_selection import temporal_train_test_split
# Suppress warnings
warnings.filterwarnings('ignore')
# Load the Longley dataset
y, X = load_longley()
# Split the data into training and testing sets
y_train, y_test, X_train, X_test = temporal_train_test_split(y, X, test_size=3)
# Initialize ARIMA models with and without exogenous features
forecaster_without_X = ARIMA()
forecaster_with_X = ARIMA()
# Define the forecasting horizon
forecast_horizon = [1, 2, 3]
# Fit the models
forecaster_without_X.fit(y_train, fh=forecast_horizon)
forecaster_with_X.fit(y_train, X=X_train, fh=forecast_horizon)
# Make predictions
y_pred_no_X = forecaster_without_X.predict()
y_pred_with_X = forecaster_with_X.predict(X=X_test)
# Evaluate performance (using Mean Absolute Percentage Error)
mape_no_X = mean_absolute_percentage_error(y_test, y_pred_no_X)
mape_with_X = mean_absolute_percentage_error(y_test, y_pred_with_X)
# Print the MAPE results
print(f"MAPE without exogenous features: {mape_no_X:.5f}")
print(f"MAPE with exogenous features: {mape_with_X:.5f}")
# Combine actual and predicted values into a DataFrame for comparison
comparison = pd.DataFrame({
    "Actual": y_test.values,
    "Predicted (No X)": y_pred_no_X.values,
    "Predicted (With X)": y_pred_with_X.values
}, index=y_test.index)
# Print the comparison table
print("\n=== Actual vs Predicted Values ===")
print(comparison)
```

```
MAPE without exogenous features: 0.02825
MAPE with exogenous features: 0.02605
=== Actual vs Predicted Values ===
         Actual  Predicted (No X)  Predicted (With X)
Period                                               
1960    69564.0      68216.665473        70371.821285
1961    69331.0      67823.598319        70872.801273
1962    70551.0      67471.123718        73677.403123
```
### Probabilistic Forecasting
Probabilistic forecasting goes beyond providing a single, definitive prediction. Instead, it offers a range of possible future outcomes, each with an associated probability. This approach acknowledges the inherent uncertainty in predicting the future and provides a more complete picture of what might happen.
#### Key Concepts
* **Probability Distributions:** Probabilistic forecasts often present results in the form of probability distributions. These distributions show the likelihood of different outcomes. For example, a forecast might indicate that there's a 60% chance of sales falling between $1 million and $1.2 million next quarter.
* **Interval Forecasts:** These forecasts provide a range of values where the future outcome is likely to fall, along with a confidence level (e.g., a 95% prediction interval). The confidence level indicates how certain we are that the actual value will fall within that range.
* **Scenarios and Risk Assessment:** Probabilistic forecasting allows you to explore different scenarios and assess the associated risks. By considering a range of possibilities, you can better prepare for different outcomes and make more informed decisions.
#### Benefits
* **Improved Decision-Making:** Probabilistic forecasts provide decision-makers with a more complete understanding of the uncertainties involved. This allows for more informed and robust decision-making, as strategies can be evaluated based on the likelihood of different outcomes.
* **Risk Management:** By considering a range of possibilities and their probabilities, probabilistic forecasting helps with risk assessment and mitigation. You can identify potential risks and their likelihood, allowing you to take proactive steps to minimize their impact.
* **Transparency and Realism:** Probabilistic forecasting acknowledges that the future is uncertain. This transparency provides a more realistic view of potential outcomes and avoids overconfidence in predictions.
* **Enhanced Accuracy:** Probabilistic forecasting methods often use more sophisticated models that can capture complex patterns and dependencies in the data. This can lead to more accurate predictions compared to traditional point forecasts.
#### Example: Prediction Intervals
A prediction interval is a range of values where we expect the future value of a time series to fall with a certain level of confidence. For example, a 95% prediction interval means we are 95% confident that the actual future value will be within that interval.
This code demonstrates how to generate forecasts with prediction intervals using `sktime`'s `AutoARIMA` model. The `return_pred_int=True` argument tells the `predict` method to return prediction intervals, and `coverage=[0.9]` specifies a 90% confidence level. The `plot_series` function is used to visualize the results, including the prediction intervals.
```python=
import warnings
import numpy as np
import pandas as pd
from sktime.forecasting.arima import AutoARIMA
from sktime.datasets import load_airline
from sktime.forecasting.model_selection import temporal_train_test_split
# Suppress warnings
warnings.filterwarnings('ignore')
# Load the airline dataset
y = load_airline()
# Split into training and testing sets
y_train, y_test = temporal_train_test_split(y, test_size=12)
# Initialize and fit the AutoARIMA model
forecaster = AutoARIMA(sp=12, suppress_warnings=True)  # sp=12 for monthly seasonality
forecaster.fit(y_train)
# Define the forecasting horizon
forecast_horizon = np.arange(1, len(y_test) + 1)
# Generate predictions and prediction intervals
y_pred = forecaster.predict(fh=forecast_horizon)
prediction_intervals = forecaster.predict_interval(fh=forecast_horizon, coverage=[0.9])
pred_interval_90 = prediction_intervals['Number of airline passengers'][0.9]
# Combine actual values, predicted values, and prediction intervals into a DataFrame
results = pd.DataFrame({
    "Actual": y_test.values,
    "Predicted": y_pred.values,
    "Lower Bound (90%)": pred_interval_90['lower'].values,
    "Upper Bound (90%)": pred_interval_90['upper'].values
}, index=y_test.index)
# Print the results
print("=== Forecast Results with 90% Prediction Intervals ===")
print(results)
```

```
=== Forecast Results with 90% Prediction Intervals ===
         Actual   Predicted  Lower Bound (90%)  Upper Bound (90%)
Period                                                           
1960-01   417.0  419.967172         403.428671         436.505674
1960-02   391.0  399.849851         379.615533         420.084168
1960-03   419.0  457.994382         434.223590         481.765174
1960-04   461.0  444.474156         418.904153         470.044159
1960-05   472.0  464.789179         437.942467         491.635891
1960-06   535.0  514.125441         486.525883         541.724999
1960-07   622.0  587.804170         559.706498         615.901842
1960-08   606.0  597.010823         568.602835         625.418811
1960-09   508.0  499.531269         470.921824         528.140714
1960-10   461.0  442.359679         413.622040         471.097318
1960-11   390.0  396.410217         367.589784         425.230650
1960-12   432.0  438.651595         409.778046         467.525145
```
### Forecasting Pipelines
A forecasting pipeline is a structured and sequential workflow that automates the process of analyzing time series data and generating predictions. It's like an assembly line for time series analysis, where each step prepares the data or model for the next step. This approach is widely used in various fields, including finance, supply chain management, and weather forecasting, to improve the efficiency and reliability of predictions.
#### Standard Steps 
A typical forecasting pipeline consists of the following steps:
1. **Data Collection:** Gathering the relevant historical time series data.
2. **Data Preprocessing:** Cleaning and preparing the data, which includes handling missing values, smoothing outliers, and transforming the data into a suitable format for modeling.
3. **Feature Engineering:** Creating new features from the raw data to improve model accuracy. This might involve adding lagged variables, calculating rolling statistics, or incorporating external data sources.
4. **Model Selection:** Choosing an appropriate forecasting model based on the characteristics of the data and the forecasting task.
5. **Training and Validation:** Training the chosen model on a portion of the data and validating its performance on a separate held-out set. This helps ensure the model generalizes well to new, unseen data.
6. **Evaluation:** Assessing the model's performance using appropriate metrics (e.g., Mean Absolute Error, Root Mean Squared Error).
#### Advantages
Using forecasting pipelines offers several benefits:
* **Efficiency and Automation:** Pipelines automate repetitive tasks, saving time and effort. This allows you to quickly experiment with different models and configurations.
* **Consistency and Accuracy:** By following a structured process, pipelines minimize the risk of errors and ensure that each step is performed consistently. This leads to more reliable and reproducible results.
* **Reproducibility:** Pipelines make it easy to reproduce your analysis and share it with others. This is essential for collaboration and ensuring the integrity of your results.
#### Example: Airline Dataset
**Goal:** To demonstrate a forecasting pipeline, we'll use the `sktime` library to forecast the number of airline passengers.
This code demonstrates a simple forecasting pipeline in `sktime`. It includes steps for log-transforming the data, removing seasonality, and applying an ARIMA model. The `TransformedTargetForecaster` makes it easy to chain these steps together.
```python=
import warnings
import pandas as pd
from sktime.forecasting.compose import TransformedTargetForecaster
from sktime.transformations.series.boxcox import LogTransformer
from sktime.transformations.series.detrend import Deseasonalizer
from sktime.datasets import load_airline
from sktime.forecasting.model_selection import temporal_train_test_split
from sktime.forecasting.arima import AutoARIMA
# Suppress warnings
warnings.filterwarnings('ignore')
# Load the airline dataset
y = load_airline()
# Split the data into training and testing sets
y_train, y_test = temporal_train_test_split(y, test_size=12)
# Define the forecasting horizon
forecast_horizon = list(range(1, len(y_test) + 1))
# Create a forecasting pipeline
forecaster = TransformedTargetForecaster([
    ("log", LogTransformer()),  # Log transform the data
    ("deseasonalize", Deseasonalizer(sp=12, model="multiplicative")),  # Remove seasonality
    ("arima", AutoARIMA(sp=12, suppress_warnings=True))  # Apply an ARIMA model
])
# Fit the forecaster to the training data
forecaster.fit(y_train)
# Generate forecasts
y_pred = forecaster.predict(fh=forecast_horizon)
# Combine actual and predicted values into a DataFrame
results = pd.DataFrame({
    "Date": y_test.index,
    "Actual": y_test.values,
    "Predicted": y_pred.values
})
# Print the results
print("=== Forecast Results ===")
print(results)
```

```
=== Forecast Results ===
       Date  Actual   Predicted
0   1960-01   417.0  419.786274
1   1960-02   391.0  410.730031
2   1960-03   419.0  485.746060
3   1960-04   461.0  468.291993
4   1960-05   472.0  478.422005
5   1960-06   535.0  551.858214
6   1960-07   622.0  625.563411
7   1960-08   606.0  628.724658
8   1960-09   508.0  540.088068
9   1960-10   461.0  466.099623
10  1960-11   390.0  402.834730
11  1960-12   432.0  462.846638
```
### Parameter Estimation
Parameter estimation is the process of finding the optimal values for the parameters of a time series model. These parameters control the model's behavior and how it captures patterns in the data. Accurate parameter estimation is essential for:
* **Model Fit and Accuracy:** The estimated parameters determine how well the model fits the historical data. Accurate estimation leads to a model that closely reflects the underlying patterns, resulting in lower errors and better predictions.
* **Forecasting Precision:** Parameters influence how the model extrapolates patterns into the future. Well-estimated parameters improve the accuracy and reliability of forecasts.
* **Stability and Robustness:** Models with accurately estimated parameters are more stable and less prone to overfitting (fitting the noise in the data) or underfitting (failing to capture important patterns). This makes the model more generalizable to new, unseen data.
#### Point Estimation 
This approach provides a single, "best-guess" value for each parameter. It's like finding the most likely value based on the available data.
* **Pros:**
    * Simple to understand and interpret.
    * Computationally efficient.
* **Cons:**
    * Doesn't provide information about the uncertainty associated with the estimate.
    * Can be misleading if the sample size is small or the model assumptions are violated.
#### Interval Estimation 
This method provides a range of plausible values for a parameter, along with a confidence level. This range, called a confidence interval, quantifies the uncertainty in the estimate.
* **Pros:**
    * Provides a measure of uncertainty, giving a more complete picture of the parameter.
    * Useful for decision-making and risk assessment, as it allows you to consider a range of possibilities.
* **Cons:**
    * Can be more computationally intensive.
    * Requires careful interpretation of confidence intervals.
##### Comparison and Use Cases
* **Point estimation** is often used when a quick and easily interpretable estimate is needed, and you have high confidence in the data and model assumptions.
* **Interval estimation** is more appropriate when it's important to understand the uncertainty in the parameter estimates, especially when dealing with limited data or high-stakes decisions.
#### Code Example
This code demonstrates how to access the estimated parameters of an `AutoARIMA` model in `sktime` using `get_fitted_params()`. It also shows how to calculate the MAPE to evaluate the model's forecasting accuracy. The estimated parameters and the MAPE provide insights into how well the model has learned from the data and its ability to generate accurate forecasts.
```python=
import warnings
from sktime.performance_metrics.forecasting import mean_absolute_percentage_error
from sktime.datasets import load_airline
from sktime.forecasting.model_selection import temporal_train_test_split
from sktime.forecasting.arima import AutoARIMA
# Suppress warnings
warnings.filterwarnings('ignore')
# Load the airline dataset
y = load_airline()
# Split the data into training and test sets
y_train, y_test = temporal_train_test_split(y, test_size=12)
# Initialize and fit an AutoARIMA model
forecaster = AutoARIMA(sp=12, suppress_warnings=True)
forecaster.fit(y_train)
# Access the estimated parameters
fitted_params = forecaster.get_fitted_params()
print("Estimated parameters:", fitted_params)
# Define the forecasting horizon
forecast_horizon = list(range(1, len(y_test) + 1))
# Generate forecasts
y_pred = forecaster.predict(fh=forecast_horizon)
# Calculate the Mean Absolute Percentage Error (MAPE)
mape = mean_absolute_percentage_error(y_test, y_pred)
print(f"Mean Absolute Percentage Error (MAPE): {mape:.2f}")
```
```
Estimated parameters: {'intercept': 5.534100348886643, 'ar.L1': 0.7048918277963387, 'ar.L2': 0.25742151823961595, 'ar.L3': -0.14344774118809428, 'sigma2': 101.09689149992465, 'order': (3, 0, 0), 'seasonal_order': (0, 1, 0, 12), 'aic': 905.6860046595817, 'aicc': 906.2123204490554, 'bic': 919.623463373492, 'hqic': 911.3460709571818}
Mean Absolute Percentage Error (MAPE): 0.03
```
### Parameter Tuning
Parameter tuning is the process of finding the optimal values for a model's hyperparameters to improve its performance on a given task. Unlike model parameters, which are learned from the training data,hyperparameters are set before training and control how the model learns or its structure.
#### Key Concepts in Parameter Tuning
* **Hyperparameters:** These are values that govern the behavior of the learning algorithm or the architecture of the model. 
  Examples include:
  * Learning rate in machine learning models.
  * Number of lags in an ARIMA model for time series.
  * Depth of decision trees in ensemble models like Random Forest.
* **Tuning Process:** The goal of tuning is to optimize hyperparameters to minimize a model's error or maximize its performance on a validation dataset. 
  
  Popular techniques include:
  * Grid Search: Exhaustively tests combinations of hyperparameter values.
  * Random Search: Randomly samples hyperparameter combinations for testing.
  
#### Advantages of Parameter Tuning
* **Improved Model Performance:** Properly tuned hyperparameters can significantly enhance a  model’s accuracy, precision, or other performance metrics.
In time series forecasting, well-tuned parameters can better capture trends and seasonality, improving forecast reliability.
* **Reduced Overfitting and Underfitting:**
Tuning allows finding the right balance between model complexity and generalization. For instance, in decision trees, depth tuning prevents overfitting on training data.
* **Better Resource Utilization:** Optimizing hyperparameters ensures that computational resources (e.g., training time and memory) are used efficiently by reducing unnecessary experimentation with suboptimal settings.
* **Adaptation to Specific Problems:**
Tailoring hyperparameters to the dataset or specific problem ensures the model is appropriately configured, leading to domain-specific insights or performance.
#### Summary
parameter tuning is a critical step in developing high-performing models, allowing them to maximize predictive power, adapt to data characteristics, and 
efficiently utilize resources.
#### Code Implementation
```python=
import warnings
from sktime.split import ExpandingWindowSplitter
from sktime.utils import plot_windows
from sktime.datasets import load_shampoo_sales
from sktime.split import temporal_train_test_split
from sktime.forecasting.ets import AutoETS
from sktime.forecasting.model_selection import ForecastingGridSearchCV
from sktime.performance_metrics.forecasting import MeanAbsolutePercentageError
# Suppress warnings
warnings.filterwarnings('ignore')
# Load the shampoo sales dataset
y = load_shampoo_sales()
# Split the data into training and testing sets
y_train, y_test = temporal_train_test_split(y, test_size=6)
# Initialize the forecaster (AutoETS with manual parameter search)
forecaster = AutoETS(auto=False)
# Define the parameter grid for manual search
param_grid = {
    "error": ["add", "mul"],
    "trend": ["add", "mul"],
    "damped_trend": [True, False],
    "seasonal": ["add", "mul"],
    "sp": [4, 6, 12],
}
# Set up the expanding window cross-validation splitter
cv = ExpandingWindowSplitter(fh=range(1, 7), initial_window=12, step_length=1)
# Define the scoring metric (symmetric MAPE)
smape = MeanAbsolutePercentageError(symmetric=True)
# Perform grid search cross-validation
gscv = ForecastingGridSearchCV(
    forecaster=forecaster,
    cv=cv,
    param_grid=param_grid,
    scoring=smape,
    n_jobs=-1
)
# Fit the grid search on the training data
gscv.fit(y_train, fh=range(1, 7))
# Extract the best score and corresponding parameters
best_score = gscv.best_score_
best_params = gscv.best_params_
# Print results
print("=== Grid Search Results ===")
print(f"Best symmetric MAPE score: {best_score:.6f}")
print("Best parameters:")
for param, value in best_params.items():
    print(f"{param}: {value}")
```

```
=== Expanding Window Splits ===
Split 1:
Train indices: [ 0  1  2  3  4  5  6  7  8  9 10 11]
Test indices: [12 13 14 15 16 17]
...
Split 13:
Train indices: [ 0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23]
Test indices: [24 25 26 27 28 29]
```
#### Prediction results
```
=== Grid Search Results ===
Best symmetric MAPE score: 0.080676
Best parameters:
damped_trend: False
error: add
seasonal: add
sp: 12
trend: mul
=== Prediction Comparison ===
      Date  Actual  Predicted (Auto)  Predicted (Tuned)
0  1993-07   575.5        462.347728         483.652040
1  1993-08   407.6        481.534237         535.507483
2  1993-09   682.0        500.720745         522.489273
3  1993-10   475.3        519.907254         569.305179
4  1993-11   581.3        539.093762         611.663984
5  1993-12   646.9        558.280271         586.641418
=== Performance Metrics ===
sMAPE (Auto): 0.17%
sMAPE (Tuned): 0.17%
```
### Graphical Pipelines
A graphical pipeline refers to a visual, structured approach to designing, configuring, and executing data workflows or processes. It represents the sequence of operations in a graph format, where nodes symbolize tasks, processes, or computations, and edges define the dependencies or data flow between these tasks. 
#### Key component of Graphical Piplines
* **Nodes:** Represent individual tasks or steps (e.g., data preprocessing, model training).
* **Edges:** Show the flow of data or dependencies between nodes.
* **Directed Acyclic Graphs (DAGs):** The pipelines are often structured as DAGs to ensure there are no cyclic dependencies.
#### Benefits of Graphical Pipelines
* **Visual Clarity:** Complex workflows are easier to understand, manage, and debug with a visual representation.
Facilitates communication between technical and non-technical stakeholders.
* **Modularity and Reusability:** Tasks (nodes) can often be reused across different workflows.
Modular design allows easier updates or scaling of specific components.
* **Ease of Debugging:** Errors or bottlenecks in a workflow can be quickly identified by analyzing the flow graph.
* **Scalability:** Many graphical pipeline tools allow for distributed computation, automatically managing dependencies and parallelism.
* **Integration with Automation:**
Pipelines can be automated for tasks like scheduling jobs, retraining models, or updating dashboards.
Triggers and conditional logic allow dynamic workflows.
* **Flexibility:** Graphical pipelines can accommodate diverse use cases, from ETL (Extract, Transform, Load) processes to machine learning workflows.
They support branching and looping, offering a high degree of customization.
* **Lower Barrier to Entry:** Visual interfaces are intuitive for beginners and reduce the learning curve for adopting complex workflows.
#### Code Implementation
**How to build a Graphical Pipline**
Let us first visualise a simple forecasting pipeline, and we want to construct:

##### Method 1:
Pass all steps to the pipeline during initialisation as for the sequential pipelines.
```python=
import warnings
from sktime.forecasting.sarimax import SARIMAX
from sktime.pipeline.pipeline import Pipeline
from sktime.transformations.series.difference import Differencer
# Suppress warnings
warnings.filterwarnings('ignore')
# Initialize the differencer transformation
differencer = Differencer()
# Create the general pipeline
general_pipeline = Pipeline(
    [
        {
            "skobject": differencer,
            "name": "differencer",
            "edges": {"X": "y"}
        },
        {
            "skobject": SARIMAX(),
            "name": "sarimax",
            "edges": {"X": "X", "y": "differencer"}
        },
        {
            "skobject": differencer,
            "name": "differencer_inv",
            "edges": {"X": "sarimax"},
            "method": "inverse_transform"
        },
    ]
)
```
**A more interesting example**
##### Method 2:
Create a pipeline object and add the steps one by one.
```python=
import warnings
from sktime.pipeline.pipeline import Pipeline
from sktime.transformations.series.difference import Differencer
from sktime.forecasting.sarimax import SARIMAX
# Suppress warnings
warnings.filterwarnings('ignore')
# Initialize the pipeline
general_pipeline = Pipeline()
# Initialize the differencer transformation
differencer = Differencer()
# Add the differencing step to the pipeline
general_pipeline = general_pipeline.add_step(
    differencer,
    name="differencer",
    edges={"X": "y"}
)
# Add the SARIMAX forecasting step to the pipeline
general_pipeline = general_pipeline.add_step(
    SARIMAX(),
    name="sarimax",
    edges={"X": "X", "y": "differencer"}
)
# Add the inverse differencing step to the pipeline
general_pipeline = general_pipeline.add_step(
    differencer,
    name="differencer_inv",
    edges={"X": "sarimax"},
    method="inverse_transform"
)
```
**Parameter explanation**
* `skobject`: The sktime object added to the pipeline
* `name`: The name of the step
* `edges`: The keys $(X\,\text{or}\,  y)$ of the dictionary indicate the input of the skobject, and the values are the names of the steps that should be connected to the input argument. 
  * We can think $\{X:y\}$ as for $X$ means the Exogenous, and $y$ means the input data. As for $\{\text{"X": "X", "y": "differencer"}\}$, we can think the $y$ in $\{\text{"y":"differencer"}\}$ as the Target input as picture shown. Also, the input is the result from differencer.
* `method`: The skobject’s method that should be called. If not provided, the default method would be inferred based on the added skobject. This parameter is used for the inverse_transform method. Optional.
* `kwargs`: Additional keyword arguments passed to the sktime object. Optional.
##### Prediction
```python=
import warnings
import pandas as pd
from sktime.forecasting.sarimax import SARIMAX
from sktime.pipeline.pipeline import Pipeline
from sktime.transformations.series.difference import Differencer
from sktime.datasets import load_longley
from sktime.forecasting.model_selection import temporal_train_test_split
# Suppress warnings
warnings.filterwarnings('ignore')
# Initialize the differencer transformation
differencer = Differencer()
# Define the pipeline
general_pipeline = Pipeline(
    [
        {
            "skobject": differencer,
            "name": "differencer",
            "edges": {"X": "y"}
        },
        {
            "skobject": SARIMAX(),
            "name": "sarimax",
            "edges": {"X": "X", "y": "differencer"}
        },
        {
            "skobject": differencer,
            "name": "differencer_inv",
            "edges": {"X": "sarimax"},
            "method": "inverse_transform"
        },
    ]
)
# Load the Longley dataset
y, X = load_longley()
# Split the data into training and testing sets
y_train, y_test, X_train, X_test = temporal_train_test_split(y, X)
# Fit the pipeline on the training data
general_pipeline.fit(y=y_train, X=X_train, fh=[1, 2, 3, 4])
# Generate predictions
y_pred = general_pipeline.predict(X=X_test)
# Combine true and predicted values into a DataFrame
results_df = pd.DataFrame({
    "True Values": y_test.values,
    "Predicted Values": y_pred.values
})
# Output the results as a table
print("True and Predicted Values:")
print(results_df)
```
```
True and Predicted Values:
   True Values  Predicted Values
0      68655.0      67213.229394
1      69564.0      68326.992339
2      69331.0      68736.176458
3      70551.0      71320.567457
```
##### An interesting example

```python=
import warnings
from sktime.pipeline.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler
from sktime.transformations.series.adapt import TabularToSeriesAdaptor
from sktime.transformations.series.detrend import Deseasonalizer
from sklearn.linear_model import Ridge
from sktime.forecasting.compose import make_reduction
from sktime.datasets import load_macroeconomic
from sktime.forecasting.base import ForecastingHorizon
from sktime.forecasting.model_selection import temporal_train_test_split
# Suppress warnings
warnings.filterwarnings("ignore")
# Initialize the pipeline
pipe = Pipeline()
# Add steps to the pipeline
pipe = pipe.add_step(
    TabularToSeriesAdaptor(StandardScaler()),
    name="scaler",
    edges={"X": "X__unemp"},
)
pipe = pipe.add_step(
    Deseasonalizer(sp=4),
    name="deseasonalizer",
    edges={"X": "X__realgdp_realdpi"},
)
pipe = pipe.add_step(
    make_reduction(Ridge(), windows_identical=False, window_length=5),
    name="forecaster_gdp",
    edges={"y": "deseasonalizer__realgdp"},
)
pipe = pipe.add_step(
    make_reduction(Ridge(), windows_identical=False, window_length=5),
    name="forecaster_dpi",
    edges={"y": "deseasonalizer__realdpi"},
)
pipe = pipe.add_step(
    make_reduction(Ridge(), windows_identical=False, window_length=5),
    name="forecaster_unemp",
    edges={
        "y": "scaler__unemp",
        "X": [
            "forecaster_gdp",
            "forecaster_dpi",
        ],
    },
)
pipe = pipe.add_step(
    make_reduction(Ridge(), windows_identical=False, window_length=5),
    name="forecaster_inflation",
    edges={"X": ["forecaster_dpi", "forecaster_unemp"], "y": "y"},
)
# Load the macroeconomic dataset
data = load_macroeconomic()
# Prepare features and target
X = data[["realgdp", "realdpi", "unemp"]]
y = data[["infl"]]
# Define forecasting horizon
fh = ForecastingHorizon([1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12])
# Split data into training and testing sets
y_train, y_test, X_train, X_test = temporal_train_test_split(y, X=X, fh=fh)
# Fit the pipeline
pipe.fit(y=y_train, X=X_train, fh=fh)
# Predict using the pipeline
result = pipe.predict(X=None, fh=y_test.index)
# Output results in programmatic form
print("Predicted Values:")
print(result)
```
```
Predicted Values:
            infl
Period          
2006Q4  3.090428
2007Q1  1.676421
2007Q2  0.219586
2007Q3  1.570087
2007Q4  0.350137
2008Q1  0.438966
2008Q2  0.615457
2008Q3  0.119022
2008Q4  0.257887
2009Q1  0.129785
2009Q2 -0.056094
2009Q3 -0.066123
```
## Advanced Modelling Features
### Parallelism for multivariate and hierarchical broadcasting
Broadcasting is a method in computer programming, particularly in numerical computing and parallel processing, that allows operations to be applied to datasets with different shapes or dimensions by automatically expanding smaller datasets to match the shape of larger ones.
**Multivariate Broadcasting:**
Multivariate broadcasting refers to extending the concept of broadcasting to operations involving multiple variables (datasets) of differing shapes. Instead of just aligning one smaller dataset to another, multiple smaller datasets are expanded to align with the largest dimensions among them.
#### Mechanism
* All datasets are aligned by expanding (not copying) their dimensions to match the target shape.
* Broadcasting rules are applied across all dimensions simultaneously.
* Operations are executed element-wise on the expanded datasets.
#### Parallelism in Multivariate Broadcasting: 
Parallel processing can split the computations across processors. Each processor handles a subset of the expanded datasets, enabling simultaneous computation and reducing the overall execution time.
##### Advantages
* Reduces memory usage since data expansion is virtual .
* Speeds up computation by leveraging multiple processors.
* Simplifies coding for large-scale numerical problems, such as matrix algebra or multivariate statistics.
#### Hierarchical Broadcasting
Hierarchical broadcasting extends broadcasting to datasets organized in a hierarchy, such as nested arrays, tensors, or multi-dimensional structures where lower-level datasets need to align with higher-level structures.
##### Mechanism
* Operate within each level of the hierarchy independently.
* Broadcast the data across levels as necessary, ensuring compatibility between parent and child structures.
* Combine results from lower levels hierarchically to form the final output.
#### Parallelism in Hierarchical Broadcasting
Hierarchical structures can be divided across processors. Each processor focuses on one part of the hierarchy, allowing computations at different levels to proceed independently and concurrently.
##### Advantages
* Enables efficient computation for complex structures like tensors, neural networks, and hierarchical datasets.
* Exploits nested patterns for operations, such as in multi-resolution analysis or multi-level data processing.
* Allows seamless scaling to distributed systems, making it suitable for big data applications.
##### Benefits of Parallelism in Broadcasting
*  **Enhanced Performance:**
   * By distributing workloads across multiple processors, parallelism accelerates computations.
   * Ideal for large datasets and high-dimensional problems.
*  **Memory Efficiency:**
   * Virtual broadcasting avoids unnecessary memory duplication, even when parallelism is applied.
* **Simplified Coding:**
   * Parallel broadcasting abstracts the complexity of aligning and distributing data, allowing programmers to focus on the logic of their operations.
* **Scalability:**
  * Works effectively with distributed computing systems, such as clusters or cloud-based environments.
* **Flexibility in Applications:**
   * Supports a wide range of applications, including numerical simulation, machine learning, and hierarchical data analysis.
#### Summary
By combining broadcasting with parallelism, both multivariate and hierarchical computations become more efficient, scalable, and user-friendly, catering to diverse domains in scientific computing and data science.
#### Code Implementation
```python=
import warnings
import pandas as pd
from sktime.forecasting.arima import ARIMA
from sktime.datasets import load_longley
# Suppress warnings
warnings.filterwarnings("ignore")
# Initialize the ARIMA forecaster
forecaster = ARIMA()
# Load the Longley dataset and drop unnecessary columns
_, y = load_longley()
y = y.drop(columns=["UNEMP", "ARMED", "POP"])
# Set parallel backend configuration
forecaster.set_config(**{"backend:parallel": "loky"})  # or "multiprocessing" / "dask"
forecaster.set_config(**{"backend:parallel:params": {"n_jobs": 2}})
# Fit the forecaster on the data
forecaster.fit(y, fh=[1, 2, 3])
# Access the forecasters used
forecasters = forecaster.forecasters_
# Convert forecasters to pandas DataFrame for better visualization
forecasters_df = pd.DataFrame(
    [(key, str(value)) for key, value in forecasters.items()],
    columns=["Variable", "Forecaster"]
)
# Output the forecasters as a DataFrame
print("Forecasters DataFrame:")
print(forecasters_df)
```
```
Forecasters DataFrame:
  Variable                                         Forecaster
0  GNPDEFL  forecasters    ARIMA()\nName: GNPDEFL, dtype: ...
1      GNP   forecasters    ARIMA()\nName: GNP, dtype: object
```
#### Hierarchy Example

```python=
import numpy as np
import pandas as pd
from sktime.forecasting.ets import AutoETS
from sktime.utils._testing.hierarchical import _make_hierarchical
from sktime.forecasting.model_selection import temporal_train_test_split
def load_product_hierarchy():
    """
    Generate hierarchical sales data with two levels of hierarchy
    and apply preprocessing steps.
    """
    # Generate 5 years of daily sales data
    n_years = 5
    y = (
        _make_hierarchical(
            hierarchy_levels=(2, 4),
            min_timepoints=365 * n_years,
            max_timepoints=365 * n_years,
            random_state=0,
        )
        .drop(
            index=[
                ("h0_0", "h1_2"),
                ("h0_0", "h1_3"),
                ("h0_1", "h1_0"),
                ("h0_1", "h1_1"),
            ]
        )
        .rename(
            index={
                "h0_0": "Food preparation",
                "h0_1": "Food preservation",
                "h1_0": "Hobs",
                "h1_1": "Ovens",
                "h1_2": "Fridges",
                "h1_3": "Freezers",
            }
        )
        .reset_index()
        .rename(
            columns={
                "h0": "Product line",
                "h1": "Product group",
                "time": "Date",
                "c0": "Sales",
            }
        )
    )
    # Convert date to monthly frequency and aggregate sales
    y["Date"] = y["Date"].dt.to_period("M")
    y = y.groupby(by=["Product line", "Product group", "Date"]).sum()
    # Add random noise to simulate variability
    noise = np.random.RandomState(seed=0).normal(1, 0.3, np.shape(y))
    y = (y * noise).round(0)
    return y
# Load the generated product hierarchy data
y = load_product_hierarchy()
# Split data into training and testing sets
y_train, y_test = temporal_train_test_split(y, test_size=4)
# Display the first few rows of the training set
print(y_train.head())
# Query sales data for January 2000
result = y.loc[(slice(None), slice(None), "2000-01")]
# Display the queried data
print("Sales data for January 2000:")
print(result)
# Assuming `y_pred` is the prediction result from a forecaster
# Example forecaster
forecaster = AutoETS(auto=True)
forecaster.fit(y_train, fh=[1, 2, 3, 4])
y_pred = forecaster.predict()
# Convert prediction to a DataFrame and display
y_pred_df = y_pred.reset_index()
print("Predicted Sales Data:")
print(
    pd.DataFrame({
        "Product line": y_pred_df["Product line"].tolist(),
        "Product group": y_pred_df["Product group"].tolist(),
        "Date": y_pred_df["Date"].tolist(),
        "Predicted Sales": y_pred_df["Sales"].round(2).tolist()
    }).to_string(index=False)
)
```
```
Sales
Product line     Product group Date          
Food preparation Hobs          2000-01  245.0
                               2000-02  144.0
                               2000-03  184.0
                               2000-04  265.0
                               2000-05  236.0
Sales data for January 2000-01:
                                 
Product line      Product group  Sales     
Food preparation  Hobs           245.0
                  Ovens          114.0
Food preservation Freezers       164.0
                  Fridges        136.0  
Predicted Sales Data:
     Product line Product group    Date  Predicted Sales
 Food preparation          Hobs 2004-09           135.50
 Food preparation          Hobs 2004-10           135.49
 Food preparation          Hobs 2004-11           135.47
 Food preparation          Hobs 2004-12           135.46
 Food preparation         Ovens 2004-09           182.82
 Food preparation         Ovens 2004-10           183.98
 Food preparation         Ovens 2004-11           185.14
 Food preparation         Ovens 2004-12           186.30
Food preservation      Freezers 2004-09           148.41
Food preservation      Freezers 2004-10           148.41
Food preservation      Freezers 2004-11           148.41
Food preservation      Freezers 2004-12           148.41
Food preservation       Fridges 2004-09           139.04
Food preservation       Fridges 2004-10           139.04
Food preservation       Fridges 2004-11           139.04
Food preservation       Fridges 2004-12           139.04
```
### `skpro`-Alternative method for statistic forecasting
#### Overview
`skpro` is a Python library designed for probabilistic forecasting, offering tools to predict, evaluate, and handle uncertainty in outcomes. It extends the familiar Scikit-learn API to enable users to incorporate probabilistic modeling seamlessly.
##### Features
* **Probabilistic Model Framework:**
  * Models generate probability distributions as predictions.
  * Supports various distribution types and custom configurations.
* **Evaluation Metrics:**
  * Includes probabilistic-specific metrics like Continuous Ranked Probability Score (CRPS) to evaluate forecast quality.
  * These metrics assess both accuracy and uncertainty estimation.
* **Flexible Pipelines:**
  * Users can build modular pipelines for probabilistic forecasting, leveraging Scikit-learn's interoperability.
* **Applications:**
  * Stock price distribution prediction for risk analysis.
  * Probabilistic weather models for emergency planning.
  * Demand forecasting for supply chain optimization.
##### Benefits of Probabilistic Forecasting with `skpro`
* **Incorporates Uncertainty:** Provides decision-makers with a better understanding of risks and variances, aiding in robust planning.
* **Improves Decision Quality:** Enables strategies like hedging against worst-case scenarios in finance or supply chain buffering.
* **Broad Applicability:** Suitable for time series forecasting, regression tasks, and real-world problems involving uncertain outcomes.
* **Scalable and Flexible:** Leverages Scikit-learn's ecosystem, allowing users to integrate probabilistic models into existing workflows with minimal learning curve.
* **Enhanced Evaluation:** By using metrics tailored to probabilistic outputs, models can be rigorously assessed beyond point prediction accuracy.
##### Summary:
Probabilistic forecasting, with its focus on distribution outputs, represents a leap forward from deterministic predictions by embracing uncertainty. Tools like skpro empower practitioners to model and evaluate uncertainty, making them invaluable for applications requiring risk management and strategic decision-making.
##### Code Implement
```python=
import numpy as np
import matplotlib.pyplot as plt
from sklearn.datasets import load_diabetes
from sklearn.ensemble import RandomForestRegressor
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import train_test_split
from skpro.regression.residual import ResidualDouble
from skpro.utils.plotting import plot_crossplot_interval
# Load the diabetes dataset
X, y = load_diabetes(return_X_y=True, as_frame=True)
# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y)
# Define the base and residual regressors
reg_mean = LinearRegression()
reg_resid = RandomForestRegressor()
# Create the probabilistic regression model
reg_proba = ResidualDouble(reg_mean, reg_resid)
# Fit the probabilistic model on the training data
reg_proba.fit(X_train, y_train)
# Predict the probabilistic intervals on the test set
y_pred_proba = reg_proba.predict_proba(X_test)
# Plot the prediction intervals and the y=x line
plot_crossplot_interval(y_test, y_pred_proba, coverage=0.9)
x = np.linspace(min(y_test), max(y_test), 100)
plt.plot(x, x, color='black', linestyle='--', label='y=x')
plt.legend()
plt.show()
```

### Benchmarking - comparing estimator performance
#### Overview
Benchmarking in the context of estimator performance refers to systematically evaluating and comparing the performance of different machine learning models (estimators) on the same dataset(s). <br>
The goal is to identify which model performs best for a specific task or to assess whether a proposed method offers improvements over established techniques.
#### Process of Benchmarking
1. **Dataset Selection:** Choose a representative dataset (or datasets) that reflects the problem you aim to solve.
2. **Model Selection:** Include a variety of estimators. These could range from simple baseline models (e.g., linear regression) to advanced techniques (e.g., ensemble methods or deep learning models).
3. **Performance Metrics:** Define the metrics relevant to your task. For example:
   * Classification tasks: Accuracy, F1-score, AUC-ROC.
   * Regression tasks: Mean Absolute Error (MAE), Mean Squared Error (MSE).
4. **Cross-Validation:** Use cross-validation techniques (e.g., k-fold cross-validation) to ensure the results are not biased by the specific data split.
5. **Fair Comparisons:** Ensure all models are trained on the same training data and evaluated on the same validation or test data.
##### Advantages of Benchmarking
* **Fair and Objective Evaluation:** Provides a systematic way to compare models under the same conditions, ensuring objectivity.
* **Insight into Model Suitability:** Helps identify which model is best suited for a specific dataset or task.
* **Improved Model Development:** Highlights strengths and weaknesses of different approaches, guiding future improvements.
* **Efficiency in Model Selection:** Simplifies the process of selecting an appropriate model by providing a direct comparison.
* **Validation of New Methods:** Offers a mechanism to assess whether a new algorithm genuinely outperforms existing methods.
* **Transparency and Reproducibility:** A well-documented benchmarking process ensures that others can reproduce the results and verify claims.
##### Practical Example:
Imagine you are predicting housing prices and have three models to compare:
1. Linear Regression
2. Random Forest
3. XGBoost
##### Benchmarking Process:
1. Use the same dataset split into training and testing sets.
2. Train each model on the training set.
3. Evaluate each model on the test set using MSE and R² metrics.
4. Use k-fold cross-validation to assess performance consistency.
5. Compile the results:
    * Linear Regression: MSE = 3000, R² = 0.7
    * Random Forest: MSE = 2000, R² = 0.85
    * XGBoost: MSE = 1800, R² = 0.9
**Insights:**
From the benchmarking results, XGBoost performs best in terms of both metrics, making it the preferred model for this task.
#### Summary:
Benchmarking helps ensure that model comparisons are meaningful, fair, and reproducible, enabling practitioners to make informed decisions when selecting or improving machine learning models.
#### Code Implementation
```python=
from sktime.benchmarking.forecasting import ForecastingBenchmark
from sktime.datasets import load_airline
from sktime.forecasting.model_selection import ExpandingWindowSplitter
from sktime.forecasting.naive import NaiveForecaster
from sktime.performance_metrics.forecasting import MeanSquaredPercentageError
# Initialize the forecasting benchmark
benchmark = ForecastingBenchmark()
# Add competing estimators
benchmark.add_estimator(
    estimator=NaiveForecaster(strategy="mean", sp=12),
    estimator_id="NaiveForecaster-mean-v1"
)
benchmark.add_estimator(
    estimator=NaiveForecaster(strategy="last", sp=12),
    estimator_id="NaiveForecaster-last-v1"
)
# Define cross-validation splitter
cv_splitter = ExpandingWindowSplitter(
    initial_window=24,
    step_length=12,
    fh=12
)
# Define scoring metrics
scorers = [MeanSquaredPercentageError()]
# Define dataset loaders
dataset_loaders = [load_airline]
# Add tasks for each dataset
for dataset_loader in dataset_loaders:
    benchmark.add_task(
        dataset_loader=dataset_loader,
        cv_splitter=cv_splitter,
        scorers=scorers
    )
# Run the benchmark and save results to a CSV file
results_df = benchmark.run("./forecasting_results.csv")
# Display the results transposed for better readability
print("Benchmark results:")
print(results_df.T)
```
<div>
<style scoped>
    
</style>
<table border="1" class="dataframe">
  <thead>
    <tr style="text-align: right;">
      <th></th>
      <th>0</th>
      <th>1</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <th>validation_id</th>
      <td>[dataset=load_airline]_[cv_splitter=ExpandingW...</td>
      <td>[dataset=load_airline]_[cv_splitter=ExpandingW...</td>
    </tr>
    <tr>
      <th>model_id</th>
      <td>NaiveForecaster-last-v1</td>
      <td>NaiveForecaster-mean-v1</td>
    </tr>
    <tr>
      <th>runtime_secs</th>
      <td>0.211127</td>
      <td>0.088294</td>
    </tr>
    <tr>
      <th>MeanSquaredPercentageError_fold_0_test</th>
      <td>0.024532</td>
      <td>0.049681</td>
    </tr>
    <tr>
      <th>MeanSquaredPercentageError_fold_1_test</th>
      <td>0.020831</td>
      <td>0.0737</td>
    </tr>
    <tr>
      <th>MeanSquaredPercentageError_fold_2_test</th>
      <td>0.001213</td>
      <td>0.05352</td>
    </tr>
    <tr>
      <th>MeanSquaredPercentageError_fold_3_test</th>
      <td>0.01495</td>
      <td>0.081063</td>
    </tr>
    <tr>
      <th>MeanSquaredPercentageError_fold_4_test</th>
      <td>0.031067</td>
      <td>0.138163</td>
    </tr>
    <tr>
      <th>MeanSquaredPercentageError_fold_5_test</th>
      <td>0.008373</td>
      <td>0.145125</td>
    </tr>
    <tr>
      <th>MeanSquaredPercentageError_fold_6_test</th>
      <td>0.007972</td>
      <td>0.154337</td>
    </tr>
    <tr>
      <th>MeanSquaredPercentageError_fold_7_test</th>
      <td>0.000009</td>
      <td>0.123298</td>
    </tr>
    <tr>
      <th>MeanSquaredPercentageError_fold_8_test</th>
      <td>0.028191</td>
      <td>0.185644</td>
    </tr>
    <tr>
      <th>MeanSquaredPercentageError_fold_9_test</th>
      <td>0.003906</td>
      <td>0.184654</td>
    </tr>
    <tr>
      <th>MeanSquaredPercentageError_mean</th>
      <td>0.014104</td>
      <td>0.118918</td>
    </tr>
    <tr>
      <th>MeanSquaredPercentageError_std</th>
      <td>0.011451</td>
      <td>0.051265</td>
    </tr>
  </tbody>
</table>
</div>
By MSPE for NaiveForecaster-last-v1 MSPE is 0.014104,and NaiveForecaster-mean-v1 MSPE is 0.118918 ,we can conclude that last is better than mean.
## Conclusion
The use of machine learning, particularly with the `sktime` library, offers significant benefits in time series analysis. Machine learning models provide flexibility in handling non-stationary data and capturing complex patterns. `sktime` streamlines the process with its user-friendly interface, comprehensive algorithms, and modular design, enabling efficient prototyping and analysis.
Looking ahead, several areas hold potential for further exploration:
* **Deep Learning Approaches:** Investigating the capabilities of LSTMs, GRUs, and Transformers in capturing long-term dependencies and non-linear patterns.
* **Hybrid Models:** Combining statistical and machine learning models to leverage the strengths of both approaches.
* **Anomaly Detection:** Developing robust methods for identifying unusual events or patterns in time series data.
Advanced topics, such as probabilistic forecasting and forecasting pipelines, highlighted the importance of managing uncertainty and automating workflows for efficiency. Comparative analysis showed that AutoETS excels with seasonal data, whereas AutoARIMA performs better on non-stationary datasets.
## Reference
* Cerqueira, V., Torgo, L., Moniz, N., & Silva, F. (2023). Sktime: A Unified Python Library for Time Series Machine Learning. In Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (pp. 1842-1853). [Link](https://arxiv.org/pdf/1909.07872)
* Hyndman, R.J., & Athanasopoulos, G. (2021) Forecasting: Principles and Practice (3rd ed). OTexts: Melbourne, Australia. [Link]([07872](https://otexts.com/fpp3/))
* James, G., Witten, D., Hastie, T., & Tibshirani, R. (2013). An Introduction to Statistical Learning (Vol. 112). New York: Springer. [Link](https://www.statlearning.com/)
* Moniz, N., & Cerqueira, V. (2023). sktime-tutorial-pydata-global-2023. GitHub Repository. [Link](https://github.com/sktime/sktime-tutorial-pydata-global-2023/blob/main/notebooks/01_introduction.ipynb)
* `sktime` GitHub Repository. [Link](https://github.com/sktime/sktime)
             
            {"description":"Introduction to mechine learning","lang":"zh-TW","showTags":"true","breaks":false,"title":"Time Series Models","contributors":"[{\"id\":\"155e4ae3-957a-4ddd-a560-3f8806fb59fd\",\"add\":214834,\"del\":133070},{\"id\":\"9e38ee55-7b6f-408d-a9e3-a3a99f1fde0e\",\"add\":34294,\"del\":53823}]"}