0/1_trading_strategy

# Two-State Market Regime Detection and 0/1 Trading Strategy This example demonstrates the use of various models to identify market regimes (bull and bear) and then applies a 0/1 trading strategy based on these regime predictions. ## Package Installation and Imports ```python= # Jupyter Notebook Setup %matplotlib inline %load_ext autoreload %autoreload 2 # Imports import numpy as np import matplotlib.pyplot as plt import pandas as pd import yfinance as yf import pandas_datareader as pdr from sklearn.preprocessing import StandardScaler from sklearn.model_selection import train_test_split import cvxportfolio as cvx from jumpmodels import filter_date_range, raise_arr_to_pd_obj, plot_regime ``` ## Data Loading and Feature Engineering ### Dataset The dataset encompasses historical price data for the Dow Jones Industrial Average (DJIA), retrieved from Yahoo Finance, spanning from January 1, 2000, to July 30, 2024. Additionally, interest rate data from FRED is included. ```python= # Constants START_DATE = '1999-01-01' END_DATE = '2024-07-30' TRAIN_START, TRAIN_END = "2000-01-06", "2020-01-06" TEST_START, TEST_END = "2020-01-06", "2024-01-06" # Function Definitions def load_data(start_date, end_date): """Load DJIA data from Yahoo Finance and interest rate data from FRED.""" djia_data = yf.download('^DJI', start=start_date, end=end_date) returns = pd.DataFrame({ 'prc': djia_data['Adj Close'], 'DJI': djia_data['Adj Close'].pct_change() }) returns["USDOLLAR"] = pdr.get_data_fred('DTB3', start=start_date, end=end_date) / (252 * 100) return returns.fillna(method='ffill').iloc[1:] # Data Loading returns = load_data(START_DATE, END_DATE) # Plotting DJIA Close Price plt.figure(figsize=(12, 6)) plt.plot(returns.prc, label='Close Price') plt.title('DJIA Close Price Chart') plt.xlabel('Date') plt.ylabel('Price') plt.legend() plt.grid(True) plt.show() ``` ![output](https://hackmd.io/_uploads/BkslsXga0.png) ### Feature Engineering The following features are constructed to aid in regime detection: * **Volatility Features:** Exponential moving standard deviation of returns is calculated using various half-lives (5, 15, 45 days). Shorter half-lives emphasize recent volatility, while longer ones offer a smoother, long-term perspective. * **Return Features:** Exponential moving average of returns is computed with different half-lives (5, 15, 45 days) to capture short-term, medium-term, and long-term return trends. These features are then lagged by one day to ensure they are available for prediction at each time step. ```python= def load_features(ret_ser: pd.Series) -> pd.DataFrame: """Load volatility and return features with different half-lives.""" feat_dict = {} for hl in [5, 15, 45]: feat_dict[f"vol_{hl}"] = ret_ser.ewm(halflife=hl).std() feat_dict[f"ret_{hl}"] = ret_ser.ewm(halflife=hl).mean() return pd.DataFrame(feat_dict) # Feature Engineering X = load_features(returns.DJI).shift(1).dropna() X = filter_date_range(X, TRAIN_START, TEST_END) # Display the first few rows of the features X.head() ``` ``` vol_5 vol_15 vol_45 ret_5 ret_15 ret_45 Date 2000-01-06 0.013612 0.010945 0.010400 -0.001800 0.000210 0.000508 2000-01-07 0.013547 0.010966 0.010413 -0.000046 0.000731 0.000680 2000-01-10 0.015145 0.011772 0.010722 0.003057 0.001778 0.001035 2000-01-11 0.014137 0.011516 0.010648 0.003219 0.001892 0.001085 2000-01-12 0.013518 0.011353 0.010595 0.002119 0.001568 0.000988 ``` ## Data Processing ### Train-Test Split The dataset is divided into training and testing sets to evaluate the models' performance on unseen data. * Training set: 2000-01-06 to 2020-01-06 * Testing set: 2020-01-06 to 2024-01-06 The `filter_date_range` function originates from the [jump-models library](https://github.com/Yizhan-Oliver-Shu/jump-models/tree/master). ```python= # Train-Test Split X_train = filter_date_range(X, TRAIN_START, TRAIN_END) X_test = filter_date_range(X, TEST_START, TEST_END) ret_ser_train = filter_date_range(returns.DJI, TRAIN_START, TRAIN_END) ret_ser_test = filter_date_range(returns.DJI, TEST_START, TEST_END) print(f"Training set shape: {X_train.shape}") print(f"Testing set shape: {X_test.shape}") ``` ``` Training set shape: (5031, 6) Testing set shape: (1008, 6) ``` ### Standardization The features are standardized using `StandardScaler` to ensure that all features contribute equally during the K-means clustering process. This prevents features with larger scales from dominating the clustering algorithm. ```python= # Standardization scaler = StandardScaler() X_train_standard = pd.DataFrame( scaler.fit_transform(X_train), index=X_train.index, columns=X_train.columns ) X_test_standard = pd.DataFrame( scaler.transform(X_test), index=X_test.index, columns=X_test.columns ) print("Standardized training data:") print(X_train_standard.head()) ``` ``` Standardized training data: vol_5 vol_15 vol_45 ret_5 ret_15 ret_45 Date 2000-01-06 0.670313 0.196530 0.041669 -0.785051 -0.026743 0.342959 2000-01-07 0.659351 0.200406 0.044297 -0.112465 0.345880 0.568258 2000-01-10 0.926212 0.348831 0.109143 1.078394 1.095470 1.034543 2000-01-11 0.757969 0.301578 0.093510 1.140514 1.177211 1.100175 2000-01-12 0.654578 0.271650 0.082486 0.718307 0.945420 0.972485 ``` ## Regime Detection Models This section explores various models to detect distinct market regimes (states), typically categorized as "bull" (upward trend) or "bear" (downward trend) markets. These models leverage the engineered features to identify patterns and transitions between these regimes. ### Moving Average (MA) A simple yet effective technique that utilizes a 252-day moving average of risk-adjusted returns to generate trading signals. Returns are normalized by the trailing 20-day exponential moving standard deviation to account for risk. If the moving average is positive, it signals a long (buy) position; otherwise, a hold (cash) position is maintained. ```python= # Moving Average (MA) Model def calculate_ma_signal(returns): MA = pd.DataFrame() MA['exp_moving_std'] = returns['DJI'].ewm(span=20, adjust=False).std().shift(1) MA['normalized_returns'] = returns['DJI'] / MA['exp_moving_std'] MA['moving_avg'] = MA['normalized_returns'].rolling(window=252).mean().shift(1) MA['signal'] = MA['moving_avg'].apply(lambda x: 1 if x > 0 else 0) return MA.dropna() MA = calculate_ma_signal(returns) MA_train = filter_date_range(MA, TRAIN_START, TRAIN_END) MA_test = filter_date_range(MA, TEST_START, TEST_END) ``` ### Hidden Markov Model (HMM) A probabilistic model that assumes the market has two hidden states ("bull" and "bear") and transitions between these states according to certain probabilities. The model learns these transition probabilities and the probability distributions of returns within each state from the training data. ```python= from hmmlearn import hmm # Hidden Markov Model (HMM) def fit_hmm(X_train, X_test): hmm_model = hmm.GaussianHMM(n_components=2, random_state=42) hmm_model.fit(X_train) return hmm_model.predict(X_train), hmm_model.predict(X_test) hmm_states_train, hmm_states_test = fit_hmm(X_train_standard, X_test_standard) ``` ### K-Means Clustering A clustering algorithm that partitions the data into two clusters based on feature similarity. The idea is that the two clusters correspond to the two market regimes. ```python= from sklearn.cluster import KMeans # K-Means Clustering def fit_kmeans(X_train, X_test): kmeans = KMeans(n_clusters=2, max_iter=1000, random_state=42) kmeans.fit(X_train) return kmeans.predict(X_train), kmeans.predict(X_test) kmeans_labels_train, kmeans_labels_test = fit_kmeans(X_train_standard, X_test_standard) ``` ### Sparse K-Means A variant of K-Means that performs feature selection to identify the most relevant features for regime detection. This can improve interpretability and potentially reduce overfitting. ```python= from jumpmodels import SparseJumpModel # Sparse Jump Model def fit_sparse_kmeans_model(X_train, X_test): s_kmeans_model = SparseJumpModel(2, n_feats = 6, jump_penalty=0, cont=False, mode_loss=False, random_state=42) s_kmeans_model.fit(X_train) return s_kmeans_model.predict(X_train), s_kmeans_model.predict(X_test) sparse_kmeans_labels_train, sparse_kmeans_labels_test = fit_sparse_kmeans_model(X_train_standard, X_test_standard) ``` ### Jump Model A statistical model specifically designed to detect sudden shifts or "jumps" in market regimes. It models both the continuous evolution of the market and the possibility of abrupt regime changes. ```python= from jumpmodels import JumpModel # Jump Model def fit_jump_model(X_train, X_test): jump_model = JumpModel(2, 1, cont=False, mode_loss=False, random_state=42) jump_model.fit(X_train) return jump_model.predict(X_train), jump_model.predict(X_test) jump_labels_train, jump_labels_test = fit_jump_model(X_train_standard, X_test_standard) ``` ### Sparse Jump Model A version of the Jump Model that incorporates feature selection to identify the most informative features for regime detection. ```python= from jumpmodels import SparseJumpModel # Sparse Jump Model def fit_sparse_jump_model(X_train, X_test): s_jump_model = SparseJumpModel(2, jump_penalty=1, n_feats = 6, cont=False, mode_loss=False, random_state=42) s_jump_model.fit(X_train) return s_jump_model.predict(X_train), s_jump_model.predict(X_test) sparse_jump_labels_train, sparse_jump_labels_test = fit_sparse_jump_model(X_train_standard, X_test_standard) ``` ### Continuous Jump Model An extension of the Jump Model that allows for continuous probabilities of transitioning between regimes, providing a more nuanced view of market dynamics. ```python= # Continuous Jump Model def fit_continuous_jump_model(X_train, X_test): c_jump_model = JumpModel(2, 1, cont=True, mode_loss=False, random_state=42) c_jump_model.fit(X_train) return c_jump_model.predict(X_train), c_jump_model.predict(X_test), c_jump_model.labels_ cont_jump_labels_train, cont_jump_labels_test, c_jump_model_labels = fit_continuous_jump_model(X_train_standard, X_test_standard) ``` ### Comparison of Regime Detection Methods To facilitate a visual comparison of the various regime detection methods, a versatile function is created to plot the identified regimes alongside the DJIA price data. ```python= # Plotting function def plot_all_regimes(returns, ret_ser, start, end, labels_dict, title): n_models = len(labels_dict) fig, axes = plt.subplots(n_models, 1, figsize=(15, 5*n_models), sharex=True) for i, (model_name, labels) in enumerate(labels_dict.items()): ax = axes[i] if n_models > 1 else axes filter_date_range(pd.DataFrame({"DJIA": returns.prc}), start, end).plot(ax=ax) ax.set(ylabel="Price (log scale)", yscale="log") ax2 = ax.twinx() plot_regime(raise_arr_to_pd_obj(labels, ret_ser), ax=ax2, title=f"{model_name} Regime Labels") plt.suptitle(title, fontsize=16) plt.tight_layout() plt.show() ``` #### Training Phase Comparison ```python= # Create a dictionary of training model labels train_labels = { "MA": MA_train['signal'], "HMM": hmm_states_train, "K-Means": kmeans_labels_train, "Sparse K-Means": sparse_kmeans_labels_train, "Jump Model": jump_labels_train, "Sparse Jump Model": sparse_jump_labels_train, "Continuous Jump Model": cont_jump_labels_train } plot_all_regimes(returns, ret_ser_train, TRAIN_START, TRAIN_END, train_labels, "Regime Detection Comparison (Training Set)") ``` ![output29](https://hackmd.io/_uploads/ryY3rQaaR.png) #### Testing Phase Comparison ```python= # Create a dictionary of testing model labels test_labels = { "MA": MA_test['signal'], "HMM": hmm_states_test, "K-Means": kmeans_labels_test, "Sparse K-Means": sparse_kmeans_labels_test, "Jump Model": jump_labels_test, "Sparse Jump Model": sparse_jump_labels_test, "Continuous Jump Model": cont_jump_labels_test } # Plot all regimes plot_all_regimes(returns, ret_ser_test, TEST_START, TEST_END, test_labels) ``` ![output30](https://hackmd.io/_uploads/HkcoB76pA.png) ## 0/1 Trading Strategy This section outlines the implementation of a straightforward trading strategy based on the predicted market regimes. The strategy follows a simple rule: 1. **Identify the "Long" Regime:** * For each regime detection model, calculate the mean returns for both identified regimes (typically "bull" and "bear"). * The regime with the higher mean return is designated as the "long" regime, indicating a favorable market condition for buying or holding the asset. 2. **Generate Trading Signals:** * If the model predicts the current regime to be the "long" regime, a trading signal of 1 is generated, suggesting a "long" or "buy" position. * Conversely, if the model predicts the current regime to be the "short" regime, a trading signal of 0 is generated, indicating a "clear position" or "sell" action. This strategy aims to capitalize on upward market trends (bull regimes) while mitigating losses during downward trends (bear regimes). It's important to note that some models may label the bear market as 1 and the bull market as 0, necessitating an inversion of these labels for consistency across all models. The subsequent sections will delve into the specifics of implementing this strategy using the `Cvxportfolio` library, hyperparameter tuning for certain models, and backtesting to evaluate the strategy's performance against a benchmark. ### Data Preparation for Backtesting Prior to conducting backtests, the data is prepared to ensure compatibility with the backtesting framework. This involves: * **Fetching risk-free rate data:** The risk-free rate, representing the return on a theoretically risk-free investment, is obtained from the FRED API. This data is essential for calculating risk-adjusted performance metrics. * **Preparing out-of-sample returns:** Returns for the testing or holdout period (out-of-sample) are extracted. These returns will be used to evaluate the performance of the trading strategy on unseen data. * **Label inversion (if necessary):** As mentioned earlier, some models may label the bear market as 1 and the bull market as 0. A function is defined to invert these labels, ensuring consistency across all models. ```python= def prepare_data_for_backtesting(returns): """ Prepare data for backtesting by extracting relevant information from returns. """ ret_ = returns.drop('prc', axis=1) market_data = cvx.UserProvidedMarketData(returns=ret_, cash_key='USDOLLAR') return market_data def invert_labels(labels, index): """ Invert labels to ensure consistency (bear=0, bull=1). """ arr = np.where(labels == 0, 1, 0) cash = labels return pd.DataFrame({'DJI': arr, 'USDOLLAR': cash}, index=index) ``` ### Trading Policy Implementation This subsection focuses on defining the trading policy and the associated costs: * Transaction and holding cost models: * A transaction cost model is created using the `TcostModel` from the `Cvxportfolio` library. This model incorporates a half-spread cost to account for the bid-ask spread in the market. * A holding cost model is defined using the `HcostModel`. It includes a borrow fee based on the risk-free rate, reflecting the cost of borrowing assets for short selling. * **Market simulator:** * A `MarketSimulator` is instantiated to simulate the trading environment. It takes the prepared market data (asset returns and risk-free rate) and the cost models as inputs. * The `simulator` is configured for daily trading frequency. With the data prepared and the trading policy defined, the next step is to perform hyperparameter tuning for specific models and then proceed with the backtesting process. ```python= HALF_SPREAD = 10E-3 def create_cost_models(market_data): """ Create transaction and holding cost models. """ r_hat_with_cash = market_data.returns.rolling(window=250).mean().shift(2).dropna() borrow_fee = r_hat_with_cash.iloc[:, -1] tcost_model = cvx.TcostModel(a=HALF_SPREAD, b=None) hcost_model = cvx.HcostModel(short_fees=borrow_fee) return tcost_model, hcost_model def create_market_simulator(market_data, cost_models): """ Create a market simulator for daily trading. """ return cvx.MarketSimulator( market_data=market_data, costs=cost_models ) market_data = prepare_data_for_backtesting(returns) tcost_model, hcost_model = create_cost_models(market_data) market_sim = create_market_simulator(market_data, [tcost_model, hcost_model]) ``` ### Hyperparameter Tuning Hyperparameters are parameters that are not directly learned from the data but are set before training a machine learning model. They significantly influence the model's performance. This section focuses on tuning hyperparameters for the Jump Model, Sparse Jump Model, and Continuous Jump Model to optimize their performance in the context of the 0/1 trading strategy. The primary metric used for optimization is the Sharpe ratio, a measure of risk-adjusted return. ```python= def transfer_data(labels, index): """Transfer (bear=1, bull=0) into (bear=0, bull=1)""" arr = np.where(labels == 0, 1, 0) cash = labels return pd.DataFrame({'DJI': arr, 'USDOLLAR': cash}, index=index) def evaluate_model(model, X, start_time, end_time): labels = model.predict(X) d = transfer_data(labels, X.index) policy = cvx.FixedWeights(d) return market_sim.run_backtest(policy, start_time=start_time, end_time=end_time).sharpe_ratio def plot_model_results(model, X, ret_ser, title, START, END): labels = model.predict(X) fig, ax = plt.subplots(figsize=(12, 6)) filter_date_range(pd.DataFrame({"DJIA": returns.prc}), START, END).plot(ax=ax) ax.set(ylabel="Price (log scale)", yscale="log") ax2 = ax.twinx() plot_regime(raise_arr_to_pd_obj(labels, ret_ser), ax=ax2, title=title) plt.show() ``` #### Optimizing the Jump Model and Sparse Jump Model * The `lambda` hyperparameter, representing the jump penalty, is tuned. Higher values of `lambda` discourage frequent regime shifts, potentially leading to more stable predictions and reduced transaction costs. * A range of `lambda` values is explored, and the models are trained and evaluated for each value. * The `lambda` that yields the highest Sharpe ratio is selected as the optimal value. ```python= def optimize_jump_params(X, lambdas, TRAIN_START, TRAIN_END): best_sharp = -np.inf best_lambda = None for lambda_ in lambdas: jump_model = JumpModel(2, lambda_, cont=False, mode_loss=False, random_state=42) jump_model.fit(X) sharp = evaluate_model(jump_model, X, TRAIN_START, TRAIN_END) print(f"Lambda: {lambda_}, Sharpe ratio: {sharp}") if sharp > best_sharp: best_sharp, best_lambda = sharp, lambda_ return best_lambda, best_sharp lambdas = [0, 0.1, 1, 10, 100] best_lambda, best_sharp = optimize_jump_params(X_train_standard, lambdas, TRAIN_START, TRAIN_END) print(f"Best lambda: {best_lambda}, Best Sharpe ratio: {best_sharp}") o_jump_model = JumpModel(2, best_lambda, cont=False, mode_loss=False, random_state=42) o_jump_model.fit(X_train_standard) o_jump_labels_train = o_jump_model.predict(X_train_standard) o_jump_labels_test = o_jump_model.predict(X_test_standard) plot_model_results(o_jump_model, X_train_standard, ret_ser_train, "Optimized Jump Model Clustering Regime Labels", TRAIN_START, TRAIN_END) ``` ``` Lambda:0 Best Sharpe ratio:-0.5331725149959207 Lambda:0.1 Best Sharpe ratio:-0.3818304253979579 Lambda:1 Best Sharpe ratio:0.07086604114063068 Lambda:10 Best Sharpe ratio:0.4146420994478566 Lambda:100 Best Sharpe ratio:0.35580412735334294 Best lambda:10 Best Sharpe ratio:0.4146420994478566 ``` ![output5](https://hackmd.io/_uploads/S1SWm-dTR.png) ```python= def optimize_sparse_jump_params(X, lambdas, START, END): best_sharp = -np.inf best_lambda = None for lambda_ in lambdas: jump_model = SparseJumpModel(2, jump_penalty=lambda_, n_feats = 6, cont=False, mode_loss=False, random_state=42) jump_model.fit(X) sharp = evaluate_model(jump_model, X, START, END) print(f"Lambda: {lambda_}, Sharpe ratio: {sharp}") if sharp > best_sharp: best_sharp, best_lambda = sharp, lambda_ return best_lambda, best_sharp best_lambda, best_sharp = optimize_sparse_jump_params(X_train_standard, lambdas, TRAIN_START, TRAIN_END) print(f"Best lambda: {best_lambda}, Best Sharpe ratio: {best_sharp}") o_s_jump_model = SparseJumpModel(2, jump_penalty=best_lambda, n_feats = 6, cont=False, mode_loss=False, random_state=42) o_s_jump_model.fit(X_train_standard) o_sparse_jump_labels_train = o_s_jump_model.predict(X_train_standard) o_sparse_jump_labels_test = o_s_jump_model.predict(X_test_standard) plot_model_results(o_s_jump_model, X_train_standard, ret_ser_train, "Optimized Sparse Jump Model Clustering Regime Labels", TRAIN_START, TRAIN_END) ``` ``` Lambda:0 Best Sharpe ratio:-0.17992049454505907 Lambda:0.1 Best Sharpe ratio:0.14405501384844405 Lambda:1 Best Sharpe ratio:0.2810170936921112 Lambda:10 Best Sharpe ratio:0.32361325886607356 Lambda:100 Best Sharpe ratio:0.32607070819098544 Best lambda:100 Best Sharpe ratio:0.32607070819098544 ``` ![output7](https://hackmd.io/_uploads/S12yuZuTA.png) #### Optimizing the Continuous Jump Model * In addition to `lambda`, the `threshold` hyperparameter is also tuned for the Continuous Jump Model. This threshold is used to convert the continuous probability vectors into discrete regime labels (0 or 1). * A grid search is performed over a range of `lambda` and `threshold` values. * The combination that results in the highest Sharpe ratio is chosen as the optimal configuration. ```python= def optimize_continuous_jump_model_params_with_threshold(X, lambdas, thresholds, START, END): best_sharp, best_lambda, best_threshold = -np.inf, None, None for lambda_ in lambdas: for threshold in thresholds: model = JumpModel(2, jump_penalty=lambda_, cont=True, mode_loss=False, random_state=42) model.fit(X) proba = model.proba_ labels = (proba.iloc[:, 1] > threshold).astype(int) d = transfer_data(labels, X.index) policy = cvx.FixedWeights(d) sharp = market_sim.run_backtest(policy, start_time=START, end_time=END).sharpe_ratio if sharp > best_sharp: best_sharp, best_lambda, best_threshold = sharp, lambda_, threshold return best_lambda, best_threshold, best_sharp lambdas = [0, 0.1, 1, 10, 100] thresholds = [0.1, 0.3, 0.5, 0.7, 0.9] best_lambda, best_threshold, best_sharp = optimize_continuous_jump_model_params_with_threshold(X_train_standard, lambdas, thresholds, TRAIN_START, TRAIN_END) print(f"Best lambda: {best_lambda}, Best threshold: {best_threshold}, Best Sharpe ratio: {best_sharp}") o_c_jump_model = JumpModel(2, best_lambda, cont=True, mode_loss=False, random_state=42) o_c_jump_model.fit(X_train_standard) o_cont_jump_labels_train = (o_c_jump_model.proba_.iloc[:, 1] > best_threshold).astype(int) o_cont_jump_labels_test = (o_c_jump_model.predict(X_test_standard).proba_.iloc[:, 1] > best_threshold).astype(int)) plot_model_results(lambda x: (o_c_jump_model.predict_proba(x)[:, 1] > best_threshold).astype(int), X_train_standard, ret_ser_train, "Optimized Continuous Jump Model Clustering Regime Labels", TRAIN_START, TRAIN_END) ``` ``` Best lambda: 100, Best threshold: 0.5, Best Sharpe ratio: 0.5796858615234578 ``` ![output6](https://hackmd.io/_uploads/r1HTv-O6C.png) The optimized models are then used for further analysis and backtesting. It's important to note that hyperparameter tuning is performed on the training data to avoid overfitting to the testing data. ### Backtesting Backtesting is a crucial step in evaluating the performance of a trading strategy using historical data. It simulates the strategy's trades and calculates various performance metrics to assess its effectiveness. In this section, both in-sample and out-of-sample backtesting are conducted for the different regime detection models combined with the 0/1 trading strategy. The buy-and-hold strategy serves as a benchmark for comparison. ```python= def backtest(X, labels, start_time, end_time): d = transfer_data(labels, X.index) policy = cvx.FixedWeights(d) return market_sim.run_backtest(policy, start_time=start_time, end_time=end_time) ``` #### Performance Metrics * Annualized Return: The average yearly return of the strategy. * Volatility: The standard deviation of the strategy's returns, indicating its risk. * Sharpe Ratio: A measure of risk-adjusted return, calculated as the excess return (over the risk-free rate) divided by volatility. * Maximum Drawdown: The largest peak-to-trough decline in portfolio value, representing the worst-case loss. * Turnover: The average percentage of the portfolio that is traded each period, reflecting transaction costs. ```python= def print_metrics(results, title, bench): print(f"\n{title}") print("-" * 100) print(f"{'Strategy':<35} {'Return':<10} {'Volatility':<15} {'Sharpe Ratio':<15} {'Max Drawdown':<15} {'Turnover':<10}") print("-" * 100) for name, result in results.items(): print(f"{name:<30} {result.annualized_average_return * 100:10.2f}% {result.annualized_volatility * 100:10.2f}% {result.sharpe_ratio:15.2f} {result.drawdown.min() * 100:15.2f}% {result.turnover.mean() * 100:12.3f}%") print(f"{'Buy and Hold':<30} {bench.annualized_average_return * 100:10.2f}% {bench.annualized_volatility * 100:10.2f}% {bench.sharpe_ratio:15.2f} {bench.drawdown.min() * 100:15.2f}% {bench.turnover.mean() * 100:12.3f}%") print("-" * 100) def plot_performance(results, benchmark, title): plt.figure(figsize=(12, 8)) for name, result in results.items(): result.v.plot(label=name) benchmark.v.plot(label='Buy and Hold') plt.xlabel('Date') plt.ylabel('Portfolio Value') plt.title(title) plt.legend() plt.show() ``` #### Define Models and Benchmark ```python= models = { 'Moving Average': (MA_train['signal'], MA_test['signal']), 'K-Means': (kmeans_labels_train, kmeans_labels_test), 'Sparse K-Means': (sparse_kmeans_labels_train, sparse_kmeans_labels_test), 'HMM': (hmm_states_train, hmm_states_test), 'Jump Model': (jump_labels_train, jump_labels_test), 'Sparse Jump Model': (sparse_jump_labels_train, sparse_jump_labels_test), 'Continuous Jump Model': (cont_jump_labels_train, cont_jump_labels_test), 'Optimized Jump Model': (o_jump_labels_train, o_jump_labels_test), 'Optimized Sparse Jump Model': (o_sparse_jump_labels_train, o_sparse_jump_labels_test), 'Optimized Continuous Jump Model': (o_cont_jump_labels_train, o_cont_jump_labels_test) } # Benchmark: Buy and Hold d_target_weights = pd.Series({'DJI': 1.0, 'USDOLLAR': 0.0}) bench = market_sim.run_backtest(cvx.FixedWeights(d_target_weights), start_time=TRAIN_START, end_time=TRAIN_END ) bench_out = market_sim.run_backtest(cvx.FixedWeights(d_target_weights), start_time=TEST_START, end_time=TEST_END ) ``` #### In-sample Backtesting * In-sample backtesting evaluates the performance of the models on the same data used for training. * Portfolio values are plotted for each model and the benchmark. * Performance metrics, including annualized return, volatility, Sharpe ratio, maximum drawdown, and turnover, are calculated and compared. ```python= in_sample_results = {name: backtest(X_train_standard, labels[0], TRAIN_START, TRAIN_END) for name, labels in models.items()} plot_performance(in_sample_results, bench, 'In-Sample Performance Comparison') print_metrics(in_sample_results, "In-Sample Results", bench) ``` ![output27](https://hackmd.io/_uploads/SJFRBXpTA.png) ``` In-Sample Results ---------------------------------------------------------------------------------------------------- Strategy Return Volatility Sharpe Ratio Max Drawdown Turnover ---------------------------------------------------------------------------------------------------- Moving Average -1.39% 11.25% -0.27 -58.68% 1.229% K-Means -4.96% 12.28% -0.54 -80.24% 1.748% Sparse K-Means -0.73% 11.79% -0.20 -60.92% 1.009% HMM 4.02% 8.74% 0.27 -16.83% 0.450% Jump Model 2.43% 11.94% 0.07 -44.64% 0.809% Sparse Jump Model 4.79% 11.41% 0.28 -34.65% 0.330% Continuous Jump Model 2.34% 11.97% 0.06 -46.15% 0.849% Optimized Jump Model 6.56% 11.89% 0.41 -30.18% 0.310% Optimized Sparse Jump Model 6.46% 14.82% 0.33 -40.64% 0.050% Optimized Continuous Jump Model 8.28% 11.48% 0.58 -14.21% 0.250% Buy and Hold 6.27% 17.80% 0.26 -53.78% 0.010% ---------------------------------------------------------------------------------------------------- ``` #### Out-of-Sample Backtesting * Out-of-sample backtesting assesses the models' performance on unseen data (the testing set). * This is a more realistic evaluation of how the models would perform in real-world trading. * Portfolio values and performance metrics are again calculated and compared. ```python= out_of_sample_results = {name: backtest(X_test_standard, labels[1], TEST_START, TEST_END) for name, labels in models.items()} plot_performance(out_of_sample_results, bench_out, 'Out-of-Sample Performance Comparison') print_metrics(out_of_sample_results, "Out-of-Sample Results", bench_out) ``` ![output28](https://hackmd.io/_uploads/SJB1Im66R.png) ``` Out-of-Sample Results ---------------------------------------------------------------------------------------------------- Strategy Return Volatility Sharpe Ratio Max Drawdown Turnover ---------------------------------------------------------------------------------------------------- Moving Average -4.54% 12.72% -0.51 -30.46% 1.546% K-Means -6.23% 13.60% -0.60 -37.43% 1.945% Sparse K-Means -0.54% 11.79% -0.18 -59.36% 0.989% HMM -1.75% 8.96% -0.40 -22.80% 0.849% Jump Model 2.66% 12.98% 0.06 -19.95% 0.947% Sparse Jump Model 2.35% 12.38% 0.04 -17.45% 0.549% Continuous Jump Model 1.87% 13.08% -0.00 -19.95% 0.947% Optimized Jump Model 10.13% 12.67% 0.50 -11.98% 0.350% Optimized Sparse Jump Model 6.94% 14.94% 0.34 -21.94% 0.150% Optimized Continuous Jump Model 10.14% 12.62% 0.50 -12.79% 0.350% Buy and Hold 9.01% 22.51% 0.32 -37.09% 0.050% ---------------------------------------------------------------------------------------------------- ``` ### Results and Discussion The performance of various models, combined with the 0/1 trading strategy, was evaluated through in-sample and out-of-sample backtesting. The results offer valuable insights into their effectiveness in identifying market regimes and generating profitable trades. #### In-Sample Backtesting * **Jump Models' Advantage:** Jump Models, especially the optimized versions, demonstrated a clear advantage in terms of higher Sharpe ratios and returns compared to other models. Their ability to incorporate penalties for frequent regime shifts resulted in lower turnover, translating to reduced transaction costs. * **K-Means and Moving Average:** K-Means and Moving Average models underperformed, exhibiting negative Sharpe ratios. This suggests that their returns might not adequately compensate for the associated risk. * **Volatility:** Volatility remained relatively consistent across all models, indicating a similar level of risk exposure. #### Out-of-Sample Backtesting * **Optimized Models' Continued Dominance:** The optimized Jump Model and Continuous Jump Model maintained their superior performance in the out-of-sample period, achieving the highest returns and Sharpe ratios. This suggests their robustness and ability to generalize well to unseen data. * **K-Means and Moving Average:** K-Means and Moving Average models showed some improvement compared to in-sample results but still lagged behind the optimized models. * **Effective Bear Market Prediction:** The models, particularly the optimized ones, showcased their capability to predict bear markets and exit positions promptly, thus mitigating significant losses. * **Risk Control:** The out-of-sample results highlighted the potential of these models, especially the optimized ones, in controlling maximum drawdown compared to a buy-and-hold strategy. ## Conclusion This study explores the application of various regime detection models, including Moving Average, Hidden Markov Model, K-Means clustering, and Jump Models, in conjunction with a simple 0/1 trading strategy. The results demonstrate the potential of these models, particularly the optimized Jump Model and Continuous Jump Model, to effectively identify market regimes and generate profitable trades. #### Key Takeaways * **0/1 Trading Strategy's Effectiveness:** The simple 0/1 trading strategy can yield favorable results when combined with well-performing regime detection models. * **Hyperparameter Tuning's Importance:** Hyperparameter tuning is critical for extracting the best performance from the models. * **Out-of-Sample Validation:** Out-of-sample backtesting is essential for assessing a model's real-world predictive power and avoiding overfitting to the training data. While this study provides valuable insights, it also opens avenues for future research. More sophisticated trading strategies, incorporating additional features or alternative risk management techniques, could be explored to further enhance performance. Furthermore, the application of these models to other asset classes or markets could be investigated. ## Reference * Statistical Jump Models (JMs): https://github.com/Yizhan-Oliver-Shu/jump-models * Shu, Y., Yu, C., & Mulvey, J. M. (2024). Regime-aware asset allocation: A statistical jump model approach. arXiv preprint arXiv:2402.05272. * Shu, Y., Yu, C., & Mulvey, J. M. (2024). Dynamic Asset Allocation with Asset-Specific Regime Forecasts. arXiv preprint arXiv:2406.09578.