MBSA - HackMD

# Fixed/Moving-band Statistical Arbitrage via Convex Optimization ## Introduction Statistical arbitrage (stat-arb) is a quantitative trading strategy exploiting temporary price discrepancies between related assets. While early stat-arb strategies like pairs trading focused on two assets, modern approaches often use portfolios of multiple mean-reverting assets. The extensive literature on stat-arbs covers various aspects, including identifying potential stat-arbs, modeling their mean-reverting dynamics, and developing effective trading strategies. ### Related Work Existing stat-arb research can be broadly classified into four categories: 1. **Finding Stat-arbs:** Techniques range from simple distance-based approaches (identifying pairs with historically close prices) to more sophisticated co-integration methods and machine learning algorithms. 2. **Modeling the Stat-arb Spread:** Common approaches include stochastic control theory (using the Ornstein-Uhlenbeck process) and time series analysis. Copulas have also been used to model the joint distribution of the spread. 3. **Trading Stat-arbs:** Strategies vary from basic threshold-based methods to stochastic control approaches, deriving optimal trading decisions from the modeled spread. 4. **Exiting a Stat-arb:** Exit strategies typically rely on thresholds, time periods, or a combination of factors and often involve a gradual position reduction. ### Price Bands and Bollinger Bands Price bands, such as Bollinger Bands, are widely used in technical analysis. A Bollinger Band is a dynamic interval around a moving average, typically two standard deviations wide, adjusting to market volatility. $$ I_M = [\mu_t+k\sigma_t, \mu_t-k\sigma_t] $$ where: * $M$ is the lookback period (commonly 21 days) * $\mu_t$ is the moving average at time $t$ * $\sigma_t$ is the standard deviation * $k$ is a multiplier (typically 2) ### Goals This study examines Boyd et al.'s novel approaches to stat-arb identification: 1. **Formulation as an optimization problem:** Stat-arb identification is framed as an optimization problem, directly optimizing for mean-reversion and high variance, enabling scalability to large asset universes. 2. **Introduction of moving-band stat-arbs:** The concept of fixed price bands is extended to moving-band stat-arbs, allowing the price band to adapt over time, potentially increasing profitability. 3. **Managing a portfolio of moving-band statistical arbitrages (MBSAs):** This study also explores a Markowitz-inspired framework for managing a dynamic basket of MBSAs, aiming to optimize portfolio performance. ## Identifying Fixed-Band Statistical Arbitrages ### Notation * $P_t\in \mathbb{R}^n$: Vector time series of prices for $n$ assets, denominated in USD per share, over time periods $t=1,2,\dots,T$. * $s\in \mathbb{R}^n$: Portfolio of these assets, denoted in shares. * $p_t=s^T P_t$: Net value of the portfolio at time $t$. * $\bar{P}=\frac{1}{T}\sum^T_{t=1}P_t$: Average price vector of the assets. The objective is to construct a portfolio where the value $p_t$ remains within a predefined price band while also exhibiting consistent variation within that band. ### Formulation as a Convex-Concave Optimization Problem Given that the portfolio value $p_t$ fluctuates around a midpoint $\mu$, which can be the mean of $p_t$, the problem can be formulated as: \begin{split} \text{maximize}\quad& \sum^T_{t=2}(p_t-p_{t-1})^2\\ \text{subject to}\quad& -1 \leq p_t-\mu \leq 1, \quad p_t = s^T P_t, \quad t=1,2,\cdots,T\\ &|s|^T \bar{P}\leq L, \quad \mu \geq 0 \end{split} where $L$ is the leverage limit. This formulation seeks to maximize the portfolio value's variation over time, constrained to remain within a band around the midpoint and adhere to a leverage constraint. ### Interpretation through a Simple Trading Policy Consider a scenario where we hold a quantity $q_t$ of the portfolio, with $q_0 = q_T = 0$, meaning no holdings at the beginning or end. The total profit can be expressed as: $$ \sum^{T-1}_{t=1}q_t(p_{t+1}-p_t) $$ A simple trading policy is to set $q_t = \mu − p_t$. Under this policy, the total profit is bounded below by: \begin{split} \frac{1}{2}\sum^T_{t=2}(p_t-p_{t-1})^2+\frac{(p_1- \mu)^2-(p_T-\mu)^2}{2} \geq \frac{1}{2}\left(\sum^T_{t=2}(p_t-p_{t-1})^2-1\right) \end{split} This implies that if the objective function's value exceeds one, this simple linear policy can generate a profit. ### Solution Method- Convex-Concave Procedure To solve the optimization problem, a convex-concave procedure is employed. In each iteration $k$, the objective function is linearized around the current portfolio value $p^k_t$ using the affine approximation: $$ \hat{f}(p;p^k) = f(p^k)+ \nabla f(p^k)^T(p−p^k) = \nabla f(p^k)^T p+c $$ where: * $f(p) = \sum^T_{t=2} (p_t − p_{t−1})^2$ * $\nabla f(p^k)$ is the gradient of $f$ at $p_k$: $$(\nabla f(p))_t= \begin{cases}2(p_1-p_2) &t=1 \\ 2(p_{T} - p_{T-1}) &t=2, \dots, T-1 \\ 2(p_t-p_{t-1}-p_{t+1}) &t=T \end{cases}$$ This simplifies the original problem to a linearized version: \begin{split} \text{maximize}\quad&\hat{f}(p;p^k)\\ \text{subject to}\quad& -1 \leq p_t-\mu \leq 1,\quad p_t = s^TP_t,\quad t=1,2,\dots,T\\ &|s|^T \bar{P}\leq L, \quad \mu \geq 0 \end{split} Solving this linearized problem yields the next iteration's portfolio $s^{k+1}$ and midpoint $\mu^{k+1}$. The algorithm iterates until convergence, usually within a few iterations. ### Cleanup Phase and Implementation Details After convergence, a cleanup phase is applied to enhance the solution's sparsity and numerical stability. This involves: * **Removing negligible weights:** Assets with weights below a threshold (e.g., 5% of the total portfolio value) are eliminated i.e. $|s_i|\bar{P}_i \leq \eta |s|^T\bar{P}$. * **Scaling prices:** Prices are scaled to normalize the average price vector to ones i.e. $\bar{P}= \mathbf{1}$, simplifying the leverage constraint. * **Scaling gradient/objective:** Scaling these to be around one further improves numerical stability. ### Initialization and Multiple Stat-arbs The final portfolio depends on the initial portfolio $s_1$. By employing various random initializations, multiple stat-arbs can be discovered within the same asset universe, thus broadening the potential for profitable trading strategies. ## Finding Moving-Band Stat-Arbs ### Moving-Band Stat-Arbs In practical scenarios, it's often more realistic to allow the midpoint of the price band to vary over time. To achieve this, we define the midpoint at time $t$ as the moving average of the portfolio values over the past $M$ periods: $$ \mu_t = \frac{1}{M} \sum_{\tau=t-M+1}^t p_\tau $$ This adjustment leads to a modified trading policy: $$ q_t = \mu_t - p_t, \quad t=1,\dots,T-1 $$ ### Revised Optimization Formulation Incorporating the moving midpoint, the optimization problem becomes: \begin{split} \text{maximize}\quad&\sum^T_{t=2}(p_t-p_{t-1})^2\\ \text{subject to}\quad& -1 \leq p_t-\mu_t \leq 1, \quad p_t = s^TP_t, \quad t=1,2,\dots,T\\ &|s|^T \bar{P}\leq L, \quad \mu_t=\frac{1}{M}\sum^t_{\tau=t-M+1}p_\tau \end{split} ### Solution Approach The same convex-concave procedure used for fixed-band stat-arbs can be applied to solve this revised formulation. The key difference lies in the inclusion of time-varying midpoint constraints, which remain convex. This allows for efficient optimization and the identification of portfolios that exhibit mean-reversion around a dynamic midpoint, potentially leading to improved trading performance. ## Numerical Experiment In this section, we introduce the details of the numerical experiment via four sections below. * **Experiment Setup:** Import necessary packages and data. * **Stat-arb Search:** Construct two classes `_State` and `StatArb` to search for stat-arbs. * **Backtesting:** With the retrieved stat-arbs, backtest them through the entire date range to compare their performance. * **Metrics:** Calculate relevant performance metrics. ### Experiment Setup #### Importing Libraries ```python= import seaborn as sns import multiprocessing as mp import matplotlib.pyplot as plt import numpy as np import pandas as pd import time import warnings import cvxpy as cp from dataclasses import dataclass, field from collections import namedtuple from tqdm import tqdm from cvx.simulator.builder import Builder from cvx.simulator.portfolio import Portfolio from cvx.covariance.ewma import iterated_ewma, _ewma_cov from cvx.covariance.combination import from_sigmas from matplotlib.ticker import ScalarFormatter, MaxNLocator, FuncFormatter ``` #### Dataset Description * **Data:** Adjusted prices of 100 stocks in NASDAQ100 * **Time period:** July 31st, 2014, to July 31st, 2024 * **Stock list source:** Wikipedia ```python= import yfinance as yf stocks_list = ['AAPL', 'MSFT', 'NVDA', 'GOOGL', 'GOOG', 'AMZN', 'META', 'LLY', 'AVGO', 'TSLA', 'JPM', 'WMT', 'UNH', 'XOM', 'V', 'MA', 'PG', 'JNJ', 'COST', 'ORCL', 'HD', 'ABBV', 'BAC', 'KO', 'MRK', 'NFLX', 'CVX', 'PEP', 'CRM', 'ADBE', 'TMO', 'TMUS', 'AMD', 'LIN', 'ACN', 'MCD', 'ABT', 'DHR', 'CSCO', 'WFC', 'PM', 'QCOM', 'GE', 'INTU', 'IBM', 'TXN', 'VZ', 'PFE', 'AMGN', 'AXP', 'NOW', 'ISRG', 'BX', 'CAT', 'NEE', 'DIS', 'AMAT', 'GS', 'MS', 'RTX', 'SPGI', 'CMCSA', 'UNP', 'UBER', 'T', 'LOW', 'HON', 'LMT', 'PGR', 'BLK', 'COP', 'TJX', 'SYK', 'ELV', 'VRTX', 'REGN', 'BKNG', 'SCHW', 'PLD', 'ETN', 'NKE', 'C', 'BSX', 'CB', 'UPS', 'MMC', 'ADP', 'AMT', 'MDT', 'ADI', 'ANET', 'KKR', 'BA', 'MU', 'LRCX', 'PANW', 'KLAC', 'BMY', 'DE', 'SO'] prices_full = yf.download(stocks_list, start="2014-07-31", end="2024-07-31")['Adj Close'] spreads = prices_full.copy() spreads.loc[:, :] = 10E-4 ``` #### Parameters - **Spread:** $1$ bps - **Shorting costs:** $3$ bps - **Leverage:** $$50$ ### Stat-arb Search Algorithm This section details the core components of the statistical arbitrage search algorithm. These components work together to identify potential stat-arbs using convex-concave optimization, represent them as `StatArb` objects, and validate their effectiveness. #### Convex-Concave Optimization: The `_State` Class The `_State` class serves as the engine for constructing stat-arb portfolios. It formulates and solves the convex-concave optimization problem, which is crucial for identifying portfolios exhibiting mean-reversion behavior. * **Key Responsibilities:** * **Problem Setup:** Initializes the optimization problem based on input parameters like `prices`, `P_max` (leverage limit), `spread_max` (allowed deviation from midpoint), `midpoint_memory`, and `moving_midpoint` (whether to use a dynamic or fixed midpoint). It sets up the decision variables (`s` for portfolio weights and `mu` for midpoint), the objective function to maximize variation within the price band, and constraints for leverage and deviation limits. * Iteration: The `iterate()` method performs one step of the convex-concave procedure, updating the portfolio weights and midpoint based on the linearized objective function and constraints. It attempts to solve the optimization problem and resets to a random feasible point if the solver fails. * Gradient Calculation: The `_get_grad_g(pk)` method computes the gradient of the objective function at the current portfolio value `pk`, which is used to linearize the objective in each iteration. * **StatArb Construction:** The `build()` method constructs a `StatArb` object encapsulating the optimized portfolio weights, midpoint, and other relevant parameters after the optimization process converges. ```python= class _State: def __init__( self, prices, P_max=None, spread_max=1, solver="CLARABEL", midpoint_memory=None, moving_midpoint=True, ): self.T, self.n = prices.shape self.s = cp.Variable((self.n, 1), name="s") self.midpoint_memory = midpoint_memory self.moving_midpoint = moving_midpoint if moving_midpoint: self.mu = cp.Variable((self.T, 1)) else: self.mu = cp.Variable(nonneg=True) self.p = cp.Variable((self.T, 1), name="p") self.P_max = P_max self.prices = prices self.spread_max = spread_max # Set up optimization problem self.grad_g = cp.Parameter((self.T, 1), name="grad_g") self.obj = cp.Maximize(self.grad_g[midpoint_memory:].T @ self.p[midpoint_memory:]) # Define constraints self.cons = self._setup_constraints() self.prob = cp.Problem(self.obj, self.cons) def _setup_constraints(self): cons = [] if self.moving_midpoint: cons += [ cp.abs(self.p[self.midpoint_memory:] - self.mu[self.midpoint_memory:]) <= self.spread_max, self.p[self.midpoint_memory:] == self.prices.values[self.midpoint_memory:] @ self.s ] else: cons += [ cp.abs(self.p - self.mu) <= self.spread_max, self.p == self.prices.values @ self.s ] if self.P_max is not None: cons += [cp.abs(self.s).T @ self.P_bar <= self.P_max] return cons def iterate(self): p = self.prices.values @ self.s.value self.grad_g.value = self._get_grad_g(p) try: self.prob.solve(solver=self.solver, verbose=False) except cp.SolverError: print("Solver failed, resetting...") self.reset() return self def _get_grad_g(self, pk): start = self.midpoint_memory if self.moving_midpoint else 0 grad_g = np.zeros(pk.shape) grad_g[start + 0] = pk[start + 0] - pk[start + 1] grad_g[-1] = pk[-1] - pk[-2] grad_g[start + 1 : -1] = 2 * pk[start + 1 : -1] - pk[start:-2] - pk[start + 2 :] return 2 * grad_g / np.mean(np.abs(grad_g)) * 100 def build(self): assets_dict = dict(zip(self.assets, self.s.value.flatten())) return StatArb( assets=assets_dict, mu=self.mu.value, midpoint_memory=self.midpoint_memory, moving_midpoint=self.moving_midpoint, P_max=self.P_max, spread_max=self.spread_max, entry_date=self.prices.index[-1], ) ``` #### `StatArb` Class The `StatArb` class represents a single statistical arbitrage strategy. It stores the asset weights, midpoint information, and parameters used to construct the strategy. It also provides methods for calculating trading quantities (`get_q`), asset positions (`get_positions`), and performance metrics (`metrics`). ```python= @dataclass(frozen=True) class StatArb: assets: dict mu: float midpoint_memory: int = None moving_midpoint: bool = True P_max: float = None spread_max: float = 1 entry_date: pd.Timestamp = None def get_q(self, prices: pd.DataFrame, mu=None, T_max=500): if mu is None: assert not self.moving_midpoint, "mu must be specified for moving mean stat arb" mu = self.mu p = self.evaluate(prices) q = mu - p q.name = "q" exit_length = 21 exit_index = min(T_max - 1, len(q) - 1) # Linear exit strategy weights = np.array([i / exit_length for i in range(exit_length)][::-1]) length = len(q[exit_index : exit_index + exit_length]) q[exit_index : exit_index + exit_length] = q[exit_index : exit_index + exit_length] * weights[:length] q[exit_index + exit_length :] = 0 q.iloc[-1] = 0 exit_trigger = q.index[min(exit_index, len(q) - 1)] exit_date = q.index[min(exit_index + exit_length - 1, len(q) - 1)] return q, exit_trigger, exit_date def get_positions(self, prices: pd.DataFrame, mu=None, T_max=500): q, _, _ = self.get_q(prices, mu, T_max=T_max) q = pd.concat([q] * self.n, axis=1) stocks = self.stocks.values.reshape(-1, 1) positions = q * (stocks.T) positions.columns = self.asset_names return positions def metrics(self, prices: pd.DataFrame, mu: pd.Series = None, T_max=500): p = self.evaluate(prices) q, exit_trigger, exit_date = self.get_q(prices, mu, T_max=T_max) price_changes = p.ffill().diff() previous_position = q.shift(1) profits = previous_position * price_changes profits.iloc[0] = np.nan entry_date = q.index[0] return Metrics( daily_profit=profits.dropna(), entry_date=entry_date, exit_trigger=exit_trigger, exit_date=exit_date, ) ``` #### Constructing Stat-arbs The `construct_stat_arb` function utilizes the `_State` class to construct a single stat-arb portfolio. It handles data preprocessing, initializes the `_State` object, runs the optimization loop, and performs a second pass for refinement if needed. The `construct_stat_arbs` function leverages multiprocessing to construct multiple stat-arbs in parallel, each with a different random seed to explore a diverse set of potential strategies. ```python= def construct_stat_arb( prices, P_max=None, spread_max=1, moving_midpoint=True, midpoint_memory=21, s_init=None, mu_init=None, seed=None, solver="CLARABEL", second_pass=False, ): if seed is not None: np.random.seed(seed) prices = prices.dropna(axis=1) P_bar = prices.mean(axis=0).values.reshape(-1, 1) prices = prices / P_bar.T state = _State( prices, P_max=P_max, spread_max=spread_max, midpoint_memory=midpoint_memory, moving_midpoint=moving_midpoint, solver=solver, ) if s_init is None or mu_init is None: state.reset() else: state.s.value = s_init state.mu.value = mu_init # Optimization loop obj_old, obj_new = 1, 10 for i in range(5): state.iterate() if state.prob.status in ["optimal", "optimal_inaccurate"]: obj_old, obj_new = obj_new, state.prob.value if np.linalg.norm(obj_new - obj_old) / obj_old <= 1e-3: break else: print("Solver failed... resetting") state.reset() obj_old, obj_new = 1, 10 # Second pass for refinement if not second_pass: s_times_P_bar = np.abs(state.s.value) * state.P_bar s_at_P_bar = np.abs(state.s.value).T @ state.P_bar non_zero_inds = np.where(np.abs(s_times_P_bar) > 0.5e-1 * s_at_P_bar)[0] if len(non_zero_inds) == 0: return None prices = prices * P_bar.T prices_new = prices.iloc[:, non_zero_inds] s_init = state.s.value[non_zero_inds] mu_init = state.mu.value return construct_stat_arb( prices_new, P_max=P_max, spread_max=spread_max, moving_midpoint=moving_midpoint, midpoint_memory=midpoint_memory, s_init=s_init, mu_init=mu_init, seed=None, second_pass=True, ) else: state.s.value = state.s.value / P_bar return state.build() def construct_stat_arbs( prices, K=1, P_max=None, spread_max=1, moving_midpoint=True, midpoint_memory=None, s_init=None, mu_init=None, seed=None, solver="CLARABEL", verbose=True, ): if seed is not None: np.random.seed(seed) all_seeds = list(np.random.choice(range(10 * K), K, replace=False)) all_args = zip( [prices] * K, [P_max] * K, [spread_max] * K, [moving_midpoint] * K, [midpoint_memory] * K, [s_init] * K, [mu_init] * K, all_seeds, [solver] * K, ) with mp.Pool() as pool: if verbose: iterator = tqdm(pool.imap_unordered(construct_stat_arb_helper, all_args), total=K) else: iterator = pool.imap_unordered(construct_stat_arb_helper, all_args) all_stat_arbs = [x for x in iterator if x is not None] return all_stat_arbs def construct_stat_arb_helper(args): """ Call this when using of imap_unordered in multiprocessing param args: tuple of arguments to pass to construct_stat_arb """ return construct_stat_arb(*args) ``` #### Validation and Filtering The `_find_and_filter_stat_arbs` function identifies and filters stat-arbs to ensure uniqueness and validity. It constructs multiple stat-arbs using the `construct_stat_arbs` function, filters out duplicates based on their asset composition, and evaluates their performance metrics on the test data. It can optionally construct portfolios for the valid stat-arbs. ```python= def _find_and_filter_stat_arbs( prices_train, prices_test, K, P_max, T_max, spread_max, moving_midpoint, midpoint_memory, seed, solver, verbose, construct_portfolios=False, ): stat_arbs = construct_stat_arbs( prices_train, K=K, P_max=P_max, spread_max=spread_max, moving_midpoint=moving_midpoint, midpoint_memory=midpoint_memory, seed=seed, solver=solver, verbose=False, ) stat_arb_results = [] asset_names_found = [] portfolios = [] for stat_arb in stat_arbs: if set(stat_arb.asset_names) in asset_names_found: continue asset_names_found.append(set(stat_arb.asset_names)) prices_train_test = pd.concat([prices_train, prices_test], axis=0) p = prices_train_test[stat_arb.asset_names] @ stat_arb.stocks if stat_arb.moving_midpoint: mu = p.rolling(stat_arb.midpoint_memory).mean() mu = mu.iloc[-len(prices_test):] else: mu = stat_arb.mu m = stat_arb.metrics(prices_test, mu, T_max=T_max) if m is not None: stat_arb_results.append(StatArbResult(stat_arb, m, prices_train, prices_test)) if construct_portfolios: positions = stat_arb.get_positions(prices_test, mu, T_max=T_max) portfolio = Portfolio(prices_test, units=positions, aum=1e6) if m is not None: portfolios.append(portfolio) return (stat_arb_results, portfolios) if construct_portfolios else (stat_arb_results, None) ``` These core components provide the foundation for systematically searching, constructing, and validating statistical arbitrage strategies. They enable the exploration of a wide range of potential stat-arbs and the selection of the most promising ones for further analysis and backtesting. ### Backtesting and Simulation This section outlines the framework used to evaluate statistical arbitrage strategies in a simulated trading environment. The framework consists of two key functions: `run_finding_backtest` and `simulate`. #### Backtesting with `run_finding_backtest` The `run_finding_backtest` function conducts a systematic backtest of the stat-arb identification and filtering process. It utilizes a rolling window approach, iteratively constructing and evaluating potential stat-arbs over historical data. The key steps within each iteration include: 1. **Data Splitting:** The data is divided into training and testing sets based on the current rolling window position. 2. **Stat-arb Construction and Filtering:** The `_find_and_filter_stat_arbs` function is used to identify and filter potential stat-arbs using the training data. 3. **Performance Evaluation:** The filtered stat-arbs are evaluated on the testing data to assess their performance. 4. **Time Progression:** The rolling window advances by a specified `update_freq`, repeating the process until the entire historical dataset is covered. The function returns two lists: `results`, which stores the performance metrics and details of each evaluated stat-arb, and `portfolios`, which contains the constructed portfolios for the valid stat-arbs. ```python= def run_finding_backtest( prices_full, stock_lists, P_max, moving_midpoint, midpoint_memory, T_max, ): np.random.seed(0) # for reproducibility remaining_to_stop = 125 K = 10 # Number of stat-arbs to construct in each iteration spread_max = 1 train_len = 500 + 21 update_freq = 21 solver = "CLARABEL" t_start = 0 t_end = len(prices_full) portfolios = [] results = [] n_iters = int((t_end - (train_len + 1) - t_start - remaining_to_stop) / update_freq) seeds = [np.random.randint(9999) for _ in range(2 * n_iters)] for i in range(1, n_iters + 1): if i % 10 == 0: print(f"{i/n_iters:.0%}", end=" ") time = t_start + train_len prices = prices_full[stock_lists] # Split data into training and testing sets prices_train = prices.iloc[t_start : t_start + train_len] prices_test = prices.iloc[time:] # Find stat-arbs seed = seeds[i] new_stat_arb_results, new_portfolios = _find_and_filter_stat_arbs( prices_train=prices_train, prices_test=prices_test, K=K, P_max=P_max, T_max=T_max, spread_max=spread_max, moving_midpoint=moving_midpoint, midpoint_memory=midpoint_memory, seed=seed, solver=solver, verbose=False, construct_portfolios=True, ) results += new_stat_arb_results portfolios += new_portfolios # Move forward in time t_start += update_freq print(f"\nFinished after looking for stat-arbs {i} times") return results, portfolios ``` #### Simulating Real-World Trading with `simulate` The `simulate` function provides a realistic simulation of a stat-arb strategy's trading performance. It considers practical constraints such as trading costs and risk management rules. Key aspects of the simulation include: 1. **Portfolio Construction:** Builds a portfolio based on the stat-arb's asset weights and an initial cash allocation. 2. **Cost Calculation:** Computes shorting costs and trading costs based on the portfolio's positions and the provided spreads. 3. **NAV Tracking:** Calculates and monitors the Net Asset Value (NAV) of the portfolio, incorporating costs and adjusting for early exits if triggered. 4. **Risk Management:** Checks for conditions that might necessitate an early exit, such as a significant drop in NAV or a violation of short position limits. 5. **Performance Evaluation:** Computes and returns a range of performance metrics to comprehensively assess the strategy's effectiveness and risk profile. ```python= def simulate(res, portfolio, spreads, lev_fraction): stat_arb = res.stat_arb exit_date = res.metrics.exit_date assets = portfolio.units.columns times = portfolio.units.loc[:exit_date].index lev0 = stat_arb.leverage(portfolio.prices).iloc[0] # Construct prices and stocks (units) prices_train = res.prices_train prices_0 = prices_train.iloc[-1] prices_temp = pd.concat([pd.DataFrame(prices_0).T, portfolio.prices]) stocks_0 = portfolio.units.iloc[0] * 0 stocks_0.name = prices_0.name stocks_temp = pd.concat([pd.DataFrame(stocks_0).T, portfolio.units]) # Construct stat arb portfolio initial_cash = lev_fraction * lev0 portfolio_new = Portfolio( prices_temp, units=stocks_temp, aum=initial_cash, ) # Calculate shorting costs short_rate = 0.5 / 100 / 252 # half a percent per year short_costs = (portfolio.units[portfolio.units < 0].abs() * portfolio.prices).sum(axis=1) * short_rate # Calculate trading costs trading_costs = compute_trading_costs( portfolio.trades_currency[assets].loc[times], spreads[assets].loc[times] ) # Calculate NAVs navs = (portfolio_new.nav - short_costs.cumsum() - trading_costs.cumsum()).loc[:exit_date] positions = portfolio_new.units absolute_notionals = positions.abs() * portfolio_new.prices long_positions = absolute_notionals[positions > 0].sum(axis=1) short_positions = absolute_notionals[positions < 0].sum(axis=1) cash = portfolio_new.nav - portfolio_new.equity.sum(axis=1) # Check for early exit conditions cash0 = cash.iloc[0] bust_times1 = navs[navs < 0.25 * cash0] bust_times2 = navs[long_positions + cash < short_positions] bust_time, bust_sort = determine_bust_time(bust_times1, bust_times2) if bust_time is not None: handle_early_exit(bust_time, stocks_temp, bust_sort) # Compute metrics profits = (portfolio.profit - short_costs - trading_costs).loc[:exit_date] if bust_time is not None: navs.loc[bust_time:] = navs.loc[bust_time] profits.loc[bust_time:] = 0 returns = navs.pct_change().loc[:exit_date] # Calculate performance metrics mean = returns.mean() * 250 stdev = returns.std() * np.sqrt(250) sharpe = returns.mean() / returns.std() * np.sqrt(250) min_nav = navs.min() mean_profit = profits.sum() min_cum_prof = profits.cumsum().min() drawdown = compute_drawdowns(navs).max() went_bust = bust_time is not None return mean, stdev, sharpe, mean_profit, min_nav, min_cum_prof, drawdown, went_bust def determine_bust_time(bust_times1, bust_times2): bust_time1 = bust_times1.index[0] if len(bust_times1) > 0 else None bust_time2 = bust_times2.index[0] if len(bust_times2) > 0 else None if bust_time1 and bust_time2: bust_time = min(bust_time1, bust_time2) bust_sort = 1 if bust_time1 < bust_time2 else 2 elif bust_time1: bust_time, bust_sort = bust_time1, 1 elif bust_time2: bust_time, bust_sort = bust_time2, 2 else: bust_time, bust_sort = None, None return bust_time, bust_sort def handle_early_exit(bust_time, stocks_temp, bust_sort): zeros = 0 * stocks_temp.loc[bust_time:].iloc[1:] stocks_temp = pd.concat([stocks_temp.loc[:bust_time], zeros], axis=0) print(f"\nPortfolio exited early at {bust_time}") if bust_sort == 1: print("NAV fell below 0.25 * cash") elif bust_sort == 2: print("Long positions + cash fell below short position") ``` ### Performance Metrics This section details the various performance and risk metrics employed to assess the effectiveness of the statistical arbitrage strategies. #### Profit and Return Calculations ##### Trading Costs Computation Trading costs, an inherent part of real-world trading, are factored into the performance evaluation. The `compute_trading_costs` function calculates these costs based on the volume of trades and the bid-ask spreads: ```python= def compute_trading_costs(trades, spreads): volume = trades.abs() if isinstance(volume, pd.Series): assert isinstance(spreads, pd.Series), "spreads and trades must be of the same type" assert (trades.index == spreads.index).all(), "Index of trades and spreads must be the same" return 0.5 * (spreads * volume).sum() elif isinstance(volume, pd.DataFrame): assert isinstance(spreads, pd.DataFrame), "spreads and trades must be of the same type" assert (trades.index == spreads.index).all(), "Index of trades and spreads must be the same" assert (trades.columns == spreads.columns).all(), "Columns of trades and spreads must be the same" return 0.5 * (spreads * volume).sum(axis=1) else: raise ValueError("trades and spreads must be a pd.Series or pd.DataFrame") ``` * **Functionality:** Computes trading costs based on trade volume and spreads. * **Flexibility:** Handles both individual asset trades (Series) and portfolio-level trades (DataFrame). ##### `Metrics` Class The `Metrics` class provides a structured way to calculate and store various profit-related metrics: ```python= @dataclass class Metrics: daily_profit: pd.Series entry_date: pd.Timestamp exit_trigger: pd.Timestamp exit_date: pd.Timestamp @property def mean_profit(self): """Mean daily profit""" return self.daily_profit.mean() @property def std_profit(self): """Standard deviation of daily profit""" return self.daily_profit.std() @property def total_profit(self): """Total cumulative profit""" return self.daily_profit.sum() @property def sr_profit(self): """Sharpe ratio of daily profit""" return self.mean_profit / self.std_profit @property def annualized_return(self): """Annualized return""" return self.mean_profit * 250 # Assuming 250 trading days per year @property def annualized_volatility(self): """Annualized volatility""" return self.std_profit * np.sqrt(250) # Assuming 250 trading days per year @property def annualized_sharpe_ratio(self): """Annualized Sharpe ratio""" return self.sr_profit * np.sqrt(250) # Assuming 250 trading days per year ``` * **Metrics Calculation:** Includes mean profit, standard deviation, total profit, Sharpe ratio, and their annualized counterparts. * **Data Storage:** Stores daily profit, entry date, exit trigger date, and exit date for further analysis. #### Risk Measures ##### Drawdown Calculations Drawdowns, representing the peak-to-trough decline in portfolio value, are a crucial risk metric. The provided functions calculate drawdowns from both NAVs and returns: ```python= def compute_drawdowns(navs): """ Computes drawdowns from a time series of NAVs """ max_nav = navs.cummax() drawdowns = -(navs - max_nav) / max_nav return drawdowns def compute_drawdowns_from_returns(returns): """ Computes drawdowns from a time series of returns """ navs = (1 + returns).cumprod() return compute_drawdowns(navs) def maximum_drawdown(navs): """ Computes the maximum drawdown from a time series of NAVs """ return compute_drawdowns(navs).max() ``` * `compute_drawdowns`: Calculates drawdowns from a time series of NAVs. * `compute_drawdowns_from_returns`: Calculates drawdowns from a time series of returns. * `maximum_drawdown`: Extracts the largest drawdown from a series of NAVs. ##### Value at Risk (VaR) and Conditional Value at Risk (CVaR) VaR and CVaR are additional risk measures that quantify potential losses: ```python= def value_at_risk(returns, confidence_level=0.95): """ Computes the Value at Risk (VaR) for a given confidence level """ return np.percentile(returns, 100 * (1 - confidence_level)) def conditional_value_at_risk(returns, confidence_level=0.95): """ Computes the Conditional Value at Risk (CVaR) for a given confidence level """ var = value_at_risk(returns, confidence_level) return returns[returns <= var].mean() ``` * `value_at_risk`: Estimates the potential loss at a specified confidence level (default 95%). * `conditional_value_at_risk`: Estimates the expected loss given that the loss exceeds the VaR. #### Performance Evaluation The `compute_performance_metrics` function provides a comprehensive evaluation of a strategy's performance and risk: ```python= def compute_performance_metrics(returns, navs): """ Computes a comprehensive set of performance metrics """ metrics = { "Total Return": navs.iloc[-1] / navs.iloc[0] - 1, "Annualized Return": returns.mean() * 250, "Annualized Volatility": returns.std() * np.sqrt(250), "Sharpe Ratio": (returns.mean() / returns.std()) * np.sqrt(250), "Maximum Drawdown": maximum_drawdown(navs), "Value at Risk (95%)": value_at_risk(returns), "Conditional Value at Risk (95%)": conditional_value_at_risk(returns), "Positive Days Ratio": (returns > 0).mean(), "Negative Days Ratio": (returns < 0).mean(), "Best Day Return": returns.max(), "Worst Day Return": returns.min(), } return pd.Series(metrics) ``` * **Comprehensive Metrics:** Calculates a wide range of metrics, including total and annualized returns, volatility, Sharpe ratio, maximum drawdown, VaR, CVaR, positive/negative days ratio, and best/worst day returns. * **Holistic View:** Offers a complete picture of a strategy's profitability, risk profile, and distributional characteristics. These performance and risk metrics, coupled with the backtesting and simulation framework, enable a thorough analysis and comparison of fixed-band and moving-band statistical arbitrage strategies, facilitating informed decision-making in strategy selection and implementation. ### Visualization Tools- `plot_stat_arb` The `plot_stat_arb` function provides a comprehensive visualization of the performance and characteristics of a statistical arbitrage strategy. It generates three informative plots: 1. **Price Evolution:** This plot displays the price trajectory of the stat-arb strategy over time, both in-sample (training period) and out-of-sample (testing period). It also visualizes the midpoint (fixed or moving) and trading bounds, offering insights into the strategy's behavior and adherence to the price band constraints. 2. **Deviation from Midpoint:** This plot illustrates how the strategy's price deviates from its midpoint over time. It highlights the mean-reversion tendencies of the strategy and potential opportunities for profitable trades. 3. **Cumulative Profit:** This plot depicts the accumulated profit of the strategy over time, considering trading costs and short-selling costs. It provides a clear picture of the strategy's profitability and potential drawdowns. ```python= def plot_stat_arb(stat_arb_tuple, insample_bound, outsample_bound, spreads, legend=True): stat_arb = stat_arb_tuple.stat_arb midpoint_memory = stat_arb.midpoint_memory asset_names = stat_arb.asset_names stocks = stat_arb.stocks prices_train = stat_arb_tuple.prices_train.iloc[:] prices_train = pd.concat([prices_train, stat_arb_tuple.prices_test.iloc[0:1]], axis=0) prices_test = stat_arb_tuple.prices_test prices_train_test = pd.concat([prices_train, prices_test], axis=0) p_train = prices_train[asset_names] @ stocks if stat_arb.moving_midpoint: mu_train = p_train.rolling(midpoint_memory).mean().dropna() else: mu_train = stat_arb.mu p_test = prices_test[asset_names] @ stocks if stat_arb.moving_midpoint: prices_train_test = pd.concat([prices_train, prices_test], axis=0).iloc[-len(prices_test) - midpoint_memory + 1:] mu_test = (prices_train_test[asset_names] @ stocks).rolling(midpoint_memory).mean().dropna() else: mu_test = stat_arb.mu exit_trigger = stat_arb_tuple.metrics.exit_trigger exit_date = stat_arb_tuple.metrics.exit_date entry_date = stat_arb_tuple.metrics.entry_date elim_end = exit_date + pd.Timedelta(days=100) # Plot 1: Price evolution plt.figure(figsize=(12, 6)) plt.plot(p_train, color="b", label="In-sample") if stat_arb.moving_midpoint: plt.plot(mu_train, color="g") plt.plot(p_test.loc[:elim_end], color="r", label="Out-of-sample") if stat_arb.moving_midpoint: plt.plot(mu_test.loc[:elim_end], color="g", label="Midpoint" + r" $(\mu_t)$") else: plt.plot([prices_train.index[0], prices_test.index[-1]], [stat_arb.mu, stat_arb.mu], color="g", label="Midpoint" + r" $(\mu)$") plt.axvline(prices_test.index[0], color="k", linewidth=2) stocks_str = "+".join([f"{np.round(s, 1)}×{a}" for s, a in zip(stocks, asset_names)]) print("stat-arb: ", stocks_str) plt.gcf().autofmt_xdate() xlim_start = prices_train.index[21] if legend: plt.xlabel("Date") plt.ylabel("Stat-arb price") plt.xlim(xlim_start, elim_end) if exit_trigger is not None: plt.axvline(exit_trigger, linestyle="--", color="k", label="Exit period", linewidth=2) plt.axvline(exit_date, linestyle="--", color="k", linewidth=2) if stat_arb.moving_midpoint: band_label = r"$\mu_t\pm 1$" if outsample_bound is not np.inf: plt.plot(mu_test.loc[:elim_end] + outsample_bound, linestyle=":", color="k", linewidth=2) plt.plot(mu_test.loc[:elim_end] - outsample_bound, linestyle=":", color="k", linewidth=2) else: sigma = 1 / 4.2 mu_train_test = pd.concat([mu_train, mu_test], axis=0) plt.plot(mu_train_test.loc[:elim_end] + 4.2 * sigma, linestyle=":", color="k", linewidth=2, label=band_label) plt.plot(mu_train_test.loc[:elim_end] - 4.2 * sigma, linestyle=":", color="k", linewidth=2) else: plt.plot([prices_train.index[0], prices_train.index[-1]], [stat_arb.mu + insample_bound, stat_arb.mu + insample_bound], linestyle=":", color="k", linewidth=2, label=r"$\mu\pm 1$") plt.plot([prices_train.index[0], prices_train.index[-1]], [stat_arb.mu - insample_bound, stat_arb.mu - insample_bound], linestyle=":", color="k", linewidth=2) plt.plot([prices_test.index[0], prices_test.index[-1]], [stat_arb.mu + insample_bound, stat_arb.mu + insample_bound], linestyle=":", color="k", linewidth=2) plt.plot([prices_test.index[0], prices_test.index[-1]], [stat_arb.mu - insample_bound, stat_arb.mu - insample_bound], linestyle=":", color="k", linewidth=2) if legend: plt.legend(bbox_to_anchor=(0.5, 1.225), loc="upper center", ncol=3, borderaxespad=0) plt.title("Stat-arb Price Evolution") plt.show() # Plot 2: Deviation from midpoint plt.figure(figsize=(12, 6)) plt.plot(p_train - mu_train, color="b", label="In-sample") if stat_arb.moving_midpoint: plt.plot(p_test.loc[:elim_end] - mu_test.loc[:elim_end], color="r", label="Out-of-sample") else: plt.plot(p_test.loc[:elim_end] - mu_test, color="r", label="Out-of-sample") plt.axvline(prices_test.index[0], color="k", linewidth=2) plt.gcf().autofmt_xdate() plt.xlim(xlim_start, elim_end) plt.ylabel(r"$p_t-\mu_t$") plt.title("Deviation from Midpoint") plt.legend() plt.show() # Plot 3: Cumulative profit plt.figure(figsize=(12, 6)) prices_train_test = pd.concat([prices_train, prices_test], axis=0) if stat_arb.moving_midpoint: mu = stat_arb.evaluate(prices_train_test).rolling(midpoint_memory, min_periods=1).mean() else: mu = stat_arb.mu positions = stat_arb.get_positions(prices_train_test, mu, T_max=1e6) initial_cash = 1e6 aum = initial_cash + (prices_train_test.iloc[0] * positions).sum(axis=1) portfolio = Portfolio(prices_train_test, units=positions, aum=aum) profit = portfolio.profit times = portfolio.units.index assets = portfolio.units.columns trading_costs = compute_trading_costs(portfolio.trades_currency[assets], spreads[assets].loc[times]) short_rate = 0.5 / 100 / 252 # half a percent per year short_costs = (portfolio.units[portfolio.units < 0].abs() * portfolio.prices).sum(axis=1) * short_rate profit = profit - short_costs - trading_costs profits_train = profit.loc[: prices_train.index[-1]] profits_test = profit.loc[prices_test.index[0] :] profits_test.loc[exit_date:] = 0 plt.plot(profits_train.loc[xlim_start:].cumsum(), color="b", label="In-sample") plt.plot(profits_test.cumsum(), color="r", label="Out-of-sample") plt.axvline(prices_test.index[0], color="k", linewidth=2) plt.gcf().autofmt_xdate() plt.xlim(xlim_start, elim_end) if legend: plt.ylabel("Cumulative profit") plt.xlabel("Date") plt.title("Cumulative Profit") plt.legend() plt.show() ``` #### Parameters * `stat_arb_tuple`: A namedtuple containing the `StatArb` object, training and testing prices. * `insample_bound`: The boundary for the price band in the in-sample period. * `outsample_bound`: The boundary for the price band in the out-of-sample period. * `spreads`: The bid-ask spreads for the assets involved in the strategy. * `legend`: A boolean flag to control the display of the legend in the plots. ### Results and Comparative Analysis This section provides a comprehensive analysis of the backtesting results, comparing the performance of fixed-band and moving-band statistical arbitrage strategies across various metrics and visualizations. #### Fixed-Band Stat-arbs ##### Performance Summary Key performance statistics for the fixed-band stat-arbs are presented, offering insights into the central tendency, dispersion, and overall distribution of metrics like annualized return, Sharpe ratio, total profit, and maximum drawdown. ```python= results, portfolios = run_finding_backtest( prices_full, stocks_list, P_max=50, moving_midpoint=False, midpoint_memory=None, T_max=63, ) all_traded_assets = [] for portfolio in portfolios: all_traded_assets += portfolio.units.columns.tolist() all_traded_assets = list(set(all_traded_assets)) means = [] stdevs = [] sharpes = [] profits = [] min_navs = [] min_cum_profs = [] drawdowns = [] n_busts = 0 for i, portfolio in enumerate(portfolios): res = results[i] ( mean, stdev, sharpe, mean_profit, min_nav, min_cum_prof, drawdown, went_bust, ) = simulate(res, portfolio, spreads[all_traded_assets], lev_fraction=0.5) means.append(mean) stdevs.append(stdev) sharpes.append(sharpe) profits.append(mean_profit) min_navs.append(min_nav) min_cum_profs.append(min_cum_prof) drawdowns.append(drawdown) n_busts += went_bust fixed_band_metrics = pd.DataFrame({ "means": means, "stdevs": stdevs, "sharpes": sharpes, "profits": profits, "min_navs": min_navs, "min_cum_prof": min_cum_prof, "drawdowns": drawdowns, }) ``` There are four busts: ``` Portfolio exited early at 2020-02-12 00:00:00 NAV fell below 0.5 * cash Portfolio exited early at 2020-03-18 00:00:00 Long positions + cash fell below short position Portfolio exited early at 2020-09-01 00:00:00 NAV fell below 0.5 * cash Portfolio exited early at 2024-03-04 00:00:00 NAV fell below 0.5 * cash ``` ##### Distribution of Key Metrics Kernel density estimation plots visualize the distribution of crucial performance metrics, allowing for a visual assessment of the strategies' performance characteristics and potential risks. ```python= fig, axes = plt.subplots(2, 2, figsize=(15, 15)) sns.kdeplot(fixed_band_metrics['means'], ax=axes[0, 0]) axes[0, 0].set_title('Distribution of Annualized Returns') axes[0, 0].set_xlabel('Annualized Return') sns.kdeplot(fixed_band_metrics['sharpes'], ax=axes[0, 1]) axes[0, 1].set_title('Distribution of Sharpe Ratios') axes[0, 1].set_xlabel('Sharpe Ratio') sns.kdeplot(fixed_band_metrics['profits'], ax=axes[1, 0]) axes[1, 0].set_title('Distribution of Total Profits') axes[1, 0].set_xlabel('Total Profit') sns.kdeplot(fixed_band_metrics['drawdowns'], ax=axes[1, 1]) axes[1, 1].set_title('Distribution of Maximum Drawdowns') axes[1, 1].set_xlabel('Maximum Drawdown') plt.tight_layout() plt.show() ``` ![image](https://hackmd.io/_uploads/BJMLBfUoA.png) ##### Example Stat-arbs The performance of fixed-band strategies at the 20th and 80th percentiles of annualized returns is illustrated, providing concrete examples of the range of outcomes observed. ```python= quantile_low = 0.2 quantile_high = 0.8 quantiles = fixed_band_metrics['means'].quantile([quantile_low, quantile_high]) res_low = results[fixed_band_metrics['means'][(fixed_band_metrics['means'] - quantiles[quantile_low]).abs().argsort()[:1]].index[0]] plot_stat_arb(res_low, insample_bound=1, outsample_bound=np.inf, spreads=spreads) res_high = results[fixed_band_metrics['means'][(fixed_band_metrics['means'] - quantiles[quantile_high]).abs().argsort()[:1]].index[0]] plot_stat_arb(res_high, insample_bound=1, outsample_bound=np.inf, spreads=spreads) ``` ###### 20-th percentile fixed-band stat-arb ``` stat-arb: 0.3×WMT + 0.1×QCOM + 0.1×GE + 0.1×PANW ``` ![image](https://hackmd.io/_uploads/Byumj4XiR.png) ![image](https://hackmd.io/_uploads/ByJNs4moC.png) ![image](https://hackmd.io/_uploads/rJOVsV7iA.png) ###### 80-th percentile fixed-band stat-arb ```python= res_high = results[means[(means - quantiles[quantile_high]).abs().argsort()[:1]].index[0]] plot_stat_arb(res_high, insample_bound=1, outsample_bound=np.inf, spreads=spreads); ``` ``` stat-arb: 0.1×MRK + 0.1×TXN - 0.1×MS - 0.1×UBER ``` ![image](https://hackmd.io/_uploads/S1iSoNQiA.png) ![image](https://hackmd.io/_uploads/rJfLjVmi0.png) ![image](https://hackmd.io/_uploads/S18Ii47iC.png) #### Moving-Band Stat-arbs The same analysis is replicated for moving-band stat-arbs, facilitating a direct comparison between the two approaches. ##### Performance Summary ```python= results, portfolios = run_finding_backtest( prices_full, stocks_list, P_max=100, moving_midpoint=True, midpoint_memory=21, T_max=125, ) all_traded_assets = [] for portfolio in portfolios: all_traded_assets += portfolio.units.columns.tolist() all_traded_assets = list(set(all_traded_assets)) means = [] stdevs = [] sharpes = [] profits = [] min_navs = [] min_cum_profs = [] drawdowns = [] n_busts = 0 for i, portfolio in enumerate(portfolios): res = results[i] ( mean, stdev, sharpe, mean_profit, min_nav, min_cum_prof, drawdown, went_bust, ) = simulate(res, portfolio, spreads[all_traded_assets], lev_fraction=0.5) means.append(mean) stdevs.append(stdev) sharpes.append(sharpe) profits.append(mean_profit) min_navs.append(min_nav) min_cum_profs.append(min_cum_prof) drawdowns.append(drawdown) n_busts += went_bust moving_band_metrics = pd.DataFrame({ "means": means_moving, "stdevs": stdevs_moving, "sharpes": sharpes_moving, "profits": profits_moving, "min_navs": min_navs_moving, "min_cum_prof": min_cum_prof_moving, "drawdowns": drawdowns_moving, }) ``` There are three busts: ``` Portfolio exited early at 2020-03-18 00:00:00 NAV fell below 0.5 * cash Portfolio exited early at 2023-09-11 00:00:00 NAV fell below 0.5 * cash Portfolio exited early at 2024-06-03 00:00:00 NAV fell below 0.5 * cash ``` ##### Distribution of Key Metrics ```python= fig, axes = plt.subplots(2, 2, figsize=(15, 15)) sns.kdeplot(moving_band_metrics['means'], ax=axes[0, 0]) axes[0, 0].set_title('Distribution of Annualized Returns (Moving-Band)') axes[0, 0].set_xlabel('Annualized Return') sns.kdeplot(moving_band_metrics['sharpes'], ax=axes[0, 1]) axes[0, 1].set_title('Distribution of Sharpe Ratios (Moving-Band)') axes[0, 1].set_xlabel('Sharpe Ratio') sns.kdeplot(moving_band_metrics['profits'], ax=axes[1, 0]) axes[1, 0].set_title('Distribution of Total Profits (Moving-Band)') axes[1, 0].set_xlabel('Total Profit') sns.kdeplot(moving_band_metrics['drawdowns'], ax=axes[1, 1]) axes[1, 1].set_title('Distribution of Maximum Drawdowns (Moving-Band)') axes[1, 1].set_xlabel('Maximum Drawdown') plt.tight_layout() plt.show() ``` ![image](https://hackmd.io/_uploads/SkjGKMLjC.png) ##### Example Stat-arbs ```python= quantiles_moving = moving_band_metrics['means'].quantile([quantile_low, quantile_high]) res_low_moving = results_moving[moving_band_metrics['means'][(moving_band_metrics['means'] - quantiles_moving[quantile_low]).abs().argsort()[:1]].index[0]] plot_stat_arb(res_low_moving, insample_bound=1, outsample_bound=np.inf, spreads=spreads) res_high_moving = results_moving[moving_band_metrics['means'][(moving_band_metrics['means'] - quantiles_moving[quantile_high]).abs().argsort()[:1]].index[0]] plot_stat_arb(res_high_moving, insample_bound=1, outsample_bound=np.inf, spreads=spreads) ``` ###### 20-th percentile moving-band stat-arb ``` stat-arb: 0.1×MSFT - 0.1×WFC - 0.1×ISRG + 0.2×PLD + 0.1×ETN ``` ![image](https://hackmd.io/_uploads/BJ9Cn4mi0.png) ![image](https://hackmd.io/_uploads/HyAR2NXs0.png) ![image](https://hackmd.io/_uploads/SkBkp4miA.png) ###### 80-th percentile moving-band stat-arb ``` stat-arb: 0.1×TXN - 0.1×PANW ``` ![image](https://hackmd.io/_uploads/SyN224Qj0.png) ![image](https://hackmd.io/_uploads/HkihhVXiA.png) ![image](https://hackmd.io/_uploads/rkwa3VXi0.png) #### Comparative Analysis ##### Performance Comparison A tabular and visual comparison of key metrics between fixed-band and moving-band strategies is presented. ```python= comparison_metrics = pd.DataFrame({ "Metric": ["Return", "Risk", "Sharpe", "Drawdown"], "Fixed-Band": [ fixed_band_metrics['means'].mean(), fixed_band_metrics['stdevs'].mean(), fixed_band_metrics['sharpes'].mean(), fixed_band_metrics['drawdowns'].mean() ], "Moving-Band": [ moving_band_metrics['means'].mean(), moving_band_metrics['stdevs'].mean(), moving_band_metrics['sharpes'].mean(), moving_band_metrics['drawdowns'].mean() ] }) print(comparison_metrics) comparison_metrics_melted = pd.melt(comparison_metrics, id_vars=['Metric'], var_name='Approach', value_name='Value') plt.figure(figsize=(12, 6)) sns.barplot(x="Metric", y="Value", hue="Approach", data=comparison_metrics_melted) plt.title("Comparison of Fixed-Band vs Moving-Band Stat-Arbs") plt.ylabel("Value") plt.xticks(rotation=45) plt.legend(title="Approach") plt.tight_layout() plt.show() ``` ###### Fixed-band stat-arbs metrics | Metric | Mean | Median | 75th Percentile | 25th Percentile | Fraction Positive | | :-------------- | :-------- | :-------- | :-------------- | :-------------- | :---------------- | | Return | 0.01 | 0.13 | 0.31 | -0.18 | 0.62 | | Risk | 0.36 | 0.28 | 0.43 | 0.15 | N/A | | Sharpe | 0.48 | 0.62 | 1.62 | -0.59 | 0.62 | | Drawdown | 0.18 | 0.13 | 0.24 | 0.06 | N/A | ###### Moving-band stat-arbs metrics | Metric | Mean | Median | 75th Percentile | 25th Percentile | Fraction Positive | | :-------------- | :-------- | :-------- | :-------------- | :-------------- | :---------------- | | Return | 0.11(+0.1) | 0.13 | 0.21 (-0.1) | -0.08(+0.17) | 0.75(+0.13) | | Risk | 0.2(-0.16) | 0.14(-0.14) | 0.22(-0.21) | 0.1(-0.05) | N/A | | Sharpe | 0.6(+0.12) | 0.61(-0.01) | 1.35(-0.27) | -0.04(+0.55) | 0.75(+0.13) | | Drawdown | 0.12(-0.06) | 0.13(-0.04) | 0.14(-0.14) | 0.06 | N/A | ![image](https://hackmd.io/_uploads/Sy5Xl4QjR.png) ##### Statistical Significance T-tests are performed to assess the statistical significance of the observed differences in performance metrics between the two approaches. ```python= from scipy import stats def perform_ttest(metric_name): fixed_band_data = fixed_band_metrics[metric_name] moving_band_data = moving_band_metrics[metric_name] t_stat, p_value = stats.ttest_ind(fixed_band_data, moving_band_data) return f"{metric_name}: t-statistic = {t_stat:.4f}, p-value = {p_value:.4f}" print(perform_ttest('means')) print(perform_ttest('sharpes')) print(perform_ttest('drawdowns')) ``` ``` means: t-statistic = -1.1498, p-value = 0.2505 sharpes: t-statistic = -0.6276, p-value = 0.5304 drawdowns: t-statistic = 5.9960, p-value = 0.0000 ``` ##### Active Strategies Over Time The number of active strategies over time is plotted for both fixed-band and moving-band approaches, highlighting potential differences in strategy longevity and adaptability to changing market conditions. ```python= def count_active_strategies(results, date): return sum(1 for r in results if r.metrics.entry_date <= date <= r.metrics.exit_date) dates = pd.date_range(start=min(r.metrics.entry_date for r in results), end=max(r.metrics.exit_date for r in results), freq='D') active_fixed = [count_active_strategies(results, date) for date in dates] active_moving = [count_active_strategies(results_moving, date) for date in dates] plt.figure(figsize=(12, 6)) plt.plot(dates, active_fixed, label='Fixed-Band') plt.plot(dates, active_moving, label='Moving-Band') plt.title('Number of Active Strategies Over Time') plt.xlabel('Date') plt.ylabel('Number of Active Strategies') plt.legend() plt.tight_layout() plt.show() ``` ![image](https://hackmd.io/_uploads/HkWc84mjC.png) ##### Asset Composition Analysis The composition of assets within the strategies is analyzed, revealing insights into the diversity of asset selection and potential concentration risks. ```python= def analyze_composition(results): all_assets = set() asset_counts = {} for r in results: strategy_assets = set(r.stat_arb.assets.keys()) all_assets.update(strategy_assets) for asset in strategy_assets: asset_counts[asset] = asset_counts.get(asset, 0) + 1 return all_assets, asset_counts fixed_assets, fixed_counts = analyze_composition(results) moving_assets, moving_counts = analyze_composition(results_moving) print(f"Number of unique assets in Fixed-Band strategies: {len(fixed_assets)}") print(f"Number of unique assets in Moving-Band strategies: {len(moving_assets)}") plt.figure(figsize=(24, 12)) plt.bar(fixed_counts.keys(), fixed_counts.values(), alpha=0.5, label='Fixed-Band') plt.bar(moving_counts.keys(), moving_counts.values(), alpha=0.5, label='Moving-Band') plt.title('Asset Frequency in Strategies') plt.xlabel('Asset') plt.ylabel('Frequency') plt.legend() plt.xticks(rotation=90) plt.tight_layout() plt.show() ``` ``` Number of unique assets in Fixed-Band strategies: 93 Number of unique assets in Moving-Band strategies: 96 ``` ![image](https://hackmd.io/_uploads/S1saINQiR.png) This comprehensive results analysis provides a multifaceted view of the performance and characteristics of fixed-band and moving-band statistical arbitrage strategies. It facilitates a deeper understanding of their strengths and weaknesses, enabling informed decision-making in the context of practical trading and risk management. ## Reference * K. Johansson, T. Schmelzer, and S. Boyd, Finding Moving-Band Statistical Arbitrages via Convex-Concave Optimization * K. Johansson, T. Schmelzer, and S. Boyd, A Markowitz Approach to Managing a Dynamic Basket of Moving-Band Statistical Arbitrages * [cvxstatarb](https://github.com/cvxgrp/cvxstatarb/tree/main)