# Fixed/Moving-band Statistical Arbitrage via Convex Optimization
## Introduction
Statistical arbitrage (stat-arb) is a quantitative trading strategy exploiting temporary price discrepancies between related assets. While early stat-arb strategies like pairs trading focused on two assets, modern approaches often use portfolios of multiple mean-reverting assets. The extensive literature on stat-arbs covers various aspects, including identifying potential stat-arbs, modeling their mean-reverting dynamics, and developing effective trading strategies.
### Related Work
Existing stat-arb research can be broadly classified into four categories:
1. **Finding Stat-arbs:** Techniques range from simple distance-based approaches (identifying pairs with historically close prices) to more sophisticated co-integration methods and machine learning algorithms.
2. **Modeling the Stat-arb Spread:** Common approaches include stochastic control theory (using the Ornstein-Uhlenbeck process) and time series analysis. Copulas have also been used to model the joint distribution of the spread.
3. **Trading Stat-arbs:** Strategies vary from basic threshold-based methods to stochastic control approaches, deriving optimal trading decisions from the modeled spread.
4. **Exiting a Stat-arb:** Exit strategies typically rely on thresholds, time periods, or a combination of factors and often involve a gradual position reduction.
### Price Bands and Bollinger Bands
Price bands, such as Bollinger Bands, are widely used in technical analysis. A Bollinger Band is a dynamic interval around a moving average, typically two standard deviations wide, adjusting to market volatility.
$$
I_M = [\mu_t+k\sigma_t, \mu_t-k\sigma_t]
$$
where:
* $M$ is the lookback period (commonly 21 days)
* $\mu_t$ is the moving average at time $t$
* $\sigma_t$ is the standard deviation
* $k$ is a multiplier (typically 2)
### Goals
This study examines Boyd et al.'s novel approaches to stat-arb identification:
1. **Formulation as an optimization problem:** Stat-arb identification is framed as an optimization problem, directly optimizing for mean-reversion and high variance, enabling scalability to large asset universes.
2. **Introduction of moving-band stat-arbs:** The concept of fixed price bands is extended to moving-band stat-arbs, allowing the price band to adapt over time, potentially increasing profitability.
3. **Managing a portfolio of moving-band statistical arbitrages (MBSAs):** This study also explores a Markowitz-inspired framework for managing a dynamic basket of MBSAs, aiming to optimize portfolio performance.
## Identifying Fixed-Band Statistical Arbitrages
### Notation
* $P_t\in \mathbb{R}^n$: Vector time series of prices for $n$ assets, denominated in USD per share, over time periods $t=1,2,\dots,T$.
* $s\in \mathbb{R}^n$: Portfolio of these assets, denoted in shares.
* $p_t=s^T P_t$: Net value of the portfolio at time $t$.
* $\bar{P}=\frac{1}{T}\sum^T_{t=1}P_t$: Average price vector of the assets.
The objective is to construct a portfolio where the value $p_t$ remains within a predefined price band while also exhibiting consistent variation within that band.
### Formulation as a Convex-Concave Optimization Problem
Given that the portfolio value $p_t$ fluctuates around a midpoint $\mu$, which can be the mean of $p_t$, the problem can be formulated as:
\begin{split}
\text{maximize}\quad& \sum^T_{t=2}(p_t-p_{t-1})^2\\
\text{subject to}\quad& -1 \leq p_t-\mu \leq 1, \quad p_t = s^T P_t, \quad t=1,2,\cdots,T\\
&|s|^T \bar{P}\leq L, \quad \mu \geq 0
\end{split}
where $L$ is the leverage limit. This formulation seeks to maximize the portfolio value's variation over time, constrained to remain within a band around the midpoint and adhere to a leverage constraint.
### Interpretation through a Simple Trading Policy
Consider a scenario where we hold a quantity $q_t$ of the portfolio, with $q_0 = q_T = 0$, meaning no holdings at the beginning or end. The total profit can be expressed as:
$$
\sum^{T-1}_{t=1}q_t(p_{t+1}-p_t)
$$
A simple trading policy is to set $q_t = \mu − p_t$. Under this policy, the total profit is bounded below by:
\begin{split}
\frac{1}{2}\sum^T_{t=2}(p_t-p_{t-1})^2+\frac{(p_1- \mu)^2-(p_T-\mu)^2}{2}
\geq \frac{1}{2}\left(\sum^T_{t=2}(p_t-p_{t-1})^2-1\right)
\end{split}
This implies that if the objective function's value exceeds one, this simple linear policy can generate a profit.
### Solution Method- Convex-Concave Procedure
To solve the optimization problem, a convex-concave procedure is employed. In each iteration $k$, the objective function is linearized around the current portfolio value $p^k_t$ using the affine approximation:
$$
\hat{f}(p;p^k) = f(p^k)+ \nabla f(p^k)^T(p−p^k) = \nabla f(p^k)^T p+c
$$
where:
* $f(p) = \sum^T_{t=2} (p_t − p_{t−1})^2$
* $\nabla f(p^k)$ is the gradient of $f$ at $p_k$: $$(\nabla f(p))_t= \begin{cases}2(p_1-p_2) &t=1 \\ 2(p_{T} - p_{T-1}) &t=2, \dots, T-1 \\ 2(p_t-p_{t-1}-p_{t+1}) &t=T \end{cases}$$
This simplifies the original problem to a linearized version:
\begin{split}
\text{maximize}\quad&\hat{f}(p;p^k)\\
\text{subject to}\quad& -1 \leq p_t-\mu \leq 1,\quad p_t = s^TP_t,\quad t=1,2,\dots,T\\
&|s|^T \bar{P}\leq L, \quad \mu \geq 0
\end{split}
Solving this linearized problem yields the next iteration's portfolio $s^{k+1}$ and midpoint $\mu^{k+1}$. The algorithm iterates until convergence, usually within a few iterations.
### Cleanup Phase and Implementation Details
After convergence, a cleanup phase is applied to enhance the solution's sparsity and numerical stability. This involves:
* **Removing negligible weights:** Assets with weights below a threshold (e.g., 5% of the total portfolio value) are eliminated i.e. $|s_i|\bar{P}_i \leq \eta |s|^T\bar{P}$.
* **Scaling prices:** Prices are scaled to normalize the average price vector to ones i.e. $\bar{P}= \mathbf{1}$, simplifying the leverage constraint.
* **Scaling gradient/objective:** Scaling these to be around one further improves numerical stability.
### Initialization and Multiple Stat-arbs
The final portfolio depends on the initial portfolio $s_1$. By employing various random initializations, multiple stat-arbs can be discovered within the same asset universe, thus broadening the potential for profitable trading strategies.
## Finding Moving-Band Stat-Arbs
### Moving-Band Stat-Arbs
In practical scenarios, it's often more realistic to allow the midpoint of the price band to vary over time. To achieve this, we define the midpoint at time $t$ as the moving average of the portfolio values over the past $M$ periods:
$$
\mu_t = \frac{1}{M} \sum_{\tau=t-M+1}^t p_\tau
$$
This adjustment leads to a modified trading policy:
$$
q_t = \mu_t - p_t, \quad t=1,\dots,T-1
$$
### Revised Optimization Formulation
Incorporating the moving midpoint, the optimization problem becomes:
\begin{split}
\text{maximize}\quad&\sum^T_{t=2}(p_t-p_{t-1})^2\\
\text{subject to}\quad& -1 \leq p_t-\mu_t \leq 1, \quad p_t = s^TP_t, \quad t=1,2,\dots,T\\
&|s|^T \bar{P}\leq L, \quad \mu_t=\frac{1}{M}\sum^t_{\tau=t-M+1}p_\tau
\end{split}
### Solution Approach
The same convex-concave procedure used for fixed-band stat-arbs can be applied to solve this revised formulation. The key difference lies in the inclusion of time-varying midpoint constraints, which remain convex. This allows for efficient optimization and the identification of portfolios that exhibit mean-reversion around a dynamic midpoint, potentially leading to improved trading performance.
## Numerical Experiment
In this section, we introduce the details of the numerical experiment via four sections below.
* **Experiment Setup:** Import necessary packages and data.
* **Stat-arb Search:** Construct two classes `_State` and `StatArb` to search for stat-arbs.
* **Backtesting:** With the retrieved stat-arbs, backtest them through the entire date range to compare their performance.
* **Metrics:** Calculate relevant performance metrics.
### Experiment Setup
#### Importing Libraries
```python=
import seaborn as sns
import multiprocessing as mp
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import time
import warnings
import cvxpy as cp
from dataclasses import dataclass, field
from collections import namedtuple
from tqdm import tqdm
from cvx.simulator.builder import Builder
from cvx.simulator.portfolio import Portfolio
from cvx.covariance.ewma import iterated_ewma, _ewma_cov
from cvx.covariance.combination import from_sigmas
from matplotlib.ticker import ScalarFormatter, MaxNLocator, FuncFormatter
```
#### Dataset Description
* **Data:** Adjusted prices of 100 stocks in NASDAQ100
* **Time period:** July 31st, 2014, to July 31st, 2024
* **Stock list source:** Wikipedia
```python=
import yfinance as yf
stocks_list = ['AAPL', 'MSFT', 'NVDA', 'GOOGL', 'GOOG', 'AMZN', 'META', 'LLY', 'AVGO', 'TSLA', 'JPM', 'WMT', 'UNH', 'XOM', 'V', 'MA', 'PG', 'JNJ', 'COST', 'ORCL', 'HD', 'ABBV', 'BAC', 'KO', 'MRK', 'NFLX', 'CVX', 'PEP', 'CRM', 'ADBE', 'TMO', 'TMUS', 'AMD', 'LIN', 'ACN', 'MCD', 'ABT', 'DHR', 'CSCO', 'WFC', 'PM', 'QCOM', 'GE', 'INTU', 'IBM', 'TXN', 'VZ', 'PFE', 'AMGN', 'AXP', 'NOW', 'ISRG', 'BX', 'CAT', 'NEE', 'DIS', 'AMAT', 'GS', 'MS', 'RTX', 'SPGI', 'CMCSA', 'UNP', 'UBER', 'T', 'LOW', 'HON', 'LMT', 'PGR', 'BLK', 'COP', 'TJX', 'SYK', 'ELV', 'VRTX', 'REGN', 'BKNG', 'SCHW', 'PLD', 'ETN', 'NKE', 'C', 'BSX', 'CB', 'UPS', 'MMC', 'ADP', 'AMT', 'MDT', 'ADI', 'ANET', 'KKR', 'BA', 'MU', 'LRCX', 'PANW', 'KLAC', 'BMY', 'DE', 'SO']
prices_full = yf.download(stocks_list, start="2014-07-31", end="2024-07-31")['Adj Close']
spreads = prices_full.copy()
spreads.loc[:, :] = 10E-4
```
#### Parameters
- **Spread:** $1$ bps
- **Shorting costs:** $3$ bps
- **Leverage:** $$50$
### Stat-arb Search Algorithm
This section details the core components of the statistical arbitrage search algorithm. These components work together to identify potential stat-arbs using convex-concave optimization, represent them as `StatArb` objects, and validate their effectiveness.
#### Convex-Concave Optimization: The `_State` Class
The `_State` class serves as the engine for constructing stat-arb portfolios. It formulates and solves the convex-concave optimization problem, which is crucial for identifying portfolios exhibiting mean-reversion behavior.
* **Key Responsibilities:**
* **Problem Setup:** Initializes the optimization problem based on input parameters like `prices`, `P_max` (leverage limit), `spread_max` (allowed deviation from midpoint), `midpoint_memory`, and `moving_midpoint` (whether to use a dynamic or fixed midpoint). It sets up the decision variables (`s` for portfolio weights and `mu` for midpoint), the objective function to maximize variation within the price band, and constraints for leverage and deviation limits.
* Iteration: The `iterate()` method performs one step of the convex-concave procedure, updating the portfolio weights and midpoint based on the linearized objective function and constraints. It attempts to solve the optimization problem and resets to a random feasible point if the solver fails.
* Gradient Calculation: The `_get_grad_g(pk)` method computes the gradient of the objective function at the current portfolio value `pk`, which is used to linearize the objective in each iteration.
* **StatArb Construction:** The `build()` method constructs a `StatArb` object encapsulating the optimized portfolio weights, midpoint, and other relevant parameters after the optimization process converges.
```python=
class _State:
def __init__(
self,
prices,
P_max=None,
spread_max=1,
solver="CLARABEL",
midpoint_memory=None,
moving_midpoint=True,
):
self.T, self.n = prices.shape
self.s = cp.Variable((self.n, 1), name="s")
self.midpoint_memory = midpoint_memory
self.moving_midpoint = moving_midpoint
if moving_midpoint:
self.mu = cp.Variable((self.T, 1))
else:
self.mu = cp.Variable(nonneg=True)
self.p = cp.Variable((self.T, 1), name="p")
self.P_max = P_max
self.prices = prices
self.spread_max = spread_max
# Set up optimization problem
self.grad_g = cp.Parameter((self.T, 1), name="grad_g")
self.obj = cp.Maximize(self.grad_g[midpoint_memory:].T @ self.p[midpoint_memory:])
# Define constraints
self.cons = self._setup_constraints()
self.prob = cp.Problem(self.obj, self.cons)
def _setup_constraints(self):
cons = []
if self.moving_midpoint:
cons += [
cp.abs(self.p[self.midpoint_memory:] - self.mu[self.midpoint_memory:]) <= self.spread_max,
self.p[self.midpoint_memory:] == self.prices.values[self.midpoint_memory:] @ self.s
]
else:
cons += [
cp.abs(self.p - self.mu) <= self.spread_max,
self.p == self.prices.values @ self.s
]
if self.P_max is not None:
cons += [cp.abs(self.s).T @ self.P_bar <= self.P_max]
return cons
def iterate(self):
p = self.prices.values @ self.s.value
self.grad_g.value = self._get_grad_g(p)
try:
self.prob.solve(solver=self.solver, verbose=False)
except cp.SolverError:
print("Solver failed, resetting...")
self.reset()
return self
def _get_grad_g(self, pk):
start = self.midpoint_memory if self.moving_midpoint else 0
grad_g = np.zeros(pk.shape)
grad_g[start + 0] = pk[start + 0] - pk[start + 1]
grad_g[-1] = pk[-1] - pk[-2]
grad_g[start + 1 : -1] = 2 * pk[start + 1 : -1] - pk[start:-2] - pk[start + 2 :]
return 2 * grad_g / np.mean(np.abs(grad_g)) * 100
def build(self):
assets_dict = dict(zip(self.assets, self.s.value.flatten()))
return StatArb(
assets=assets_dict,
mu=self.mu.value,
midpoint_memory=self.midpoint_memory,
moving_midpoint=self.moving_midpoint,
P_max=self.P_max,
spread_max=self.spread_max,
entry_date=self.prices.index[-1],
)
```
#### `StatArb` Class
The `StatArb` class represents a single statistical arbitrage strategy. It stores the asset weights, midpoint information, and parameters used to construct the strategy. It also provides methods for calculating trading quantities (`get_q`), asset positions (`get_positions`), and performance metrics (`metrics`).
```python=
@dataclass(frozen=True)
class StatArb:
assets: dict
mu: float
midpoint_memory: int = None
moving_midpoint: bool = True
P_max: float = None
spread_max: float = 1
entry_date: pd.Timestamp = None
def get_q(self, prices: pd.DataFrame, mu=None, T_max=500):
if mu is None:
assert not self.moving_midpoint, "mu must be specified for moving mean stat arb"
mu = self.mu
p = self.evaluate(prices)
q = mu - p
q.name = "q"
exit_length = 21
exit_index = min(T_max - 1, len(q) - 1)
# Linear exit strategy
weights = np.array([i / exit_length for i in range(exit_length)][::-1])
length = len(q[exit_index : exit_index + exit_length])
q[exit_index : exit_index + exit_length] = q[exit_index : exit_index + exit_length] * weights[:length]
q[exit_index + exit_length :] = 0
q.iloc[-1] = 0
exit_trigger = q.index[min(exit_index, len(q) - 1)]
exit_date = q.index[min(exit_index + exit_length - 1, len(q) - 1)]
return q, exit_trigger, exit_date
def get_positions(self, prices: pd.DataFrame, mu=None, T_max=500):
q, _, _ = self.get_q(prices, mu, T_max=T_max)
q = pd.concat([q] * self.n, axis=1)
stocks = self.stocks.values.reshape(-1, 1)
positions = q * (stocks.T)
positions.columns = self.asset_names
return positions
def metrics(self, prices: pd.DataFrame, mu: pd.Series = None, T_max=500):
p = self.evaluate(prices)
q, exit_trigger, exit_date = self.get_q(prices, mu, T_max=T_max)
price_changes = p.ffill().diff()
previous_position = q.shift(1)
profits = previous_position * price_changes
profits.iloc[0] = np.nan
entry_date = q.index[0]
return Metrics(
daily_profit=profits.dropna(),
entry_date=entry_date,
exit_trigger=exit_trigger,
exit_date=exit_date,
)
```
#### Constructing Stat-arbs
The `construct_stat_arb` function utilizes the `_State` class to construct a single stat-arb portfolio. It handles data preprocessing, initializes the `_State` object, runs the optimization loop, and performs a second pass for refinement if needed.
The `construct_stat_arbs` function leverages multiprocessing to construct multiple stat-arbs in parallel, each with a different random seed to explore a diverse set of potential strategies.
```python=
def construct_stat_arb(
prices,
P_max=None,
spread_max=1,
moving_midpoint=True,
midpoint_memory=21,
s_init=None,
mu_init=None,
seed=None,
solver="CLARABEL",
second_pass=False,
):
if seed is not None:
np.random.seed(seed)
prices = prices.dropna(axis=1)
P_bar = prices.mean(axis=0).values.reshape(-1, 1)
prices = prices / P_bar.T
state = _State(
prices,
P_max=P_max,
spread_max=spread_max,
midpoint_memory=midpoint_memory,
moving_midpoint=moving_midpoint,
solver=solver,
)
if s_init is None or mu_init is None:
state.reset()
else:
state.s.value = s_init
state.mu.value = mu_init
# Optimization loop
obj_old, obj_new = 1, 10
for i in range(5):
state.iterate()
if state.prob.status in ["optimal", "optimal_inaccurate"]:
obj_old, obj_new = obj_new, state.prob.value
if np.linalg.norm(obj_new - obj_old) / obj_old <= 1e-3:
break
else:
print("Solver failed... resetting")
state.reset()
obj_old, obj_new = 1, 10
# Second pass for refinement
if not second_pass:
s_times_P_bar = np.abs(state.s.value) * state.P_bar
s_at_P_bar = np.abs(state.s.value).T @ state.P_bar
non_zero_inds = np.where(np.abs(s_times_P_bar) > 0.5e-1 * s_at_P_bar)[0]
if len(non_zero_inds) == 0:
return None
prices = prices * P_bar.T
prices_new = prices.iloc[:, non_zero_inds]
s_init = state.s.value[non_zero_inds]
mu_init = state.mu.value
return construct_stat_arb(
prices_new,
P_max=P_max,
spread_max=spread_max,
moving_midpoint=moving_midpoint,
midpoint_memory=midpoint_memory,
s_init=s_init,
mu_init=mu_init,
seed=None,
second_pass=True,
)
else:
state.s.value = state.s.value / P_bar
return state.build()
def construct_stat_arbs(
prices,
K=1,
P_max=None,
spread_max=1,
moving_midpoint=True,
midpoint_memory=None,
s_init=None,
mu_init=None,
seed=None,
solver="CLARABEL",
verbose=True,
):
if seed is not None:
np.random.seed(seed)
all_seeds = list(np.random.choice(range(10 * K), K, replace=False))
all_args = zip(
[prices] * K,
[P_max] * K,
[spread_max] * K,
[moving_midpoint] * K,
[midpoint_memory] * K,
[s_init] * K,
[mu_init] * K,
all_seeds,
[solver] * K,
)
with mp.Pool() as pool:
if verbose:
iterator = tqdm(pool.imap_unordered(construct_stat_arb_helper, all_args), total=K)
else:
iterator = pool.imap_unordered(construct_stat_arb_helper, all_args)
all_stat_arbs = [x for x in iterator if x is not None]
return all_stat_arbs
def construct_stat_arb_helper(args):
"""
Call this when using of imap_unordered in multiprocessing
param args: tuple of arguments to pass to construct_stat_arb
"""
return construct_stat_arb(*args)
```
#### Validation and Filtering
The `_find_and_filter_stat_arbs` function identifies and filters stat-arbs to ensure uniqueness and validity. It constructs multiple stat-arbs using the `construct_stat_arbs` function, filters out duplicates based on their asset composition, and evaluates their performance metrics on the test data. It can optionally construct portfolios for the valid stat-arbs.
```python=
def _find_and_filter_stat_arbs(
prices_train,
prices_test,
K,
P_max,
T_max,
spread_max,
moving_midpoint,
midpoint_memory,
seed,
solver,
verbose,
construct_portfolios=False,
):
stat_arbs = construct_stat_arbs(
prices_train,
K=K,
P_max=P_max,
spread_max=spread_max,
moving_midpoint=moving_midpoint,
midpoint_memory=midpoint_memory,
seed=seed,
solver=solver,
verbose=False,
)
stat_arb_results = []
asset_names_found = []
portfolios = []
for stat_arb in stat_arbs:
if set(stat_arb.asset_names) in asset_names_found:
continue
asset_names_found.append(set(stat_arb.asset_names))
prices_train_test = pd.concat([prices_train, prices_test], axis=0)
p = prices_train_test[stat_arb.asset_names] @ stat_arb.stocks
if stat_arb.moving_midpoint:
mu = p.rolling(stat_arb.midpoint_memory).mean()
mu = mu.iloc[-len(prices_test):]
else:
mu = stat_arb.mu
m = stat_arb.metrics(prices_test, mu, T_max=T_max)
if m is not None:
stat_arb_results.append(StatArbResult(stat_arb, m, prices_train, prices_test))
if construct_portfolios:
positions = stat_arb.get_positions(prices_test, mu, T_max=T_max)
portfolio = Portfolio(prices_test, units=positions, aum=1e6)
if m is not None:
portfolios.append(portfolio)
return (stat_arb_results, portfolios) if construct_portfolios else (stat_arb_results, None)
```
These core components provide the foundation for systematically searching, constructing, and validating statistical arbitrage strategies. They enable the exploration of a wide range of potential stat-arbs and the selection of the most promising ones for further analysis and backtesting.
### Backtesting and Simulation
This section outlines the framework used to evaluate statistical arbitrage strategies in a simulated trading environment. The framework consists of two key functions: `run_finding_backtest` and `simulate`.
#### Backtesting with `run_finding_backtest`
The `run_finding_backtest` function conducts a systematic backtest of the stat-arb identification and filtering process. It utilizes a rolling window approach, iteratively constructing and evaluating potential stat-arbs over historical data. The key steps within each iteration include:
1. **Data Splitting:** The data is divided into training and testing sets based on the current rolling window position.
2. **Stat-arb Construction and Filtering:** The `_find_and_filter_stat_arbs` function is used to identify and filter potential stat-arbs using the training data.
3. **Performance Evaluation:** The filtered stat-arbs are evaluated on the testing data to assess their performance.
4. **Time Progression:** The rolling window advances by a specified `update_freq`, repeating the process until the entire historical dataset is covered.
The function returns two lists: `results`, which stores the performance metrics and details of each evaluated stat-arb, and `portfolios`, which contains the constructed portfolios for the valid stat-arbs.
```python=
def run_finding_backtest(
prices_full,
stock_lists,
P_max,
moving_midpoint,
midpoint_memory,
T_max,
):
np.random.seed(0) # for reproducibility
remaining_to_stop = 125
K = 10 # Number of stat-arbs to construct in each iteration
spread_max = 1
train_len = 500 + 21
update_freq = 21
solver = "CLARABEL"
t_start = 0
t_end = len(prices_full)
portfolios = []
results = []
n_iters = int((t_end - (train_len + 1) - t_start - remaining_to_stop) / update_freq)
seeds = [np.random.randint(9999) for _ in range(2 * n_iters)]
for i in range(1, n_iters + 1):
if i % 10 == 0:
print(f"{i/n_iters:.0%}", end=" ")
time = t_start + train_len
prices = prices_full[stock_lists]
# Split data into training and testing sets
prices_train = prices.iloc[t_start : t_start + train_len]
prices_test = prices.iloc[time:]
# Find stat-arbs
seed = seeds[i]
new_stat_arb_results, new_portfolios = _find_and_filter_stat_arbs(
prices_train=prices_train,
prices_test=prices_test,
K=K,
P_max=P_max,
T_max=T_max,
spread_max=spread_max,
moving_midpoint=moving_midpoint,
midpoint_memory=midpoint_memory,
seed=seed,
solver=solver,
verbose=False,
construct_portfolios=True,
)
results += new_stat_arb_results
portfolios += new_portfolios
# Move forward in time
t_start += update_freq
print(f"\nFinished after looking for stat-arbs {i} times")
return results, portfolios
```
#### Simulating Real-World Trading with `simulate`
The `simulate` function provides a realistic simulation of a stat-arb strategy's trading performance. It considers practical constraints such as trading costs and risk management rules. Key aspects of the simulation include:
1. **Portfolio Construction:** Builds a portfolio based on the stat-arb's asset weights and an initial cash allocation.
2. **Cost Calculation:** Computes shorting costs and trading costs based on the portfolio's positions and the provided spreads.
3. **NAV Tracking:** Calculates and monitors the Net Asset Value (NAV) of the portfolio, incorporating costs and adjusting for early exits if triggered.
4. **Risk Management:** Checks for conditions that might necessitate an early exit, such as a significant drop in NAV or a violation of short position limits.
5. **Performance Evaluation:** Computes and returns a range of performance metrics to comprehensively assess the strategy's effectiveness and risk profile.
```python=
def simulate(res, portfolio, spreads, lev_fraction):
stat_arb = res.stat_arb
exit_date = res.metrics.exit_date
assets = portfolio.units.columns
times = portfolio.units.loc[:exit_date].index
lev0 = stat_arb.leverage(portfolio.prices).iloc[0]
# Construct prices and stocks (units)
prices_train = res.prices_train
prices_0 = prices_train.iloc[-1]
prices_temp = pd.concat([pd.DataFrame(prices_0).T, portfolio.prices])
stocks_0 = portfolio.units.iloc[0] * 0
stocks_0.name = prices_0.name
stocks_temp = pd.concat([pd.DataFrame(stocks_0).T, portfolio.units])
# Construct stat arb portfolio
initial_cash = lev_fraction * lev0
portfolio_new = Portfolio(
prices_temp,
units=stocks_temp,
aum=initial_cash,
)
# Calculate shorting costs
short_rate = 0.5 / 100 / 252 # half a percent per year
short_costs = (portfolio.units[portfolio.units < 0].abs() * portfolio.prices).sum(axis=1) * short_rate
# Calculate trading costs
trading_costs = compute_trading_costs(
portfolio.trades_currency[assets].loc[times], spreads[assets].loc[times]
)
# Calculate NAVs
navs = (portfolio_new.nav - short_costs.cumsum() - trading_costs.cumsum()).loc[:exit_date]
positions = portfolio_new.units
absolute_notionals = positions.abs() * portfolio_new.prices
long_positions = absolute_notionals[positions > 0].sum(axis=1)
short_positions = absolute_notionals[positions < 0].sum(axis=1)
cash = portfolio_new.nav - portfolio_new.equity.sum(axis=1)
# Check for early exit conditions
cash0 = cash.iloc[0]
bust_times1 = navs[navs < 0.25 * cash0]
bust_times2 = navs[long_positions + cash < short_positions]
bust_time, bust_sort = determine_bust_time(bust_times1, bust_times2)
if bust_time is not None:
handle_early_exit(bust_time, stocks_temp, bust_sort)
# Compute metrics
profits = (portfolio.profit - short_costs - trading_costs).loc[:exit_date]
if bust_time is not None:
navs.loc[bust_time:] = navs.loc[bust_time]
profits.loc[bust_time:] = 0
returns = navs.pct_change().loc[:exit_date]
# Calculate performance metrics
mean = returns.mean() * 250
stdev = returns.std() * np.sqrt(250)
sharpe = returns.mean() / returns.std() * np.sqrt(250)
min_nav = navs.min()
mean_profit = profits.sum()
min_cum_prof = profits.cumsum().min()
drawdown = compute_drawdowns(navs).max()
went_bust = bust_time is not None
return mean, stdev, sharpe, mean_profit, min_nav, min_cum_prof, drawdown, went_bust
def determine_bust_time(bust_times1, bust_times2):
bust_time1 = bust_times1.index[0] if len(bust_times1) > 0 else None
bust_time2 = bust_times2.index[0] if len(bust_times2) > 0 else None
if bust_time1 and bust_time2:
bust_time = min(bust_time1, bust_time2)
bust_sort = 1 if bust_time1 < bust_time2 else 2
elif bust_time1:
bust_time, bust_sort = bust_time1, 1
elif bust_time2:
bust_time, bust_sort = bust_time2, 2
else:
bust_time, bust_sort = None, None
return bust_time, bust_sort
def handle_early_exit(bust_time, stocks_temp, bust_sort):
zeros = 0 * stocks_temp.loc[bust_time:].iloc[1:]
stocks_temp = pd.concat([stocks_temp.loc[:bust_time], zeros], axis=0)
print(f"\nPortfolio exited early at {bust_time}")
if bust_sort == 1:
print("NAV fell below 0.25 * cash")
elif bust_sort == 2:
print("Long positions + cash fell below short position")
```
### Performance Metrics
This section details the various performance and risk metrics employed to assess the effectiveness of the statistical arbitrage strategies.
#### Profit and Return Calculations
##### Trading Costs Computation
Trading costs, an inherent part of real-world trading, are factored into the performance evaluation. The `compute_trading_costs` function calculates these costs based on the volume of trades and the bid-ask spreads:
```python=
def compute_trading_costs(trades, spreads):
volume = trades.abs()
if isinstance(volume, pd.Series):
assert isinstance(spreads, pd.Series), "spreads and trades must be of the same type"
assert (trades.index == spreads.index).all(), "Index of trades and spreads must be the same"
return 0.5 * (spreads * volume).sum()
elif isinstance(volume, pd.DataFrame):
assert isinstance(spreads, pd.DataFrame), "spreads and trades must be of the same type"
assert (trades.index == spreads.index).all(), "Index of trades and spreads must be the same"
assert (trades.columns == spreads.columns).all(), "Columns of trades and spreads must be the same"
return 0.5 * (spreads * volume).sum(axis=1)
else:
raise ValueError("trades and spreads must be a pd.Series or pd.DataFrame")
```
* **Functionality:** Computes trading costs based on trade volume and spreads.
* **Flexibility:** Handles both individual asset trades (Series) and portfolio-level trades (DataFrame).
##### `Metrics` Class
The `Metrics` class provides a structured way to calculate and store various profit-related metrics:
```python=
@dataclass
class Metrics:
daily_profit: pd.Series
entry_date: pd.Timestamp
exit_trigger: pd.Timestamp
exit_date: pd.Timestamp
@property
def mean_profit(self):
"""Mean daily profit"""
return self.daily_profit.mean()
@property
def std_profit(self):
"""Standard deviation of daily profit"""
return self.daily_profit.std()
@property
def total_profit(self):
"""Total cumulative profit"""
return self.daily_profit.sum()
@property
def sr_profit(self):
"""Sharpe ratio of daily profit"""
return self.mean_profit / self.std_profit
@property
def annualized_return(self):
"""Annualized return"""
return self.mean_profit * 250 # Assuming 250 trading days per year
@property
def annualized_volatility(self):
"""Annualized volatility"""
return self.std_profit * np.sqrt(250) # Assuming 250 trading days per year
@property
def annualized_sharpe_ratio(self):
"""Annualized Sharpe ratio"""
return self.sr_profit * np.sqrt(250) # Assuming 250 trading days per year
```
* **Metrics Calculation:** Includes mean profit, standard deviation, total profit, Sharpe ratio, and their annualized counterparts.
* **Data Storage:** Stores daily profit, entry date, exit trigger date, and exit date for further analysis.
#### Risk Measures
##### Drawdown Calculations
Drawdowns, representing the peak-to-trough decline in portfolio value, are a crucial risk metric. The provided functions calculate drawdowns from both NAVs and returns:
```python=
def compute_drawdowns(navs):
"""
Computes drawdowns from a time series of NAVs
"""
max_nav = navs.cummax()
drawdowns = -(navs - max_nav) / max_nav
return drawdowns
def compute_drawdowns_from_returns(returns):
"""
Computes drawdowns from a time series of returns
"""
navs = (1 + returns).cumprod()
return compute_drawdowns(navs)
def maximum_drawdown(navs):
"""
Computes the maximum drawdown from a time series of NAVs
"""
return compute_drawdowns(navs).max()
```
* `compute_drawdowns`: Calculates drawdowns from a time series of NAVs.
* `compute_drawdowns_from_returns`: Calculates drawdowns from a time series of returns.
* `maximum_drawdown`: Extracts the largest drawdown from a series of NAVs.
##### Value at Risk (VaR) and Conditional Value at Risk (CVaR)
VaR and CVaR are additional risk measures that quantify potential losses:
```python=
def value_at_risk(returns, confidence_level=0.95):
"""
Computes the Value at Risk (VaR) for a given confidence level
"""
return np.percentile(returns, 100 * (1 - confidence_level))
def conditional_value_at_risk(returns, confidence_level=0.95):
"""
Computes the Conditional Value at Risk (CVaR) for a given confidence level
"""
var = value_at_risk(returns, confidence_level)
return returns[returns <= var].mean()
```
* `value_at_risk`: Estimates the potential loss at a specified confidence level (default 95%).
* `conditional_value_at_risk`: Estimates the expected loss given that the loss exceeds the VaR.
#### Performance Evaluation
The `compute_performance_metrics` function provides a comprehensive evaluation of a strategy's performance and risk:
```python=
def compute_performance_metrics(returns, navs):
"""
Computes a comprehensive set of performance metrics
"""
metrics = {
"Total Return": navs.iloc[-1] / navs.iloc[0] - 1,
"Annualized Return": returns.mean() * 250,
"Annualized Volatility": returns.std() * np.sqrt(250),
"Sharpe Ratio": (returns.mean() / returns.std()) * np.sqrt(250),
"Maximum Drawdown": maximum_drawdown(navs),
"Value at Risk (95%)": value_at_risk(returns),
"Conditional Value at Risk (95%)": conditional_value_at_risk(returns),
"Positive Days Ratio": (returns > 0).mean(),
"Negative Days Ratio": (returns < 0).mean(),
"Best Day Return": returns.max(),
"Worst Day Return": returns.min(),
}
return pd.Series(metrics)
```
* **Comprehensive Metrics:** Calculates a wide range of metrics, including total and annualized returns, volatility, Sharpe ratio, maximum drawdown, VaR, CVaR, positive/negative days ratio, and best/worst day returns.
* **Holistic View:** Offers a complete picture of a strategy's profitability, risk profile, and distributional characteristics.
These performance and risk metrics, coupled with the backtesting and simulation framework, enable a thorough analysis and comparison of fixed-band and moving-band statistical arbitrage strategies, facilitating informed decision-making in strategy selection and implementation.
### Visualization Tools- `plot_stat_arb`
The `plot_stat_arb` function provides a comprehensive visualization of the performance and characteristics of a statistical arbitrage strategy. It generates three informative plots:
1. **Price Evolution:** This plot displays the price trajectory of the stat-arb strategy over time, both in-sample (training period) and out-of-sample (testing period). It also visualizes the midpoint (fixed or moving) and trading bounds, offering insights into the strategy's behavior and adherence to the price band constraints.
2. **Deviation from Midpoint:** This plot illustrates how the strategy's price deviates from its midpoint over time. It highlights the mean-reversion tendencies of the strategy and potential opportunities for profitable trades.
3. **Cumulative Profit:** This plot depicts the accumulated profit of the strategy over time, considering trading costs and short-selling costs. It provides a clear picture of the strategy's profitability and potential drawdowns.
```python=
def plot_stat_arb(stat_arb_tuple, insample_bound, outsample_bound, spreads, legend=True):
stat_arb = stat_arb_tuple.stat_arb
midpoint_memory = stat_arb.midpoint_memory
asset_names = stat_arb.asset_names
stocks = stat_arb.stocks
prices_train = stat_arb_tuple.prices_train.iloc[:]
prices_train = pd.concat([prices_train, stat_arb_tuple.prices_test.iloc[0:1]], axis=0)
prices_test = stat_arb_tuple.prices_test
prices_train_test = pd.concat([prices_train, prices_test], axis=0)
p_train = prices_train[asset_names] @ stocks
if stat_arb.moving_midpoint:
mu_train = p_train.rolling(midpoint_memory).mean().dropna()
else:
mu_train = stat_arb.mu
p_test = prices_test[asset_names] @ stocks
if stat_arb.moving_midpoint:
prices_train_test = pd.concat([prices_train, prices_test], axis=0).iloc[-len(prices_test) - midpoint_memory + 1:]
mu_test = (prices_train_test[asset_names] @ stocks).rolling(midpoint_memory).mean().dropna()
else:
mu_test = stat_arb.mu
exit_trigger = stat_arb_tuple.metrics.exit_trigger
exit_date = stat_arb_tuple.metrics.exit_date
entry_date = stat_arb_tuple.metrics.entry_date
elim_end = exit_date + pd.Timedelta(days=100)
# Plot 1: Price evolution
plt.figure(figsize=(12, 6))
plt.plot(p_train, color="b", label="In-sample")
if stat_arb.moving_midpoint:
plt.plot(mu_train, color="g")
plt.plot(p_test.loc[:elim_end], color="r", label="Out-of-sample")
if stat_arb.moving_midpoint:
plt.plot(mu_test.loc[:elim_end], color="g", label="Midpoint" + r" $(\mu_t)$")
else:
plt.plot([prices_train.index[0], prices_test.index[-1]], [stat_arb.mu, stat_arb.mu], color="g", label="Midpoint" + r" $(\mu)$")
plt.axvline(prices_test.index[0], color="k", linewidth=2)
stocks_str = "+".join([f"{np.round(s, 1)}×{a}" for s, a in zip(stocks, asset_names)])
print("stat-arb: ", stocks_str)
plt.gcf().autofmt_xdate()
xlim_start = prices_train.index[21]
if legend:
plt.xlabel("Date")
plt.ylabel("Stat-arb price")
plt.xlim(xlim_start, elim_end)
if exit_trigger is not None:
plt.axvline(exit_trigger, linestyle="--", color="k", label="Exit period", linewidth=2)
plt.axvline(exit_date, linestyle="--", color="k", linewidth=2)
if stat_arb.moving_midpoint:
band_label = r"$\mu_t\pm 1$"
if outsample_bound is not np.inf:
plt.plot(mu_test.loc[:elim_end] + outsample_bound, linestyle=":", color="k", linewidth=2)
plt.plot(mu_test.loc[:elim_end] - outsample_bound, linestyle=":", color="k", linewidth=2)
else:
sigma = 1 / 4.2
mu_train_test = pd.concat([mu_train, mu_test], axis=0)
plt.plot(mu_train_test.loc[:elim_end] + 4.2 * sigma, linestyle=":", color="k", linewidth=2, label=band_label)
plt.plot(mu_train_test.loc[:elim_end] - 4.2 * sigma, linestyle=":", color="k", linewidth=2)
else:
plt.plot([prices_train.index[0], prices_train.index[-1]], [stat_arb.mu + insample_bound, stat_arb.mu + insample_bound], linestyle=":", color="k", linewidth=2, label=r"$\mu\pm 1$")
plt.plot([prices_train.index[0], prices_train.index[-1]], [stat_arb.mu - insample_bound, stat_arb.mu - insample_bound], linestyle=":", color="k", linewidth=2)
plt.plot([prices_test.index[0], prices_test.index[-1]], [stat_arb.mu + insample_bound, stat_arb.mu + insample_bound], linestyle=":", color="k", linewidth=2)
plt.plot([prices_test.index[0], prices_test.index[-1]], [stat_arb.mu - insample_bound, stat_arb.mu - insample_bound], linestyle=":", color="k", linewidth=2)
if legend:
plt.legend(bbox_to_anchor=(0.5, 1.225), loc="upper center", ncol=3, borderaxespad=0)
plt.title("Stat-arb Price Evolution")
plt.show()
# Plot 2: Deviation from midpoint
plt.figure(figsize=(12, 6))
plt.plot(p_train - mu_train, color="b", label="In-sample")
if stat_arb.moving_midpoint:
plt.plot(p_test.loc[:elim_end] - mu_test.loc[:elim_end], color="r", label="Out-of-sample")
else:
plt.plot(p_test.loc[:elim_end] - mu_test, color="r", label="Out-of-sample")
plt.axvline(prices_test.index[0], color="k", linewidth=2)
plt.gcf().autofmt_xdate()
plt.xlim(xlim_start, elim_end)
plt.ylabel(r"$p_t-\mu_t$")
plt.title("Deviation from Midpoint")
plt.legend()
plt.show()
# Plot 3: Cumulative profit
plt.figure(figsize=(12, 6))
prices_train_test = pd.concat([prices_train, prices_test], axis=0)
if stat_arb.moving_midpoint:
mu = stat_arb.evaluate(prices_train_test).rolling(midpoint_memory, min_periods=1).mean()
else:
mu = stat_arb.mu
positions = stat_arb.get_positions(prices_train_test, mu, T_max=1e6)
initial_cash = 1e6
aum = initial_cash + (prices_train_test.iloc[0] * positions).sum(axis=1)
portfolio = Portfolio(prices_train_test, units=positions, aum=aum)
profit = portfolio.profit
times = portfolio.units.index
assets = portfolio.units.columns
trading_costs = compute_trading_costs(portfolio.trades_currency[assets], spreads[assets].loc[times])
short_rate = 0.5 / 100 / 252 # half a percent per year
short_costs = (portfolio.units[portfolio.units < 0].abs() * portfolio.prices).sum(axis=1) * short_rate
profit = profit - short_costs - trading_costs
profits_train = profit.loc[: prices_train.index[-1]]
profits_test = profit.loc[prices_test.index[0] :]
profits_test.loc[exit_date:] = 0
plt.plot(profits_train.loc[xlim_start:].cumsum(), color="b", label="In-sample")
plt.plot(profits_test.cumsum(), color="r", label="Out-of-sample")
plt.axvline(prices_test.index[0], color="k", linewidth=2)
plt.gcf().autofmt_xdate()
plt.xlim(xlim_start, elim_end)
if legend:
plt.ylabel("Cumulative profit")
plt.xlabel("Date")
plt.title("Cumulative Profit")
plt.legend()
plt.show()
```
#### Parameters
* `stat_arb_tuple`: A namedtuple containing the `StatArb` object, training and testing prices.
* `insample_bound`: The boundary for the price band in the in-sample period.
* `outsample_bound`: The boundary for the price band in the out-of-sample period.
* `spreads`: The bid-ask spreads for the assets involved in the strategy.
* `legend`: A boolean flag to control the display of the legend in the plots.
### Results and Comparative Analysis
This section provides a comprehensive analysis of the backtesting results, comparing the performance of fixed-band and moving-band statistical arbitrage strategies across various metrics and visualizations.
#### Fixed-Band Stat-arbs
##### Performance Summary
Key performance statistics for the fixed-band stat-arbs are presented, offering insights into the central tendency, dispersion, and overall distribution of metrics like annualized return, Sharpe ratio, total profit, and maximum drawdown.
```python=
results, portfolios = run_finding_backtest(
prices_full,
stocks_list,
P_max=50,
moving_midpoint=False,
midpoint_memory=None,
T_max=63,
)
all_traded_assets = []
for portfolio in portfolios:
all_traded_assets += portfolio.units.columns.tolist()
all_traded_assets = list(set(all_traded_assets))
means = []
stdevs = []
sharpes = []
profits = []
min_navs = []
min_cum_profs = []
drawdowns = []
n_busts = 0
for i, portfolio in enumerate(portfolios):
res = results[i]
(
mean,
stdev,
sharpe,
mean_profit,
min_nav,
min_cum_prof,
drawdown,
went_bust,
) = simulate(res, portfolio, spreads[all_traded_assets], lev_fraction=0.5)
means.append(mean)
stdevs.append(stdev)
sharpes.append(sharpe)
profits.append(mean_profit)
min_navs.append(min_nav)
min_cum_profs.append(min_cum_prof)
drawdowns.append(drawdown)
n_busts += went_bust
fixed_band_metrics = pd.DataFrame({
"means": means,
"stdevs": stdevs,
"sharpes": sharpes,
"profits": profits,
"min_navs": min_navs,
"min_cum_prof": min_cum_prof,
"drawdowns": drawdowns,
})
```
There are four busts:
```
Portfolio exited early at 2020-02-12 00:00:00
NAV fell below 0.5 * cash
Portfolio exited early at 2020-03-18 00:00:00
Long positions + cash fell below short position
Portfolio exited early at 2020-09-01 00:00:00
NAV fell below 0.5 * cash
Portfolio exited early at 2024-03-04 00:00:00
NAV fell below 0.5 * cash
```
##### Distribution of Key Metrics
Kernel density estimation plots visualize the distribution of crucial performance metrics, allowing for a visual assessment of the strategies' performance characteristics and potential risks.
```python=
fig, axes = plt.subplots(2, 2, figsize=(15, 15))
sns.kdeplot(fixed_band_metrics['means'], ax=axes[0, 0])
axes[0, 0].set_title('Distribution of Annualized Returns')
axes[0, 0].set_xlabel('Annualized Return')
sns.kdeplot(fixed_band_metrics['sharpes'], ax=axes[0, 1])
axes[0, 1].set_title('Distribution of Sharpe Ratios')
axes[0, 1].set_xlabel('Sharpe Ratio')
sns.kdeplot(fixed_band_metrics['profits'], ax=axes[1, 0])
axes[1, 0].set_title('Distribution of Total Profits')
axes[1, 0].set_xlabel('Total Profit')
sns.kdeplot(fixed_band_metrics['drawdowns'], ax=axes[1, 1])
axes[1, 1].set_title('Distribution of Maximum Drawdowns')
axes[1, 1].set_xlabel('Maximum Drawdown')
plt.tight_layout()
plt.show()
```

##### Example Stat-arbs
The performance of fixed-band strategies at the 20th and 80th percentiles of annualized returns is illustrated, providing concrete examples of the range of outcomes observed.
```python=
quantile_low = 0.2
quantile_high = 0.8
quantiles = fixed_band_metrics['means'].quantile([quantile_low, quantile_high])
res_low = results[fixed_band_metrics['means'][(fixed_band_metrics['means'] - quantiles[quantile_low]).abs().argsort()[:1]].index[0]]
plot_stat_arb(res_low, insample_bound=1, outsample_bound=np.inf, spreads=spreads)
res_high = results[fixed_band_metrics['means'][(fixed_band_metrics['means'] - quantiles[quantile_high]).abs().argsort()[:1]].index[0]]
plot_stat_arb(res_high, insample_bound=1, outsample_bound=np.inf, spreads=spreads)
```
###### 20-th percentile fixed-band stat-arb
```
stat-arb: 0.3×WMT + 0.1×QCOM + 0.1×GE + 0.1×PANW
```



###### 80-th percentile fixed-band stat-arb
```python=
res_high = results[means[(means - quantiles[quantile_high]).abs().argsort()[:1]].index[0]]
plot_stat_arb(res_high, insample_bound=1, outsample_bound=np.inf, spreads=spreads);
```
```
stat-arb: 0.1×MRK + 0.1×TXN - 0.1×MS - 0.1×UBER
```



#### Moving-Band Stat-arbs
The same analysis is replicated for moving-band stat-arbs, facilitating a direct comparison between the two approaches.
##### Performance Summary
```python=
results, portfolios = run_finding_backtest(
prices_full,
stocks_list,
P_max=100,
moving_midpoint=True,
midpoint_memory=21,
T_max=125,
)
all_traded_assets = []
for portfolio in portfolios:
all_traded_assets += portfolio.units.columns.tolist()
all_traded_assets = list(set(all_traded_assets))
means = []
stdevs = []
sharpes = []
profits = []
min_navs = []
min_cum_profs = []
drawdowns = []
n_busts = 0
for i, portfolio in enumerate(portfolios):
res = results[i]
(
mean,
stdev,
sharpe,
mean_profit,
min_nav,
min_cum_prof,
drawdown,
went_bust,
) = simulate(res, portfolio, spreads[all_traded_assets], lev_fraction=0.5)
means.append(mean)
stdevs.append(stdev)
sharpes.append(sharpe)
profits.append(mean_profit)
min_navs.append(min_nav)
min_cum_profs.append(min_cum_prof)
drawdowns.append(drawdown)
n_busts += went_bust
moving_band_metrics = pd.DataFrame({
"means": means_moving,
"stdevs": stdevs_moving,
"sharpes": sharpes_moving,
"profits": profits_moving,
"min_navs": min_navs_moving,
"min_cum_prof": min_cum_prof_moving,
"drawdowns": drawdowns_moving,
})
```
There are three busts:
```
Portfolio exited early at 2020-03-18 00:00:00
NAV fell below 0.5 * cash
Portfolio exited early at 2023-09-11 00:00:00
NAV fell below 0.5 * cash
Portfolio exited early at 2024-06-03 00:00:00
NAV fell below 0.5 * cash
```
##### Distribution of Key Metrics
```python=
fig, axes = plt.subplots(2, 2, figsize=(15, 15))
sns.kdeplot(moving_band_metrics['means'], ax=axes[0, 0])
axes[0, 0].set_title('Distribution of Annualized Returns (Moving-Band)')
axes[0, 0].set_xlabel('Annualized Return')
sns.kdeplot(moving_band_metrics['sharpes'], ax=axes[0, 1])
axes[0, 1].set_title('Distribution of Sharpe Ratios (Moving-Band)')
axes[0, 1].set_xlabel('Sharpe Ratio')
sns.kdeplot(moving_band_metrics['profits'], ax=axes[1, 0])
axes[1, 0].set_title('Distribution of Total Profits (Moving-Band)')
axes[1, 0].set_xlabel('Total Profit')
sns.kdeplot(moving_band_metrics['drawdowns'], ax=axes[1, 1])
axes[1, 1].set_title('Distribution of Maximum Drawdowns (Moving-Band)')
axes[1, 1].set_xlabel('Maximum Drawdown')
plt.tight_layout()
plt.show()
```

##### Example Stat-arbs
```python=
quantiles_moving = moving_band_metrics['means'].quantile([quantile_low, quantile_high])
res_low_moving = results_moving[moving_band_metrics['means'][(moving_band_metrics['means'] - quantiles_moving[quantile_low]).abs().argsort()[:1]].index[0]]
plot_stat_arb(res_low_moving, insample_bound=1, outsample_bound=np.inf, spreads=spreads)
res_high_moving = results_moving[moving_band_metrics['means'][(moving_band_metrics['means'] - quantiles_moving[quantile_high]).abs().argsort()[:1]].index[0]]
plot_stat_arb(res_high_moving, insample_bound=1, outsample_bound=np.inf, spreads=spreads)
```
###### 20-th percentile moving-band stat-arb
```
stat-arb: 0.1×MSFT - 0.1×WFC - 0.1×ISRG + 0.2×PLD + 0.1×ETN
```



###### 80-th percentile moving-band stat-arb
```
stat-arb: 0.1×TXN - 0.1×PANW
```



#### Comparative Analysis
##### Performance Comparison
A tabular and visual comparison of key metrics between fixed-band and moving-band strategies is presented.
```python=
comparison_metrics = pd.DataFrame({
"Metric": ["Return", "Risk", "Sharpe", "Drawdown"],
"Fixed-Band": [
fixed_band_metrics['means'].mean(),
fixed_band_metrics['stdevs'].mean(),
fixed_band_metrics['sharpes'].mean(),
fixed_band_metrics['drawdowns'].mean()
],
"Moving-Band": [
moving_band_metrics['means'].mean(),
moving_band_metrics['stdevs'].mean(),
moving_band_metrics['sharpes'].mean(),
moving_band_metrics['drawdowns'].mean()
]
})
print(comparison_metrics)
comparison_metrics_melted = pd.melt(comparison_metrics, id_vars=['Metric'], var_name='Approach', value_name='Value')
plt.figure(figsize=(12, 6))
sns.barplot(x="Metric", y="Value", hue="Approach", data=comparison_metrics_melted)
plt.title("Comparison of Fixed-Band vs Moving-Band Stat-Arbs")
plt.ylabel("Value")
plt.xticks(rotation=45)
plt.legend(title="Approach")
plt.tight_layout()
plt.show()
```
###### Fixed-band stat-arbs metrics
| Metric | Mean | Median | 75th Percentile | 25th Percentile | Fraction Positive |
| :-------------- | :-------- | :-------- | :-------------- | :-------------- | :---------------- |
| Return | 0.01 | 0.13 | 0.31 | -0.18 | 0.62 |
| Risk | 0.36 | 0.28 | 0.43 | 0.15 | N/A |
| Sharpe | 0.48 | 0.62 | 1.62 | -0.59 | 0.62 |
| Drawdown | 0.18 | 0.13 | 0.24 | 0.06 | N/A |
###### Moving-band stat-arbs metrics
| Metric | Mean | Median | 75th Percentile | 25th Percentile | Fraction Positive |
| :-------------- | :-------- | :-------- | :-------------- | :-------------- | :---------------- |
| Return | <font color='#33FF44'>0.11(+0.1) | 0.13 | <font color='#D41D1A'>0.21 (-0.1) | <font color='#33FF44'>-0.08(+0.17) | <font color='#33FF44'>0.75(+0.13) |
| Risk | <font color='#D41D1A'>0.2(-0.16) | <font color='#D41D1A'>0.14(-0.14) | <font color='#D41D1A'>0.22(-0.21) | <font color='#D41D1A'>0.1(-0.05) | N/A |
| Sharpe | <font color='#33FF44'>0.6(+0.12) | <font color='#D41D1A'>0.61(-0.01) | <font color='#D41D1A'>1.35(-0.27) | <font color='#33FF44'>-0.04(+0.55) | <font color='#33FF44'>0.75(+0.13) |
| Drawdown | <font color='#D41D1A'>0.12(-0.06) | <font color='#D41D1A'>0.13(-0.04) | <font color='#D41D1A'>0.14(-0.14) | 0.06 | N/A |

##### Statistical Significance
T-tests are performed to assess the statistical significance of the observed differences in performance metrics between the two approaches.
```python=
from scipy import stats
def perform_ttest(metric_name):
fixed_band_data = fixed_band_metrics[metric_name]
moving_band_data = moving_band_metrics[metric_name]
t_stat, p_value = stats.ttest_ind(fixed_band_data, moving_band_data)
return f"{metric_name}: t-statistic = {t_stat:.4f}, p-value = {p_value:.4f}"
print(perform_ttest('means'))
print(perform_ttest('sharpes'))
print(perform_ttest('drawdowns'))
```
```
means: t-statistic = -1.1498, p-value = 0.2505
sharpes: t-statistic = -0.6276, p-value = 0.5304
drawdowns: t-statistic = 5.9960, p-value = 0.0000
```
##### Active Strategies Over Time
The number of active strategies over time is plotted for both fixed-band and moving-band approaches, highlighting potential differences in strategy longevity and adaptability to changing market conditions.
```python=
def count_active_strategies(results, date):
return sum(1 for r in results if r.metrics.entry_date <= date <= r.metrics.exit_date)
dates = pd.date_range(start=min(r.metrics.entry_date for r in results),
end=max(r.metrics.exit_date for r in results),
freq='D')
active_fixed = [count_active_strategies(results, date) for date in dates]
active_moving = [count_active_strategies(results_moving, date) for date in dates]
plt.figure(figsize=(12, 6))
plt.plot(dates, active_fixed, label='Fixed-Band')
plt.plot(dates, active_moving, label='Moving-Band')
plt.title('Number of Active Strategies Over Time')
plt.xlabel('Date')
plt.ylabel('Number of Active Strategies')
plt.legend()
plt.tight_layout()
plt.show()
```

##### Asset Composition Analysis
The composition of assets within the strategies is analyzed, revealing insights into the diversity of asset selection and potential concentration risks.
```python=
def analyze_composition(results):
all_assets = set()
asset_counts = {}
for r in results:
strategy_assets = set(r.stat_arb.assets.keys())
all_assets.update(strategy_assets)
for asset in strategy_assets:
asset_counts[asset] = asset_counts.get(asset, 0) + 1
return all_assets, asset_counts
fixed_assets, fixed_counts = analyze_composition(results)
moving_assets, moving_counts = analyze_composition(results_moving)
print(f"Number of unique assets in Fixed-Band strategies: {len(fixed_assets)}")
print(f"Number of unique assets in Moving-Band strategies: {len(moving_assets)}")
plt.figure(figsize=(24, 12))
plt.bar(fixed_counts.keys(), fixed_counts.values(), alpha=0.5, label='Fixed-Band')
plt.bar(moving_counts.keys(), moving_counts.values(), alpha=0.5, label='Moving-Band')
plt.title('Asset Frequency in Strategies')
plt.xlabel('Asset')
plt.ylabel('Frequency')
plt.legend()
plt.xticks(rotation=90)
plt.tight_layout()
plt.show()
```
```
Number of unique assets in Fixed-Band strategies: 93
Number of unique assets in Moving-Band strategies: 96
```

This comprehensive results analysis provides a multifaceted view of the performance and characteristics of fixed-band and moving-band statistical arbitrage strategies. It facilitates a deeper understanding of their strengths and weaknesses, enabling informed decision-making in the context of practical trading and risk management.
## Reference
* K. Johansson, T. Schmelzer, and S. Boyd, Finding Moving-Band Statistical Arbitrages via Convex-Concave Optimization
* K. Johansson, T. Schmelzer, and S. Boyd, A Markowitz Approach to Managing a Dynamic Basket of Moving-Band Statistical Arbitrages
* [cvxstatarb](https://github.com/cvxgrp/cvxstatarb/tree/main)