# Stratified Models for Portfolio Construction
## Introduction
Stratified models offer a robust machine learning technique to improve prediction accuracy by customizing models to specific subgroups or strata within your data. By segmenting your data based on categorical features (e.g., age, location, customer type), you can create specialized models that capture the unique patterns and relationships within each group.
### Key Concepts
* **Stratified Models:** Models that adapt their behavior based on a chosen stratification feature, while maintaining a simple (often linear) relationship with other features. This allows for capturing complex interactions with the stratification feature while keeping the model interpretable.
* **Example:** A stratified model for predicting customer behavior could have separate models for different age groups, recognizing that preferences and purchasing habits can vary with age. Each age-specific model would learn patterns unique to its group.
### Hybrid Nature
Stratified models combine aspects of both complex and simple models:
* **Complex Dependence on Stratification Feature:** Each model within the stratified model can capture nuanced, non-linear relationships specific to its assigned group. This allows for a deep understanding of how the chosen feature influences the outcome.
* **Simple Dependence on Other Features:** The relationship between the model and other features (e.g., income, education level) is often linear, making it easier to interpret how changes in these features affect the prediction within a given stratum.
## Stratified Models: A Technical Deep Dive
Stratified models are a powerful technique in machine learning that leverages customization to enhance predictive capabilities. The core idea is to fit a base model, but with distinct parameter values tailored to each unique value or level of a selected categorical feature, often termed the stratification feature $z$. This approach enables the model to adapt and fine-tune its predictions for specific subgroups within your dataset.
### Structure of the Data
In a stratified modeling framework, each data point is represented by a triplet:
* **$z$ (Stratification Feature):** This is a categorical variable that defines the subgroups or strata. Common examples include demographic categories (e.g., age group, gender), geographic locations, or customer segments.
* **$x$ (Other Features):** These can be either numerical or categorical variables that provide additional information for predicting the outcome.
* **$y$ (Outcome):** This is the variable you are trying to predict, and it can be continuous (for regression tasks) or categorical (for classification tasks).
### The Base Model and Its Parameters
The foundation of a stratified model is the base model. This model encapsulates the relationship between the features $x$ and the outcome $y$. Crucially, in a stratified model, we allow the parameters of this base model (denoted by the vector $\theta$) to vary across different strata.
For instance, if $z$ represents age groups, we would have different parameter values $\theta$ for young adults, middle-aged individuals, and seniors. This customization empowers the model to learn distinct patterns and relationships that are specific to each age group.
#### Local Loss and Regularization
The process of fitting a stratified model revolves around minimizing a *regularized empirical loss*.
* **Local Loss $\ell$:** For each stratum $k$, the local loss $\ell_k(\theta_k)$ quantifies how well the model with parameters $\theta_k$ performs on the data points belonging to that stratum. It is the sum of the individual loss function $l$ evaluated for each data point within stratum $k$: $$\ell_k(\theta) = \sum_{i:z_i=k} l(\theta, x_i, y_i)$$
* **Regularization:** To prevent overfitting, a regularization term r is added to the loss function. This term penalizes overly complex models, thereby enhancing their ability to generalize to unseen data.
* **Laplacian Regularization $\mathcal{L}$:** This specific type of regularization encourages the parameter values across different strata to be similar or "smooth.": $$\mathcal{L}(\theta) = \mathcal{L}(\theta_1, ..., \theta_K) = \frac{1}{2} \sum_{i=1}^K \sum_{j<i} W_{ij} || \theta_i -\theta_j ||_2^2 = \frac12 \text{Tr}(\theta L \theta^T)$$ The degree of similarity is controlled by a graph structure (the regularization graph), where the nodes represent strata and the edges indicate relationships between them. The Laplacian matrix $L$, derived from this graph, plays a central role in calculating the regularization term: $$L_{ij} = \begin{cases} -W_{ij} \ & i\ne j\\ \sum_{k=1}^K W_{ik}\ & i=j \end{cases}$$
#### Optimization: Finding the Best Parameters
The optimal parameters for the stratified model are determined by minimizing the overall objective function:
$$
F(\theta_1, ..., \theta_k) = \sum_{k=1}^K (l_k(\theta_k) + r(\theta_k)) + \mathcal{L}(\theta_1, ..., \theta_K)
$$
Here, $r$ represents an additional local regularization function that can be applied at the stratum level. By minimizing this objective, we strike a balance between fitting the data well (minimizing local losses) and maintaining smoothness across strata (through Laplacian regularization).
#### Convexity and Hyperparameter Tuning
Under the assumption that the loss $l$ and regularization functions $r$ are convex, the optimization problem becomes easier to solve. Hyperparameters, which control model complexity and the strength of regularization, are fine-tuned using methods like grid search or cross-validation to ensure optimal model performance.
### Data Models: Adapting to Different Tasks
The choice of data model within the stratified framework depends on the specific prediction task you're tackling and the nature of your outcome variable $y$. Different models have corresponding loss functions and predictors tailored to specific scenarios, such as regression, classification, or distributional modeling.
Here's a summary of the different data models discussed in the context of stratified models:
| Model Name | Loss Function | Predictor |
|------------|---------------|-----------|
| Regression | $l(\theta,x,y)=p(x^T\theta-y)$ | $f_z(x)=x^T\theta_z$ |
| Boolean Classification | $l(\theta,x,y)=p(yx^T\theta)$ | $f_z(x)=\text{sign}(x^T\theta_z)$ |
| Multi-Class Classification | $l(\theta,x,y)=\sum_{i:i\neq y}((x^T\theta)_i-(x^T\theta)_y+1)_+$ or $l(\theta,x,y)=\log(\sum_{j=1}^M\exp(x^T\theta)_j)-(x^T\theta)_y$ | $f_z(x)=\arg\max_i(x^T\theta_z)_i$ |
| Regression (no $x$) | $l(\theta,y)=p(\theta-y)$ | $\theta_z$ |
| Boolean Classification (no $x$) | $l(\theta,y)=p(y\theta)$ | $\text{sign}(\theta_z)$ |
| Logistic Regression | $l(\theta,x,y)=\log(1+\exp(-y\theta^Tx))$ | $\text{Prob}(y=1\mid x)=\frac{1}{1+\exp(-\theta^Tx)}$ |
| Multinomial Logistic Regression | $l(\theta,x,y=i)=\log(\sum_{j=1}^M\exp(x^T\theta)_j)-(x^T\theta)_i$ | $\text{Prob}(y=i\mid x)=\frac{\exp((x^T\theta)_i)}{\sum_{j=1}^M\exp((x^T\theta)_j)}$ |
| Exponential Regression | $l(\theta,x,y)=x^T\theta+\exp(x^T\theta)y$ | $\text{Prob}(y\mid x)=\exp(x^T\theta)\exp(-y\exp(x^T\theta))$ |
| Gaussian Distribution | $l(\theta,y)=-\log \det S+y^TSy-2y^T\Gamma+\Gamma^TS^{-1}\Gamma$ where $\theta=(S,\Gamma)=(\Sigma^{-1},\Sigma^{-1}\mu)$ | $p(y\mid \theta)=\frac{1}{(2\pi)^{n/2}(\det \Sigma)^{1/2}}\exp(-\frac{1}{2}(y-\mu)^T\Sigma^{-1}(y-\mu))$ |
| Bernoulli Distribution | $l(\theta,y)=(1^Ty)\log(\theta)-(n-1^Ty)\log(1-\theta)$ | $p(y\mid \theta)=\theta^y(1-\theta)^{1-y}$ |
| Poisson Distribution | $l(\theta,y=k)=k\log \theta+\theta$ | $p(y=k\mid \theta)=\frac{\theta^k\exp(-\theta)}{k!}$ |
| Non-Parametric Discrete Distribution | $l(\theta,y=k)=-\log(\theta_k)$ | $p(y=k\mid \theta)=\theta_k$ |
In these formulas:
* $\theta$ represents the model parameters.
* $x$ represents the features.
* $y$ represents the outcome.
* $z$ represents the stratification feature.
* $p(u)$ is a penalty function in regression and classification models.
* $M$ is the number of classes in multi-class classification.
* $n$ is the dimension of the outcome in Bernoulli distribution.
## Portfolio Construction: Leveraging Stratified Models for Adaptive Trading
This section details a sophisticated method for constructing a dynamic trading policy that responds to observable market conditions and current portfolio holdings. The core of this approach lies in developing models for asset return mean and covariance that are explicitly conditioned on the prevailing market regime. These models then drive a trading policy that intelligently adjusts portfolio weights based on the current market environment.
### The Adaptive Trading Policy
The trading policy, denoted by $T$, is a function that determines the optimal portfolio weights $w_t$ at each time point $t$. It takes into account two crucial inputs:
1. **$z_t$ (Market Conditions):** This captures the observable state of the market at time $t$, which could be derived from various indicators like volatility, inflation, and interest rates.
2. **$w_{t-1}$ (Previous Weights):** This represents the portfolio allocation from the previous time period, providing a context for the current decision.
Mathematically, the trading policy can be expressed as:
$$
T: \{1,\cdots,K\}\times \mathbb{R}^n \to \mathbb{R}^n
$$
where:
* $w_t = T(z_t, w_{t-1})$, indicating that the current weights are a function of both the market conditions and the previous weights.
* The policy $T$ itself is based on a refined version of the classic Markowitz portfolio optimization framework, incorporating a Laplacian regularized stratified model for the asset return mean and covariance.
### The Laplacian Regularized Stratified Model: A Deeper Look
This model consists of several key components:
* **Stratified Model:** This model is fit by minimizing the following objective function: $$\mathop{\text{minimize}}_{\theta_1, \dots,\theta_K} \sum_{k=1}^K (\ell_k(\theta_k) + r(\theta_k)) + \mathcal{L}(\theta_1, \dots, \theta_K)$$ where $\ell_k$ represents the local loss for stratum $k$, $r$ is a local regularization function, and $\mathcal{L}$ is the Laplacian regularization term.
* **Gaussian Assumption:** The model assumes that asset returns, given the market condition $z$, follow a Gaussian distribution with mean $\mu_z$ and covariance matrix $\Sigma_z$.
* **Parameter Estimation:** For each market condition, the model estimates the mean return vector $\mu$ and covariance matrix $\Sigma$. This estimation is typically achieved by minimizing the negative log-likelihood of the observed data under the Gaussian assumption.
* **Laplacian Regularization$\mathcal{L}$:** This regularization technique encourages parameter estimates for similar market conditions to be close to each other, effectively borrowing strength from neighboring conditions and enhancing the model's ability to handle unseen market scenarios.
* **Hyperparameter Tuning:** The performance of the model is sensitive to various hyperparameters, such as the strength of the regularization. These are carefully tuned using a validation dataset to optimize model performance.
### Defining Stratified Market Conditions
Each data point in the dataset is associated with a market condition $z$, which is known at the close of the previous trading day. This market condition is derived from three key real-valued market indicators:
1. **Market Implied Volatility:** Measured using a 15-day moving average of the CBOE Volatility Index (VIX), lagged by one day.
2. **Inflation Rate:** Calculated as the percentage change in the Consumer Price Index (CPI).
3. **30-Year US Mortgage Rates:** Measured as the 8-week rolling percent change in the 30-year US mortgage rate.
#### Discretization and Regularization Graph
To handle the vast number of possible market conditions, each of these indicators is divided into deciles, resulting in 1000 potential combinations. Given that many of these combinations may not be present in the training data, Laplacian regularization becomes crucial for borrowing information from similar conditions.
The relationship between different market conditions is defined by a *regularization graph*. Each node in this graph represents a unique market condition, and edges connect nodes considered similar. The weights of these edges, which are hyperparameters, quantify the strength of the relationship between connected nodes.
### Stratified Models for Return and Risk
* **Stratified Return Model:** This model estimates a return vector for each of the 1000 market conditions, using a Huber loss function for robustness and an L2 regularization term for local smoothing:
\begin{split}
\ell_k(\mu_k) &= \sum_{t:z_t=k} \mathbf{1}^T H(\mu_k − y_t) \\
H(z) &= \begin{cases}z^2,&\ |z|\le M\\2M|z| − M^2, &\ |z|\ge M\end{cases} \\
r(\mu_k) &= \gamma_{ret,loc}\|\mu_k\|^2_2
\end{split}
where
* $M > 0$ is the half-width, fixed at $M = 0.01$.
* $\gamma_{ret,loc}$ is a hyperparameter.
* **Stratified Risk Model:** This model estimates the covariance matrix for each market condition, using a negative log-likelihood loss function: $$\ell_k (\theta_k) = \text{Tr}(S_k \Sigma_k^{−1}) − \log \text{det}(\Sigma_k^{−1})$$ where $S_k = \frac{1}{n_k} \sum_{t:z_t=k}y_t y_t^T$ is the empirical covariance matrix of the data $y$ for which $z=k$.
### Optimizing the Trading Policy
The trading policy leverages the stratified return and risk models to construct a portfolio. At the start of each day, the previous day's market conditions are used to determine the current portfolio weights by solving a Markowitz-inspired optimization problem. This optimization aims to maximize the portfolio's expected return while considering constraints such as risk limits, leverage limits, and position limits:
\begin{split}
\text{maximize} &\quad \mu_{z_t}^T w − \gamma_{sc} \kappa^T (w)_− − \gamma_{tc} \tau_t^T |w − w_{t−1}| \\
\text{subject to} &\quad w^T \Sigma_{z_t} w \leq \sigma_{tar}^2, \quad \mathbf{1}^T w = 1, \\
&\quad ||w||_1 ≤ L_{max}, \quad w_{min} \leq w \leq w_{max}
\end{split}
### Backtests
The trading policy's hyperparameters, controlling shorting aversion and turnover, are fine-tuned using backtests on the training data. The final policy is then evaluated on a separate test dataset to assess its performance in a realistic setting.
## Reference
* J. Tuck, S. Barratt, and S. Boyd, Portfolio Construction Using Stratified Models, Chapter 17 in Machine Learning in Financial Markets: A Guide to Contemporary Practice, A. Capponi and C.-A. Lehalle, editors, Cambridge University Press, pages 317–339, 2023.
* J. Tuck, S. Barratt, and S. Boyd, A Distributed Method for Fitting Laplacian Regularized Stratified Models, Journal of Machine Learning Research, (22):1–37, 2021.
* [lrsm_portfolio](https://github.com/cvxgrp/lrsm_portfolio)