# Research Proposal: Modern Machine Learning for Adaptive Quantitative Trading
Financial markets constantly shift between different states, or regimes, characterized by distinct patterns of volatility, momentum, and correlation. A trading strategy that performs well in one regime may fail in another. While traditional methods like *Hidden Markov Models (HMMs)* have been used to identify these regimes, they often oversimplify the complex, non-linear dependencies in financial data.
The core problem is twofold:
1. **Regime Identification:** How can we automatically identify the underlying, latent market regimes from a wide range of market features $\mathbf{x}_t$ and the performance of multiple trading strategies (or risky assets) $\mathbf{y}_t$?
2. **Portfolio Optimization:** Once we understand the market regime, how can we use this information to construct a robust, multi-step portfolio that maximizes risk-adjusted returns?
We propose a novel, three-part framework to address these challenges:
1. A **Self-Supervised Regime Detection** model to identify market states.
2. A **Generative Model** to forecast the full PnL distribution conditioned on the identified regimes.
3. A **Model Predictive Control (MPC)** system to optimize portfolio construction based on these forecasts.
## Proposed Solutions
### Self-Supervised Regime Detection
We formulate regime detection as a self-supervised learning task. The goal is to train a model, a function $f$, that maps market features $\mathbf{x}_t \in \mathbb{R}^n$ to a soft assignment over $K$ potential regimes, represented by a point in the $K$-dimensional simplex, $f(\mathbf{x}_t) \in \Delta_K$.
The model minimizes the within-cluster variance of trading strategy performance, defined by the following objective function:
$$
\underset{\mathcal{f \in \mathcal{D}}}{\arg\min} \sum_{t} \langle f(\mathbf{x}_t), g_f(\mathbf{y}_t) \rangle
$$
Where $g_f(\mathbf{y})$ is the vector of squared Euclidean distances from the PnL vector $\mathbf{y}$ to the regime means:
$$
g_f(\mathbf{y}) = (\| \mathbf{y} - \pmb{\mu}^f_1 \|^2_2, \dots, \| \mathbf{y} - \pmb{\mu}^f_K \|^2_2)^t \in \mathbb{R}^K
$$
and the regime means $\pmb{\mu}^f_i$ are dynamically defined by the function $f$ itself:
$$
\pmb{\mu}^f_i = \frac{\sum_{t \in T} f_i(\mathbf{x}_t) \mathbf{y}_t}{\sum_{t \in T} f_i(\mathbf{x}_t)}
$$
This approach is a generalization of the **K-means clustering algorithm**, allowing for a more flexible and continuous assignment of regimes that specifically targets trading strategy performance. When the function $f$ is a parameterized space (e.g., a neural network), this objective function can be efficiently solved using a deep learning solver such as `PyTorch` or `JAX`. A key challenge will be to extend the concept of a single "regime mean" to a full "regime distribution."
### Generative Models for Strategy PnL
Instead of just forecasting a single point value, this component will generate the entire **probability distribution** of PnL, conditioned on a low-dimensional latent variable $\mathbf{z}_t$ learned by the encoder. The framework consists of:
* An **Encoder** that distills high-dimensional features into a low-dimensional latent representation $\mathbf{z}_t$.
* A **Generator** that takes $\mathbf{z}_t$ as input and produces samples from the PnL distribution.
We will explore state-of-the-art models like *diffusion* models and *flow-matching* models for this task. The primary challenge is the scarcity of data conditional on specific features.
* A potential approach to a prototype of this model is to first use an unsupervised clustering method to produce a discrete regime label $z_t$ and then train a generative model for each regime separately.
* However, our proposed solution is to train the encoder and generator simultaneously using a customized loss function that assumes a pre-determined relationship between the latent variable and the PnL distribution.
This end-to-end approach bypasses the need for discrete clustering and allows the model to learn from all available data. One potential approach is to assume the PnL distribution for a given $\mathbf{x}_t$ is a weighted sum of $K$ separate PnL distributions, with the weights determined by a function that maps the feature $\mathbf{x}_t$ to the $K$-simplex. The central research question here is how to properly define the loss function for such a complex generative model.
### Model Predictive Control (MPC) for Portfolio Construction
This final component uses the generative model as a **world model** for multi-step planning. The system will operate in a closed-loop fashion:
1. **World Model:** At each time step, the generative model will forecast a distribution of potential future PnLs conditioned on the current market features.
2. **Action Network:** A policy network will be trained to make portfolio allocation decisions at each time stamp to maximize the trader's utility (e.g., risk-adjusted return) based on the distributions provided by the world model.
This MPC system will learn to make optimal, forward-looking decisions under uncertainty. A straightforward streamline is to train the world model (the multi-step generator) first and then train the action network separately. However, it would be interesting to see if one can train all the components of this system—the encoder, the generator, and the action network—at the same time.
## References
* Aghapour, A., Bayraktar, E., & Yuan, F. (2025). Solving dynamic portfolio selection problems via score-based diffusion models. arXiv.
* Balestriero, R., Ibrahim, M., Sobal, V., Morcos, A., Shekhar, S., Goldstein, T., ... & Goldblum, M. (2023). A cookbook of self-supervised learning. arXiv. https://doi.org/10.48550/arXiv.2304.12210
* Murphy, K. P. (2023). Probabilistic machine learning: Advanced topics. MIT Press. https://probml.github.io/pml-book/book2.html
* Zhou, G., Swaminathan, S., Raju, R. V., Guntupalli, J. S., Lehrach, W., Ortiz, J., Dedieu, A., Lazaro-Gredilla, M., & Murphy, K. P. (2025). Diffusion model predictive control. Transactions on Machine Learning Research. https://openreview.net/forum?id=pvtgffHtJm