Nonparametric Portfolio Selection

# Nonparametric Portfolio Selection *Generalizing Breeden-Litzenberger for Nonparametric Portfolio Selection through Multivariate Probability Density Estimation via High-Dimensional Options Tomography for Expected CRRA Utility Maximization* ## Preamble This work isn't necessarily intended to be a suggestion for a profitable trading strategy. If I were specifically optimizing for profitability, I would structure the proposed model differenly, and I likely wouldn't circulate a document outlining the details. The motivation here is aesthetic. My intention is to build a purely forward-looking, nonparametric model of asset returns that makes no distributional or stationarity-related assumptions. ## Introduction The history of portfolio selection roughly begins with the seminal work of Harry Markowitz (1952) and his mean-variance optimization framework. We continue to Sharpe's CAPM (1964), the Fama-French Three-Factor Model (1993), and a host of others, all of which make strong assumptions about the distribution of asset returns, particularly that returns are Gaussian stochastic processes. While these models have been important contributions to the field of finance, a persistent criticism has been their assumption of normally distributed returns. Figures like Mandelbrot and Taleb have championed the idea that financial returns exhibit fat tails, suggesting that extreme events occur more frequently than Gaussian distributions would suggest. Various patches for capturing the fat-tailed nature of returns have been tried, like jump-diffusion models, Lévy stable distributions, or making the parameters of normal distributions themselves stochastic—such as in stochastic volatility models like GARCH. However, all of these models are parametric; they rely on specific assumptions about the distribution of asset returns. What I attempt in this work is to build a portfolio selection framework in a purely nonparametric, model-free way. I do not want to make assumptions about the underlying probability distribution of the data. Parametric methods, like Brownian motion and Black-Scholes, assume a specific distribution (e.g., Gaussian distributions). I want to bring as few assumptions into this model as possible—to do proper inference and let the data tell us what it wants to tell us. Instead of assuming a particular mathematical form for the distribution, I'm interested in nonparametric procedures for reconstructing probability densities directly from options market data. Most nonparametric models rely on historical data to approximate the typical distribution of asset returns, which perhaps yields more accurate-than-Gaussian estimations for how prices will behave, but still assumes a degree of stationarity in the random variables we're analyzing. That is, under these models, we expect assets to behave in the future the way they've behaved in the past. In this paper, I don't want to use historical data in my analysis or assume stationarity in my model. This is the second fundamental constraint guiding the following exploration. Now, to optimally select a portfolio, we need to construct one that is optimal with respect to some objective function. In this paper, we maximize expected utility. I'll discuss the specific utility function later, but before that, in order to compute expected utility, we need a distribution to compute the expectation over. Suppose I had a good estimate of the joint probability density function over all assets in our universe for the next period. Then I'd be able to optimize portfolio weights that maximize our expected (CRRA) utility. This is the portfolio I want. So how do we get an estimate for the joint probability distribution over all assets in our universe? We can impute marginal risk-neutral probability density functions for each asset in our universe using Breeden-Litzenberger and then use Steve Ross's Recovery Theorem to back out the natural probability distribution. Then we could use a copula to represent the dependency structure between the assets, but this introduces model assumptions, and I think we can do better. If we look at basket options and swap options, we can get more information on the dependencies. ## Developing the Model ### Defining Optimality How can I come up with an optimal portfolio? What exactly does it mean to have an optimal portfolio? Optimal with respect to what? We have to optimize it with respect to some objective (utility) function. We're optimizing a vector of portfolio weights with respect to an explicit utility function—what are the constraints? We bound the set of solutions to the polytope described by the maximum and minimum position sizes. Let's say our objective function is to maximize the geometric growth rate of our portfolio, which is equivalent to maximizing our log wealth—the utility function underlying what Kelly calls the "growth optimal portfolio." We can extend this to CRRA utility functions and beyond, but log wealth is an easier place to start. Now, how do we handle the *expected* part of the expected utility maximization problem? Well, the expectation must be computed over a probability distribution (of utility, i.e., log wealth). How do we impute a probability distribution of log wealth given a set of portfolio weights? We need to estimate a joint probability density function over the assets in our universe. Given a joint pdf for the returns of the assets in our universe, a set of position size constraints, an objective function, and a sufficiently powerful optimizer, we can compute an optimal portfolio allocation with respect to our expected utility. ### Mathematical Formalism Let's add a bit more formalism. We're looking for an optimal vector of portfolio weights $\mathbf{w}$ where the weight of asset $i$ is denoted as $w_i$ over a universe of $k$ assets. We suppose we have a starting bankroll $B_0$ and a final bankroll $B_T$ at time $T$. We optimize our portfolio allocation $\mathbf{w}$ with respect to an expected utility represented as $E[U(B_T|B_0)]$. That is, given a starting bankroll, we want to maximize our expected log wealth at the end of the period: $$ \underset{\mathbf{w}}{\max} \, E[\log(B_T)]. $$ Assuming no leverage, our starting bankroll is fully invested: $$ B_0 = \sum_{i=1}^{k} w_i B_0. $$ We consider asset $i$'s returns $r_i$ to be the log returns: $$ r_i = \ln\left(\frac{S_{T,i}}{S_{0,i}}\right), $$ where $S_{T,i}$ is the price of asset $i$ at time $T$, and $S_{0,i}$ is the initial price. The value of each position at time $T$ will be the initial investment in that asset times its growth factor: $$ e^{r_i} w_i B_0. $$ Therefore, the final bankroll is: $$ B_T = \sum_{i=1}^{k} e^{r_i} w_i B_0 = B_0 \sum_{i=1}^{k} e^{r_i} w_i. $$ Our optimization problem becomes: $$ \underset{\mathbf{w}}{\max} \, E\left[\log\left(B_0 \sum_{i=1}^{k} e^{r_i} w_i\right)\right]. $$ Since $\log(B_0)$ is constant with respect to $\mathbf{w}$, we can simplify: $$ \underset{\mathbf{w}}{\max} \, E\left[\log\left(\sum_{i=1}^{k} e^{r_i} w_i\right)\right]. $$ This is our objective function. ### The Need for the Joint Probability Distribution To solve this expectation, we note that $r_i$ is a random variable with probability density function $f(r_i)$. Crucially, the returns of each asset are not independent, so we can't treat each pdf individually. What we're really looking for is the joint probability density function $f_{\mathbf{r}}(\mathbf{r})$, where $\mathbf{r} = (r_1, r_2, \ldots, r_k)$. When we set up our optimization problem, we also need a set of constraints. We know that the weights must sum to 1: $$ \sum_{i=1}^{k} w_i = 1. $$ We also use the position size limits to form a polytope bounding our solution set: $$ w_{\text{min}} \leq w_i \leq w_{\text{max}}, $$ where $w_{\text{min}}$ and $w_{\text{max}}$ are the minimum and maximum position sizes, respectively. ### Challenges with Marginal Distributions Why don't the marginal distributions fully determine the joint distribution? Essentially, we can think of the marginal distributions as the range of values that AAPL might be next month, and the range of values BRK might be next month, but not necessarily the probabilities that AAPL is at $a$ and simultaneously BRK is at $b$. If the prices of AAPL and BRK were independent, then $(A = a, B = b) = p(A = a) p(B = b)$. In reality, Berkshire Hathaway literally owns some Apple stock, so they're clearly going to have some dependence. Here's a more visual example of why we can't reconstruct the joint distribution from the marginals. Imagine a two-dimensional surface like a hill, which in our case represents the joint pdf of AAPL and BRK. The marginal pdf for AAPL would be like the shadow of the hill against one wall, and the BRK marginal pdf would be the shadow on another wall. These shadows are helpful for letting us understand the shape of the hill, but they don't give us all the information required to uniquely describe the hill. For instance, if the shadows are triangles, the hill could be a pyramid or a cone—both cases create triangular shadows, but have different shapes. ![Visualization of Marginal Distributions](https://hackmd.io/_uploads/rym1yzVfR.jpg) ### Why Not Use Copulas or Kernel Density Estimation? For estimating the joint probability distribution of asset returns without relying on historical data or making parametric assumptions, two tools come to mind: copulas and kernel density estimation (KDE). While both are helpful for constructing joint distributions, their constraints make them unsuitable given the aesthetic limitations we imposed of requiring a completely forward-looking model. #### Copulas Copulas couple multivariate distribution functions to their one-dimensional marginal distribution functions. They allow us to construct a joint distribution by "stitching together" the marginal distributions of individual assets while capturing the dependence structure between them. Using parametric copulas introduces specific assumptions about the dependency structure. For example, a Gaussian copula assumes linear correlations and normality in the joint distribution, which may not accurately reflect the complexities and tail dependencies observed in financial markets. This conflicts with our goal of building a fully nonparametric, model-free system that infers the distribution entirely from forward-looking empirical data. Nonparametric copula estimation techniques offer more flexibility by not imposing a predefined functional form on the dependency structure. Empirical copulas, for example, model the dependence structure based on observed data without assuming a particular parametric model. However, these methods still rely heavily on historical data to estimate the copula, implicitly assuming that past dependency patterns will persist into the future. This reliance introduces an assumption of stationarity, I don't want to assume will hold. While in practice rigorous backtesting could help provide insights into the stability of these structures, our objective is to develop a forward-looking model that doesn't depend on historical patterns, so we proceed. #### Kernel Density Estimation (KDE) Multivariate kernel density estimation (KDE) is a nonparametric method for estimating the probability density function of a random variable. This approach places a kernel function at each data point, summing these functions to approximate the underlying density. By doing so, KDE attempts to model complex distributions without assuming a specific functional form, making it useful in scenarios where we want to avoid imposing rigid distributional assumptions. The primary advantages of KDE lie in its flexibility and nonparametric nature. KDE does not require any predefined structure for the distribution, which allows it to capture complex, multimodal behavior in the data. Additionally, because it doesn’t rely on a fixed set of parameters that define the distribution, KDE aligns well with our objective of minimizing assumptions. This flexibility is especially valuable when the data exhibits patterns that might not fit traditional distributional forms, allowing for a more nuanced representation of the underlying distribution. However, KDE also has notable limitations. A critical challenge is the selection of the kernel function and bandwidth parameter, which introduces subjectivity and potential bias. The bandwidth, for example, determines the smoothness of the density estimate: a larger bandwidth yields a smoother, potentially oversimplified estimate, while a smaller bandwidth captures more detail but risks overfitting to noise in the data. This requirement to choose a parametrized kernel and bandwidth somewhat detracts from KDE's fully nonparametric claim, introducing assumptions we prefer to avoid. Furthermore, KDE relies on historical data for estimating the density, which implicitly assumes that past patterns will continue into the future—an assumption that conflicts with our goal of a forward-looking model. In higher dimensions, KDE also becomes computationally demanding. Cursed by dimensionality, the volume of data needed to maintain accuracy increases exponentially. To improve KDE’s application in our context, we could consider using the Hessian matrix for local density approximation. The Hessian provides information about the second-order curvature of the pdf, potentially assisting in bandwidth selection by dynamically adjusting the smoothing parameter according to the density's local characteristics. In our optimization step, which seeks to maximize expected utility over the joint pdf of asset returns, the Hessian matrix also offers valuable insight into the local curvature of the utility function, potentially speeding up convergence and improving optimization accuracy by refining step size and direction. However, utilizing the Hessian matrix still requires a sufficient amount of data for accurate computation, which reintroduces dependence on historical data. KDE presents a nonparametric method for joint pdf estimation directly from observed data. Though, its reliance on historical data, combined with the assumptions introduced through kernel and bandwidth choices, limits its suitability for our particular forward-looking, nonparametric model. While KDE is definitely a reasonable method in practice, it doesn’t fully align with our objective of building a model entirely free from parametric assumptions and historical dependence. #### Final Thoughts on Copulas and KDE Both copulas and KDE have their merits in estimating joint distributions, but neither fully satisfies our criteria of being both nonparametric and forward-looking without reliance on historical data. We want to find a method that constructs the joint probability distribution of asset returns using live, forward-looking options data, thereby avoiding the pitfalls associated with historical data dependence and parametric assumptions. This leads us to explore alternative financial instruments, such as basket options, rainbow options, and swaptions. These instruments might provide the necessary information to compute the joint pdf from current market data. For instance, basket options can offer insights into the dependencies between assets by reflecting conditional probabilities or specific "slices" of the joint distribution. By analyzing various basket options with different compositions, we could potentially piece together these slices to fully determine the joint pdf. By shifting our focus to methods that leverage current market instruments, we align more closely with our goal of developing a fully nonparametric, forward-looking model. This approach allows us to infer the necessary probability distributions directly from market expectations embedded in options prices, without resorting to historical data or introducing unwanted assumptions. ### Leveraging Basket Options and Tomography My core hypothesis is that there's a way to obtain the information we need to compute the joint pdf from live forward-looking options data—specifically by utilizing financial instruments like basket options, rainbow options, or swap options. By carefully analyzing these contracts, we can extract the necessary data without relying on historical information. [I realized after finishing this essay that we can just use ETFs which are super common and liquid. I should probably rewrite this section to incorporate from the start my final conclusion. Or maybe I should have the reader discover along side me?] #### Using Basket Options Basket options, in particular, might provide the insights we're looking for. The payoff of a basket option is defined as: $$ \text{Payoff} = \max\left(0, \sum_{i=1}^{n} w_i S_i(T) - K\right), $$ where: - $S_i(T)$ is the price of asset $i$ at expiration time $T$. - $w_i$ is the weight or proportion of asset $i$ in the basket. - $K$ is the strike price of the basket option. By analyzing basket options with different weights $\mathbf{w} = (w_1, w_2, \ldots, w_n)$ and strike prices $K$, we can infer different "slices" of the joint pdf, which can then be combined to reconstruct the full joint distribution. **Example: Dual-Asset Basket Option** Let's work through a dual-asset basket option example where our universe is {AAPL, BRK}, weighted equally in the basket. The value of this basket option can be expressed as: $$ C(K, T) = e^{-rT} \times \mathbb{E}\left[ \max\left(0, \frac{1}{2} S_{\text{AAPL}}(T) + \frac{1}{2} S_{\text{BRK}}(T) - K \right) \right], $$ where $C(K, T)$ is the price of the basket option, and $r$ is the risk-free interest rate. To integrate over a linear function of two dependent variables, we need to perform a double integral in the form: $E[X+Y] = \int_{-\infty}^{\infty} \int_{-\infty}^{\infty} (x + y) f(x, y) \, dx \, dy$. So, to compute the expected payoff in our case, we perform a double integral over the joint distribution of $S_{\text{A}}(T)$ and $S_{\text{B}}(T)$: $$ C(K, T) = e^{-rT} \int_{0}^{\infty} \int_{0}^{\infty} \max\left(0, \frac{1}{2} S_{\text{A}}(T) + \frac{1}{2} S_{\text{B}}(T) - K \right) f\left( S_{\text{A}}(T), S_{\text{B}}(T) \right) \, dS_{\text{A}}(T) \, dS_{\text{B}}(T), $$ where $f\left( S_{\text{A}}(T), S_{\text{B}}(T) \right)$ is the joint pdf of the asset prices at expiration. The payoff is positive when: $$ \frac{1}{2} S_{\text{AAPL}}(T) + \frac{1}{2} S_{\text{BRK}}(T) > K, $$ and zero otherwise. ![Screenshot 2024-05-04 at 9.50.14 PM](https://hackmd.io/_uploads/By2Mlq4GC.png) Geometrically, this inequality represents a line in the $S_{\text{A}}(T)\)–\(S_{\text{B}}(T)$ plane, dividing it into two regions: 1. **Positive Payoff Region**: Above the line, where the weighted sum exceeds the strike price $K$, resulting in a positive payoff equal to $\frac{1}{2} S_{\text{A}}(T) + \frac{1}{2} S_{\text{B}}(T) - K$. 2. **Zero Payoff Region**: Below the line, where the weighted sum is less than or equal to $K$, resulting in a payoff of zero. By analyzing the option prices across various strike prices $K$, we can extract information about the joint distribution along the line defined by the equation $\frac{1}{2} S_{\text{A}}(T) + \frac{1}{2} S_{\text{B}}(T) = K$. #### Extending Breeden-Litzenberger to Basket Options The key insight is that basket options encapsulate information about the joint distribution of their underlying assets. By extending the Breeden-Litzenberger approach to basket options with different weights and strike prices, we can infer different "slices" of the joint probability density function (pdf). These slices can then be combined to reconstruct the full joint distribution. Consider a general basket option with $n$ assets, where the payoff at expiration $T$ is given by: $$ C(K, T) = \max\left(0, \sum_{i=1}^{n} w_i S_i(T) - K\right), $$ where: - $w_i$ represents the weight of asset $i$ in the basket, - $S_i(T)$ is the price of asset $i$ at expiration $T$, - $K$ is the strike price. The price of this basket option, denoted as $C(K, T, \mathbf{w})$ where $\mathbf{w} = (w_1, \ldots, w_n)$ is the vector of weights, can be expressed as: $$ C(K, T, \mathbf{w}) = e^{-rT} \, \mathbb{E}\left[ \max\left(0, \sum_{i=1}^{n} w_i S_i(T) - K \right) \right], $$ where $r$ is the risk-free interest rate, and $\mathbb{E}[\cdot]$ denotes the expectation under the risk-neutral measure. To extend the Breeden-Litzenberger approach to this setting, we take the second partial derivative of the basket option price with respect to the strike price $K$. This yields: $$ \frac{\partial^2 C(K, T, \mathbf{w})}{\partial K^2} = e^{-rT} \, f_{\sum w_i S_i(T)}(K), $$ where $f_{\sum w_i S_i(T)}(K)$ is the pdf of the weighted sum $\sum_{i=1}^{n} w_i S_i(T)$ evaluated at $K$. This result implies that the second derivative of the basket option price with respect to the strike price gives us the density of the weighted sum of the asset prices at expiration. By observing basket option prices for different weight vectors $\mathbf{w}$ and strike prices $K$, we can numerically estimate $\frac{\partial^2 C(K, T, \mathbf{w})}{\partial K^2}$, thus obtaining slices of the joint pdf along hyperplanes defined by $\sum_{i=1}^{n} w_i S_i(T) = K$. To illustrate, suppose we fix a weight vector $\mathbf{w}$ and vary $K$. For each $K$, computing $\frac{\partial^2 C(K, T, \mathbf{w})}{\partial K^2}$ provides the density of the weighted sum at that point. By varying $\mathbf{w}$ across different directions in the asset space, we obtain densities along different hyperplanes in the $n$-dimensional space of asset prices. #### Reconstructing the Joint PDF Using Tomography To reconstruct the full joint pdf $f_{S_1, S_2, \ldots, S_n}(S_1, S_2, \ldots, S_n)$, we can employ techniques from tomography, specifically the Radon transform. The Radon transform maps a function to its integrals over hyperplanes. In our case, the Radon transform of the joint pdf corresponds to the marginal densities of the weighted sums obtained from basket options. Mathematically, the Radon transform $\mathcal{R}f$ of a function $f$ is defined as: $$ \mathcal{R}f(s, \mathbf{w}) = \int_{\sum_{i=1}^{n} w_i S_i = s} f(S_1, S_2, \ldots, S_n) \, d\Sigma, $$ where: - $s = K$ is the value of the weighted sum, - $\mathbf{w}$ is the weight vector, - $d\Sigma$ denotes the measure on the hyperplane $\sum_{i=1}^{n} w_i S_i = s$. In our case, the second derivative $\frac{\partial^2 C(K, T, \mathbf{w})}{\partial K^2}$ provides $e^{-rT} f_{\sum w_i S_i(T)}(K)$, which is directly related to the Radon transform of the joint pdf evaluated at $s = K$ and direction $\mathbf{w}$. By collecting these marginal densities for various $\mathbf{w}$ and $K$, we effectively gather projections (Radon transforms) of the joint pdf from multiple angles. Applying the inverse Radon transform allows us to reconstruct the original joint pdf $f_{S_1, S_2, \ldots, S_n}(S_1, S_2, \ldots, S_n)$ from these projections. This method is analogous to how computed tomography (CT) scans reconstruct images of internal structures by combining projections taken from different angles. In our context, the "internal structure" is the joint distribution of asset prices, and the "projections" are the marginal densities of weighted sums obtained from basket options. **Practical Steps:** 1. **Data Collection**: Collect basket option prices $C(K, T, \mathbf{w})$ for a variety of weight vectors $\mathbf{w}$ and strike prices $K$. 2. **Numerical Differentiation**: Numerically estimate the second partial derivatives $\frac{\partial^2 C(K, T, \mathbf{w})}{\partial K^2}$ with respect to $K$ for each $\mathbf{w}$. 3. **Obtain Marginal Densities**: Use the relationship: $$ \frac{\partial^2 C(K, T, \mathbf{w})}{\partial K^2} = e^{-rT} \, f_{\sum w_i S_i(T)}(K) $$ to extract the marginal pdfs $f_{\sum w_i S_i(T)}(K)$ for each $\mathbf{w}$. 4. **Reconstruct Joint PDF**: Apply the inverse Radon transform to the set of marginal pdfs $f_{\sum w_i S_i(T)}(K)$ to reconstruct the joint pdf $f_{S_1, S_2, \ldots, S_n}(S_1, S_2, \ldots, S_n)$. By following these steps, we can nonparametrically estimate the joint pdf of asset prices using forward-looking options data, without relying on historical data or imposing parametric assumptions. ### Solving the Portfolio Optimization Problem Once we have estimated the joint pdf $f_{S_1, S_2, \ldots, S_n}(S_1, S_2, \ldots, S_n)$, we can integrate it into our portfolio optimization framework. Our goal is to maximize expected utility, such as the expected logarithmic wealth. The optimization problem becomes: $$ \underset{\mathbf{w}}{\max} \, \mathbb{E}\left[ \log\left( \sum_{i=1}^{n} e^{r_i} w_i \right) \right] = \underset{\mathbf{w}}{\max} \int_{\mathbb{R}^n} \log\left( \sum_{i=1}^{n} e^{r_i} w_i \right) f_{\mathbf{r}}(\mathbf{r}) \, d\mathbf{r}, $$ subject to the constraints: - $\sum_{i=1}^{n} w_i = 1$ (weights sum to 1), - $w_{\text{min}} \leq w_i \leq w_{\text{max}}$ (position size limits). Here: - $r_i = \ln\left( \frac{S_i(T)}{S_i(0)} \right)$ is the log return of asset $i$, - $f_{\mathbf{r}}(\mathbf{r})$ is the joint pdf of the returns, derived from $f_{S_1, \ldots, S_n}(S_1, \ldots, S_n)$. This optimization problem can be approached using numerical integration and optimization techniques. The integral in the objective function may be high-dimensional, but modern computational methods—such as Monte Carlo integration or quasi-Monte Carlo methods—render it tractable. By leveraging the reconstructed joint pdf, we can accurately compute the expected utility for any portfolio weight vector $\mathbf{w}$. The optimization algorithm can then search over the feasible set defined by the constraints to find the portfolio weights that maximize the expected utility. By tying everything together, this approach allows us to nonparametrically estimate the joint pdf of asset prices using forward-looking options data, without relying on historical data or parametric assumptions. Extending the Breeden-Litzenberger framework to basket options and employing tomographic techniques like the inverse Radon transform provides a powerful method to reconstruct the joint distribution. This, in turn, enables us to solve the portfolio optimization problem in a way that is fully forward-looking and model-free. ## Conclusion By generalizing the Breeden-Litzenberger approach to basket options and employing tomographic techniques like the inverse Radon transform, we can nonparametrically estimate the joint probability density function of asset prices using forward-looking options data. This approach allows us to reconstruct the joint distribution without relying on historical data or imposing parametric assumptions. Once we have the joint pdf, we can incorporate it into our portfolio optimization framework to find the growth-optimal portfolio that maximizes expected utility. Basket options are not rare exotic derivatives; they include all fixed-weight exchange-traded funds (ETFs), which are among the most liquidly traded securities with extensive options data available. This model allows us to combine data from the options chains of individual equities and non-dynamic fixed-weight ETFs into a unified framework. By applying the generalized Breeden-Litzenberger technique to ETF options, we obtain additional slices of the joint distribution, which we can stitch together using the Radon transform to reconstruct the full joint pdf. This approach allows us to nonparametrically estimate the joint probability distribution of asset returns using forward-looking options data, without relying on historical data or imposing parametric assumptions. By leveraging options market data and advanced mathematical techniques, we can estimate the joint probability distribution of asset returns, enabling us to optimize our portfolio in a way that is both forward-looking and model-free.