Optimal_control_AMM

# Optimal Control for Uniswap v3 Liquidity Provision This document proposes a stochastic control framework to analyze and optimize liquidity provision on Uniswap v3, a popular decentralized exchange (DEX). We focus on two key optimization problems: 1. **Optimal Stopping:** Determining the optimal time for a liquidity provider (LP) to withdraw their liquidity to maximize profit. 2. **Optimal Market Making:** Dynamically adjusting the price range and liquidity level to maximize returns over time. ## Uniswap v3 Dynamics Unlike traditional constant product market makers, Uniswap v3 allows LPs to concentrate their liquidity within a specific price range. This "concentrated liquidity" mechanism introduces complexities in the relationship between trading activity, liquidity, and price changes. ### Concentrated Liquidity An LP position in Uniswap v3 is defined by: * **Price Range:** A specified interval $[p_l, p_r]$, where $0<p_l<p_r$, within which the LP provides liquidity. * **Liquidity Level:** A fixed amount of liquidity, denoted by $L$, allocated to the specified price range. The amount of each asset held by the LP is a function of the current price $p$ and the chosen price range: \begin{split} x &= L\left\{ \left(\frac1{\sqrt p} - \frac1{\sqrt{p_r}} \right)^+ - \left(\frac1{\sqrt p} - \frac1{\sqrt{p_l}} \right)^+ \right\} = L\left\{ \left(\frac1q - \frac1{q_r} \right)^+ - \left(\frac1q - \frac1{q_l} \right)^+ \right\} \\ y &= L\left\{ \left(\sqrt p - \sqrt{p_l} \right)^+ - \left(\sqrt p - \sqrt{p_r} \right)^+ \right\} = L\left\{ \left(q - q_l \right)^+ - \left(q - q_r \right)^+ \right\} \end{split} where: * $x$: Quantity of token0 held by the LP. * $y$: Quantity of token1 (numéraire) held by the LP. * $p$: Current spot price of token1 denominated in token0. * $q = \sqrt{p}$: The root price that simplifies the expressions. In particular, $q_l = \sqrt{p_l}$ and $q_r = \sqrt{p_r}$. ## Evolution of Liquidity For an LP who actively manages their position $(L_t, [p_{r,t}, p_{l,t}])$ over time, the changes in their token holdings are given by: \begin{split} dx_t &= - \frac12 L_t p_t^{-3/2} \pmb{1}_{[p_l, p_r]}(p_t) dp_t = - L_t q_t^{-2} \pmb{1}_{[q_l, q_r]} dq_t \\ dy_t &= \frac12 p_t^{-1/2} L_t \pmb{1}_{[p_l, p_r]}(p_t) dp_t = L_t \pmb{1}_{[q_l, q_r]}(q_t) dq_t \end{split} where: * $x_t$, $y_t$: Quantities of token0 and token1 held by the LP at time $t$, respectively. * $p_t$: Spot price at time $t$. * $q_t = \sqrt{p_t}$: The root price at time $t$ These equations describe how the LP's token holdings evolve as the price fluctuates within their active liquidity range. In particular, $p_t$ and $q_t$ need to be of finite variation. <font color=red>Remark. This analysis adopts the perspective of the liquidity provider (LP), not the Automated Market Maker (AMM) itself. Consequently, terms involving changes in the liquidity level ($dL_t$) do not appear in the LP's wealth dynamics. This is because any adjustments to the LP's liquidity involve transferring assets between the pool and the LP's own wallet, which does not directly affect the LP's overall wealth.</font> ### Incorporating Fees Uniswap v3 charges a fee, denoted by $1 − \gamma$, on every trade. This fee is collected from the trader and distributed to LPs. The fee-adjusted dynamics of the LP's position are: \begin{split} dx_t &= \frac{1}{2} L_t p_t^{-3/2} \pmb{1}_{[p_l, p_r]}(p_t) \left[ \gamma^{-1} dp^-_t - dp^+_t \right] = L_t q^{-2}_t \pmb{1}_{[q_l, q_r]}(q_t) [\gamma^{-1} dq_t^- - dq_t^+], \\ dy_t &= \frac{1}{2} L_t p_t^{-1/2} \pmb{1}_{[p_l, p_r]}(p_t) \left[ \gamma^{-1} dp^+_t - dp^-_t \right] = L_t \pmb{1}_{[q_l, q_r]}(q_t) [\gamma^{-1} dq_t^+ - dq^-_t], \end{split} where: * $dp^+_t$ (resp. $dq^+_t$): Positive changes in price (in $p$ and $q$ respectively). * $dp^-_t$ (resp. $dq^-_t$): Negative changes in price (in $p$ and $q$ respectively). The inclusion of fees amplifies the changes in the LP's token holdings when the price moves favorably due to the fees collected. ## Optimal Stopping for Uniswap v3 Liquidity Provision This section analyzes the optimal time for a liquidity provider (LP) to withdraw their liquidity from a Uniswap v3 pool, maximizing their profit. We assume the LP employs a passive strategy, depositing liquidity once and withdrawing it all at once at a time of their choosing. ### Scenario Consider an LP who initiates a liquidity position $(L, [p_l, p_r])$ (or equivalently, $(L, [q_l, q_r])$) in the transformed price space) at time 0 and aims to determine the optimal time to exit this position. The LP is assumed not to adjust his liquidity provision until closing his position all at once at the time of his choice. ### Price and Liquidity Dynamics The following system of stochastic differential equations (SDEs) governs the price and liquidity dynamics: \begin{split} \frac{d S_t}{S_t} &= \mu dt + \sigma dW_t \\ \frac{d Q_t}{Q_t} &= (u^+_t - u^-_t) dt \\\ u^+_t &= \kappa \left(\ln \gamma - \ln \frac{S_t}{P_t} \right)^+ + \beta^+ = \kappa \left(\ln \gamma - \ln S_t + 2 \ln Q_t \right)^+ + \beta^+ \\ u^-_t &= \kappa \left(- \ln \gamma - \ln \frac{S_t}{P_t} \right)^- + \beta^- = \kappa \left(-\ln \gamma - \ln S_t + 2 \ln Q_t \right)^- + \beta^- \\ df_t &= L \frac{1-\gamma}{\gamma} \pmb{1}_{[q_l, q_r]}(Q_t) \left(S_t Q_t^{-2} u^+_t + u^-_t \right) Q_t dt \\ &= L \frac{1-\gamma}{\gamma} \pmb{1}_{[q_l, q_r]}(Q_t) \left( \kappa S_t Q_t^{-1} \left( \ln \gamma - \ln S_t + 2 \ln Q_t \right)^+ + \kappa Q_t \left(-\ln \gamma - \ln S_t + 2 \ln Q_t \right)^- + \beta^+ S_t Q_t^{-1} + \beta^- Q_t \right) dt \end{split} where * $S_t$: Fair market price of the asset, modeled as a Brownian motion with drift $\mu$ and volatility $\sigma$. * $Q_t = \sqrt{P_t}$: Root price of the asset within the Uniswap v3 pool. * $u^+_t$ and $u^-_t$: Buy and sell order flow rates, respectively, incorporating arbitrage activity. * $\kappa>0$: Speed of arbitrage order flow. * $\beta^+$ and $\beta^-$: Baseline buy and sell order flow from noise traders. * $f_t$: Cumulative fees earned by the LP up to time $t$. #### Model Intuition This model aims to capture the interplay between the fair price $S_t$, pool price $Q_t$, and order flow $u^+_t$, $u^-_t$. The log-transformed price ratio, $$R_t := \ln (S_t / P_t) = \ln S_t - 2 \ln Q_t$$, is designed to approximate an Ornstein-Uhlenbeck process mean-reverting to the no-arbitrage interval $[\ln \gamma, -\ln \gamma]$. Specifically, the model incorporates: * **Mean Reversion:** The drift of $R_t$ pushes it towards the no-arbitrage interval, where arbitrage is limited due to fees. * **Arbitrage Activity:** Significant buy (sell) orders are triggered when the pool price deviates above (below) the fair price, adjusted for fees. * **Noise Trading:** Baseline order flow $(\beta^+, \beta^-)$ represents trading activity unrelated to arbitrage. #### Connection to G3Ms Growth Rate This model aligns with the findings of the "G3Ms growth rate" paper, where the pool price exhibits finite variation and is driven by the local times of the fair price process. As the arbitrage speed $\kappa$ increases, the pool price should closely track the fair price, with the buy/sell order flow rates converging to the upward/downward local times of the fair price process at the boundaries of the no-arbitrage interval. ### Optimal Stopping Problem The LP's objective is to find the optimal stopping time $\tau$ that maximizes their expected terminal wealth: $$ \max_{\tau \in \mathcal{A}_{0, T}} \ \mathbb{E} \left[g(S_{\tau}, Q_{\tau}) + \int_0^{\tau} h(t, S_t,Q_t) dt \right] \\ $$ where: * $\mathcal{A}_{t, T}$: Set of admissible stopping times. * $g(s,q)$: Terminal wealth function with a liquidation cost $c$: $$g(s,q) = cs L\left\{\left(\frac1q - \frac1{q_r}\right)^+ - \left(\frac1q - \frac1{q_l}\right)^+\right\} + L\left\{\left(q - q_l\right)^+ - \left(q - q_r\right)^+\right\}$$ * $h(t, s,q)$: Fees earning rate by the LP at time $t$: $$h(s,q) = L \frac{1-\gamma}{\gamma} (s q^{-1} u^+ + q u^-) \pmb{1}_{[q_l, q_r]}(q)$$ #### Hamilton-Jacobi-Bellman (HJB) Equation The optimal stopping problem can be analyzed using the HJB equation: $$ \begin{cases} \max \left\{v_t + \mathcal{L}v + h, g-v \right\} = 0, \ (t,s,q) \in [0, T) \times \mathbb{R}^2_+ \\ v(T,s,q) = g(s,q) \end{cases} $$ where: * $v(t,s,q)$: Value function, representing the maximum expected wealth achievable starting from state $(s,q)$ at time $t$. * $\mathcal{L} v(t,s,p,x,y)$: Infinitesimal generator of the state process, given by: \begin{split} \mathcal{L} v(t,s,q) = \frac12 \sigma^2 s^2 v_{ss} &+ \mu s v_s + (u^+_t - u^-_t) q v_q \end{split} This HJB equation characterizes the value function, which in turn determines the optimal stopping rule. ### Penalty Approximation To solve the HJB equation, we employ a penalty approximation method, as outlined in Dai-Sun-Xu-Zhou. This involves considering the following penalized version of the HJB equation: $$ (*) \begin{cases} v_t + \mathcal{L}v + h + K(g-v)^+ = 0, \ (t,s,q) \in [0, T) \times \mathbb{R}^2_+ \\ v(T,s,q) = g(s,q) \quad \end{cases} $$ where * $K>0$: Penalty factor, a large positive constant. * $(g-v)^+ = \max(g-v, 0)$: Positive part of the difference between the terminal wealth function g and the value function $v$. This penalty method effectively approximates the original HJB equation by introducing a penalty term, $K(g−v)^+$, that discourages the value function from falling below the terminal wealth function. This term essentially penalizes any violation of the constraint $v \geq g$. #### Properties * **Convergence:** As the penalty factor $K$ increases, the solution to the penalized equation $(∗)$ converges to the solution of the original HJB equation. This is because a larger K imposes a stronger penalty, forcing the solution of $(∗)$ to stay closer to the constraint $v \geq g$. * **Stochastic Control Interpretation:** The penalized equation $(∗)$ can be reinterpreted as a stochastic control problem. By introducing a control variable $u \in \{0,1\}$, we can rewrite $(∗)$ as: $$v_t + h + \max_{u \in \{0,1\}} \{ \mathcal{L}v - Kuv + Kug \} = 0$$ This formulation highlights the connection between the penalty approximation and a controlled process where the control $u$ determines whether to stop or not. ### Entropy Regularization of the Penalty Approximation To further enhance the numerical stability and efficiency of the penalty approximation, we introduce an entropy regularization technique. This involves randomizing the control variable $$ v_t + \mathcal{L}v + \max_{\pi \in [0,1]} \left\{ \pi K(g-v) - \frac{1}{\beta}\left[\pi\ln\pi + (1-\pi)\ln(1 - \pi)\right] \right\} = 0, $$ where: * $\pi$: Randomized control variable, representing the probability of applying the penalty. * $\beta>0$: Regularization parameter controlling the strength of the entropy penalty. The entropy penalty term, $\frac{1}{\beta}\left[\pi\ln\pi + (1-\pi)\ln(1 - \pi)\right]$, encourages exploration by penalizing deterministic policies (where $\pi$ is close to 0 or 1). This promotes smoother solutions and can improve the convergence of numerical methods. #### Derivation of the Semilinear PDE The maximization problem in the rHJB equation can be solved analytically. The maximum value is achieved at: $$ \pi^* = \frac{e^{\beta K(g-v)}}{1 + e^{\beta K(g-v)}} $$ and the corresponding maximum value is: $$ \frac{1}{\beta} \ln\left\{ e^{\beta K(g-v)} + 1 \right\} $$ Substituting this back into the rHJB equation, we obtain the following semilinear PDE: $$ v_t + \mathcal{L}v + \frac{1}{\beta} \ln\left\{ e^{\beta K(g-v)} + 1 \right\} = 0, $$ with the terminal condition $v(T, s, q) = g(s, q)$. #### Convergence to the Original Problem The solution to this regularized problem converges to the solution of the original optimal stopping problem as the penalty factor K and the regularization parameter $\beta$ tend to infinity. More precisely, the value function of the original problem is given by the double limit: $$ \lim_{K\to\infty}\lim_{\beta\to\infty} v_{K, \beta}(t, s, q), $$ where $v_{K, \beta}$ denotes the solution to the rHJB equation with parameters $K$ and $\beta$. The optimal stopping time is determined by the boundary where the value function equals the terminal wealth function: $$ \lim_{K\to\infty}\lim_{\beta\to\infty} v_{K,\beta}(t,s,q) = g(s,q). $$ ### Numerical Scheme To solve the semilinear PDE derived from the entropy-regularized penalty approximation, we employ a numerical scheme. This involves discretizing the PDE over a finite domain and applying an appropriate numerical method. #### Feasible Domain * **Reference Market Price $s$:** While theoretically $s \in (0,\infty)$, we can restrict the domain to a finite interval around the initial price $s_0$ without significant loss of accuracy. A reasonable choice is an interval covering approximately 6 standard deviations of the price process over the time horizon $[0,T]$. This leads to: * If $\mu - \frac{\sigma^2}{2} > 0$ (upward drift): $s \in [s_0 e^{-6 \sigma T}, s_0 e^{(\mu - \frac{\sigma^2}{2} + 6 \sigma) T}]$ * If $\mu - \frac{\sigma^2}{2} \leq 0$ (downward or no drift): $s \in [s_0 e^{(\mu - \frac{\sigma^2}{2} -6 \sigma) T}, s_0 e^{6 \sigma T}]$ * In practice, it suffices to choose $s \in [s_0 (1+k), s_0(1-k)]$ with $k$ large enough. * **Transformed Pool Price $q$:** The transformed pool price $q$ should remain in a neighborhood of $\sqrt{s}$. We define the domain for $q$ similarly to that of s, ensuring it covers a sufficiently wide range around the initial pool price and the LP's price range $[q_l,q_r]$: * If $\mu - \frac{\sigma^2}{2} > 0$ (upward drift): $q \in [s_0 e^{-3 \sigma T}, s_0 e^{\frac12 (\mu - \frac{\sigma^2}{2} + 6 \sigma) T}]$ * If $\mu - \frac{\sigma^2}{2} \leq 0$ (downward or no drift): $q \in [s_0 e^{\frac12 (\mu - \frac{\sigma^2}{2} - 6 \sigma), s_0 T} e^{3 \sigma T}]$ * In practice, it suffices to choose $q \in [s_0 (1+ \frac{k}2), s_0(1-\frac{k}2)]$ with $k$ large enough. #### Boundary Conditions * **Boundary Conditions for $v(t,s,q)$:** We apply Dirichlet boundary conditions, setting the value function $v(t,s,q)$ equal to the terminal wealth function $g(s,q)$ at the boundaries of the domain for $s$ and $q$. This reflects the fact that the LP would optimally liquidate their position if the price reaches the edge of the considered domain. * **Terminal Condition:** The terminal condition is given by $v(T,s,q) = g(s,q)$ for all $(s,q)$ in the domain. #### Parameter Values For numerical experiments, we suggest the following parameter values, inspired by historical data for ETH: * **Drift $\mu$:** Hourly drift values in the range $\mu \in [-0.01, 0.01]$. * **Volatility $\sigma$:** Hourly volatility values in the range $\sigma \in [0.002,0.008]$. * **Arbitrage Speed $\kappa$:** Chosen such that $\sigma / \sqrt{\kappa} < \gamma$ to ensure reasonable price deviations between the pool and the reference market. * **Initial Reference Price $s_0$:** Normalized to $s_0 =1$. * **Initial Pool Price $q_0$:** Within a narrow range around the reference price: $q_0 \in [\frac1{\sqrt{1+\gamma}}, \frac1{\sqrt{1-\gamma}}]$. * **Fee Tier $\gamma$:** A representative value of $\gamma = 0.0005$ (5 bps) * **LP Price Range $[q_l,q_r]$:** Defined as $[q_l, q_r] = [\frac1{\sqrt{1 + k\gamma}}, \frac1{\sqrt{1 - k\gamma}}]$, where $k$ controls the width of the range (e.g., $0 < k \leq 10$). * **Liquidity Level $L$:** Set to $L=1$ for simplicity. * **Liquidation Cost $c$:** Set to $c = 1-\gamma$. These values provide a starting point for exploring the model's behavior and the impact of different parameters on the optimal stopping strategy. ## Optimal Market Making for Uniswap v3 Liquidity Provision This section tackles the challenge of optimal market making in Uniswap v3. We analyze an LP who actively manages their liquidity provision to maximize their expected risk-adjusted terminal wealth. This dynamic strategy contrasts with the passive "set and forget" approach examined in the previous section. ### LP Strategies The LP can control their capital through two key actions: * **Liquidity Provision:** The LP can choose to provide liquidity $L_t = \ell_t L$, where $\ell_t \in \{0,1\}$ represents their active state: * $\ell_t = 1$: Actively provide liquidity, concentrating capital around the current pool price. * $\ell_t = 0$: Temporarily withdraw liquidity from the pool. * **Risk Management:** The LP implements two risk mitigation measures: * **Price Range Constraint:** A predefined price range $[q_l, q_r]$ in the transformed price space limits risk. If the root price $q_t$ hits either boundary, the LP liquidates their entire position (stopping time $\tau^1_t$). * **Inventory Constraint:** A maximum absolute inventory level M for the risky asset controls exposure. If the inventory $X_t$ reaches $\pm M$, the LP liquidates their position (stopping time $\tau^2_t$). The overall liquidation time is given by $\tau_t = \min(\tau^1_t, \tau^2_t) \wedge T$ ensuring the position is closed at the earlier of the price range boundary, inventory limit, or the terminal time $T$. ### Price and Liquidity Dynamics We retain the same price dynamics from the optimal stopping problem: \begin{split} \frac{d S_t}{S_t} &= \sigma dW_t \\ \frac{d Q_t}{Q_t} &= (u^+_t - u^-_t) dt \\ u^+_t &= \kappa \left(\ln \gamma - \ln \frac{S_t}{P_t} \right)^+ + \beta^+ = \kappa \left(\ln \gamma - \ln S_t + 2 \ln Q_t \right)^+ + \beta^+ \\ u^-_t &= \kappa \left(- \ln \gamma - \ln \frac{S_t}{P_t} \right)^- + \beta^- = \kappa \left(-\ln \gamma - \ln S_t + 2 \ln Q_t \right)^- + \beta^- \end{split} However, the LP's inventory $X_t$ and cash holdings $Y_t$ now evolve dynamically based on their liquidity provision strategy: \begin{split} dX_t &= \ell_t L (\gamma^{-1} u^-_t - u^+_t) \pmb{1}_{[q_l, q_r]}(Q_t) Q_t^{-1} dt \\ dY_t &= \ell_t L (\gamma^{-1} u^+_t - u^-_t) \pmb{1}_{[q_l, q_r]}(Q_t) Q_t dt \end{split} ### Optimal Market Making Problem The LP aims to maximize their expected risk-adjusted terminal wealth, considering both the final portfolio value and a penalty for holding inventory: $$ \max_{\ell \in \mathcal{A}_{0, T}} \mathbb{E} \left[ g(S_\tau, X^{\ell}_{\tau}, Y^{\ell}_{\tau}) + \int_0^{\tau} h(t,S_\tau,X_\tau,Y_\tau) dt \right] $$ where: * $\ell = (\ell_t)_{0 \leq t \leq T}$: The LP's dynamic liquidity provision strategy. * $g(s,x,y) = y + (1 - \gamma)s \{x\}^+ - (1 + \gamma)s \{x\}^-$: Terminal wealth function with liquidation cost $c$. * $h(t,s,x,y) = - \phi x^2$: Inventory penalty function with parameter $\phi$ reflecting inventory risk aversion. * $X^{\ell}_t$, $Y^{\ell}_t$: Inventory and cash holdings under strategy $\ell$. * $\mathcal{A}_{0, T}$: Set of admissible strategies. #### Hamilton-Jacobi-Bellman (HJB) Equation The optimal market-making problem can be expressed as an HJB equation: $$ \begin{cases} v_t + \max_{\ell \in \{0,1\}} \mathcal{L}^{\ell}v + h = 0, \ (t,s,q,x,y) \in [0, T) \times \mathbb{R}^2_+ \times \mathbb{R}^2 \\ v(T,s,q,x,y) = g(s,x,y) \\ v(t,s,q_l,x,y) = v(t,s,q_r,x,y) = g(s,x,y) \\ v(t,s,q,M,y) = g(s,M,y) \\ v(t,s,q,-M,y) = g(s,-M,y) \end{cases} $$ where: * $v(t,s,q,x,y)$: The value function, representing the maximum expected risk-adjusted wealth achievable starting from state $(s,q,x,y)$ at time $t$. * $\mathcal{L}^{\ell}$: The infinitesimal generator of the state process, given by: \begin{split} \mathcal{L}^{\ell} v(t,s,q,x,y) &= \frac12 \sigma^2 s^2 v_{ss} + (u^+ - u^-) q v_q \\ &+ \ell L \left\{ (\gamma^{-1} u^- - u^+) q^{-1} v_x + (\gamma^{-1} u^+ - u^-) q v_y \right\} \end{split} #### Optimal Control Using the fact that $\max_{\ell\in\{0,1\}} \{\ell X\} = X_+$ for any $X \in \mathbb{R}$, the HJB equation can be rewritten as: \begin{split} 0 = v_t &+ \frac12 \sigma^2 s^2 v_{ss} + (u^+ - u^-)q v_q - \phi x^2 \\ &+ L \left\{(\gamma^{-1} u^- - u^+) q^{-1} v_x + (\gamma^{-1} u^+ - u^-) q v_y \right\}_+ \end{split} The optimal control $\ell^*$ is then given in feedback form by: $$ \ell^* = \pmb{1}_{ (\gamma^{-1} u^- - u^+) q^{-1} v_x + (\gamma^{-1} u^+ - u^-) q v_y >0} $$ #### Ansatz To further simplify the HJB equation, we employ the following ansatz: $$ v(t,s,q,x,y) = y + c x s + w(t,s,q,x) $$ This ansatz decomposes the value function into * portfolio value $y + c x s$, * and an additional component $w(t,s,q,x)$ that captures the value of optimally controlling liquidity provision. Substituting this ansatz into the HJB equation leads to a simplified equation for $w(t,s,q,x)$: \begin{split} 0 = w_t &+ \frac12 \sigma^2 s^2 w_{ss} + (u^+ - u^-) q w_q - \phi x^2 \\ &+ L \left\{(\gamma^{-1} u^- - u^+) (w_x + cs) q^{-1} + (\gamma^{-1} u^+ - u^-) q \right\}_+ \end{split} with the terminal and boundary conditions: $$ \begin{cases} w(T,s,q,x) = 0\\ w(t,s,q_l,x) = w(t,s,q_r,x) = 0 \\ w(t,s,q,M) = 0 = w(t,s,q,-M) = 0 \end{cases} $$ The optimal control can then be expressed as: $$ \ell^* = \pmb{1}_{(\gamma^{-1} u^- - u^+) (w_x + cs) + q^2 (\gamma^{-1} u^+ - u^-) > 0} $$ ### Entropy Regularized Control To enhance the numerical stability and efficiency of solving the optimal market-making problem, we introduce an entropy-regularized version. This involves considering a randomized control strategy and adding an entropy penalty term to the objective function: $$ \max_{\pi\in\mathcal{A}[0, \tau]} \mathbb{E} \left[g(S_\tau, X_\tau, Y_\tau) + \int_0^\tau \left\{\int h(t, S_s, X_s, Y_s) \pi_s(\ell)d\ell + \frac1\beta H(\pi_s) \right\} ds \right] $$ where: * $\pi = (\pi_t)_{0 \leq t \leq \tau}$: The LP's randomized liquidity provision strategy. At each time $t$, $\pi_t$ is a probability distribution over the action space $\{0,1\}$, representing the probabilities of providing liquidity $\ell=1$ or not $\ell=0$. * $H(\pi)$: The entropy of the distribution π, which measures the randomness or uncertainty associated with the strategy. For the binary action space, the entropy is given by: $$H(\pi) = -p\ln p - (1-p)\ln(1-p)$$ where $p = \mathbb{P}[\ell = 1]$ is the probability of providing liquidity. * $\beta>0$: The inverse temperature parameter, which controls the strength of the entropy regularization. A larger $\beta>0$ corresponds to a stronger penalty for deterministic strategies. The entropy regularization encourages exploration by penalizing strategies that are too deterministic (i.e., those where $\pi_t$ puts all the probability mass on a single action). This can lead to smoother solutions and improved numerical stability. #### Regularized HJB (rHJB) Equation The value function for the entropy-regularized problem is defined as: $$ v(t, s, q, x, y) := \max_{\pi\in\mathcal{A}[t, \tau]} \mathbb{E}_{t,s,q,x,y} \left[g(S_\tau, X_\tau, Y_\tau) + \int_t^\tau \left\{\int h(s, S_s, X_s, Y_s) \pi_s(\ell) d\ell + \frac1\beta H(\pi_s) \right\} ds \right] $$ The dynamic programming principle leads to the following rHJB equation: \begin{split} 0 =& v_t + \frac12 \sigma^2 s^2 v_{ss} + (u^+ - u^-)q v_q - \phi x^2 \\ & + L \max_{\pi} \left\{\mathbb{E}_{\pi}\left[\ell (\gamma^{-1} u^- - u^+) q^{-1} v_x + \ell (\gamma^{-1} u^+ - u^-) q v_y \right] + \frac1\beta H(\pi) \right\} \end{split} subject to the boundary conditions: $$ v(t,s,q,x,y) = y + c x s \quad \text{if } q = q_l, \ q = q_r, x=M, x=-M \text{ or } t=T $$ Solving the maximization problem within the rHJB equation, we find that the optimal probability $p^*$ is given by: \begin{split} & \mathbb{E}_{\pi}\left[\ell (\gamma^{-1} u^- - u^+) q^{-1} v_x + \ell (\gamma^{-1} u^+ - u^-) 1 v_y\right] + \frac1\beta H(\pi) \\ =& \max_p \left\{ p \left[ (\gamma^{-1} u^- - u^+) q^{-1} v_x + (\gamma^{-1} u^+ - u^-) q v_y \right] - \frac1\beta \left( {p\ln p + (1-p)\ln(1-p)} \right) \right\} \end{split} Since the last equation is concave in $p$, the first-order condition implies that the unique maximum is attained at $p^*$ with: $$ p^* = \frac{e^{\beta G[v]}}{1 + e^{\beta G[v]}} = \frac12 \left\{1 + \tanh\left( \frac\beta2 G[v] \right) \right\} $$ with $$ G[v] = (\gamma^{-1} u^- - u^+) q^{-1} v_x + (\gamma^{-1} u^+ - u^-) q v_y $$ Substituting this optimal control back into the rHJB equation, we obtain the following nonlinear PDE: \begin{split} 0 =& v_t + \frac12 \sigma^2 s^2 v_{qq} + (u^+ - u^-)q v_q - \phi x^2 + \frac L \beta \ln (1 + e^{\beta G[v]}). \end{split} ##### Smoothing Effect The entropy regularization effectively smooths the control problem. As $$ \lim_{\beta\to\infty} \frac1\beta \ln (1 + e^{\beta G[v]}) = \left\{\begin{array}{ll} G[v] & \mbox{ if } G[v] > 0 \\ 0 & \mbox{ if } G[v] \leq 0 \end{array}\right. = (G[v])_+, $$ the entropy term converges to $(G[v])_+$, recovering the original HJB equation. This smoothing can improve the analytical and numerical tractability of the problem. ### Numerical Scheme To solve the nonlinear PDE derived from the entropy-regularized optimal market-making problem, we utilize a numerical scheme. This involves discretizing the PDE over a defined domain and applying a suitable numerical method. #### Feasible Domain To implement a numerical solution for the optimal market-making problem, we need to define a suitable domain for the state variables: * **Reference Market Price $s$** and **Transformed Pool Price $q$:** We use the same domain for $s$ as defined in the optimal stopping problem. * **Inventory $x$:** The inventory is restricted to the bounded interval $x \in [−M,M]$, reflecting the inventory constraint in the LP's risk management strategy. #### Boundary Conditions Accurate and stable numerical solutions necessitate careful consideration of boundary conditions. * **Boundary Conditions for $w(t,s,q,x)$:** We apply Dirichlet boundary conditions by setting $w(t,s,q,x)=0$ at the boundaries of the domain for $s$, $q$, and $x$. This reflects the fact that the LP liquidates their position at these boundaries due to price or inventory constraints, eliminating any further option value from continued liquidity provision. * **Terminal Condition:** The terminal condition $w(T,s,q,x)=0$ is enforced for all $(s,q,x)$ within the feasible domain. This signifies that no option value remains at the terminal time. #### Parameter Values For numerical experiments and to illustrate the optimal market-making strategy, we propose the following parameter values: * **Consistent with Optimal Stopping:** We maintain the same values for$\mu$, $\sigma$, $\kappa$, $\gamma$, $s_0$, $q_0$, $[q_l, q_r]$ and $L$ as in the optimal stopping problem. This allows for direct comparison and highlights the impact of dynamic liquidity management. * **Initial Conditions:** The LP starts with no initial inventory and zero cash: $x_0 = y_0 = 0$. * **Inventory Constraints:** The inventory bound is set to $M=10$, reflecting the LP's risk tolerance. * **Inventory Risk Parameters:** The inventory cost parameter is set to $\phi=0.01$. ## Reference * Dai, Min and Sun, Yu and Xu, Zuo Quan and Zhou, Xunyu, Learning to Optimally Stop Diffusion Processes, with Financial Applications (September 08, 2024). http://dx.doi.org/10.2139/ssrn.4928749 * Guilbaud, F., & Pham, H. (2015). Optimal high-frequency trading in a pro rata microstructure with predictive information. Mathematical Finance, 25(3), 545–575. https://doi.org/10.1111/mafi.12042 * Cartea, Á., Jaimungal, S., & Penalva, J. (2015). Algorithmic and high-frequency trading. Cambridge University Press. * Guéant, O. (2017). Optimal market making. Applied Mathematical Finance, 24(2), 112–154. https://doi.org/10.1080/1350486X.2017.1342552 * Wang, M., & Wang, T. H. (2023). Relative entropy-regularized robust optimal order execution. arXiv preprint arXiv:2311.06476.