LOB - HackMD

# Order-Book Trading Algorithms ## Understanding the Order Book The order book is the cornerstone of electronic financial markets, providing a real-time, transparent view of market supply and demand. It acts as a central ledger where buyers and sellers submit orders, facilitating price discovery and efficient trade execution. ### Double Auction Mechanism In a **double auction**, both buyers and sellers submit their orders to the market. Orders consist of bids from buyers and asks (or offers) from sellers, each specifying a price and quantity. The auction then matches these bids and asks based on their prices, leading to trades when compatible bids and asks are found. The key features of a double auction include: * **Multiple Buyers and Sellers**: The system accommodates numerous participants on both the buying and selling sides, enhancing liquidity and depth in the market. * **Price Discovery**: The auction determines a clearing price $P$ at which trades are executed. This price reflects the balance between supply (asks) and demand (bids) at the moment. * **Market Clearing**: Ideally, the auction clears the market by matching all bids above or equal to $P$ with all asks below or equal to $P$, optimizing the volume of trades. ### Order Book Structure * **Bids and Asks**: The order book is divided into two main sections; *bids (buy orders)* and *asks (sell orders)*, each sorted by price. The highest bid and the lowest ask are prominently displayed as they represent the best available prices. * **Depth of Market**: The order book shows the depth of the market by listing the quantity available at each price level beyond the best bid and ask. This depth is crucial for understanding the liquidity and potential price impact of large orders. * **Price Levels**: Each unique price at which orders are submitted forms a price level in the order book. Orders at the same price are aggregated, showing the total quantity available at that price. #### Limit Orders * **Limit Orders (LO)** specify a trade price $P$ and size $N$. * **Buy LOs (Bids)**: $\{(P_i^b, N_i^b) | 0 \leq i < m\}$ where $P_i^b > P_j^b$ for $i < j$. * **Sell LOs (Asks)**: $\{(P_i^a, N_i^a) | 0 \leq i < n\}$ where $P_i^a < P_j^a$ for $i < j$. * Order Book aggregates order sizes for each unique price. **Definitions** * **Best Bid** $P_0^b$: Highest buy order price. * **Best Ask** $P_0^a$: Lowest sell order price. * **Mid Price** $\frac{P_0^a + P_0^b}{2}$: Average of the Best Bid and Best Ask. * **Spread** $P_0^a - P_0^b$: Difference between the Best Ask and Best Bid. * **Market Depth** $P_{n-1}^a - P_{m-1}^b$: Price difference between the deepest bid and ask. ![LOB](https://hackmd.io/_uploads/HkqSNiaR6.png) ``` % OrderBook class ``` #### Market Orders A **Market Order (MO**) states intent to buy/sell $N$ shares at the best possible price(s) available on the OB at the time of MO submission. ### The Matching Process The order book continuously matches bids and asks following these principles: * **Price-Time Priority**: Orders with the highest bid (buy) or lowest ask (sell) are prioritized. If prices are equal, orders submitted earlier are filled first. * **Trade Execution**: When compatible orders are matched, a trade occurs at the agreed-upon price. The order book updates to reflect the new state of the market. ### Order Book Dynamics The order book is constantly changing as traders submit, cancel, and modify orders. Here's how it reacts to different order types: * New Sell LO $(P, N)$: * Potential Removal of Best Bids: $$ \text{Removal}: \left\{ (P^b_i, \min(N^b_i, \max(0, N - \sum^{i-1}_{j=0} N^b_j))) | i: P^b_i \geq P \right\} $$ * Addition to Asks: $$ (P, \max (0, N - \sum_{i: P^b_i \geq P} N^b_j)) $$ * New Buy LO $(P, N)$: * Potential Removal of Best Asks: $$ \text{Removal}: \left\{ (P^a_i, \min(N^a_i, \max(0, N - \sum^{i-1}_{j=0} N^a_j))) | i: P^a_i \geq P \right\} $$ * Addition to Bids: $$ (P, \max (0, N - \sum_{i: P^a_i \geq P} N^a_j)) $$ * Sell Market Order $N$: Removes the best bid prices. $$ \text{Removal}: \left\{ (P^b_i, \min(N^b_i, \max(0, N - \sum^{i-1}_{j=0} N^b_j))) | 0 \leq i < m \right\} $$ * Buy Market Order $N$: Removes the best ask prices. $$ \text{Removal}: \left\{ (P^a_i, \min(N^a_i, \max(0, N - \sum^{i-1}_{j=0} N^a_j))) | 0 \leq i < n \right\} $$ %``` % order book dynamics %``` ### Price Impact: How Market Orders Affect Prices **Price impact** refers to the degree to which a trade, especially a large one, influences the market price of a security or asset. This effect is particularly significant when market orders are used for substantial trades. #### How Price Impact Works 1. **Large Market Order**: A sizable market order to buy or sell can quickly consume available liquidity at the best prices, forcing the order to be filled at progressively worse prices if the order size is large enough. This dynamic drives the price up (for a buy order) or down (for a sell order). 2. **Widening Spread**: As the available shares at the best prices are depleted, the spread (the difference between the best bid and ask) widens. This means the market becomes temporarily less liquid. 3. **Order Book Replenishment**: The widened spread can attract new limit orders from other traders seeking to profit from the shifted price levels. This replenishment helps narrow the spread, but the price may settle at a new level, either higher or lower than before the large order. #### Key Factors * **Order Size vs. Market Depth**: The magnitude of price impact depends on how big the order is relative to the available liquidity (number of shares offered) in the order book. * **Buy vs. Sell**: Large buy orders push prices up; large sell orders push prices down. * **Temporary vs. Permanent**: Price impact can be temporary if the market rebalances quickly. However, if the large order reflects fundamental information that wasn't previously reflected in the price, the impact could be longer-lasting. ## Optimal Execution of a Market Order ### Objective To liquidate a significant amount of shares, denoted by $N$, within a preset timeframe comprised of $T$ discrete intervals. #### Constraints * The trading strategy is restricted to the use of **Market Orders** only. * It is imperative to consider the implications of both **Temporary** and **Permanent Price Impacts** on the asset's market price. **Goal**: Maximize the expected total utility derived from the proceeds of the sale. #### Strategic Insights * **Rapid Sales**: Executing large sales quickly can depress prices due to the market absorbing the increased supply, resulting in lower proceeds from sales. * **Gradual Sales**: Distributing the sale of shares more evenly over time can minimize market disruption but introduces the risk of not utilizing the full time available effectively. Additionally, this approach can increase exposure to market volatility, potentially reducing the predictability of sales proceeds. This dilemma and the pursuit of an optimal liquidation strategy can be effectively framed and analyzed through a **Markov Decision Process (MDP)**, providing a structured approach to navigating the trade-offs between immediate market impact and exposure to longer-term market fluctuations. ### Setup * **Time Division**: The process is segmented into discrete intervals, labeled by $t = 0, 1, \dots, T$. * **Best Bid Price** at the start of each interval is symbolized by $P_t$. * **Number of Shares Sold** within each interval is represented by $N_t$. * **Remaining Shares** to be sold at the beginning of each interval are denoted by $R_t$, calculated as $R_t = N - \sum^{t-1}_{i=0} N_i$, where: * Initially, $R_0 = N$. * For each subsequent interval, $R_{t+1} = R_t - N_t$, applicable for $0<t<T$. * At the final step, $N_{T-1} = R_{T-1}$ meaning completion of sales. * **Price Dynamics** are modeled as $P_{t+1} = f_t(P_t, N_t, \epsilon_t)$, incorporating: * The Permanent Price Impact resulting from the sale of $N_t$ shares. * Variations in the Best Bid Price independent of the sale's impact. * Random elements influencing price movements denoted by $\epsilon_t$. * **Sales Proceeds** for each interval $t$ are quantified by $$ N_t \cdot Q_t = N_t \cdot (P_t - g_t(P_t, N_t)), $$ where $g_t$ symbolizes the Temporary Price Impact. * **Utility Function** $U$ quantifies the satisfaction derived from the sales proceeds. ### Markov Decision Process Formulation This scenario is structured as a finite-horizon, discrete-time MDP: * MDP Horizon: $T$ * Order of Events at time step $t$ for $t=0,1,\dots,T-1$ involves: * Observe State $s_t = (P_t, R_t) \in \mathcal{S}_t$ * Perform Action $a_t = N_t \in \mathcal{A}_t$ * Receive Reward $r_{t+1} = U(N_t \cdot Q_t)$ * Experience Price Dynamics $P_{t+1} = f_t(P_t, N_t, \epsilon_t)$ as a consequence of the action and set $R_{t+1} = R_t - N_t$. The goal is to find the **optimal policy** $\pi^* = (\pi^*_0, \pi^*_1, \dots, \pi^*_{T-1})$ (defined as $\pi^*_t(P_t, R_t) = N^*_t$) that maximizes: $$ \mathbb{E} \left[\sum^{T-1}_{t=0} \gamma^t U(N_t \cdot Q_t) \right] $$ where $\gamma$ denotes the discount factor. ``` #optimal order execution ``` ### A Simple Linear Price Impact Model with No Risk-Aversion In this model, we explore the execution of a market order with the intention to sell a large quantity of shares, $N$, within a finite timeframe, partitioned into $T$ discrete intervals. The focus is on understanding the implications of executing such an order, especially in terms of price impact, under a simplified linear model. #### Model Assumptions * The price dynamics are governed by a linear equation: $P_{t+1} = P_t - \alpha N_t + \epsilon_t$, where: * $\alpha \in \mathbb{R}$ represents the coefficient of permanent price impact. * $\epsilon_t$ is a random variable representing exogenous price movements, assumed to be independent and identically distributed with an expected value of 0 given the current state and action. * The temporary price impact of selling $N_t$ shares is captured by a linear function: $\beta N_t$, leading to sales proceeds of $N_t (P_t - \beta N_t)$ per time step. * The utility function $U$ is the identity function, reflecting a risk-neutral preference. * The discount factor, $\gamma=1$. #### Optimal Value Function and Bellman Equation Given a scenario where we aim to sell a predetermined quantity of shares, $N$, within a specific timeframe segmented into $T$ discrete intervals, we employ the following notations and principles to establish the optimal value function: * **Value Function Under Policy $\pi$**: $$ V^{\pi}_t(P_t,R_t) = \mathbb{E}_{\pi} \left[ \sum^T_{i=t} N_i (P_t - \beta N_i)| (P_i,R_i) \right]. $$ * **Optimal Value Function**: $$ V^*_t(P_t,R_t) = \max_{\pi} V^{\pi}_t(P_t,R_t). $$ * **Bellman Equation**: For each timestep $0 \leq t < T-1$, $$ V^*_t(P_t,R_t) = \max_{N_t} \{ N_t (P_t - \beta N_t) + \mathbb{E}[V^*_{t+1}(P_{t+1},R_{t+1})] \}, \quad (*) $$ with a terminal condition at $T-1$ outlined as: $$ V^*_{T-1}(P_{T-1},R_{T-1}) = N_{T-1} (P_{T-1} - \beta N_{T-1}) = R_{T-1} (P_{T-1} - \beta R_{T-1}). $$ #### Derivation of the Optimal Policy and Value Function: For the second to last timestep, $T-2$, the optimal action and resulting value function become: \begin{split} V^*_{T-2}(P_{T-2},R_{T-2}) &= \max_{N_{T-2}} \{ N_{T-2} (P_{T-2} - \beta N_{T-2}) + \mathbb{E}[ R_{T-1} (R_{T-1} - \beta P_{T-1})] \} \\ &= \max_{N_{T-2}} \{ N_{T-2} (P_{T-2} - \beta N_{T-2}) + \mathbb{E}[ ( (R_{T-2}-N_{T-2}) (P_{T-1} - \beta (R_{T-2} - N_{T-2}) ) ] \} \\ &= \max_{N_{T-2}} \{ R_{T-2}P_{T-2} - \beta R^2_{T-2} + (\alpha-2\beta) (N_{T-2}^2 - N_{T-2} R_{T-2}) \} \end{split} #### Case $\alpha \geq 2 \beta$ * **Optimal Sales Strategy**: Either fully liquidate the position at once $N^*_{T-2} = R_{T-2}$ or not sell at all $N^*_{T-2} = 0$, based on maximizing expected sale proceeds. * **Value Function Insight**: Substituting $R_{T-2}$ for $(*)$ yields: $$ V^*_{T-2}(P_{T-2},R_{T-2}) = R_{T-2}(P_{T-2}-\beta R_{T-2}), $$ indicating that the optimal action either maximizes immediate revenue or preserves the entire quantity for future sale. * **Backward Induction Outcome**: Repeatedly applying this reasoning for each preceding time step confirms that * the optimal action $N^*_t$ is binary – sell all or nothing. * the expected total proceeds under this strategy equal $N(P_0 - \beta N)$. #### Case $\alpha < 2 \beta$ * **Optimal Differentiation**: The decision criterion for $N^*_{T-2}$ is given by setting the derivative of the expected utility equal to zero, leading to the equation $N^*_{T-2} = \frac{R_{T-2}}{2}$. * **Value Function Adjustment**: With $N^*_{T-2}$ determined, the updated value function becomes: $$ V^*_{T-2}(P_{T-2},R_{T-2}) = R_{T-2}P_{T-2} - R^2_{T-2} (\frac{\alpha + 2 \beta}{4}). $$ * **Generalization for All Time Steps**: Extending this logic across all time steps, the derived strategy implies * selling an equal portion of the remaining shares at each interval, i.e., $N^*_t = \frac{R_t}{T-t}$. * the expected total proceeds under this strategy equal $V^*_t(P_t,R_t) = R_tP_t - \frac{R^2_t}{2} (\frac{2\beta + \alpha(T-t-1)}{T-t})$. **Conclusion** * **Uniform Distribution Strategy**: The optimal approach involves consistently distributing the sale of shares across each interval, effectively selling a fraction $N^*_t = \frac{N}{T}$ at each time step. * **Expected Sale Proceeds**: Under this uniform distribution strategy, the expected total proceeds from sales are calculated as $N P_0 - \frac{N^2}{2} \left(\alpha + \frac{2\beta-\alpha}{T}\right)$. * **Implementation Shortfall Analysis**: The implementation shortfall, defined as the loss due to the price impact of executing large orders, is quantified as $\frac{N^2}{2} \left(\alpha + \frac{2\beta-\alpha}{T}\right)$. This measure highlights the cost of market impact over the planned execution horizon. * Remarkably, the shortfall remains above zero for scenarios where $\alpha > 0$, indicating a persistent cost due to permanent price impact, regardless of how extended the execution period ($T \to \infty$) might be. * Conversely, in situations where price impact is purely temporary ($\alpha = 0$), the strategy effectively neutralizes the implementation shortfall, assuming an infinite execution timeline is available. This outcome underscores the significance of distinguishing between temporary and permanent price impacts when strategizing market order executions. ### Navigating Real-world Trade Order Execution Challenges Executing large orders optimally in real-world markets involves navigating numerous challenges: * **Unpredictable Dynamics**: * Price movements can be arbitrary, influenced by factors beyond just the order being executed (news, sentiment, etc.). They might best be modeled by a stochastic process, $f_t$. * Temporary price impact can be time-dependent and non-linear, represented by $g_t$. * **Market Frictions and Constraints**: * Prices and order sizes are discrete, not continuous. * Trading rules impose constraints on allowable prices and order sizes. * Fees add another layer of cost. * **Vast Information**: Including nuanced market factors in the state representation (e.g., the entire order book) can lead to an unmanageably large state space. * **Cross-Asset Impact**: Large trades in one asset can impact prices in correlated assets. #### The Simulation Approach A practical solution is to develop a high-fidelity market simulator that captures these complexities. Here's how this approach works: 1. **Data-Driven Dynamics**: The simulator learns price dynamics and price impact models from historical market data. This can be enhanced with insights from market microstructure research. 2. **Realistic Constraints**: The simulator enforces market frictions like discrete prices, trading rules, and fees. 3. **Focused State Representation**: While the full order book could be included, designing a compact yet informative state representation is crucial for computational efficiency. #### Reinforcement Learning (RL) with the Simulator * **Neural Network Approximation**: To handle large state spaces, neural networks can approximate value functions or policies within the RL framework. * **Simulator Interactions**: The RL agent interacts with the simulator, receiving states, taking actions, and observing rewards. This allows it to learn optimal execution strategies without the risks of real-world experimentation. ## Optimal Market-Making ### Introduction **Market-making** is an essential function in financial markets, performed by entities or individuals known as **market-makers**. These participants actively quote both bid (buy) and ask (sell) prices for financial assets, maintaining a ready stance to execute trades at these prices. Through this activity, market-makers aim to profit from the spread—the difference between the ask and bid prices—while managing their inventory of assets. #### Roles and Dynamics of Market-Making * **Liquidity Provision**: Market-makers serve as the cornerstone of market liquidity, ensuring that buy and sell limit orders (LOs) are consistently available to market participants. * **Interaction with Market Participants**: The ecosystem includes **liquidity takers**, who engage with the market through market orders (MOs), directly impacting the demand and supply dynamics captured in the order book (OB). The relationship between liquidity providers, including market-makers, and liquidity takers is intricate, governed by the flow and response to MOs and LOs within the market environment. Understanding and predicting these dynamics require a deep analysis of OB activity and the broader market context. #### Objectives of Market-Making The primary aim for a market-maker is to optimize the utility of gains over a determined period, balancing the risk and reward inherent in their role: * **Narrow Spreads**: Setting closely priced buy and sell LOs might lead to smaller, more frequent gains, appealing in highly liquid markets where price volatility is minimal. * **Wide Spreads**: Alternatively, wider spreads can yield less frequent but potentially larger gains, suitable in scenarios of higher volatility or lower liquidity. * **Inventory Management**: A critical aspect of market-making is managing the inventory risk—holding too much of an asset (long) or too short of a position can lead to significant losses if the market moves unfavorably. ### Setup * **Time Division**: The model divides time into discrete intervals, labeled by $t = 0, 1, \dots, T$. * **Trading Account Value**: At the beginning of each interval, it's represented by $W_t \in \mathbb{R}$. * **Inventory of Shares**: At the start of each interval, it's represented by $I_t \in \mathbb{Z}$, with an initial inventory $I_0 = 0$. * **OB Mid Price** at time $t$ is $S_t \in \mathbb{R}_+$. * **Bid and Ask Prices and Sizes**: At time $t$, the bid price and size are $P^b_t \in \mathbb{R}_+$ and $N^b_t \in \mathbb{N}_+$, respectively; the ask price and size are $P^a_t \in \mathbb{R}_+$ and $N^a_t \in \mathbb{N}_+$. * **Bid and Ask Spreads** are denoted by $\delta^b_t = S_t - P^b_t$ for bids and $\delta^a_t = P^a_t - S_t$ for asks. * **Bid-Shares Hit** and **Ask-Shares Lifted up** to time $t$ are modeled as random variables $X^b_t$ and $X^a_t$. * The market-maker can adjust bids and asks without cost. Objective: Maximize the expected utility $\mathbb{E}[U(W_T + I_T \cdot S_T)]$ for a given utility function $U$. ### Markov Decision Process Formulation The scenario is presented as a finite-horizon, discrete-time MDP: * MDP Horizon: $T$ * Events Sequence at time step $t$ for $t=0,1,\dots,T-1$: * Observe the current state $(S_t, W_t, I_t) \in \mathcal{S}_t$. * Execute an action $(P^b_t, N^b_t, P^a_t, N^a_t) \in \mathcal{A}_t$. * Encounter OB dynamics resulting in: * Bid-shares hit $= X^b_{t+1} - X^b_t$ * Ask-shares lifted $= X^a_{t+1} - X^a_t$ * Update trading account value to $W_{t+1}$: $$ W_{t+1} = W_t + P^a_t\cdot (X^a_{t+1} - X^a_t) - P^b_t \cdot (X^b_{t+1} - X^b_t) $$ * Update inventory to $I_{t+1}$: $$ I_{t+1} = X^b_{t+1} - X^a_{t+1} $$ * Stochastic update from $S_t$ to $S_{t+1}$ * Earn a reward at the next step: $$ R_{t+1} = \begin{cases} 0 &\text{for } 0 \leq t \leq T-1, \\ U(W_T + I_T S_T) &\text{for } t+1 = T. \end{cases} $$ The objective is to determine an optimal policy $\pi^* = (\pi^*_0, \pi^*_1, \cdots, \pi^*_{T-1})$ where $\pi^*_t(S_t, W_t, I_t) = (P^b_t, N^b_t, P^a_t, N^a_t)$ to maximize $\mathbb{E}[R_T]$. ### Avellaneda-Stoikov Continuous Time Model We delve into the seminal work by Avellaneda and Stoikov from 2006, which presents an elegant, straightforward, and intuitive solution in the realm of quantitative finance. #### Setting Transformation to Continuous Time The setup transitions our previously discrete-time framework into a continuous-time environment, closely following the Avellaneda-Stoikov model. * **Poisson Processes for Order Hits/Lifts**: The processes $X^b_t$ and $X^a_t$ are modeled as Poisson with respective rates $\lambda^b_t$ and $\lambda^a_t$, representing the mean hit and lift rates: $$ dX^b_t \sim \text{Poisson}(\lambda^b_t \cdot dt), \quad dX^a_t \sim \text{Poisson}(\lambda^a_t \cdot dt), $$ where $\lambda^b_t = f^b(\delta^b_t)$ and $\lambda^a_t = f^a(\delta^a_t)$ for decreasing functions $f^b$ and $f^a$. * **Simplification for Trade Sizes**: Given the infinitesimally small nature of the Poisson variables $X^b_t$ and $X^a_t$, we can simplify $N^b_t$ and $N^a_t$ to always be $1$. This reduction transforms the action at time $t$ to simply choosing the bid and ask spreads $(\delta^b_t, \delta^a_t)$. * **Order Book Mid Price Dynamics**: The evolution of the mid price $S_t$ follows a stochastic differential equation $dS_t = \sigma dB_t$, with $B_t$ representing standard Brownian motion. * **Utility Function**: The model employs an exponential utility function $U(x) = - e^{-\gamma x}$, where $\gamma > 0$ serves as the risk-aversion coefficient. The dynamics of the trading account value are thus described by $$ dW_t = P^a_t dX^a_t - P^b_t d X^b_t $$ while the inventory dynamics are captured by $$ I_t = X^b_t - X^a_t. $$ #### Hamilton-Jacobi-Bellman Equation for Optimal Market Making The optimal value functionis is articulated as: $$ V^*(t,S_t, W_t, I_t) = \max_{\delta^b_u, \delta^a_u: t \leq u < T} \mathbb{E}[-e^{-\gamma (W_T + I_T \cdot S_t)}]. $$ For any sub-interval $[t, t_1]$ within the trading horizon up to time $T$, the value function satisfies a recursive relation: $$ V^*(t,S_t, W_t, I_t) = \max_{\delta^b_u, \delta^a_u: t \leq u < t_1} \mathbb{E}[V^*(t_1,S_{t_1}, W_{t_1}, I_{t_1})]. $$ Rewriting this in stochastic differential form, we have \begin{align*} 0 = \max_{\delta^b_u, \delta^a_u} \ &\mathbb{E}[d V^* (t,S_t, W_t, I_t)] \\ = \max_{\delta^b_u, \delta^a_u} \{ &\frac{\partial V^*}{\partial t} dt + \mathbb{E}[\sigma \frac{\partial V^*}{\partial S_t} dz_t + \frac{\sigma^2}{2} \frac{\partial^2 V^*}{\partial S_t^2} (dz_t)^2 ] \\ &+ \lambda^b_t V^* (t,S_t,W_t-S_t+\delta^b_t,I_t+1) \cdot dt \\ &+ \lambda^a_t V^* (t,S_t,W_t-S_t+\delta^a_t,I_t-1) \cdot dt \\ &+ (1 - \lambda^b_t dt - \lambda^a_t dt) V^*(t,S_t,W_t,I_t) - V^* (t,S_t,W_t,I_t) \}. \end{align*} By using the fact that $\mathbb{E}[dz_t]=0$ and $\mathbb{E}[(dz_t)^2]=dt$, we deduce the HJB equation \begin{split} 0 =& \frac{\partial V^*}{\partial t} + \frac{\sigma^2}{2} \frac{\partial^2 V^*}{\partial S_t^2} \\ &+ \max_{\delta^b_t} \{ f^b(\delta^b_t) + (V^* (t,S_t,W_t-S_t+\delta^b_t,I_t+1) - V^* (t,S_t,W_t,I_t) ) \} + \\ &+ \max_{\delta^a_t} \{ f^a(\delta^a_t) + (V^* (t,S_t,W_t+S_t+\delta^a_t,I_t-1) - V^* (t,S_t,W_t,I_t) ) \}. \end{split} with terminal condition $$ V^* (T,S_T,W_T,I_T) = -e^{-\gamma (W_T + I_T \cdot S_T)}. $$ #### Solving the HJB To solve the HJB equation for optimal market making, we employ an ansatz for the value function $$ V^* (T,S_T,W_T,I_T) = -e^{-\gamma (W_T + \theta(t,S_t,I_t))}. $$ Inserting this ansatz into the HJB equation yields: \begin{align} 0 =& \frac{\partial \theta}{\partial t} + \frac{\sigma^2}{2} (\frac{\partial^2 \theta}{\partial S_t^2} - \gamma (\frac{\partial \theta}{\partial t})^2 ) \\ &+ \max_{\delta^b_t} \{ \frac{f^b (\delta^b_t)}{\gamma} (1- e^{-\gamma (\delta^b_t - S_t + \theta(t,S_t,I_t+1) - \theta(t,S_t,I_t))}) \} \\ &+ \max_{\delta^a_t} \{ \frac{f^a (\delta^a_t)}{\gamma} (1- e^{-\gamma (\delta^a_t - S_t + \theta(t,S_t,I_t-1) - \theta(t,S_t,I_t))}) \} \quad (1) \end{align} with the terminal condition simplifying to $$ \theta(T,S_T,W_T,I_T) = I_T \cdot S_T. $$ #### Indifference Bid/Ask Price **Indifference prices** for buying $Q^b_t= Q^b(t,S_t,I_t)$ and selling $Q^a_t= Q^a(t,S_t,I_t)$ are deduced under the assumption that adjusting the inventory by a single unit leaves the market maker's expected utility unchanged: * For buying (bid price): $$ V^*(t,S_t,W_t-Q^b_t,I_t+1) = V^*(t,S_t,W_t, I_t), $$ * For selling (ask price): $$ V^*(t,S_t,W_t+Q^a_t,I_t-1) = V^*(t,S_t,W_t, I_t). $$ Leveraging the value function ansatz, we establish: \begin{split} Q^b_t &= \theta(t,S_t,I_t+1) - \theta(t,S_t,I_t), \\ Q^a_t &= \theta(t,S_t,I_t) - \theta(t,S_t,I_t-1). \end{split} and equation $(1)$ can be written as \begin{align*} 0 =& \frac{\partial \theta}{\partial t} + \frac{\sigma^2}{2} (\frac{\partial^2 V^*}{\partial S_t^2} - \gamma (\frac{\partial \theta}{\partial t})^2 ) + \max_{\delta^b_t} g(\delta^b_t) + \max_{\delta^a_t} h(\delta^a_t), \end{align*} where \begin{split} g(\delta^b_t) = \frac{f^b(\delta^b_t)}{\gamma} (1-e^{-\gamma (\delta^b_t - S_t + Q^b_t)}), \\ h(\delta^b_t) = \frac{f^a(\delta^a_t)}{\gamma} (1-e^{-\gamma (\delta^a_t - S_t + Q^a_t)}). \end{split} Optimal bid $\delta^{b*}_t$ and ask $\delta^{a*}_t$ spreads are derived by maximizing these functions: * Optimal bid spread: $$ \delta^{b*}_t = S_t - Q^b_t + \frac{1}{\gamma} \ln (1 - \gamma \frac{f^b(\delta^{b*}_t)}{\frac{\partial f^b}{\partial \delta^b_t}(\delta^{b*}_t)}). \quad (2) $$ * Optimal ask spread: $$ \delta^{a*}_t = Q^a_t - S_t + \frac{1}{\gamma} \ln (1 - \gamma \frac{f^a(\delta^{a*}_t)}{\frac{\partial f^a}{\partial \delta^a_t}(\delta^{a*}_t)}). \quad (3) $$ #### Simple Functional Form for Hitting/Lifting Rate In the context of market making, we consider a simplified yet insightful model for determining the rates at which market orders hit (buy) and lift (sell) the quotes provided by the market maker. This model assumes the hitting and lifting rates decay exponentially with the bid $\delta^b$ and ask $\delta^a$ spreads, formalized as: $$ f^b(\delta) = f^a(\delta) = c e^{-k \delta}, $$ where $c$ and $k$ are constants reflecting the sensitivity of order flow to spread size. ##### Optimal Spreads Determination Given this functional form, the optimal bid and ask spreads $\delta^{b*}_t$ and $\delta^{a*}_t$ can be explicitly expressed as adjustments from the mid-price plus a term that incorporates both risk aversion $\gamma$ and order flow sensitivity $k$: \begin{split} \delta^{b*}_t = S_t - Q^b_t + \frac{1}{\gamma}(1+\frac{\gamma}{k}), \\ \delta^{a*}_t = Q^a_t - S_t + \frac{1}{\gamma}(1+\frac{\gamma}{k}). \end{split} ##### Simplifying the Market Dynamics Incorporating these optimal spreads back into $(1)$ simplifies the dynamics into a tractable form: \begin{split} 0 = \frac{\partial \theta}{\partial t} + \frac{\sigma^2}{2} (\frac{\partial^2 \theta}{\partial S_t^2} - \gamma (\frac{\partial \theta}{\partial t})^2 ) + \frac{c}{k+\gamma} (1 - k \cdot \delta^{b*}_t + 1 - k \cdot \delta^{a*}_t). \end{split} Upon further analysis, we arrive at the following refined expressions: * indifference bid/ask prices: \begin{split} Q^b_t= S_t - (2I_t+1) \frac{\gamma \sigma^2 (T-t)}{2}, \\ Q^a_t= S_t - (2I_t-1) \frac{\gamma \sigma^2 (T-t)}{2}. \end{split} * optimal bid and ask spreads: \begin{split} \delta^{b*}_t = \frac{(2I_t+1)\gamma \sigma^2 (T-t)}{2} + \frac{1}{\gamma} \ln (1+\frac{\gamma}{k}), \\ \delta^{a*}_t = \frac{(1 - 2I_t)\gamma \sigma^2 (T-t)}{2} + \frac{1}{\gamma} \ln (1+\frac{\gamma}{k}). \end{split} * In particular, the optimal bid-ask spread is given by $$ \delta^{b*}_t + \delta^{a*}_t = \gamma \sigma^2 (T-t) \frac{2}{\gamma} \ln(1+\frac{\gamma}{k}). $$ ### Navigating Real-World Market-Making Challenges Theoretical market-making models often simplify the complexities of real-world markets. Key challenges practitioners face include: * **Non-linear Dynamics**: Price movements and order book responses to trades can be unpredictable and change over time. * **Market Frictions**: Discrete prices, order size limits, and transaction fees create constraints. * **Information Overload**: Incorporating all relevant market factors leads to an enormous state space, known as the "curse of dimensionality." #### The Power of Simulation-Based Market-Making A powerful approach to tackle these challenges is a sophisticated simulator that replicates the intricate dynamics of real-world markets: * **Data-Driven Dynamics**: The simulator leverages historical market data to learn realistic models of order book behavior and price responses. This creates a testing ground that closely mirrors the complexities of actual trading. * **Tackling Complexity with RL**: By integrating neural networks for function approximation, RL algorithms can handle the high-dimensional state space. This allows for learning complex market-making strategies that couldn't be easily derived analytically. * **Iterative Learning**: Through repeated interactions with the simulator, RL agents can uncover optimal policies that balance liquidity provision, profit, and risk. These strategies are refined without the need for risky live-market experimentation. ## References - Chapter 10 of the [RLForFinanceBook](https://stanford.edu/~ashlearn/RLForFinanceBook/book.pdf) - [Order Book Algorithms](https://github.com/coverdrive/technical-documents/blob/master/finance/cme241/Tour-OrderBook.pdf) slides for CME 241: Foundations of Reinforcement Learning with Applications in Finance