# <center> Bayesian Inference on RMM-01 Pool Liquidity Utilization </center>
## Introduction
Statistical inference is a largely unexplored area to analyze DeFi datasets with little work done to date. In this post, we analyze a DeFi dataset containing RMM-01 historical pool data using Bayesian inference [in this notebook](https://github.com/primitivefinance/pool-analytics/blob/main/notebooks/rmm_bayesian_inference_liquidity_utilization.ipynb). We derive liquidity behavior insights, providing empirical evidence to answer questions such as:
- What is the optimal liquidity utilization ratio?
- What are the upper bounds of arbitrage swap sizes over time?
- What is the potential slippage/price impact on LPs?
### Replicating Market Makers (RMMs) and RMM-01 Trading Function
RMMs are a [novel class of AMMs](https://arxiv.org/abs/2103.14769) with the key property that a trading function can be explicitly defined to replicate a myriad of financial derivative payoff. The first implementation of an RMM is [RMM-01](https://primitive.xyz/whitepaper-rmm-01.pdf), which uses a Black-Scholes trading function to replicate the payoff of a covered call for liquidity providers (LPs).
Launching in April 2022 on Ethereum, little is known about the behavior of RMM-01 pools, with less than 500 data points of swaps being available in the historical dataset. RMM-01 pools are unique because they are set with an unusually high swap fee around 1%, which no human would pay. Instead, all of the RMM-01 swap volume comes entirely from arbitrage bots who can pay a higher fee while still profiting from the trade.
As a result, RMM-01 provides an interesting opportunity to study the intersection between DeFi (providing liquidity), TradFi (replicating a covered call payoff), and MEV (attracting sufficient arbitrage volume).
## Bayesian Inference
Bayesian inference is a powerful probabilistic framework that approximates a posterior probability distribution conditionally dependent on the prior distribution of observed data. The posterior distribution provides insight into the uncertainty of prior statistical parameters conditioned on observed data and can be visualized easily as a highest posterior density interval.
Bayesian inference leverages [Bayes Formula](https://www.psychologyinaction.org/psychology-in-action-1/2012/10/22/bayes-rule-and-bomb-threats) to compute the posterior distribution from a prior distribution of observed data.
A good description of the Bayesian procedure is shown below, taken from [here](https://www.stat.cmu.edu/~larry/=sml/Bayes.pdf):

### cmdstanpy
`cmdstanpy` is a state of the art Bayesian inference library that lets users draw samples from a posterior distribution via the [UTS-HMC](http://www.stat.columbia.edu/~gelman/research/published/nuts.pdf) aglorithm.
Hamiltonian Monte Carlo (HMC) is an algorithm used to draw samples from a posterior distribution. The performance of HMC depends strongly on choosing suitable values for $\epsilon$ and $L$. If $\epsilon$ is too large, then the simulation will be inaccurate and yield low acceptance rates. If $\epsilon$ is too small, then computation will be wasted taking many small steps. If $L$ is too small, then successive samples will be close to one another, resulting in undesirable random walk behavior and slow mixing. If $L$ is too large, then HMC will generate trajectories that loop back and retrace their steps.
The No-U-Turn Sampler (NUTS) extends HMC by offering automatic selection of step sizes $\epsilon$ and desired number of steps $L$. The selection of the step size $\epsilon$ relies on using [nonsmooth stochastic optimization with a vanishing gradient adaptation](https://people.eecs.berkeley.edu/~jordan/sail/readings/andrieu-thoms.pdf) adapted from [Nesterov's primal-dual algorithm](https://ium.mccme.ru/postscript/s12/GS-Nesterov%20Primal-dual.pdf). NUTS uses a recursive algorithm to set the desired number $L$ of leapfrog steps by building a balanced binary tree via repeated doubling. This preserves time reversibility by running the Hamiltonian simulation both forward and backward in time. This is shown in the below figure, taken from [here](http://www.stat.columbia.edu/~gelman/research/published/nuts.pdf).

## Analysis and Results
This model samples two posterior distributions from two prior distributions of observed liquidity utilization data and $\tau$ (time between swaps). Liquidity utilization was calculated on a per swap basis as the swap size divided by the USDC value of liquidity being provided. The two charts below are kernel density estimate (KDE) charts which draw contours to visualize different levels of density between the posterior distributions. The high density contour intervals are defined as 25% (yellow), 50% (green), 75% (teal), and 95% (dark blue).


The first insight is that the prior distribution of liquidity utilization has very large intervals where liquidity utilization could reach levels upwards of 30-50% with a very large probability density interval whereas the posterior distribution does not go past 20% with very high certainty. The second insight is that the posterior distribution shows that liquidity utilization is stable over a longer period of time up to ~28 days (tau=0.08). In contrast, no insight is available for the prior distribution past ~14 days (tau=0.05).
Higher liquidity utilization rates generally imply illiquidity and lead to higher price impact/slippage, making swaps riskier for both LPs and swappers. However since swap volume is driven entirely from arbitrage bots who only execute swaps for profit, the higher liquidity utilization rates are actually hurting the liquidity providers as they are not being compensated with a high enough fee for the level of LP risk. More research is needed in the direction of using dynamic fees to compensate LPs more fairly for the risk of LPing. If risk can be priced more accurately using dynamic fees at a swap level, this would provide better risk compensation to LPs as well as discouraging liquidity utilization past a level that is too risky for LPs.
The chart below shows a more granular view of the posterior probability densities as well as showing the distribution curves for both liquidity utilization and tau. The highest density occurs around the 5-10% liquidity utilizatn area and around the .02-.04 (1-2 week) $\tau$ range.

## Conclusion & Further Research
By using Bayesian inference, key initial insights were derived describing the behavior of RMM-01 pools as well as illuminate areas of risk for LPs from excessively high liquidity utilization. Additionally Bayesian inference allowed robust analysis to be completed on a small data set of less than 500 swaps.
Some additional research directions could be:
* How to set fees that compensate LPs dynamically for risk in the form of high liquidity utilization rates?
* Are the initial parameters set by RMM-01 LPs ideal? How do different parameters affect liquidity utilization and swap volume?
* How does RMM-01 pool liquidity utilization compare to other AMM designs such as Uniswap and Curve?
* Is there a way to distribute LP utilization risk across different RMM-01 pools in a way that mitigates LP utilization risk for the entire portfolio?
## References
1. [NUTS-HMC](http://www.stat.columbia.edu/~gelman/research/published/nuts.pdf)
2. [Chapter 12 - Bayesian Inference](https://www.stat.cmu.edu/~larry/=sml/Bayes.pdf)
3. [cmdstanpy, a package for Bayesian inference](https://mc-stan.org/cmdstanpy/)
4. [cmdstanpy examples](https://github.com/stan-dev/cmdstanpy/tree/master/docsrc/examples)
5. [rmm pool bayesian analysis notebook](https://github.com/primitivefinance/pool-analytics/blob/main/notebooks/rmm_bayesian_inference_liquidity_utilization.ipynb)