# Polymarket Mispricing Strategy: An Alpha-Generating Framework ## Introduction Prediction markets like Polymarket offer a unique environment for *information aggregation* and financial innovation. Unlike traditional exchanges, the price of a binary option token (YES/NO) directly represents the market's implied probability of a discrete event occurring. This note proposes a quantitative strategy that generates alpha by identifying and exploiting temporary mispricings between the market-implied probability $P_M$ derived from Polymarket prices and an information-based predictive probability $P_R$ generated by a supervised machine learning model (*Logistic Regression*). The strategy is designed to capture profits as the market corrects to the model's superior forecast. ## Prediction Market Mechanics Polymarket functions as a decentralized prediction market leveraging blockchain technology. Understanding its mechanics is crucial for execution efficiency and risk control. ### Binary Options Structure A market consists of two mutually exclusive tokens: **YES** (pays $1 if the event occurs) and **NO** (pays $1 if the event does not occur). * *Pricing*: The sum of the YES price and the NO price is always $1 (excluding fees). The YES price is defined as the market-implied probability, $P_M$. * Settlement: Upon market resolution, the winning token settles at $1, and the losing token settles at $0. ## The Predictive Model The core of the strategy is the robust generation of the model probability, $P_R$, using *Logistic Regression (Logit)* for binary classification. ### Model Choice and Output *Logistic regression* is chosen for its interpretability and ability to output a probability between 0 and 1, naturally fitting the prediction market context. $$ \text{Model Probability } P_R = \sigma \left( \frac{\mathbf{w}^T \mathbf{x} + b}{T} \right) $$ Where: * $\mathbf{x}$ is the feature vector of inputs. * $\mathbf{w}$ is the vector of trained weights. * $b$ is the bias term. * $T$ is the temperature parameter. * $\sigma$ is the sigmoid function $\frac{1}{1 + e^{-z}}$. ### Candidate Input Features $(\mathbf{x})$ The model will be trained on data collected up to the time of prediction, focusing on two categories of information: | Category | Input Feature | Rationale | | :--- | :--- | :--- | | Market-Specific | *Time to Maturity ($T$)* | Events closer to resolution exhibit higher information flow and market stability. | | Market-Specific | *Volatility (Price changes over 24h)* | Measures market uncertainty or the arrival rate of new information. | | Exogenous Information | *Sentiment Score (of related news/social media)* | Quantifies the directional bias of external commentary relevant to the event. | | Exogenous Information | *Key Data Releases (Indicator variable)* | Flags the proximity to scheduled, high-impact public information releases (e.g., earnings, election results). | ### Training and Validation The model must be retrained or validated periodically to account for concept drift. A rolling *walk-forward validation* approach is required, training the model on a look-back window (e.g., 60 days) and testing its performance on the subsequent period. ## Alpha Generation & Trading Strategy ### Defining the Alpha Signal *Alpha* $\alpha$ is defined as the absolute difference between the model's prediction and the current market price (implied probability):$$\alpha = P_R - P_M$$ ### Trading Signals and Execution The strategy is fundamentally *mean-reverting*, betting that $P_M$ will converge toward the "more correct" $P_R$. | Component | Condition | Action | | :--- | :--- | :--- | | Entry Signal ($\mathbf{S}_{\text{entry}}$) | When the mispricing exceeds a threshold $\tau_{\alpha}$. | *BUY YES* if $\alpha > \tau_{\alpha}$ (Undervalued)<br>*SELL/SHORT YES* (Buy NO) if $\alpha < -\tau_{\alpha}$ (Overvalued) | | Exit Signal ($\mathbf{S}_{\text{exit}}$) | When the mispricing is corrected, falling below a tolerance $\epsilon$. | *CLOSE POSITION* if $\|\alpha\| < \epsilon$ | | Stop Loss ($\mathbf{SL}$) | If the trade moves against the position by a defined percentage $\delta_{\text{loss}}$. | *IMMEDIATE CLOSE* to limit capital erosion. | ### Position Sizing and Capital Allocation Position sizing must be dynamic, leveraging a modified Kelly Criterion or a risk-parity approach. $$ L \propto \frac{\alpha}{\text{Expected Volatility}} $$ Where $L$ is the capital allocation for the trade. Positions should be smaller for: 1. Markets with *high volatility*. 2. Markets with a *long Time to Maturity* ($T$), as they are exposed to information risk for longer periods. ## Risk Management & Performance Metrics ### Risk Mitigation * *Liquidity Risk*: Limit trade size to a maximum percentage of the available liquidity pool to prevent catastrophic slippage. * *Event Risk*: Utilize Stop-Loss ($\mathbf{SL}$) to manage the risk of sudden, large information shocks that invalidate the model's prediction. * *Overfitting Risk*: Rigorously test the model out-of-sample and avoid excessive feature engineering. ### Performance Metrics The strategy's success will be evaluated using standard quantitative finance metrics: * *Sharpe Ratio*: Measures risk-adjusted return (Return / Standard Deviation of Return). * *Maximum Drawdown (MDD)*: The largest peak-to-trough decline during a specific period. * *Calmar Ratio*: Measures risk-adjusted return based on MDD (Annualized Return / MDD). ## Reference * https://vitalik.eth.limo/general/2024/11/09/infofinance.html