# Polymarket Mispricing Strategy: An Alpha-Generating Framework
## Introduction
Prediction markets like Polymarket offer a unique environment for *information aggregation* and financial innovation. Unlike traditional exchanges, the price of a binary option token (YES/NO) directly represents the market's implied probability of a discrete event occurring.
This note proposes a quantitative strategy that generates alpha by identifying and exploiting temporary mispricings between the market-implied probability $P_M$ derived from Polymarket prices and an information-based predictive probability $P_R$ generated by a supervised machine learning model (*Logistic Regression*). The strategy is designed to capture profits as the market corrects to the model's superior forecast.
## Prediction Market Mechanics
Polymarket functions as a decentralized prediction market leveraging blockchain technology. Understanding its mechanics is crucial for execution efficiency and risk control.
### Binary Options Structure
A market consists of two mutually exclusive tokens: **YES** (pays $1 if the event occurs) and **NO** (pays $1 if the event does not occur).
* *Pricing*: The sum of the YES price and the NO price is always $1 (excluding fees). The YES price is defined as the market-implied probability, $P_M$.
* Settlement: Upon market resolution, the winning token settles at $1, and the losing token settles at $0.
## The Predictive Model
The core of the strategy is the robust generation of the model probability, $P_R$, using *Logistic Regression (Logit)* for binary classification.
### Model Choice and Output
*Logistic regression* is chosen for its interpretability and ability to output a probability between 0 and 1, naturally fitting the prediction market context.
$$
\text{Model Probability } P_R = \sigma \left( \frac{\mathbf{w}^T \mathbf{x} + b}{T} \right)
$$
Where:
* $\mathbf{x}$ is the feature vector of inputs.
* $\mathbf{w}$ is the vector of trained weights.
* $b$ is the bias term.
* $T$ is the temperature parameter.
* $\sigma$ is the sigmoid function $\frac{1}{1 + e^{-z}}$.
### Candidate Input Features $(\mathbf{x})$
The model will be trained on data collected up to the time of prediction, focusing on two categories of information:
| Category | Input Feature | Rationale |
| :--- | :--- | :--- |
| Market-Specific | *Time to Maturity ($T$)* | Events closer to resolution exhibit higher information flow and market stability. |
| Market-Specific | *Volatility (Price changes over 24h)* | Measures market uncertainty or the arrival rate of new information. |
| Exogenous Information | *Sentiment Score (of related news/social media)* | Quantifies the directional bias of external commentary relevant to the event. |
| Exogenous Information | *Key Data Releases (Indicator variable)* | Flags the proximity to scheduled, high-impact public information releases (e.g., earnings, election results). |
### Training and Validation
The model must be retrained or validated periodically to account for concept drift. A rolling *walk-forward validation* approach is required, training the model on a look-back window (e.g., 60 days) and testing its performance on the subsequent period.
## Alpha Generation & Trading Strategy
### Defining the Alpha Signal
*Alpha* $\alpha$ is defined as the absolute difference between the model's prediction and the current market price (implied probability):$$\alpha = P_R - P_M$$
### Trading Signals and Execution
The strategy is fundamentally *mean-reverting*, betting that $P_M$ will converge toward the "more correct" $P_R$.
| Component | Condition | Action |
| :--- | :--- | :--- |
| Entry Signal ($\mathbf{S}_{\text{entry}}$) | When the mispricing exceeds a threshold $\tau_{\alpha}$. | *BUY YES* if $\alpha > \tau_{\alpha}$ (Undervalued)<br>*SELL/SHORT YES* (Buy NO) if $\alpha < -\tau_{\alpha}$ (Overvalued) |
| Exit Signal ($\mathbf{S}_{\text{exit}}$) | When the mispricing is corrected, falling below a tolerance $\epsilon$. | *CLOSE POSITION* if $\|\alpha\| < \epsilon$ |
| Stop Loss ($\mathbf{SL}$) | If the trade moves against the position by a defined percentage $\delta_{\text{loss}}$. | *IMMEDIATE CLOSE* to limit capital erosion. |
### Position Sizing and Capital Allocation
Position sizing must be dynamic, leveraging a modified Kelly Criterion or a risk-parity approach.
$$
L \propto \frac{\alpha}{\text{Expected Volatility}}
$$
Where $L$ is the capital allocation for the trade. Positions should be smaller for:
1. Markets with *high volatility*.
2. Markets with a *long Time to Maturity* ($T$), as they are exposed to information risk for longer periods.
## Risk Management & Performance Metrics
### Risk Mitigation
* *Liquidity Risk*: Limit trade size to a maximum percentage of the available liquidity pool to prevent catastrophic slippage.
* *Event Risk*: Utilize Stop-Loss ($\mathbf{SL}$) to manage the risk of sudden, large information shocks that invalidate the model's prediction.
* *Overfitting Risk*: Rigorously test the model out-of-sample and avoid excessive feature engineering.
### Performance Metrics
The strategy's success will be evaluated using standard quantitative finance metrics:
* *Sharpe Ratio*: Measures risk-adjusted return (Return / Standard Deviation of Return).
* *Maximum Drawdown (MDD)*: The largest peak-to-trough decline during a specific period.
* *Calmar Ratio*: Measures risk-adjusted return based on MDD (Annualized Return / MDD).
## Reference
* https://vitalik.eth.limo/general/2024/11/09/infofinance.html