| Market | P&L |
|---|---|
| Loading positions... | |
| Market | Type | Market Price | True Prob | Edge | Ann. Edge | Opp Cost | Time | Kelly | Size | Action |
|---|
| Fold | Train End | Test Size | Brier S2 | Brier Mkt | Improvement | McFadden ΔR² |
|---|
| Trade | Side | Entry Price | Current Price | P&L % | Hours Held | Status |
|---|
| Event Type | Brier Stage 1 | Brier Stage 2 | Brier Market | S2 vs S1 | N |
|---|---|---|---|---|---|
| Loading... | |||||
| Time | Market | Type | Side | Price | Model Prob | Edge | Kelly | P&L | Cumulative P&L | Status |
|---|
| ID | Market | Side | Entry | Size | Realized Edge | P&L | Status | Date |
|---|---|---|---|---|---|---|---|---|
| Loading... | ||||||||
Loading limit orders...
Loading position data...
| Type | Trades | Hit Rate | Correlation | MAE | P&L |
|---|---|---|---|---|---|
| No resolved trades yet | |||||
| Time | Markets | Opportunities | Trades | Unrealized P&L | ΔR² | Positions | Bankroll |
|---|---|---|---|---|---|---|---|
| Loading... | |||||||
Loading alerts...
Quantitative edge detection for Polymarket prediction markets.
Select a section below to learn more.
Every hour, PolyEdge pulls prices, order books, and volume data from hundreds of active Polymarket markets. Resolved markets (where the outcome is known) become training data.
A two-stage statistical model identifies where crowds make systematic mistakes. Stage 1 captures what the market already knows. Stage 2 corrects for behavioral biases like overvaluing longshots or underreacting to new information.
Before any trading happens, the model must prove it beats market prices on data it has never seen. This out-of-sample testing prevents the system from fooling itself with overfitting.
For each live market, PolyEdge compares its model probability to the market price. The difference is the edge. Position sizes are calculated using the Kelly Criterion (at a conservative 15% fraction) to balance growth with safety.
Open positions are continuously monitored with stop-losses and EV-based exit decisions. An AI supervisor (Claude) performs periodic strategic reviews to catch issues the statistical model might miss.
PolyEdge uses a two-stage approach inspired by academic research on horse racing markets (Benter 1994). Here's how each piece fits together:
Every hour, the system pulls data from 200+ active Polymarket markets and 500+ resolved markets. It records prices, order book depth, volume, and timing information. Resolved markets (where we know the outcome) become our training data.
First, we build a model using just the market price and basic features (volume, liquidity, time to close). This represents what the crowd already knows. Think of it as "the market is mostly right, but how right?"
This is where the edge comes from. Stage 2 uses L2-regularized logistic regression with 48 features: 18 base features plus 30 event-type interaction terms across 6 market categories (political, crypto, sports, entertainment, geopolitical, economics). The regularization strength C is selected by cross-validation, not hardcoded.
A key insight comes from Snowberg & Wolfers (2010), who showed that the favorite-longshot bias isn't uniform — it's dramatically stronger at price extremes. A 5-cent longshot doesn't just have proportionally more FLB than a 30-cent contract; it has nonlinearly more. Similarly, heavy favorites above 90 cents are systematically underpriced. The model captures this with price-tier spline features (longshot_deep, longshot_zone, favorite_zone, favorite_deep) plus logit_squared for curvature and flb_asymmetric for the asymmetry between longshot and favorite mispricing. Each of these interacts with event type, so crypto longshots get different corrections than sports longshots (following Jullién & Salanié's insight that bias structure is context-dependent).
The ΔR² between Stage 1 and Stage 2 tells us exactly how much value these bias corrections add.
The model's raw prediction is blended with the market price in logit space (log-odds), not linearly. The blending weight is optimized by cross-validation alongside the regularization parameter — the system searches over a grid of weights from 10% to 50% model influence and picks the combination with the best out-of-sample Brier score. Currently the data selects a 50/50 blend, meaning the model earns equal weight with the market. This is justified because the model beats the market price in all 5 walk-forward CV folds consistently.
Some events have many outcomes — "Who will win the Champions League?" might have 32 teams, each with a YES price. If you add up all those prices, they typically sum to more than $1.00 (often $1.30+). That excess is called the overround, and it isn't spread evenly: longshots carry a disproportionate share.
The Shin (1993) probability model, originally developed for horse racing, solves for a single parameter (z) representing the level of informed trading, then transforms all prices simultaneously into true probabilities that sum to exactly 1.0. The result is a coherent probability distribution that accounts for insider effects and the favorite-longshot bias across all outcomes at once.
This step is especially valuable for sports, where the Stage 2 bias correction slightly underperforms Stage 1. Sports events on Polymarket are almost always mutually exclusive multi-outcome markets with high overround and lots of longshots — exactly the structure Shin was designed for. The Shin probability is blended at 30% weight with the logit model prediction, and high-overround events (above 15%) are the primary edge source.
We don't just test once. We use "walk-forward" testing with 5 expanding windows: train on months 1–2, test on month 3. Then train on months 1–3, test on month 4. And so on. Expanding-window temporal splits prevent data leakage by ensuring the model never sees future data during training. This simulates real trading conditions where you only know the past.
The system will not trade unless three conditions are met: (1) OOS ΔR² > 0, meaning the model beats market prices on unseen data; (2) OOS Brier improvement > 0, meaning predictions are more accurate; (3) at least 20 test observations, so the results are statistically meaningful. All three must pass.
For each live market, we compare our model's probability to the market price. The difference is the "edge." We then use Kelly Criterion (at 15% strength) to size positions proportionally to the edge, and apply risk limits to prevent overconcentration.
Markets are mostly efficient, but crowds make systematic errors. These errors are our opportunity.
Beyond the classic biases, we track 5 behavioral signals rooted in psychology research that create predictable mispricing.
longshot_deep (below 10c), longshot_zone (below 25c), favorite_zone (above 75c), favorite_deep (above 90c), plus logit_squared for curvature and flb_asymmetric (p²(1-p)) for the asymmetry between longshot and favorite mispricing.Once a position is open, the system continuously monitors it and decides whether to hold or fully exit. There are no fixed hold-time rules — every decision is driven by math.
Every trade follows this primary status progression (dry-run trades use parallel simulated statuses):
A buy order has been placed on the Polymarket CLOB (Central Limit Order Book). A corresponding limit order record tracks the on-exchange order. The trade is waiting to be filled.
The buy order has been matched. The position is now active and being monitored by the position manager every cycle. Mark-to-market P&L is tracked continuously.
The system has decided to exit and placed a sell limit order on the CLOB. The trade waits for the sell to fill. If the sell order expires without filling, the trade returns to "filled" status and the system retries with increased urgency.
The sell order has filled. Realized P&L is computed, edge accuracy metrics are updated, and the trade enters the feedback loop for model improvement. Alternatively, a trade can reach "resolved" if the market closes and settles at $0 or $1.
Every few minutes, the scanner runs review_positions(), which does the following for each open position:
Each position is evaluated through a strict priority chain. The first rule that triggers wins — no further checks are needed.
Three independent stop-loss triggers protect against different failure modes:
The core exit decision. If stop-loss hasn't triggered, the system asks: "Is the remaining edge in this position worth more than what this capital could earn elsewhere?"
ev_hold = (residual_edge × confidence_factor − exit_cost_at_resolution) × capital_locked
ev_exit = −exit_cost_now × capital_locked + opportunity_cost_annualized × capital_locked × time_remaining_years
ev_exit > ev_hold, the system exits. The remaining thesis isn't worth the capital. The capital is better off earning the opportunity cost elsewhere. This is a pure economic comparison — no hold-time gates, no arbitrary thresholds.
Once an exit decision is made, the system assigns an urgency level that controls how aggressively it executes:
The urgency level determines which execution method is used:
purpose=exit to track the order lifecycle. The trade status moves to "exit_pending" and P&L is deferred until fill is confirmed.
p_final = 0.6 * p_model + 0.4 * p_market. This acknowledges that market prices contain valuable information and prevents the model from making extreme predictions. Shrinkage is a standard technique in statistical forecasting that trades a small amount of in-sample fit for significantly better out-of-sample stability.Your command center. The four cards at the top show unrealized P&L, capital deployed, model status (ΔR²), and edge quality. Below that you'll find top positions, OOS validation status, bias detection by event type, and current opportunities. If you're an admin, action buttons for manual cycles also appear here.
Shown on the Overview tab, this table lists markets where the model detects mispricing of at least the configured minimum edge threshold (currently 7%). The "Edge" column is the percentage difference between our model's probability and the market price. "Kelly" shows the optimal bet fraction. "Size" shows the dollar amount. Opportunities are refreshed automatically every 5 minutes by the tactical scheduler.
Ann. Edge — The edge scaled to a yearly rate: edge × (8760 / hours_to_close), capped at 365x turnover. A 2% edge on a market closing in 24 hours is worth 730% annualized, while the same edge on a 30-day market is only 24%. This is why short-duration trades often rank higher.
Opp Cost — The opportunity cost of locking capital in this trade instead of earning the 5% APY risk-free rate. For short-duration trades this is negligible; for multi-week holds it can eat into real returns.
Time — Hours or days until the market closes. Combined with Ann. Edge, this tells you how efficiently your capital would be deployed.
Model performance and calibration diagnostics combined. Stage 1/Stage 2 R² and ΔR² show the model improvement from bias correction. OOS (out-of-sample) metrics validate on unseen data. ECE should be under 0.05. The reliability diagram shows calibration visually. Edge decay tracks whether open trades' edges are holding up or deteriorating over time.
Every trade the system has placed (or simulated in dry-run). Check the "Model Prob" vs "Price" columns to see the predicted edge. Resolved trades show realized P&L. Invalidated trades (dimmed rows) had edges so extreme they were likely model errors.
Realized performance stats (P&L, win rate, ROI) at the top, followed by open limit orders waiting to fill, and active positions with hold/exit recommendations and EV metrics. The position manager evaluates each trade and suggests whether to hold or exit based on current edge, EV comparison, and risk parameters.
Ann. Edge and Opp Cost columns show each position's annualized edge and the opportunity cost of capital locked in it. Time Rem shows how long until the market closes. Click the magnifying glass button on any position to get a live AI analysis covering position health, feature drivers, capital efficiency, and a hold/exit recommendation with reasoning.
The System Logs and Claude Supervisor tabs are visible only to admin users. System Logs shows edge accuracy metrics, trade execution diagnostics ("Why No Trades?"), cycle history, alerts, data backfill controls, and a live application log stream. The Claude Supervisor tab provides deep AI-powered strategic reviews with a confidence score and interactive Q&A.