| Market | P&L |
|---|---|
| Loading... | |
| Market | Type | Market Price | True Prob | Edge | Kelly | Size | Action |
|---|
| Fold | Train End | Test Size | Brier S2 | Brier Mkt | Improvement | McFadden ΔR² |
|---|
| Trade | Side | Original Edge | Current Edge | Decay Ratio | Hours Held | Direction |
|---|
| Time | Market | Type | Side | Price | Model Prob | Edge | Kelly | P&L | Cumulative P&L | Status |
|---|
| ID | Market | Type | Side | Entry | Size | Predicted Edge | Realized Edge | Edge Error | P&L | Resolved |
|---|---|---|---|---|---|---|---|---|---|---|
| Loading... | ||||||||||
Loading limit orders...
Loading position data...
| Type | Trades | Hit Rate | Correlation | MAE | P&L |
|---|---|---|---|---|---|
| No resolved trades yet | |||||
| Time | Markets | Opportunities | Trades | Unrealized P&L | ΔR² | Positions | Bankroll |
|---|---|---|---|---|---|---|---|
| Loading... | |||||||
Loading alerts...
Before diving into how PolyEdge works, here are the key concepts explained in plain language.
PolyEdge uses a two-stage approach inspired by academic research on horse racing markets (Benter 1994). Here's how each piece fits together:
Every hour, the system pulls data from 200+ active Polymarket markets and 500+ resolved markets. It records prices, order book depth, volume, and timing information. Resolved markets (where we know the outcome) become our training data.
First, we build a model using just the market price and basic features (volume, liquidity, time to close). This represents what the crowd already knows. Think of it as "the market is mostly right, but how right?"
This is where the edge comes from. Stage 2 uses L2-regularized logistic regression (sklearn LogisticRegression, C=0.1) with 6 core features: stage1_logit, price_stage1_diff, depth_imbalance, price_uncertainty, log_time, and flb_correction. The L2 penalty prevents overfitting by shrinking coefficients toward zero, and the reduced feature set (down from 13) avoids multicollinearity. The ΔR² between Stage 1 and Stage 2 tells us exactly how much value these bias corrections add.
Final predictions blend 60% model probability + 40% market price (shrinkage_factor=0.6). This conservative blending acknowledges that markets are mostly right and prevents the model from making extreme predictions on thin evidence. Shrinkage improves out-of-sample stability.
We don't just test once. We use "walk-forward" testing with 5 expanding windows: train on months 1–2, test on month 3. Then train on months 1–3, test on month 4. And so on. Expanding-window temporal splits prevent data leakage by ensuring the model never sees future data during training. This simulates real trading conditions where you only know the past.
The system will not trade unless three conditions are met: (1) OOS ΔR² > 0, meaning the model beats market prices on unseen data; (2) OOS Brier improvement > 0, meaning predictions are more accurate; (3) at least 20 test observations, so the results are statistically meaningful. All three must pass.
For each live market, we compare our model's probability to the market price. The difference is the "edge." We then use Kelly Criterion (at 15% strength) to size positions proportionally to the edge, and apply risk limits to prevent overconcentration.
Markets are mostly efficient, but crowds make systematic errors. These errors are our opportunity.
Beyond the classic biases, we track 5 behavioral signals rooted in psychology research that create predictable mispricing.
PolyEdge has multiple layers of protection to prevent catastrophic losses.
Validated out-of-sample results demonstrating the model's genuine predictive power.
p_final = 0.6 * p_model + 0.4 * p_market. This acknowledges that market prices contain valuable information and prevents the model from making extreme predictions. Shrinkage is a standard technique in statistical forecasting that trades a small amount of in-sample fit for significantly better out-of-sample stability.A quick guide to each tab and what to look for.
Your command center. The four cards at the top show bankroll, total P&L, open positions, and OOS validation status. The OOS card is the most important — if it shows "VALIDATED" in green, the model has proven itself on unseen data. Below that, you'll see bias detection by event type and action buttons to trigger manual cycles.
Shown on the Overview tab, this table lists markets where the model detects mispricing of at least 2%. The "Edge" column is the percentage difference between our model's probability and the market price. "Kelly" shows the optimal bet fraction. "Size" shows the dollar amount. Opportunities are refreshed automatically every 5 minutes by the tactical scheduler.
Model performance and calibration diagnostics combined. Stage 1/Stage 2 R² and ΔR² show the model improvement from bias correction. OOS (out-of-sample) metrics validate on unseen data. ECE should be under 0.05. The reliability diagram shows calibration visually. Edge decay tracks whether open trades' edges are holding up over time.
Every trade the system has made (or simulated in dry-run). Check the "Model Prob" vs "Price" columns to see the predicted edge. Resolved trades show realized P&L. Invalidated trades (dimmed rows) had edges so extreme they were likely model errors.
Choose which AI model powers the supervisor. Available models include Claude (Opus, Sonnet, Haiku) and GPT-4o variants. Flagship models give deeper analysis, fast models are quicker and cheaper.
Deep strategic review by the selected AI model. The confidence score (0–100) summarizes overall system health. Green (70+) means healthy. Yellow (40–69) means needs attention. Red (<40) means critical issues. You can also ask specific strategy questions using the text box.