POLYEDGE

System Active
First Time Here?
Overview
Calibration
Trade History
Positions
Docs

Unrealized P&L

$--
across -- positions

Capital Deployed

$--
Exposure: --

Model Status

--
OOS ΔR² --

Capital Efficiency

--

Top Positions

Market P&L
Loading positions...

OOS Validation

--
Significance: --

Current Opportunities

Market Type Market Price True Prob Edge Ann. Edge Opp Cost Time Kelly Size Action

Recent System Activity

Loading system activity...

Total Markets

--
Active: --

Resolved Markets

--
Snapshots: --

Training Data

--
--

Actions

Stage 1 - Fundamentals

R² = --
Brier Score: --

Stage 2 - Bias Correction

R² = --
Brier Score: --

ΔR² Improvement (In-Sample)

--
LR Test: --

OOS McFadden ΔR²

--
Walk-forward backtest

Test Samples

--
-- folds

Expected Calibration Error (ECE)

--
Model ECE (lower is better)
Market ECE: --

Tail Calibration

Loading...

Edge Decay Signal

--
--

Reliability Diagram

Model Performance Over Time

Walk-Forward Backtest Folds

Fold Train End Test Size Brier S2 Brier Mkt Improvement McFadden ΔR²

OOS ΔR² Trend Over Time

Edge Decay by Trade

Trade Side Entry Price Current Price P&L % Hours Held Status

Bias Detection by Event Type

Loading bias data...

Stage 2 Performance by Event Type

Event Type Brier Stage 1 Brier Stage 2 Brier Market S2 vs S1 N
Loading...

Feature Importance

Loading...

Data Quality Report

Loading...

Time-Regime Stage 2 Analysis

Loading...

Edge Calibration (OOS Shrinkage)

Loading...

Trade History

Time Market Type Side Price Model Prob Edge Kelly P&L Cumulative P&L Status

Realized P&L

--
-- trades resolved

Win Rate

--
W: -- / L: --

Edge Accuracy

--
MAE: --

Edge Quality

--
avg P&L: --

Best / Worst

--
--

Cumulative P&L Over Time

P&L by Event Type

Predicted vs Realized Edge

Trade P&L Distribution

Resolved Trades - Feedback Loop

IDMarketSideEntrySizeRealized EdgeP&LStatusDate
Loading...

Open Limit Orders

Loading limit orders...

Position Management

Loading position data...

Resolved Trades

--
win rate: --

Edge Hit Rate

--
predicted sign = realized sign

Edge Correlation

--
predicted vs realized edge

Mean Edge Error

--
bias: --

Model Hit Rate

--
direction correct at resolution

Stop-Loss Rate

--
stopped out before resolution

Post-Exit Correct

--
early exits later validated

Validated Trades

--
with real outcomes

Why No Trades? waiting for scan

Edge Accuracy by Event Type

TypeTradesHit RateCorrelationMAEP&L
No resolved trades yet

Predicted vs Realized Edge

Shows whether predicted edges match actual outcomes
Waiting for resolved trades...

Cycle History

Time Markets Opportunities Trades Unrealized P&L ΔR² Positions Bankroll
Loading...

Recent Alerts

Loading alerts...

Data Backfill

Need Backfill
--
Training Ready
--
Total Snapshots
--
Status
idle

Live Logs (0 lines)

Loading logs...

Supervisor Status

Checking...
Model: --

Model

--

System Confidence

--
Run supervisor review to get score

Ask the Supervisor

Ask Claude any question about your system's strategy, risk management, or improvements.

PolyEdge Documentation

Quantitative edge detection for Polymarket prediction markets.
Select a section below to learn more.

Getting Started
What PolyEdge does and how it works
What is PolyEdge?
PolyEdge is an automated quantitative trading system that finds mispricings on Polymarket — a platform where you can trade on the outcomes of real-world events (elections, crypto prices, sports, economics, and more). When a market's price doesn't match the true probability of an event happening, that gap is called "edge." PolyEdge detects those gaps and trades on them automatically.

How It Works

1
Collect Data

Every hour, PolyEdge pulls prices, order books, and volume data from hundreds of active Polymarket markets. Resolved markets (where the outcome is known) become training data.

2
Build a Model

A two-stage statistical model identifies where crowds make systematic mistakes. Stage 1 captures what the market already knows. Stage 2 corrects for behavioral biases like overvaluing longshots or underreacting to new information.

3
Validate on Unseen Data

Before any trading happens, the model must prove it beats market prices on data it has never seen. This out-of-sample testing prevents the system from fooling itself with overfitting.

4
Find and Size Opportunities

For each live market, PolyEdge compares its model probability to the market price. The difference is the edge. Position sizes are calculated using the Kelly Criterion (at a conservative 15% fraction) to balance growth with safety.

5
Manage Positions

Open positions are continuously monitored with stop-losses and EV-based exit decisions. An AI supervisor (Claude) performs periodic strategic reviews to catch issues the statistical model might miss.

Dashboard Overview

Overview — Live snapshot of P&L, capital deployed, model quality, current opportunities, and open positions.
Calibration — Model accuracy metrics, out-of-sample validation results, and edge decay tracking.
Trade History — Every trade the system has placed or simulated, with entry prices, model predictions, and realized P&L.
Positions — Active positions with realized stats, open limit orders, and hold/exit recommendations.
Docs — You're here. Deep-dive explanations of every concept used in the system.
PolyEdge runs in dry-run mode by default — all trades are simulated with no real money at risk. You can evaluate performance before deciding to go live.
📚
Foundational Concepts
Key statistical and financial terms explained in plain language
Prediction Market Probability
The price IS the crowd's guess at the chance of something happening.
On Polymarket, if "Will X happen?" trades at $0.70, the crowd thinks there's a 70% chance. If the event actually happens, the contract pays $1.00. If not, it pays $0.00. Our job is to find cases where the crowd's guess is wrong.
ΔR² (Delta R-Squared)
How much better our model is compared to just using market prices alone.
Think of it like a test score improvement. If the market price alone gets a 75% on predicting outcomes, and our model gets 77%, the ΔR² is that 2% improvement. Even small improvements matter because they compound over hundreds of trades. A ΔR² above 0% means our model adds real value.
A ΔR² of 2% might sound small, but across 1,000 trades it can mean the difference between losing money and consistent profits.
p-value
How confident we are that our results aren't just luck.
The p-value answers: "If our model had zero skill, what's the chance we'd see results this good by random luck?" A p-value of 0.01 means there's only a 1% chance the results are just luck. We typically need p < 0.05 (less than 5% chance of luck) to trust the model.
Flipping a coin and getting 7 heads in a row has a p-value of about 0.008 — unlikely enough that you'd suspect the coin is rigged. That's the same logic we apply to model performance.
Brier Score
A report card for probability predictions — lower is better.
The Brier score measures how close our probability predictions are to what actually happens. If we say there's a 90% chance of something and it happens, that's a good prediction. If we say 90% and it doesn't happen, that's a bad one. The score ranges from 0 (perfect) to 1 (terrible). A weather forecast that's always right gets 0. One that always says 50% gets 0.25.
Our model's Brier score: ~0.06 (very accurate). The raw market: ~0.08. That gap is where our edge comes from.
McFadden R²
How well our model explains why outcomes happen.
Similar to a regular R², but designed for yes/no outcomes like prediction markets. Values above 0.2 are considered good, above 0.4 is excellent. Our two-stage model typically achieves R² of 0.70–0.80, meaning it captures most of the factors that drive outcomes.
Kelly Criterion
A formula that tells you the optimal amount to bet based on your edge.
If you have a 60% edge on a fair coin flip, Kelly says bet 20% of your bankroll. Bet too much and a losing streak wipes you out. Bet too little and you leave money on the table. We use "fractional Kelly" (15% of the full Kelly amount) to be extra conservative — giving up some profit for much more safety.
Full Kelly says "bet $100." We bet $15 instead. We grow slower but survive the inevitable bad streaks.
Out-of-Sample (OOS) Testing
Testing the model on data it has never seen before, to prove it works for real.
It's easy to build a model that "predicts" the past perfectly — that's like memorizing the answers to a test you've already taken. OOS testing is like taking a brand new exam. We train the model on old data, then test it on newer data it's never touched. Only if it passes OOS testing do we allow it to trade.
Expected Calibration Error (ECE)
How honest our probability predictions are — when we say 70%, does it really happen 70% of the time?
ECE groups all predictions into buckets (e.g., all the times we said 60–70%) and checks if the actual rate matches. An ECE near 0 means perfectly calibrated. An ECE of 0.05 means predictions are off by about 5 percentage points on average.
🔍
How the Model Finds Edge
Two-stage logit modeling, walk-forward validation, and the OOS trading gate

PolyEdge uses a two-stage approach inspired by academic research on horse racing markets (Benter 1994). Here's how each piece fits together:

1
Data Collection

Every hour, the system pulls data from 200+ active Polymarket markets and 500+ resolved markets. It records prices, order book depth, volume, and timing information. Resolved markets (where we know the outcome) become our training data.

2
Stage 1: Baseline Model

First, we build a model using just the market price and basic features (volume, liquidity, time to close). This represents what the crowd already knows. Think of it as "the market is mostly right, but how right?"

3
Stage 2: Non-Linear Bias Correction

This is where the edge comes from. Stage 2 uses L2-regularized logistic regression with 48 features: 18 base features plus 30 event-type interaction terms across 6 market categories (political, crypto, sports, entertainment, geopolitical, economics). The regularization strength C is selected by cross-validation, not hardcoded.

A key insight comes from Snowberg & Wolfers (2010), who showed that the favorite-longshot bias isn't uniform — it's dramatically stronger at price extremes. A 5-cent longshot doesn't just have proportionally more FLB than a 30-cent contract; it has nonlinearly more. Similarly, heavy favorites above 90 cents are systematically underpriced. The model captures this with price-tier spline features (longshot_deep, longshot_zone, favorite_zone, favorite_deep) plus logit_squared for curvature and flb_asymmetric for the asymmetry between longshot and favorite mispricing. Each of these interacts with event type, so crypto longshots get different corrections than sports longshots (following Jullién & Salanié's insight that bias structure is context-dependent).

The ΔR² between Stage 1 and Stage 2 tells us exactly how much value these bias corrections add.

4
Prediction Blending

The model's raw prediction is blended with the market price in logit space (log-odds), not linearly. The blending weight is optimized by cross-validation alongside the regularization parameter — the system searches over a grid of weights from 10% to 50% model influence and picks the combination with the best out-of-sample Brier score. Currently the data selects a 50/50 blend, meaning the model earns equal weight with the market. This is justified because the model beats the market price in all 5 walk-forward CV folds consistently.

5
Multi-Outcome Coherence

Some events have many outcomes — "Who will win the Champions League?" might have 32 teams, each with a YES price. If you add up all those prices, they typically sum to more than $1.00 (often $1.30+). That excess is called the overround, and it isn't spread evenly: longshots carry a disproportionate share.

The Shin (1993) probability model, originally developed for horse racing, solves for a single parameter (z) representing the level of informed trading, then transforms all prices simultaneously into true probabilities that sum to exactly 1.0. The result is a coherent probability distribution that accounts for insider effects and the favorite-longshot bias across all outcomes at once.

This step is especially valuable for sports, where the Stage 2 bias correction slightly underperforms Stage 1. Sports events on Polymarket are almost always mutually exclusive multi-outcome markets with high overround and lots of longshots — exactly the structure Shin was designed for. The Shin probability is blended at 30% weight with the logit model prediction, and high-overround events (above 15%) are the primary edge source.

6
Walk-Forward Validation

We don't just test once. We use "walk-forward" testing with 5 expanding windows: train on months 1–2, test on month 3. Then train on months 1–3, test on month 4. And so on. Expanding-window temporal splits prevent data leakage by ensuring the model never sees future data during training. This simulates real trading conditions where you only know the past.

7
OOS Trading Gate

The system will not trade unless three conditions are met: (1) OOS ΔR² > 0, meaning the model beats market prices on unseen data; (2) OOS Brier improvement > 0, meaning predictions are more accurate; (3) at least 20 test observations, so the results are statistically meaningful. All three must pass.

8
Edge Detection & Sizing

For each live market, we compare our model's probability to the market price. The difference is the "edge." We then use Kelly Criterion (at 15% strength) to size positions proportionally to the edge, and apply risk limits to prevent overconcentration.

The full cycle runs automatically every hour: ingest data, retrain the model, validate OOS, detect biases, scan for edges, size positions, and run Claude's strategic review.
📈
Market Biases
Systematic crowd errors that create trading opportunities

Markets are mostly efficient, but crowds make systematic errors. These errors are our opportunity.

Favorite-Longshot Bias (FLB)
People overpay for longshots and underpay for favorites.
Just like lottery tickets, people are drawn to low-probability, high-payout bets. A contract trading at $0.05 (5% chance) might really only have a 2% chance. Meanwhile, contracts at $0.95 (95% chance) might actually be worth $0.97. This creates a systematic tilt we can exploit. When b1 > 1.5, this bias is strong.
If a market says "Event X" has a 5% chance and the true probability is 2%, that's a 3-cent overvaluation on every contract — the FLB at work.
Optimistic Bias
People tend to think good things are more likely than they really are.
When b0 (the intercept) is significantly different from 0, it means the crowd systematically overestimates or underestimates probabilities across the board, regardless of whether the event is likely or unlikely. A positive b0 means general overconfidence. We detect this per event type, so political markets might show different optimism than crypto markets.
Event-Specific Detection
Different types of events have different biases — we track each one separately.
Crypto markets behave differently from political markets. Sports bettors have different biases than economics watchers. PolyEdge classifies every market into a category (political, crypto, sports, entertainment, economics, geopolitical, science, other) and estimates biases for each type independently.
Bayesian Shrinkage
When we don't have enough data for a category, we blend its estimate with the overall average to stay safe.
Say we only have 15 resolved sports markets — too few to be confident. Bayesian shrinkage automatically blends the sports-specific bias estimate with the overall (all-market) estimate. The fewer observations we have, the more we lean on the overall average. With 1,000+ observations, we trust the category-specific estimate almost entirely. This prevents us from overreacting to small, noisy samples.
Sports bias with only 15 samples: 95% overall average + 5% sports-specific. Sports bias with 500 samples: 4% overall average + 96% sports-specific.
🧠
Behavioral Features
Psychology-driven signals that predict mispricing

Beyond the classic biases, we track 5 behavioral signals rooted in psychology research that create predictable mispricing.

Volume Momentum (Anchoring Bias)
When prices move away from where most of the trading happened, people anchored to the old price create an opportunity.
We compute a Volume-Weighted Average Price (VWAP) for each market, which represents the "consensus price" weighted by how much was traded at each level. When the current price diverges from VWAP, it often signals that the market has moved but participants are still anchored to old levels.
Deadline Effect (Certainty Illusion)
As events approach resolution, uncertainty increases but people act as if they're more certain.
This combines price uncertainty (how volatile the price is) with time pressure (how close to expiry). Markets near their resolution date show a specific kind of mispricing where participants overreact to late-breaking information or freeze up and stop updating. We capture this by multiplying uncertainty by an exponential time-pressure function.
Cluster Divergence (Narrow Framing)
People treat each market in isolation instead of comparing it to similar events.
If 10 crypto markets all predict different prices for similar events, the ones that diverge most from the group average may be mispriced. Narrow framing means people evaluate each market individually without considering how similar markets are priced. We measure how far each market's price deviates from its event-type average.
Nonlinear FLB (Extreme Price Interaction)
The Favorite-Longshot bias gets dramatically worse at extreme prices.
Snowberg & Wolfers (2010) showed that FLB isn't uniform across prices. At moderate prices (30–70 cents), the bias is mild. But at extreme prices (below 10 cents or above 90 cents), it amplifies nonlinearly. A 5-cent contract doesn't just have proportionally more FLB than a 30-cent contract — it has dramatically more. Similarly, heavy favorites above 90 cents are systematically underpriced. We capture this with price-tier spline features: longshot_deep (below 10c), longshot_zone (below 25c), favorite_zone (above 75c), favorite_deep (above 90c), plus logit_squared for curvature and flb_asymmetric (p²(1-p)) for the asymmetry between longshot and favorite mispricing.
Informed Trading Pressure (Adverse Selection)
Detecting when traders with inside knowledge are pushing the price.
When the order book is imbalanced (more bids than asks, or vice versa) and the bid-ask spread is wide, it often signals that informed traders are positioning. Wide spreads mean market makers are nervous about being picked off by someone who knows more. We combine order book imbalance with spread tightness to detect these situations and figure out which direction the informed money is flowing.
🛡
Risk Controls
Multiple layers of protection against catastrophic losses
Dry-Run Mode (Default)
The system simulates trades without risking real money until you're ready.
By default, all trades are paper trades. The system records what it would have done, tracks simulated P&L, and lets you evaluate performance before going live. You need to provide a private key and explicitly disable dry-run mode to trade for real.
Fractional Kelly (25%)
We bet 25% of what the math says is optimal, trading growth for safety.
Full Kelly sizing maximizes long-run growth but creates stomach-churning swings. At 25% Kelly (kelly_fraction=0.25, configurable via POLYEDGE_KELLY_FRACTION), we capture significant expected growth with dramatically lower risk of drawdowns. This conservative approach is appropriate given model uncertainty in prediction markets, where edges are typically small and noisy.
Exposure Limits
Hard caps on how much is at risk at any time, overall and per category.
The system enforces maximum total exposure (% of bankroll at risk) and per-event-type limits. Even if the model finds 50 great opportunities in crypto, it won't load up all in one category. This prevents a single sector shock from devastating the portfolio.
Edge Cap (20% Maximum)
If the model claims an edge above 20%, we cap it — edges that big are usually errors.
In prediction markets, a 20% edge is enormous. If the model says "the true probability is 95% but the market says 60%," something is probably wrong with the model, not the market. Capping edges prevents overconfident predictions from creating outsized positions.
Overfitting Detection
If the model fits the training data too perfectly, we reject it — it's memorizing, not learning.
Two automatic triggers: R² > 0.95 (model explains 95%+ of variation, which is suspicious for noisy financial data) or Brier score < 0.001 (predictions are almost perfect, which shouldn't happen in uncertain markets). Either one causes the Stage 2 corrections to be thrown out for that cycle.
OOS Trading Gate
The model must prove it works on unseen data before any trades are allowed.
Three conditions must all be met: ΔR² > 0 (model beats market on new data), Brier improvement > 0 (more accurate predictions), and at least 20 test observations (enough data to be meaningful). This is the strictest safeguard — the system will sit idle rather than trade on an unvalidated model.
Isotonic Calibration
Fixes probability accuracy after Stage 2 bias correction, ensuring position sizes are properly calibrated.
Stage 2 improves discrimination (R²) but can worsen calibration (Brier score), meaning predicted probabilities drift from true frequencies. Since Kelly sizing depends on probability accuracy, not ranking, this matters for capital allocation. Isotonic regression fits a monotonic mapping from S2 predictions to observed outcomes, restoring calibration without destroying the ranking improvements. The system automatically checks whether isotonic calibration actually improves Brier score over Stage 1 — if not, it falls back to Stage 1 probabilities for sizing while still using Stage 2 for edge identification.
Trade Management
Lifecycle, exit decisions, urgency levels, and execution mechanics

Once a position is open, the system continuously monitors it and decides whether to hold or fully exit. There are no fixed hold-time rules — every decision is driven by math.

Trade Lifecycle

Every trade follows this primary status progression (dry-run trades use parallel simulated statuses):

1
Submitted

A buy order has been placed on the Polymarket CLOB (Central Limit Order Book). A corresponding limit order record tracks the on-exchange order. The trade is waiting to be filled.

2
Filled

The buy order has been matched. The position is now active and being monitored by the position manager every cycle. Mark-to-market P&L is tracked continuously.

3
Exit Pending

The system has decided to exit and placed a sell limit order on the CLOB. The trade waits for the sell to fill. If the sell order expires without filling, the trade returns to "filled" status and the system retries with increased urgency.

4
Exited

The sell order has filled. Realized P&L is computed, edge accuracy metrics are updated, and the trade enters the feedback loop for model improvement. Alternatively, a trade can reach "resolved" if the market closes and settles at $0 or $1.

Position Review Cycle

Every few minutes, the scanner runs review_positions(), which does the following for each open position:

Mark-to-Market
Fetches the current price for every held token from the CLOB, computing unrealized P&L for each position.
Re-evaluate Model Probability
The model re-runs its probability estimate using the latest market data. If the model's conviction has changed since entry, this feeds into the confidence factor used by the EV exit framework.
Fetch Orderbook Depth
Pulls the current bid-side orderbook from the CLOB to assess exit liquidity. This determines how large an exit can be without excessive slippage.
Inject Opportunity Cost
The best annualized growth rate found during the latest market scan is passed to each position as the "opportunity cost of capital." If better opportunities exist elsewhere, the bar for holding a position gets higher. Falls back to 5% APY risk-free rate when no new opportunities are available.
After building context, the scanner hands everything to the Position Manager, which runs each trade through the exit decision hierarchy.

Exit Decision Hierarchy

Each position is evaluated through a strict priority chain. The first rule that triggers wins — no further checks are needed.

1
Stop-Loss — Hard loss protection (highest priority)
2
EV Comparison — Is the remaining thesis worth the locked capital?

Stop-Loss

Three independent stop-loss triggers protect against different failure modes:

Hard Stop-Loss (25%)
Exit immediately if unrealized loss exceeds 25% of entry price.
If the current price drops more than 25% below the entry price, the position is exited at CRITICAL urgency. This is a worst-case capital preservation rule that fires regardless of model conviction.
Thesis Invalidation
Exit when the model's edge turns negative — the original reason for the trade no longer exists.
If the model's current predicted probability implies a negative edge (below −0.02 threshold), the thesis behind the trade is broken. The model now thinks the market is right or even underpriced in the opposite direction. Exit at CRITICAL urgency.
Stale Position
Exit flat positions that have consumed 168+ hours of capital with no meaningful movement.
If a position has been open for 168 hours (7 days) or more and hasn't produced meaningful profit or loss, the capital is better deployed elsewhere. This is an anti-opportunity-cost mechanism that prevents dead capital from sitting idle.

EV Exit Framework

The core exit decision. If stop-loss hasn't triggered, the system asks: "Is the remaining edge in this position worth more than what this capital could earn elsewhere?"

EV Hold
Expected value of keeping the position open until resolution.
ev_hold = (residual_edge × confidence_factor − exit_cost_at_resolution) × capital_locked

Residual Edge: The current model edge — how much the model still thinks the market is mispriced right now.
Confidence Factor: Current edge divided by the original entry edge, clamped between 0.1 and 1.0. If the model's conviction has dropped from a 10% edge to a 3% edge, the confidence factor is 0.3 — the system trusts the remaining edge less.
Exit Cost at Resolution: An illiquidity risk premium that grows as the market approaches resolution. Near expiry, orderbooks thin out and exit costs spike, making it harder to get out if the thesis breaks down late.
EV Exit
Expected value of exiting the position now and redeploying capital.
ev_exit = −exit_cost_now × capital_locked + opportunity_cost_annualized × capital_locked × time_remaining_years

Exit Cost Now: The immediate cost of selling — primarily the bid-ask spread and estimated market impact. This is always negative (you lose money to exit).
Opportunity Cost: The best annualized growth rate currently available from the latest market scan. If there's a new opportunity offering 50% annualized growth, holding a position with 5% annualized edge becomes expensive. Falls back to 5% APY risk-free rate when no opportunities are available.
The decision: When ev_exit > ev_hold, the system exits. The remaining thesis isn't worth the capital. The capital is better off earning the opportunity cost elsewhere. This is a pure economic comparison — no hold-time gates, no arbitrary thresholds.
Edge Captured %
Computed as an output diagnostic only — it shows what percentage of the original edge has been realized by the price moving toward the model's prediction. This metric is never used as an input signal for the exit decision. It appears on the Positions tab for informational purposes.

Urgency System

Once an exit decision is made, the system assigns an urgency level that controls how aggressively it executes:

CRITICAL
Triggers: Hard stop-loss, thesis invalidation, 3+ failed exit attempts
Execution: Market exit — sell at best bid minus $0.02 with 2-minute expiry. If unfilled, retry at best bid minus $0.05. Designed to get out immediately at any reasonable price.
HIGH
Triggers: Stale position, 2 failed exit attempts
Execution: Aggressive limit order at best bid minus $0.01 with 5-minute expiry.
NORMAL
Triggers: High EV advantage (ev_exit significantly exceeds ev_hold)
Execution: Limit order at best bid with 15-minute expiry.
PATIENT
Triggers: Low EV advantage (ev_exit barely exceeds ev_hold)
Execution: Passive limit order at best bid plus $0.01 with 30-minute expiry. Waits for a slightly better fill.
Auto-Escalation: If an exit limit order expires without filling, the trade returns to "filled" status and is re-evaluated next cycle with escalated urgency. The second attempt goes to HIGH, the third and beyond go to CRITICAL. This guarantees exits eventually complete.

Exit Execution

The urgency level determines which execution method is used:

execute_exit()
The standard exit path. Fetches the current best bid from the CLOB, applies a price offset based on urgency (e.g., −$0.01 for HIGH), and places a sell limit order with a time-limited expiry. Creates a LimitOrder record with purpose=exit to track the order lifecycle. The trade status moves to "exit_pending" and P&L is deferred until fill is confirmed.
execute_market_exit()
Used for CRITICAL urgency only. Wraps execute_exit() with aggressive parameters: uses the CLOB best bid (with $0.01 as the absolute floor if the bid is unavailable), applies a −$0.02 price offset, and sets a 2-minute expiry. This effectively creates a market-order-like fill by undercutting the best available bids. If the first attempt fails, retries with an even more aggressive −$0.05 offset.
reprice_exit_order()
Handles stale exit orders. Cancels the existing sell order on the CLOB, verifies the cancellation succeeded (to prevent duplicate live orders), then places a new sell order with updated pricing. This is used when the market has moved and the original exit price is no longer competitive.
Exit Order Lifecycle
After an exit limit order is placed, the system monitors it each cycle:

Fill confirmed: Realized P&L is computed (exit price − entry price × shares). The trade status moves to "exited." Edge accuracy metrics (realized_edge, edge_error) are calculated and fed back into the model improvement loop.
Order cancelled/expired: The trade status is restored to "filled" so the position manager can re-evaluate it next cycle. The exit attempt counter increments, automatically escalating urgency on the next try.
Partial fill: Trade size is reduced by the filled amount, partial P&L is recorded, and the remainder stays open for continued management.

Market Impact Protection

Exit Impact Analysis
Before executing, the system checks if the exit size will move the market against you.
The market impact model analyzes the sell-side (bid) orderbook depth to determine the maximum exit size that won't cause excessive slippage. If the intended exit is larger than available liquidity, the system caps the close percentage and exits in smaller chunks across multiple cycles rather than forcing a large order through a thin book.

Exit Audit Trail

Exit Details
Every exit decision is recorded with full parameters for post-analysis.
When an exit is initiated, the trade record stores three fields:

exit_reason: Why the exit was triggered (e.g., "hard_stop_loss", "thesis_invalidated", "stale_position", "ev_exit_favorable").
exit_price: The price at which the sell order was placed.
exit_details: A JSON object containing the full decision parameters — ev_hold, ev_exit, confidence_factor, edge_residual, capital_locked, opportunity_cost_annualized, and more. This enables post-hoc analysis of whether exits were optimal and feeds into the AI trade analysis.
📊
Model Performance
Walk-forward OOS results and validation methodology
OOS McFadden ΔR² is +1.10% across 5 walk-forward folds with 3,620 test samples. This proves the model adds value beyond market prices on completely unseen data.
Walk-Forward OOS Results
Model tested on unseen data across 5 expanding time windows, proving genuine edge.
The walk-forward validation uses expanding-window temporal splits to prevent data leakage. Results by fold:
Fold 1: Near zero ΔR² (small training set, expected)
Fold 2: Near zero ΔR² (still building training data)
Fold 3: +1.35% ΔR² (model begins to show edge)
Fold 4: +2.52% ΔR² (strongest fold, large training set)
Fold 5: +1.91% ΔR² (consistent positive performance)

Average across all 5 folds: +1.10% with 3,620 total test observations.
L2-Regularized Stage 2
Stage 2 uses penalized logistic regression to prevent overfitting on noisy market data.
The switch from unregularized statsmodels to sklearn's LogisticRegression with C=0.1 (strong L2 penalty) dramatically improved OOS performance. The regularization shrinks coefficients toward zero, preventing the model from fitting to noise in the training data. Combined with reducing from 13 features to 6 core features (stage1_logit, price_stage1_diff, depth_imbalance, price_uncertainty, log_time, flb_correction), this eliminates multicollinearity and produces more stable predictions.
Prediction Shrinkage
Final predictions blend 60% model + 40% market price for conservative, stable estimates.
With shrinkage_factor=0.6, the final probability is: p_final = 0.6 * p_model + 0.4 * p_market. This acknowledges that market prices contain valuable information and prevents the model from making extreme predictions. Shrinkage is a standard technique in statistical forecasting that trades a small amount of in-sample fit for significantly better out-of-sample stability.
Folds 1-2 showing near-zero ΔR² is expected and healthy — it confirms the model needs sufficient training data before it can beat the market. The consistent improvement in Folds 3-5 demonstrates genuine learning, not overfitting.
💻
Reading the Dashboard
Guide to each tab, column, and metric on the interface

Overview Tab

Your command center. The four cards at the top show unrealized P&L, capital deployed, model status (ΔR²), and edge quality. Below that you'll find top positions, OOS validation status, bias detection by event type, and current opportunities. If you're an admin, action buttons for manual cycles also appear here.

Current Opportunities

Shown on the Overview tab, this table lists markets where the model detects mispricing of at least the configured minimum edge threshold (currently 7%). The "Edge" column is the percentage difference between our model's probability and the market price. "Kelly" shows the optimal bet fraction. "Size" shows the dollar amount. Opportunities are refreshed automatically every 5 minutes by the tactical scheduler.

Ann. Edge — The edge scaled to a yearly rate: edge × (8760 / hours_to_close), capped at 365x turnover. A 2% edge on a market closing in 24 hours is worth 730% annualized, while the same edge on a 30-day market is only 24%. This is why short-duration trades often rank higher.

Opp Cost — The opportunity cost of locking capital in this trade instead of earning the 5% APY risk-free rate. For short-duration trades this is negligible; for multi-week holds it can eat into real returns.

Time — Hours or days until the market closes. Combined with Ann. Edge, this tells you how efficiently your capital would be deployed.

Calibration Tab

Model performance and calibration diagnostics combined. Stage 1/Stage 2 R² and ΔR² show the model improvement from bias correction. OOS (out-of-sample) metrics validate on unseen data. ECE should be under 0.05. The reliability diagram shows calibration visually. Edge decay tracks whether open trades' edges are holding up or deteriorating over time.

Trade History Tab

Every trade the system has placed (or simulated in dry-run). Check the "Model Prob" vs "Price" columns to see the predicted edge. Resolved trades show realized P&L. Invalidated trades (dimmed rows) had edges so extreme they were likely model errors.

Positions Tab

Realized performance stats (P&L, win rate, ROI) at the top, followed by open limit orders waiting to fill, and active positions with hold/exit recommendations and EV metrics. The position manager evaluates each trade and suggests whether to hold or exit based on current edge, EV comparison, and risk parameters.

Ann. Edge and Opp Cost columns show each position's annualized edge and the opportunity cost of capital locked in it. Time Rem shows how long until the market closes. Click the magnifying glass button on any position to get a live AI analysis covering position health, feature drivers, capital efficiency, and a hold/exit recommendation with reasoning.

Admin-Only Tabs

The System Logs and Claude Supervisor tabs are visible only to admin users. System Logs shows edge accuracy metrics, trade execution diagnostics ("Why No Trades?"), cycle history, alerts, data backfill controls, and a live application log stream. The Claude Supervisor tab provides deep AI-powered strategic reviews with a confidence score and interactive Q&A.

📖
Glossary
Quick reference for all technical terms
AICModel quality score (lower = better fit with fewer variables)
BankrollTotal simulated capital available for trading
b0 (intercept)Overall optimism/pessimism bias level
b1 (slope)Favorite-Longshot bias strength (>1.5 = strong FLB)
Brier ScorePrediction accuracy: 0 = perfect, 0.25 = coin flip
CLOBCentral Limit Order Book — Polymarket's trading engine
ConvergingEdge is getting smaller over time (market agreeing with you)
Cross-FittingTraining and predicting on different data splits to avoid overfitting
DivergingEdge is growing over time (market moving against you)
ΔR²Improvement from Stage 1 to Stage 2 (>0 = model adds value)
Dry RunSimulated trading without real money
ECEExpected Calibration Error (<0.05 = well-calibrated)
EdgeDifference between model probability and market price
Annualized EdgeEdge × (8760 / hours_to_close), capped at 365x turnover. Short-duration edges are more valuable per unit of capital.
Opportunity CostAt 5% APY risk-free rate, every dollar locked in a position has an alternative use. Long-duration trades with small edges may not beat the risk-free rate.
Annualized Growth RateCombines the Kelly-optimal growth rate with how many times per year you could repeat the trade. This is how opportunities are ranked.
Expanding WindowCross-validation where the training set grows over time, never shrinks
ExposureTotal capital at risk across all open positions
FLBFavorite-Longshot Bias (longshots overpriced)
Kelly %Optimal bet size as fraction of bankroll (15% fractional)
L2 RegularizationA penalty that shrinks model coefficients toward zero to prevent overfitting
LR TestLikelihood Ratio test: checks if Stage 2 is a real improvement
Mark-to-MarketUpdating position values to current market prices
McFadden Pseudo-R²Measures how well a logit model explains outcomes compared to a null model
OOSOut-of-Sample — tested on data the model hasn't seen
P&LProfit and Loss (realized = closed, unrealized = open)
Prediction ShrinkageBlending model predictions with market prices (60/40) for stability
Shrinkage FactorHow much we trust category-specific vs overall estimate (0-1)
VWAPVolume-Weighted Average Price (consensus trading level)
Walk-Forward ValidationTesting method using expanding time windows to simulate real trading
Win RatePercentage of resolved trades that were profitable
Circuit BreakerAuto-halts trading when daily losses exceed a threshold
Edge DecayHow quickly an open position's edge shrinks over time
Limit OrderAn order to buy/sell at a specific price, waiting for a fill
EV HoldExpected value of keeping a position open until resolution
EV ExitExpected value of closing a position now and redeploying capital
Confidence FactorCurrent edge divided by original edge, measuring how much conviction remains
Isotonic CalibrationPost-hoc calibration layer on Stage 2 outputs that preserves ranking ability while restoring probability accuracy for Kelly sizing

Ask about...

Trade Thesis

Analyzing trade...