diff --git a/ROADMAP.md b/ROADMAP.md index d759044..5097934 100755 --- a/ROADMAP.md +++ b/ROADMAP.md @@ -72,6 +72,19 @@ Depends on: Phase 2 complete - Founder code locks rate - Status: NOT STARTED +## PHASE 4 — INTELLIGENCE (Backlog) + +### Feature 4.1 — Model Learning Loop (depends: 1.3 + 1.5) +- Settled bets feed back into grading weight analysis +- Track grade accuracy (A/B/C/D hit rates) per stat type +- Track signal accuracy (which deltas actually predict outcomes) +- Track kill condition effectiveness (do they prevent bad bets?) +- Auto-adjust grading weights with conservative learning rate +- Weight changes capped at 20% per cycle, min 50 picks per signal +- GET /api/model/accuracy (Desk tier) — current model stats +- GET /api/model/insights (Desk tier) — human-readable learnings +- Status: SPEC COMPLETE — ready to build + ## DEPENDENCY MAP ``` 1.1 (Odds API) ──┐ diff --git a/specs/feature-4-1-model-learning-loop.md b/specs/feature-4-1-model-learning-loop.md new file mode 100644 index 0000000..a4b97c1 --- /dev/null +++ b/specs/feature-4-1-model-learning-loop.md @@ -0,0 +1,330 @@ +# Feature 4.1 — Model Learning Loop + +## Overview +Closed-loop intelligence system. Every settled bet feeds back into the grading engine. Track which signals, kill conditions, and composite weights actually predict outcomes. Over time, the model self-calibrates — grades get sharper, kill conditions get validated or deprecated, and weight distribution shifts toward what works. + +## Dependencies +- Feature 1.3 — Prop Analysis Engine (grader.js weights to tune) +- Feature 1.5 — Bet Submission (settled bets with outcomes) +- Feature 1.4 — Database Schema (outcomes, picks, performance tables) + +## The Loop + +``` +User scans parlay → grades assigned with current weights + ↓ +User places bet → logs in tracker + ↓ +Game plays out → user settles bet (won/lost/push) + ↓ +System records: grade predicted X, actual result was Y + ↓ +Accumulate enough data (50+ settled picks per signal) + ↓ +Recalculate signal accuracy → adjust grading weights + ↓ +Next scan uses improved weights +``` + +## New Database Tables + +### grade_accuracy +Tracks accuracy of each grade level over time. + +```sql +CREATE TABLE public.grade_accuracy ( + id UUID PRIMARY KEY DEFAULT gen_random_uuid(), + grade TEXT NOT NULL CHECK (grade IN ('A', 'B', 'C', 'D')), + stat_type TEXT NOT NULL, + total_picks INT NOT NULL DEFAULT 0, + hits INT NOT NULL DEFAULT 0, + misses INT NOT NULL DEFAULT 0, + pushes INT NOT NULL DEFAULT 0, + hit_rate NUMERIC(5,2), + expected_hit_rate NUMERIC(5,2), + calculated_at TIMESTAMPTZ NOT NULL DEFAULT now() +); + +CREATE UNIQUE INDEX idx_grade_accuracy_unique ON public.grade_accuracy(grade, stat_type); +``` + +### signal_accuracy +Tracks how predictive each individual signal is. + +```sql +CREATE TABLE public.signal_accuracy ( + id UUID PRIMARY KEY DEFAULT gen_random_uuid(), + signal_name TEXT NOT NULL, + signal_value TEXT NOT NULL, + stat_type TEXT NOT NULL, + total_picks INT NOT NULL DEFAULT 0, + hits INT NOT NULL DEFAULT 0, + hit_rate NUMERIC(5,2), + avg_edge_when_hit NUMERIC(5,2), + avg_edge_when_miss NUMERIC(5,2), + predictive_score NUMERIC(5,2), + calculated_at TIMESTAMPTZ NOT NULL DEFAULT now() +); + +CREATE UNIQUE INDEX idx_signal_accuracy_unique ON public.signal_accuracy(signal_name, signal_value, stat_type); +``` + +### kill_condition_accuracy +Tracks whether kill conditions actually prevent bad bets. + +```sql +CREATE TABLE public.kill_condition_accuracy ( + id UUID PRIMARY KEY DEFAULT gen_random_uuid(), + kill_condition TEXT NOT NULL, + total_triggered INT NOT NULL DEFAULT 0, + picks_with_condition INT NOT NULL DEFAULT 0, + hits_with_condition INT NOT NULL DEFAULT 0, + hit_rate_with NUMERIC(5,2), + hit_rate_without NUMERIC(5,2), + effectiveness NUMERIC(5,2), + calculated_at TIMESTAMPTZ NOT NULL DEFAULT now() +); + +CREATE UNIQUE INDEX idx_kill_accuracy_unique ON public.kill_condition_accuracy(kill_condition); +``` + +### weight_history +Audit trail of weight changes over time. + +```sql +CREATE TABLE public.weight_history ( + id UUID PRIMARY KEY DEFAULT gen_random_uuid(), + weight_set JSONB NOT NULL, + reason TEXT NOT NULL, + sample_size INT NOT NULL, + created_at TIMESTAMPTZ NOT NULL DEFAULT now() +); +``` + +## Signal Tracking + +On every pick created (via parlay scan), store the individual signal values alongside the pick. This is already partially captured in `picks.reasoning`, but we need structured data for aggregation. + +### New column on picks table (migration 003): +```sql +ALTER TABLE public.picks ADD COLUMN signal_snapshot JSONB; +``` + +`signal_snapshot` stores: +```json +{ + "season_delta": 1.8, + "recent_delta": 2.3, + "situational_delta": 1.5, + "line_edge": 0.5, + "home_away_signal": "bullish", + "rest_signal": "neutral", + "vs_opponent_signal": "strong_bullish", + "kill_conditions": ["blowout_risk"], + "composite": 2.1, + "weights_version": 1 +} +``` + +## Accuracy Calculation Pipeline + +Triggered periodically (on every 10th bet settlement, or daily cron). + +### Step 1: Grade Accuracy +``` +For each grade (A, B, C, D) × stat_type: + - Count settled picks with that grade + stat + - Count hits (outcome = 'hit') + - Calculate hit_rate = hits / total * 100 + - Compare to expected: + - A should hit ~70-80% + - B should hit ~55-65% + - C should hit ~45-55% + - D should hit ~30-40% + - If actual diverges from expected by >10%: flag for weight adjustment +``` + +### Step 2: Signal Accuracy +``` +For each signal (season_delta, recent_delta, etc.): + - Group picks by signal_value bucket: + - "strong_bullish" (delta >= 4) + - "bullish" (2-4) + - "lean" (0.5-2) + - "neutral" (< 0.5) + - bearish equivalents + - For each bucket: + - hit_rate = hits / total + - avg_edge_when_hit vs avg_edge_when_miss + - predictive_score = (hit_rate - 0.5) * log(total) + (rewards accuracy AND sample size) +``` + +### Step 3: Kill Condition Effectiveness +``` +For each kill condition: + - Picks where this condition triggered: hit_rate_with + - Picks where this condition did NOT trigger: hit_rate_without + - effectiveness = hit_rate_without - hit_rate_with + (positive = the kill condition correctly identifies bad bets) + - If effectiveness < 5%: kill condition may not be useful + - If effectiveness > 20%: kill condition is highly predictive +``` + +### Step 4: Weight Adjustment +``` +Current weights: season=1.0, recent=1.5, situational=1.2, lineEdge=0.8 + +If a signal's predictive_score is higher than its current weight influence: + → Increase that weight +If a signal's predictive_score is lower: + → Decrease that weight + +Adjustment formula: + new_weight = current_weight * (1 + (predictive_score - baseline) * learning_rate) + learning_rate = 0.1 (conservative — small steps) + +Constraints: + - No weight can drop below 0.3 or exceed 3.0 + - Total weight sum stays within 3.5-5.5 range + - Changes capped at 20% per adjustment cycle + - Minimum 50 picks per signal before adjusting + +Store new weights in weight_history. +Apply new weights to grader.js (load from DB on startup, fallback to defaults). +``` + +## Endpoints + +### GET /api/model/accuracy (auth required, Desk tier only) +Returns current model accuracy stats. + +**Response (200):** +```json +{ + "grade_accuracy": [ + { "grade": "A", "stat_type": "points", "total": 120, "hit_rate": 72.5, "expected": 75.0 }, + { "grade": "B", "stat_type": "points", "total": 200, "hit_rate": 58.0, "expected": 60.0 } + ], + "signal_accuracy": [ + { "signal": "recent_delta", "value": "bullish", "stat_type": "points", "hit_rate": 68.0, "predictive_score": 4.2 } + ], + "kill_condition_effectiveness": [ + { "condition": "blowout_risk", "effectiveness": 22.5, "triggered": 45 }, + { "condition": "low_minutes", "effectiveness": 18.0, "triggered": 30 } + ], + "current_weights": { + "season": 1.0, "recent": 1.5, "situational": 1.2, "lineEdge": 0.8, + "version": 3, "last_updated": "2026-04-15T00:00:00Z" + }, + "total_settled_picks": 850, + "model_confidence": "high" +} +``` + +### GET /api/model/insights (auth required, Desk tier only) +Returns human-readable insights from the learning loop. + +**Response (200):** +```json +{ + "insights": [ + { + "type": "signal_outperforming", + "message": "Recent form (last 10 games) is the strongest predictor for points props. It outperforms season average by 12%.", + "action": "Recent form weight increased from 1.5 to 1.65." + }, + { + "type": "kill_condition_validated", + "message": "blowout_risk is your most effective kill condition. Props in blowout games hit 15% less often.", + "action": "No change needed — working as designed." + }, + { + "type": "grade_calibration", + "message": "Grade A picks on rebounds are hitting at 68% instead of expected 75%. Sample is small (40 picks) — monitoring.", + "action": "No weight change yet. Need 50+ picks to adjust." + } + ], + "next_recalculation_at": "2026-04-20T00:00:00Z" +} +``` + +## Service Architecture + +``` +src/ +├── services/ +│ ├── modelLearningService.js # Orchestrator: triggers accuracy calc + weight adjustment +│ ├── accuracyCalculator.js # Grade, signal, kill condition accuracy from settled data +│ └── weightAdjuster.js # Computes new weights, stores history, applies to grader +├── routes/ +│ └── model.js # GET /api/model/accuracy, GET /api/model/insights +``` + +## Integration Points + +### On bet settlement (betService.js): +``` +After settling a bet: + 1. Check total settled picks for this user + 2. Every 10th settlement: trigger modelLearningService.recalculate() + 3. This is global (not per-user) — all users' data feeds the model +``` + +### On pick creation (parlayScanService.js): +``` +When creating a pick: + 1. Attach signal_snapshot JSONB with all signal values + current weights version + 2. This enables retrospective analysis of which weights were active when the pick was made +``` + +### On grader startup (grader.js): +``` +On first call: + 1. Load latest weight_set from weight_history table + 2. If no weights in DB: use hardcoded defaults + 3. Cache weights in memory, refresh every hour +``` + +## Acceptance Criteria + +1. Every settled pick updates grade_accuracy, signal_accuracy, and kill_condition_accuracy tables +2. Grade accuracy tracks hit rate per grade per stat type +3. Signal accuracy tracks predictive score per signal per stat type +4. Kill condition effectiveness measures hit_rate_with vs hit_rate_without +5. Weight adjustment runs after every 10th settlement (global) +6. Weight changes are capped at 20% per cycle, bounded 0.3-3.0 +7. Weight history is stored with reason and sample size +8. `GET /api/model/accuracy` returns current stats (Desk tier only) +9. `GET /api/model/insights` returns human-readable insights +10. signal_snapshot JSONB attached to every new pick +11. Grader loads weights from DB on startup, falls back to defaults +12. Minimum 50 picks per signal before weight adjustment triggers + +## Test Plan + +### Unit Tests (accuracyCalculator.js) +- Correctly computes hit rate from settled picks +- Groups by grade + stat_type +- Groups by signal + value + stat_type +- Kill condition effectiveness: difference between with/without +- Handles zero settled picks gracefully + +### Unit Tests (weightAdjuster.js) +- Increases weight when predictive_score exceeds baseline +- Decreases weight when predictive_score below baseline +- Caps changes at 20% per cycle +- Enforces min/max bounds (0.3-3.0) +- Stores weight history with correct reason +- Does not adjust with < 50 picks per signal + +### Integration Tests +- Full loop: create pick with signal_snapshot → settle → accuracy updated → weights adjusted +- GET /api/model/accuracy returns correct stats +- GET /api/model/insights generates relevant insights +- Desk tier only: free/analyst get 403 +- Weight changes reflected in next grading call + +## Open Questions +- **Global vs per-user model:** This spec uses a global model (all users' data combined). Per-user models would require significantly more data. Global is correct for MVP — the model learns from collective intelligence. Per-user customization can layer on top later. +- **Cold start:** With < 50 settled picks, no adjustments fire. The hardcoded defaults carry the system until enough data accumulates. This is intentional — bad adjustments on small samples would be worse than no adjustments.