spec: Feature 4.1 — Model Learning Loop (Phase 4 backlog)
Closed-loop intelligence: settled bets feed back into grading weights. - Grade accuracy tracking per stat type (A/B/C/D hit rates) - Signal accuracy tracking (which deltas predict outcomes) - Kill condition effectiveness (hit_rate_with vs without) - Conservative weight adjustment (20% cap, 50-pick minimum) - 4 new DB tables: grade_accuracy, signal_accuracy, kill_condition_accuracy, weight_history - Desk-tier endpoints: /api/model/accuracy, /api/model/insights Spec complete, ready to build when Phase 3 deployment is stable. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
+13
@@ -72,6 +72,19 @@ Depends on: Phase 2 complete
|
|||||||
- Founder code locks rate
|
- Founder code locks rate
|
||||||
- Status: NOT STARTED
|
- Status: NOT STARTED
|
||||||
|
|
||||||
|
## PHASE 4 — INTELLIGENCE (Backlog)
|
||||||
|
|
||||||
|
### Feature 4.1 — Model Learning Loop (depends: 1.3 + 1.5)
|
||||||
|
- Settled bets feed back into grading weight analysis
|
||||||
|
- Track grade accuracy (A/B/C/D hit rates) per stat type
|
||||||
|
- Track signal accuracy (which deltas actually predict outcomes)
|
||||||
|
- Track kill condition effectiveness (do they prevent bad bets?)
|
||||||
|
- Auto-adjust grading weights with conservative learning rate
|
||||||
|
- Weight changes capped at 20% per cycle, min 50 picks per signal
|
||||||
|
- GET /api/model/accuracy (Desk tier) — current model stats
|
||||||
|
- GET /api/model/insights (Desk tier) — human-readable learnings
|
||||||
|
- Status: SPEC COMPLETE — ready to build
|
||||||
|
|
||||||
## DEPENDENCY MAP
|
## DEPENDENCY MAP
|
||||||
```
|
```
|
||||||
1.1 (Odds API) ──┐
|
1.1 (Odds API) ──┐
|
||||||
|
|||||||
@@ -0,0 +1,330 @@
|
|||||||
|
# Feature 4.1 — Model Learning Loop
|
||||||
|
|
||||||
|
## Overview
|
||||||
|
Closed-loop intelligence system. Every settled bet feeds back into the grading engine. Track which signals, kill conditions, and composite weights actually predict outcomes. Over time, the model self-calibrates — grades get sharper, kill conditions get validated or deprecated, and weight distribution shifts toward what works.
|
||||||
|
|
||||||
|
## Dependencies
|
||||||
|
- Feature 1.3 — Prop Analysis Engine (grader.js weights to tune)
|
||||||
|
- Feature 1.5 — Bet Submission (settled bets with outcomes)
|
||||||
|
- Feature 1.4 — Database Schema (outcomes, picks, performance tables)
|
||||||
|
|
||||||
|
## The Loop
|
||||||
|
|
||||||
|
```
|
||||||
|
User scans parlay → grades assigned with current weights
|
||||||
|
↓
|
||||||
|
User places bet → logs in tracker
|
||||||
|
↓
|
||||||
|
Game plays out → user settles bet (won/lost/push)
|
||||||
|
↓
|
||||||
|
System records: grade predicted X, actual result was Y
|
||||||
|
↓
|
||||||
|
Accumulate enough data (50+ settled picks per signal)
|
||||||
|
↓
|
||||||
|
Recalculate signal accuracy → adjust grading weights
|
||||||
|
↓
|
||||||
|
Next scan uses improved weights
|
||||||
|
```
|
||||||
|
|
||||||
|
## New Database Tables
|
||||||
|
|
||||||
|
### grade_accuracy
|
||||||
|
Tracks accuracy of each grade level over time.
|
||||||
|
|
||||||
|
```sql
|
||||||
|
CREATE TABLE public.grade_accuracy (
|
||||||
|
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
|
||||||
|
grade TEXT NOT NULL CHECK (grade IN ('A', 'B', 'C', 'D')),
|
||||||
|
stat_type TEXT NOT NULL,
|
||||||
|
total_picks INT NOT NULL DEFAULT 0,
|
||||||
|
hits INT NOT NULL DEFAULT 0,
|
||||||
|
misses INT NOT NULL DEFAULT 0,
|
||||||
|
pushes INT NOT NULL DEFAULT 0,
|
||||||
|
hit_rate NUMERIC(5,2),
|
||||||
|
expected_hit_rate NUMERIC(5,2),
|
||||||
|
calculated_at TIMESTAMPTZ NOT NULL DEFAULT now()
|
||||||
|
);
|
||||||
|
|
||||||
|
CREATE UNIQUE INDEX idx_grade_accuracy_unique ON public.grade_accuracy(grade, stat_type);
|
||||||
|
```
|
||||||
|
|
||||||
|
### signal_accuracy
|
||||||
|
Tracks how predictive each individual signal is.
|
||||||
|
|
||||||
|
```sql
|
||||||
|
CREATE TABLE public.signal_accuracy (
|
||||||
|
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
|
||||||
|
signal_name TEXT NOT NULL,
|
||||||
|
signal_value TEXT NOT NULL,
|
||||||
|
stat_type TEXT NOT NULL,
|
||||||
|
total_picks INT NOT NULL DEFAULT 0,
|
||||||
|
hits INT NOT NULL DEFAULT 0,
|
||||||
|
hit_rate NUMERIC(5,2),
|
||||||
|
avg_edge_when_hit NUMERIC(5,2),
|
||||||
|
avg_edge_when_miss NUMERIC(5,2),
|
||||||
|
predictive_score NUMERIC(5,2),
|
||||||
|
calculated_at TIMESTAMPTZ NOT NULL DEFAULT now()
|
||||||
|
);
|
||||||
|
|
||||||
|
CREATE UNIQUE INDEX idx_signal_accuracy_unique ON public.signal_accuracy(signal_name, signal_value, stat_type);
|
||||||
|
```
|
||||||
|
|
||||||
|
### kill_condition_accuracy
|
||||||
|
Tracks whether kill conditions actually prevent bad bets.
|
||||||
|
|
||||||
|
```sql
|
||||||
|
CREATE TABLE public.kill_condition_accuracy (
|
||||||
|
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
|
||||||
|
kill_condition TEXT NOT NULL,
|
||||||
|
total_triggered INT NOT NULL DEFAULT 0,
|
||||||
|
picks_with_condition INT NOT NULL DEFAULT 0,
|
||||||
|
hits_with_condition INT NOT NULL DEFAULT 0,
|
||||||
|
hit_rate_with NUMERIC(5,2),
|
||||||
|
hit_rate_without NUMERIC(5,2),
|
||||||
|
effectiveness NUMERIC(5,2),
|
||||||
|
calculated_at TIMESTAMPTZ NOT NULL DEFAULT now()
|
||||||
|
);
|
||||||
|
|
||||||
|
CREATE UNIQUE INDEX idx_kill_accuracy_unique ON public.kill_condition_accuracy(kill_condition);
|
||||||
|
```
|
||||||
|
|
||||||
|
### weight_history
|
||||||
|
Audit trail of weight changes over time.
|
||||||
|
|
||||||
|
```sql
|
||||||
|
CREATE TABLE public.weight_history (
|
||||||
|
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
|
||||||
|
weight_set JSONB NOT NULL,
|
||||||
|
reason TEXT NOT NULL,
|
||||||
|
sample_size INT NOT NULL,
|
||||||
|
created_at TIMESTAMPTZ NOT NULL DEFAULT now()
|
||||||
|
);
|
||||||
|
```
|
||||||
|
|
||||||
|
## Signal Tracking
|
||||||
|
|
||||||
|
On every pick created (via parlay scan), store the individual signal values alongside the pick. This is already partially captured in `picks.reasoning`, but we need structured data for aggregation.
|
||||||
|
|
||||||
|
### New column on picks table (migration 003):
|
||||||
|
```sql
|
||||||
|
ALTER TABLE public.picks ADD COLUMN signal_snapshot JSONB;
|
||||||
|
```
|
||||||
|
|
||||||
|
`signal_snapshot` stores:
|
||||||
|
```json
|
||||||
|
{
|
||||||
|
"season_delta": 1.8,
|
||||||
|
"recent_delta": 2.3,
|
||||||
|
"situational_delta": 1.5,
|
||||||
|
"line_edge": 0.5,
|
||||||
|
"home_away_signal": "bullish",
|
||||||
|
"rest_signal": "neutral",
|
||||||
|
"vs_opponent_signal": "strong_bullish",
|
||||||
|
"kill_conditions": ["blowout_risk"],
|
||||||
|
"composite": 2.1,
|
||||||
|
"weights_version": 1
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
## Accuracy Calculation Pipeline
|
||||||
|
|
||||||
|
Triggered periodically (on every 10th bet settlement, or daily cron).
|
||||||
|
|
||||||
|
### Step 1: Grade Accuracy
|
||||||
|
```
|
||||||
|
For each grade (A, B, C, D) × stat_type:
|
||||||
|
- Count settled picks with that grade + stat
|
||||||
|
- Count hits (outcome = 'hit')
|
||||||
|
- Calculate hit_rate = hits / total * 100
|
||||||
|
- Compare to expected:
|
||||||
|
- A should hit ~70-80%
|
||||||
|
- B should hit ~55-65%
|
||||||
|
- C should hit ~45-55%
|
||||||
|
- D should hit ~30-40%
|
||||||
|
- If actual diverges from expected by >10%: flag for weight adjustment
|
||||||
|
```
|
||||||
|
|
||||||
|
### Step 2: Signal Accuracy
|
||||||
|
```
|
||||||
|
For each signal (season_delta, recent_delta, etc.):
|
||||||
|
- Group picks by signal_value bucket:
|
||||||
|
- "strong_bullish" (delta >= 4)
|
||||||
|
- "bullish" (2-4)
|
||||||
|
- "lean" (0.5-2)
|
||||||
|
- "neutral" (< 0.5)
|
||||||
|
- bearish equivalents
|
||||||
|
- For each bucket:
|
||||||
|
- hit_rate = hits / total
|
||||||
|
- avg_edge_when_hit vs avg_edge_when_miss
|
||||||
|
- predictive_score = (hit_rate - 0.5) * log(total)
|
||||||
|
(rewards accuracy AND sample size)
|
||||||
|
```
|
||||||
|
|
||||||
|
### Step 3: Kill Condition Effectiveness
|
||||||
|
```
|
||||||
|
For each kill condition:
|
||||||
|
- Picks where this condition triggered: hit_rate_with
|
||||||
|
- Picks where this condition did NOT trigger: hit_rate_without
|
||||||
|
- effectiveness = hit_rate_without - hit_rate_with
|
||||||
|
(positive = the kill condition correctly identifies bad bets)
|
||||||
|
- If effectiveness < 5%: kill condition may not be useful
|
||||||
|
- If effectiveness > 20%: kill condition is highly predictive
|
||||||
|
```
|
||||||
|
|
||||||
|
### Step 4: Weight Adjustment
|
||||||
|
```
|
||||||
|
Current weights: season=1.0, recent=1.5, situational=1.2, lineEdge=0.8
|
||||||
|
|
||||||
|
If a signal's predictive_score is higher than its current weight influence:
|
||||||
|
→ Increase that weight
|
||||||
|
If a signal's predictive_score is lower:
|
||||||
|
→ Decrease that weight
|
||||||
|
|
||||||
|
Adjustment formula:
|
||||||
|
new_weight = current_weight * (1 + (predictive_score - baseline) * learning_rate)
|
||||||
|
learning_rate = 0.1 (conservative — small steps)
|
||||||
|
|
||||||
|
Constraints:
|
||||||
|
- No weight can drop below 0.3 or exceed 3.0
|
||||||
|
- Total weight sum stays within 3.5-5.5 range
|
||||||
|
- Changes capped at 20% per adjustment cycle
|
||||||
|
- Minimum 50 picks per signal before adjusting
|
||||||
|
|
||||||
|
Store new weights in weight_history.
|
||||||
|
Apply new weights to grader.js (load from DB on startup, fallback to defaults).
|
||||||
|
```
|
||||||
|
|
||||||
|
## Endpoints
|
||||||
|
|
||||||
|
### GET /api/model/accuracy (auth required, Desk tier only)
|
||||||
|
Returns current model accuracy stats.
|
||||||
|
|
||||||
|
**Response (200):**
|
||||||
|
```json
|
||||||
|
{
|
||||||
|
"grade_accuracy": [
|
||||||
|
{ "grade": "A", "stat_type": "points", "total": 120, "hit_rate": 72.5, "expected": 75.0 },
|
||||||
|
{ "grade": "B", "stat_type": "points", "total": 200, "hit_rate": 58.0, "expected": 60.0 }
|
||||||
|
],
|
||||||
|
"signal_accuracy": [
|
||||||
|
{ "signal": "recent_delta", "value": "bullish", "stat_type": "points", "hit_rate": 68.0, "predictive_score": 4.2 }
|
||||||
|
],
|
||||||
|
"kill_condition_effectiveness": [
|
||||||
|
{ "condition": "blowout_risk", "effectiveness": 22.5, "triggered": 45 },
|
||||||
|
{ "condition": "low_minutes", "effectiveness": 18.0, "triggered": 30 }
|
||||||
|
],
|
||||||
|
"current_weights": {
|
||||||
|
"season": 1.0, "recent": 1.5, "situational": 1.2, "lineEdge": 0.8,
|
||||||
|
"version": 3, "last_updated": "2026-04-15T00:00:00Z"
|
||||||
|
},
|
||||||
|
"total_settled_picks": 850,
|
||||||
|
"model_confidence": "high"
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
### GET /api/model/insights (auth required, Desk tier only)
|
||||||
|
Returns human-readable insights from the learning loop.
|
||||||
|
|
||||||
|
**Response (200):**
|
||||||
|
```json
|
||||||
|
{
|
||||||
|
"insights": [
|
||||||
|
{
|
||||||
|
"type": "signal_outperforming",
|
||||||
|
"message": "Recent form (last 10 games) is the strongest predictor for points props. It outperforms season average by 12%.",
|
||||||
|
"action": "Recent form weight increased from 1.5 to 1.65."
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"type": "kill_condition_validated",
|
||||||
|
"message": "blowout_risk is your most effective kill condition. Props in blowout games hit 15% less often.",
|
||||||
|
"action": "No change needed — working as designed."
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"type": "grade_calibration",
|
||||||
|
"message": "Grade A picks on rebounds are hitting at 68% instead of expected 75%. Sample is small (40 picks) — monitoring.",
|
||||||
|
"action": "No weight change yet. Need 50+ picks to adjust."
|
||||||
|
}
|
||||||
|
],
|
||||||
|
"next_recalculation_at": "2026-04-20T00:00:00Z"
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
## Service Architecture
|
||||||
|
|
||||||
|
```
|
||||||
|
src/
|
||||||
|
├── services/
|
||||||
|
│ ├── modelLearningService.js # Orchestrator: triggers accuracy calc + weight adjustment
|
||||||
|
│ ├── accuracyCalculator.js # Grade, signal, kill condition accuracy from settled data
|
||||||
|
│ └── weightAdjuster.js # Computes new weights, stores history, applies to grader
|
||||||
|
├── routes/
|
||||||
|
│ └── model.js # GET /api/model/accuracy, GET /api/model/insights
|
||||||
|
```
|
||||||
|
|
||||||
|
## Integration Points
|
||||||
|
|
||||||
|
### On bet settlement (betService.js):
|
||||||
|
```
|
||||||
|
After settling a bet:
|
||||||
|
1. Check total settled picks for this user
|
||||||
|
2. Every 10th settlement: trigger modelLearningService.recalculate()
|
||||||
|
3. This is global (not per-user) — all users' data feeds the model
|
||||||
|
```
|
||||||
|
|
||||||
|
### On pick creation (parlayScanService.js):
|
||||||
|
```
|
||||||
|
When creating a pick:
|
||||||
|
1. Attach signal_snapshot JSONB with all signal values + current weights version
|
||||||
|
2. This enables retrospective analysis of which weights were active when the pick was made
|
||||||
|
```
|
||||||
|
|
||||||
|
### On grader startup (grader.js):
|
||||||
|
```
|
||||||
|
On first call:
|
||||||
|
1. Load latest weight_set from weight_history table
|
||||||
|
2. If no weights in DB: use hardcoded defaults
|
||||||
|
3. Cache weights in memory, refresh every hour
|
||||||
|
```
|
||||||
|
|
||||||
|
## Acceptance Criteria
|
||||||
|
|
||||||
|
1. Every settled pick updates grade_accuracy, signal_accuracy, and kill_condition_accuracy tables
|
||||||
|
2. Grade accuracy tracks hit rate per grade per stat type
|
||||||
|
3. Signal accuracy tracks predictive score per signal per stat type
|
||||||
|
4. Kill condition effectiveness measures hit_rate_with vs hit_rate_without
|
||||||
|
5. Weight adjustment runs after every 10th settlement (global)
|
||||||
|
6. Weight changes are capped at 20% per cycle, bounded 0.3-3.0
|
||||||
|
7. Weight history is stored with reason and sample size
|
||||||
|
8. `GET /api/model/accuracy` returns current stats (Desk tier only)
|
||||||
|
9. `GET /api/model/insights` returns human-readable insights
|
||||||
|
10. signal_snapshot JSONB attached to every new pick
|
||||||
|
11. Grader loads weights from DB on startup, falls back to defaults
|
||||||
|
12. Minimum 50 picks per signal before weight adjustment triggers
|
||||||
|
|
||||||
|
## Test Plan
|
||||||
|
|
||||||
|
### Unit Tests (accuracyCalculator.js)
|
||||||
|
- Correctly computes hit rate from settled picks
|
||||||
|
- Groups by grade + stat_type
|
||||||
|
- Groups by signal + value + stat_type
|
||||||
|
- Kill condition effectiveness: difference between with/without
|
||||||
|
- Handles zero settled picks gracefully
|
||||||
|
|
||||||
|
### Unit Tests (weightAdjuster.js)
|
||||||
|
- Increases weight when predictive_score exceeds baseline
|
||||||
|
- Decreases weight when predictive_score below baseline
|
||||||
|
- Caps changes at 20% per cycle
|
||||||
|
- Enforces min/max bounds (0.3-3.0)
|
||||||
|
- Stores weight history with correct reason
|
||||||
|
- Does not adjust with < 50 picks per signal
|
||||||
|
|
||||||
|
### Integration Tests
|
||||||
|
- Full loop: create pick with signal_snapshot → settle → accuracy updated → weights adjusted
|
||||||
|
- GET /api/model/accuracy returns correct stats
|
||||||
|
- GET /api/model/insights generates relevant insights
|
||||||
|
- Desk tier only: free/analyst get 403
|
||||||
|
- Weight changes reflected in next grading call
|
||||||
|
|
||||||
|
## Open Questions
|
||||||
|
- **Global vs per-user model:** This spec uses a global model (all users' data combined). Per-user models would require significantly more data. Global is correct for MVP — the model learns from collective intelligence. Per-user customization can layer on top later.
|
||||||
|
- **Cold start:** With < 50 settled picks, no adjustments fire. The hardcoded defaults carry the system until enough data accumulates. This is intentional — bad adjustments on small samples would be worse than no adjustments.
|
||||||
Reference in New Issue
Block a user