vyndr/specs/feature-4-1-model-learning-loop.md

# Feature 4.1 — Model Learning Loop

## Overview
Closed-loop intelligence system. Every settled bet feeds back into the grading engine. Track which signals, kill conditions, and composite weights actually predict outcomes. Over time, the model self-calibrates — grades get sharper, kill conditions get validated or deprecated, and weight distribution shifts toward what works.

## Dependencies
- Feature 1.3 — Prop Analysis Engine (grader.js weights to tune)
- Feature 1.5 — Bet Submission (settled bets with outcomes)
- Feature 1.4 — Database Schema (outcomes, picks, performance tables)

## The Loop

```
User scans parlay → grades assigned with current weights
  ↓
User places bet → logs in tracker
  ↓
Game plays out → user settles bet (won/lost/push)
  ↓
System records: grade predicted X, actual result was Y
  ↓
Accumulate enough data (50+ settled picks per signal)
  ↓
Recalculate signal accuracy → adjust grading weights
  ↓
Next scan uses improved weights
```

## New Database Tables

### grade_accuracy
Tracks accuracy of each grade level over time.

```sql
CREATE TABLE public.grade_accuracy (
  id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
  grade TEXT NOT NULL CHECK (grade IN ('A', 'B', 'C', 'D')),
  stat_type TEXT NOT NULL,
  total_picks INT NOT NULL DEFAULT 0,
  hits INT NOT NULL DEFAULT 0,
  misses INT NOT NULL DEFAULT 0,
  pushes INT NOT NULL DEFAULT 0,
  hit_rate NUMERIC(5,2),
  expected_hit_rate NUMERIC(5,2),
  calculated_at TIMESTAMPTZ NOT NULL DEFAULT now()
);

CREATE UNIQUE INDEX idx_grade_accuracy_unique ON public.grade_accuracy(grade, stat_type);
```

### signal_accuracy
Tracks how predictive each individual signal is.

```sql
CREATE TABLE public.signal_accuracy (
  id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
  signal_name TEXT NOT NULL,
  signal_value TEXT NOT NULL,
  stat_type TEXT NOT NULL,
  total_picks INT NOT NULL DEFAULT 0,
  hits INT NOT NULL DEFAULT 0,
  hit_rate NUMERIC(5,2),
  avg_edge_when_hit NUMERIC(5,2),
  avg_edge_when_miss NUMERIC(5,2),
  predictive_score NUMERIC(5,2),
  calculated_at TIMESTAMPTZ NOT NULL DEFAULT now()
);

CREATE UNIQUE INDEX idx_signal_accuracy_unique ON public.signal_accuracy(signal_name, signal_value, stat_type);
```

### kill_condition_accuracy
Tracks whether kill conditions actually prevent bad bets.

```sql
CREATE TABLE public.kill_condition_accuracy (
  id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
  kill_condition TEXT NOT NULL,
  total_triggered INT NOT NULL DEFAULT 0,
  picks_with_condition INT NOT NULL DEFAULT 0,
  hits_with_condition INT NOT NULL DEFAULT 0,
  hit_rate_with NUMERIC(5,2),
  hit_rate_without NUMERIC(5,2),
  effectiveness NUMERIC(5,2),
  calculated_at TIMESTAMPTZ NOT NULL DEFAULT now()
);

CREATE UNIQUE INDEX idx_kill_accuracy_unique ON public.kill_condition_accuracy(kill_condition);
```

### weight_history
Audit trail of weight changes over time.

```sql
CREATE TABLE public.weight_history (
  id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
  weight_set JSONB NOT NULL,
  reason TEXT NOT NULL,
  sample_size INT NOT NULL,
  created_at TIMESTAMPTZ NOT NULL DEFAULT now()
);
```

## Signal Tracking

On every pick created (via parlay scan), store the individual signal values alongside the pick. This is already partially captured in `picks.reasoning`, but we need structured data for aggregation.

### New column on picks table (migration 003):
```sql
ALTER TABLE public.picks ADD COLUMN signal_snapshot JSONB;
```

`signal_snapshot` stores:
```json
{
  "season_delta": 1.8,
  "recent_delta": 2.3,
  "situational_delta": 1.5,
  "line_edge": 0.5,
  "home_away_signal": "bullish",
  "rest_signal": "neutral",
  "vs_opponent_signal": "strong_bullish",
  "kill_conditions": ["blowout_risk"],
  "composite": 2.1,
  "weights_version": 1
}
```

## Accuracy Calculation Pipeline

Triggered periodically (on every 10th bet settlement, or daily cron).

### Step 1: Grade Accuracy
```
For each grade (A, B, C, D) × stat_type:
  - Count settled picks with that grade + stat
  - Count hits (outcome = 'hit')
  - Calculate hit_rate = hits / total * 100
  - Compare to expected:
    - A should hit ~70-80%
    - B should hit ~55-65%
    - C should hit ~45-55%
    - D should hit ~30-40%
  - If actual diverges from expected by >10%: flag for weight adjustment
```

### Step 2: Signal Accuracy
```
For each signal (season_delta, recent_delta, etc.):
  - Group picks by signal_value bucket:
    - "strong_bullish" (delta >= 4)
    - "bullish" (2-4)
    - "lean" (0.5-2)
    - "neutral" (< 0.5)
    - bearish equivalents
  - For each bucket:
    - hit_rate = hits / total
    - avg_edge_when_hit vs avg_edge_when_miss
    - predictive_score = (hit_rate - 0.5) * log(total)
      (rewards accuracy AND sample size)
```

### Step 3: Kill Condition Effectiveness
```
For each kill condition:
  - Picks where this condition triggered: hit_rate_with
  - Picks where this condition did NOT trigger: hit_rate_without
  - effectiveness = hit_rate_without - hit_rate_with
    (positive = the kill condition correctly identifies bad bets)
  - If effectiveness < 5%: kill condition may not be useful
  - If effectiveness > 20%: kill condition is highly predictive
```

### Step 4: Weight Adjustment
```
Current weights: season=1.0, recent=1.5, situational=1.2, lineEdge=0.8

If a signal's predictive_score is higher than its current weight influence:
  → Increase that weight
If a signal's predictive_score is lower:
  → Decrease that weight

Adjustment formula:
  new_weight = current_weight * (1 + (predictive_score - baseline) * learning_rate)
  learning_rate = 0.1 (conservative — small steps)

Constraints:
  - No weight can drop below 0.3 or exceed 3.0
  - Total weight sum stays within 3.5-5.5 range
  - Changes capped at 20% per adjustment cycle
  - Minimum 50 picks per signal before adjusting

Store new weights in weight_history.
Apply new weights to grader.js (load from DB on startup, fallback to defaults).
```

## Endpoints

### GET /api/model/accuracy (auth required, Desk tier only)
Returns current model accuracy stats.

**Response (200):**
```json
{
  "grade_accuracy": [
    { "grade": "A", "stat_type": "points", "total": 120, "hit_rate": 72.5, "expected": 75.0 },
    { "grade": "B", "stat_type": "points", "total": 200, "hit_rate": 58.0, "expected": 60.0 }
  ],
  "signal_accuracy": [
    { "signal": "recent_delta", "value": "bullish", "stat_type": "points", "hit_rate": 68.0, "predictive_score": 4.2 }
  ],
  "kill_condition_effectiveness": [
    { "condition": "blowout_risk", "effectiveness": 22.5, "triggered": 45 },
    { "condition": "low_minutes", "effectiveness": 18.0, "triggered": 30 }
  ],
  "current_weights": {
    "season": 1.0, "recent": 1.5, "situational": 1.2, "lineEdge": 0.8,
    "version": 3, "last_updated": "2026-04-15T00:00:00Z"
  },
  "total_settled_picks": 850,
  "model_confidence": "high"
}
```

### GET /api/model/insights (auth required, Desk tier only)
Returns human-readable insights from the learning loop.

**Response (200):**
```json
{
  "insights": [
    {
      "type": "signal_outperforming",
      "message": "Recent form (last 10 games) is the strongest predictor for points props. It outperforms season average by 12%.",
      "action": "Recent form weight increased from 1.5 to 1.65."
    },
    {
      "type": "kill_condition_validated",
      "message": "blowout_risk is your most effective kill condition. Props in blowout games hit 15% less often.",
      "action": "No change needed — working as designed."
    },
    {
      "type": "grade_calibration",
      "message": "Grade A picks on rebounds are hitting at 68% instead of expected 75%. Sample is small (40 picks) — monitoring.",
      "action": "No weight change yet. Need 50+ picks to adjust."
    }
  ],
  "next_recalculation_at": "2026-04-20T00:00:00Z"
}
```

## Service Architecture

```
src/
├── services/
│   ├── modelLearningService.js    # Orchestrator: triggers accuracy calc + weight adjustment
│   ├── accuracyCalculator.js      # Grade, signal, kill condition accuracy from settled data
│   └── weightAdjuster.js          # Computes new weights, stores history, applies to grader
├── routes/
│   └── model.js                   # GET /api/model/accuracy, GET /api/model/insights
```

## Integration Points

### On bet settlement (betService.js):
```
After settling a bet:
  1. Check total settled picks for this user
  2. Every 10th settlement: trigger modelLearningService.recalculate()
  3. This is global (not per-user) — all users' data feeds the model
```

### On pick creation (parlayScanService.js):
```
When creating a pick:
  1. Attach signal_snapshot JSONB with all signal values + current weights version
  2. This enables retrospective analysis of which weights were active when the pick was made
```

### On grader startup (grader.js):
```
On first call:
  1. Load latest weight_set from weight_history table
  2. If no weights in DB: use hardcoded defaults
  3. Cache weights in memory, refresh every hour
```

## Acceptance Criteria

1. Every settled pick updates grade_accuracy, signal_accuracy, and kill_condition_accuracy tables
2. Grade accuracy tracks hit rate per grade per stat type
3. Signal accuracy tracks predictive score per signal per stat type
4. Kill condition effectiveness measures hit_rate_with vs hit_rate_without
5. Weight adjustment runs after every 10th settlement (global)
6. Weight changes are capped at 20% per cycle, bounded 0.3-3.0
7. Weight history is stored with reason and sample size
8. `GET /api/model/accuracy` returns current stats (Desk tier only)
9. `GET /api/model/insights` returns human-readable insights
10. signal_snapshot JSONB attached to every new pick
11. Grader loads weights from DB on startup, falls back to defaults
12. Minimum 50 picks per signal before weight adjustment triggers

## Test Plan

### Unit Tests (accuracyCalculator.js)
- Correctly computes hit rate from settled picks
- Groups by grade + stat_type
- Groups by signal + value + stat_type
- Kill condition effectiveness: difference between with/without
- Handles zero settled picks gracefully

### Unit Tests (weightAdjuster.js)
- Increases weight when predictive_score exceeds baseline
- Decreases weight when predictive_score below baseline
- Caps changes at 20% per cycle
- Enforces min/max bounds (0.3-3.0)
- Stores weight history with correct reason
- Does not adjust with < 50 picks per signal

### Integration Tests
- Full loop: create pick with signal_snapshot → settle → accuracy updated → weights adjusted
- GET /api/model/accuracy returns correct stats
- GET /api/model/insights generates relevant insights
- Desk tier only: free/analyst get 403
- Weight changes reflected in next grading call

## Open Questions
- **Global vs per-user model:** This spec uses a global model (all users' data combined). Per-user models would require significantly more data. Global is correct for MVP — the model learns from collective intelligence. Per-user customization can layer on top later.
- **Cold start:** With < 50 settled picks, no adjustments fire. The hardcoded defaults carry the system until enough data accumulates. This is intentional — bad adjustments on small samples would be worse than no adjustments.