Files
vyndr/specs/feature-1-2-nba-api.md
builtbykev 3da1b4242c feat: Feature 1.2 (NBA stats FastAPI service) + Feature 1.4 (database schema)
Feature 1.2: Python FastAPI microservice wrapping nba_api
- GET /stats/season-avg, /stats/last-n, /stats/splits, /players/search
- Redis caching (24hr/1hr/6hr/7day), 0.6s rate limiting, PRA derived stat
- 27 Python tests passing

Feature 1.4: Complete Supabase database schema
- 6 tables: users, picks, scan_sessions, bets, outcomes, performance
- RLS enabled on all tables with auth.uid() policies
- 3 triggers: auto-create user, updated_at, scan count reset
- 37 schema validation tests passing
- Migration SQL ready, pending manual apply (WSL2 DNS blocker)

Total: 92 tests (65 Node.js + 27 Python), all passing

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-21 10:58:58 -04:00

9.8 KiB

Feature 1.2 — NBA_API Stats Wrapper (FastAPI Microservice)

Overview

A Python FastAPI microservice wrapping the nba_api library. Provides player stats (season averages, last N games, situational splits) via internal HTTP endpoints. Caches aggressively in Redis. Called by the Node.js backend — not exposed to the public internet.

Dependencies

  • None (builds parallel with Feature 1.1)
  • Downstream consumers: Feature 1.3 (Prop Analysis Engine) needs season averages + recent game data to grade props

Tech

  • Language: Python 3.11+
  • Framework: FastAPI + uvicorn
  • Data source: nba_api (free, no key required)
  • Cache: Redis (shared instance with Node backend)
  • Port: 8000 (internal only)

Directory Structure

nba-service/
├── app/
│   ├── main.py            # FastAPI app, routes
│   ├── services/
│   │   └── stats.py       # nba_api wrapper functions
│   ├── utils/
│   │   ├── cache.py       # Redis caching helpers
│   │   └── player_map.py  # Player name -> nba_api player_id resolution
│   └── config.py          # Settings (Redis URL, cache TTLs)
├── tests/
│   ├── test_stats.py      # Unit tests for stats service
│   └── test_routes.py     # Integration tests for endpoints
├── requirements.txt
└── README.md

Endpoints

GET /stats/season-avg

Returns a player's season averages for the current season.

Query params:

Param Type Required Description
player string yes Player full name (e.g., "Nikola Jokic")
stat_type string no Filter to specific stat (default: all)
season string no NBA season (default: current, e.g., "2025-26")

Response (200):

{
  "player": "Nikola Jokic",
  "player_id": 203999,
  "team": "DEN",
  "season": "2025-26",
  "source": "cache | live",
  "stats": {
    "points": 26.3,
    "rebounds": 12.4,
    "assists": 9.1,
    "threes": 1.1,
    "blocks": 0.7,
    "steals": 1.4,
    "pra": 47.8,
    "turnovers": 3.2,
    "games_played": 65,
    "minutes": 34.2
  }
}

GET /stats/last-n

Returns a player's averages over their last N games.

Query params:

Param Type Required Default Description
player string yes Player full name
n int no 10 Number of recent games (max: 30)
stat_type string no all Filter to specific stat

Response (200):

{
  "player": "Nikola Jokic",
  "player_id": 203999,
  "team": "DEN",
  "last_n": 10,
  "source": "cache | live",
  "stats": {
    "points": 28.1,
    "rebounds": 13.0,
    "assists": 10.2,
    "threes": 1.3,
    "blocks": 0.8,
    "steals": 1.5,
    "pra": 51.3,
    "turnovers": 2.9,
    "games_played": 10,
    "minutes": 35.1
  }
}

GET /stats/splits

Returns situational splits for a player.

Query params:

Param Type Required Default Description
player string yes Player full name
stat_type string yes Stat to split (points, rebounds, etc.)
split_type string yes One of: home_away, rest_days, vs_team
opponent string no Required when split_type=vs_team (e.g., "LAL")

Response (200) — home_away split:

{
  "player": "Nikola Jokic",
  "player_id": 203999,
  "team": "DEN",
  "stat_type": "points",
  "split_type": "home_away",
  "source": "cache | live",
  "splits": {
    "home": { "avg": 27.8, "games": 33 },
    "away": { "avg": 24.9, "games": 32 }
  }
}

Response (200) — rest_days split (back-to-back detection):

{
  "player": "Nikola Jokic",
  "stat_type": "points",
  "split_type": "rest_days",
  "source": "cache | live",
  "splits": {
    "b2b": { "avg": 23.1, "games": 8 },
    "1_day_rest": { "avg": 26.5, "games": 40 },
    "2_plus_days_rest": { "avg": 28.2, "games": 17 }
  }
}

Response (200) — vs_team split:

{
  "player": "Nikola Jokic",
  "stat_type": "points",
  "split_type": "vs_team",
  "opponent": "LAL",
  "source": "cache | live",
  "splits": {
    "vs_opponent": { "avg": 30.5, "games": 3 },
    "vs_all_others": { "avg": 25.8, "games": 62 }
  }
}

GET /players/search

Resolves player name to nba_api player ID. Used internally.

Query params:

Param Type Required Description
name string yes Player name (partial match OK)

Response (200):

{
  "results": [
    { "player_id": 203999, "full_name": "Nikola Jokic", "team": "DEN", "is_active": true }
  ]
}

GET /health

Health check for the microservice.

Response (200):

{ "status": "ok", "cache": "connected" }

Error Responses

Status When Body
400 Missing required param or invalid value { "error": "player is required" }
404 Player not found in nba_api { "error": "Player not found: Xyz" }
503 nba_api unreachable or rate limited { "error": "NBA stats service unavailable" }

Data Shape (Internal)

@dataclass
class PlayerStats:
    player: str
    player_id: int
    team: str          # 3-letter abbreviation
    season: str        # e.g., "2025-26"
    stats: dict        # { "points": 26.3, "rebounds": 12.4, ... }
    games_played: int
    minutes: float

Stat Mapping

Map nba_api column names to our internal stat names:

nba_api Column Internal stat_type
PTS points
REB rebounds
AST assists
FG3M threes
BLK blocks
STL steals
TOV turnovers
(PTS+REB+AST) pra (computed)
MIN minutes
GP games_played

Caching Strategy

  • Store: Redis (same instance as Node backend)
  • Key patterns:
    • Season avg: nba:season:{player_id}:{season} — TTL: 24 hours
    • Last N: nba:last:{player_id}:{n} — TTL: 1 hour
    • Splits: nba:splits:{player_id}:{stat_type}:{split_type} — TTL: 6 hours
    • Player search: nba:player:{name_normalized} — TTL: 7 days
  • On cache hit: Return with "source": "cache"
  • On cache miss: Fetch from nba_api, store, return with "source": "live"
  • On nba_api failure: Return stale cache if available, 503 if not

Player Name Resolution

  • nba_api requires player_id for most endpoints
  • Use nba_api.stats.static.players.find_players_by_full_name() for lookup
  • Cache player_id mappings for 7 days (players don't change teams mid-game)
  • Support partial matching: "Jokic" should resolve to "Nikola Jokic"
  • If multiple matches, return all and let caller disambiguate

nba_api Rate Limiting

  • nba_api hits NBA.com endpoints which are rate-limited (no official docs)
  • Add a 0.6s delay between nba_api calls to avoid getting blocked
  • If a request fails with connection error, retry once after 2s
  • If retry fails, serve from cache or return 503

Acceptance Criteria

  1. GET /stats/season-avg?player=Nikola Jokic returns correct season averages
  2. GET /stats/last-n?player=Nikola Jokic&n=10 returns last 10 game averages
  3. GET /stats/splits?player=Nikola Jokic&stat_type=points&split_type=home_away returns home/away split
  4. GET /stats/splits?player=Nikola Jokic&stat_type=points&split_type=rest_days returns B2B/rest splits
  5. GET /stats/splits?player=Nikola Jokic&stat_type=points&split_type=vs_team&opponent=LAL returns vs-team split
  6. GET /players/search?name=Jokic resolves to correct player with ID and team
  7. Season averages are cached for 24 hours; subsequent calls return from cache
  8. Last N averages are cached for 1 hour
  9. Player search results are cached for 7 days
  10. If nba_api is unreachable, stale cache is returned; if no cache, 503
  11. All timestamps and dates use UTC
  12. GET /health returns status and cache connectivity

Test Plan

Unit Tests (stats.py)

  • Maps nba_api raw stats to internal stat names correctly
  • Computes PRA from PTS + REB + AST
  • Handles missing stats gracefully (player with 0 games returns empty)
  • Player search returns correct matches for full and partial names
  • Back-to-back detection: identifies games on consecutive days

Unit Tests (cache.py)

  • Cache hit returns stored data
  • Cache miss returns None
  • TTLs are set correctly per data type

Integration Tests (routes)

  • Full request cycle: GET /stats/season-avg with mocked nba_api -> verify response shape
  • GET /stats/last-n with n=5, n=10, n=30 -> correct game counts
  • GET /stats/splits for each split_type -> correct response shapes
  • Player search with partial name -> returns matches
  • Error: invalid player name -> 404
  • Error: missing required param -> 400
  • Error: nba_api down with warm cache -> returns stale data
  • Error: nba_api down with cold cache -> returns 503
  • Health check returns ok when Redis is connected

Open Questions

  • nba_api game log structure: Need to verify exact column names for game logs (used in last-n and splits). Will confirm during implementation with a test call.
  • Current season string: nba_api uses format "2025-26" — need to compute this dynamically based on current date (season starts in October).