Files
nuzlocke-tracker/.beans/nuzlocke-tracker-ya2a--refactor-seeding-to-use-pokeapi-csv-data-via-git-s.md

57 lines
3.4 KiB
Markdown

---
# nuzlocke-tracker-ya2a
title: Refactor seeding to use PokeAPI CSV data via git submodule
status: draft
type: task
priority: normal
created_at: 2026-02-05T18:01:09Z
updated_at: 2026-02-05T18:06:04Z
---
## Summary
Replace the current seeding approach (which uses the `pokebase` Python library to hit the PokeAPI REST API, then writes intermediate JSON files) with reading static JSON data from the [PokeAPI/api-data](https://github.com/PokeAPI/api-data) repository, pulled in as a git submodule.
The `api-data` repo contains a static copy of the full PokeAPI output as JSON files at `data/api/v2/{endpoint}/{id}/index.json`, mirroring the REST API structure exactly.
## Motivation
- **Eliminates network dependency**: No more hitting the PokeAPI REST API (or running a local instance) during seed generation
- **Faster**: Reading local JSON files is instant vs. hundreds of HTTP requests (even with pokebase caching)
- **Minimal code change**: The JSON structure matches the API responses, so parsing logic stays similar to the current `fetch_pokeapi.py`
- **More data available**: The full dataset is available locally, not just what we query for
- **Version-pinnable**: The git submodule can be pinned to a specific commit for reproducible builds
- **Removes `pokebase` dependency**: One less runtime/dev dependency to maintain
## Current Approach
1. `fetch_pokeapi.py` uses the `pokebase` library to query the PokeAPI REST API
2. It processes responses and writes intermediate JSON files (`games.json`, `pokemon.json`, `firered.json`, etc.) to `seeds/data/`
3. `run.py` reads these JSON files and calls `loader.py` to upsert into the database
4. Evolution data is also fetched from the API with an override mechanism (`evolution_overrides.json`)
## Proposed Approach
1. **Add git submodule**: Add `https://github.com/PokeAPI/api-data` as a git submodule with `--depth 1` (e.g., at `data/pokeapi/` or `backend/pokeapi-data/`)
2. **Rewrite `fetch_pokeapi.py`**: Replace API calls with local JSON file reads from the submodule. The data lives at `data/api/v2/{endpoint}/{id}/index.json`. Key endpoints:
- `pokemon/{id}/` and `pokemon-species/{id}/` — Pokemon data & names
- `type/{id}/` — Type data
- `region/{id}/` — Region data with location refs
- `location/{id}/` — Locations with area refs
- `location-area/{id}/` — Location areas with encounter data
- `version/{id}/` and `version-group/{id}/` — Game/version data
- `evolution-chain/{id}/` — Evolution chain data
3. **Keep the same output format**: The rewritten script should still produce the same intermediate JSON files (`games.json`, `pokemon.json`, `firered.json`, etc.) so `run.py` and `loader.py` remain unchanged.
4. **Keep the override mechanism**: `evolution_overrides.json` should still work for manual corrections.
5. **Remove `pokebase` dependency**: Remove from `pyproject.toml` / `requirements.txt`.
6. **Update documentation**: Update any setup/dev docs and the seed run command instructions.
## Checklist
- [ ] Add `PokeAPI/api-data` repo as a git submodule (shallow clone)
- [ ] Rewrite `fetch_pokeapi.py` to read local JSON files from the submodule instead of calling the API
- [ ] Verify output JSON files match the current format (so `run.py`/`loader.py` stay unchanged)
- [ ] Preserve evolution override mechanism
- [ ] Remove `pokebase` dependency
- [ ] Test that seeding produces equivalent results
- [ ] Update dev setup docs / seed run instructions