diff --git a/.beans/nuzlocke-tracker-ya2a--refactor-seeding-to-use-pokeapi-csv-data-via-git-s.md b/.beans/nuzlocke-tracker-ya2a--refactor-seeding-to-use-pokeapi-csv-data-via-git-s.md new file mode 100644 index 0000000..798845c --- /dev/null +++ b/.beans/nuzlocke-tracker-ya2a--refactor-seeding-to-use-pokeapi-csv-data-via-git-s.md @@ -0,0 +1,56 @@ +--- +# nuzlocke-tracker-ya2a +title: Refactor seeding to use PokeAPI CSV data via git submodule +status: draft +type: task +created_at: 2026-02-05T18:01:09Z +updated_at: 2026-02-05T18:01:09Z +--- + +## Summary + +Replace the current seeding approach (which uses the `pokebase` Python library to hit the PokeAPI REST API, then writes intermediate JSON files) with direct CSV parsing from the [PokeAPI/pokeapi](https://github.com/PokeAPI/pokeapi) repository's `data/v2/csv/` directory, pulled in as a git submodule. + +## Motivation + +- **Eliminates network dependency**: No more hitting the PokeAPI REST API (or running a local instance) during seed generation +- **Faster**: Reading local CSVs is instant vs. hundreds of HTTP requests (even with pokebase caching) +- **More data available**: The CSVs contain the complete dataset, not just what we query for +- **Version-pinnable**: The git submodule can be pinned to a specific commit for reproducible builds +- **Removes `pokebase` dependency**: One less runtime/dev dependency to maintain + +## Current Approach + +1. `fetch_pokeapi.py` uses the `pokebase` library to query the PokeAPI REST API +2. It processes responses and writes intermediate JSON files (`games.json`, `pokemon.json`, `firered.json`, etc.) to `seeds/data/` +3. `run.py` reads these JSON files and calls `loader.py` to upsert into the database +4. Evolution data is also fetched from the API with an override mechanism (`evolution_overrides.json`) + +## Proposed Approach + +1. **Add git submodule**: Add `https://github.com/PokeAPI/pokeapi` as a git submodule (e.g., at `backend/pokeapi-data/` or a top-level `data/pokeapi/` directory) +2. **Write CSV parser**: Create a new module that reads the relevant CSVs directly. Key CSV files include: + - `pokemon.csv`, `pokemon_species.csv`, `pokemon_types.csv` — Pokemon data + - `locations.csv`, `location_areas.csv`, `location_names.csv` — Location/route data + - `encounters.csv`, `encounter_slots.csv`, `encounter_methods.csv` — Encounter data + - `versions.csv`, `version_groups.csv`, `version_names.csv` — Game/version data + - `pokemon_evolution.csv`, `evolution_chains.csv`, `evolution_triggers.csv` — Evolution data + - `types.csv`, `type_names.csv` — Type data + - `regions.csv` — Region data +3. **Replace `fetch_pokeapi.py`**: The new CSV parser replaces the API-fetching script. It should produce the same (or equivalent) output that `loader.py` expects, or `loader.py` should be updated to accept the new data format. +4. **Keep the override mechanism**: `evolution_overrides.json` should still work for manual corrections. +5. **Remove intermediate JSON files**: The generated JSON files in `seeds/data/` can be removed from version control since data now comes from the submodule. +6. **Remove `pokebase` dependency**: Remove from `pyproject.toml` / `requirements.txt`. +7. **Update documentation**: Update any setup/dev docs and the seed run command instructions. + +## Checklist + +- [ ] Add PokeAPI repo as a git submodule +- [ ] Identify and document all needed CSV files from the PokeAPI data +- [ ] Write CSV parsing module to replace `fetch_pokeapi.py` +- [ ] Update `run.py` and/or `loader.py` to work with the new data source +- [ ] Preserve evolution override mechanism +- [ ] Remove intermediate JSON seed data files from version control +- [ ] Remove `pokebase` dependency +- [ ] Test that seeding produces equivalent results +- [ ] Update dev setup docs / seed run instructions \ No newline at end of file