3.4 KiB
3.4 KiB
title, status, type, created_at, updated_at
| title | status | type | created_at | updated_at |
|---|---|---|---|---|
| Refactor seeding to use PokeAPI CSV data via git submodule | draft | task | 2026-02-05T18:01:09Z | 2026-02-05T18:01:09Z |
Summary
Replace the current seeding approach (which uses the pokebase Python library to hit the PokeAPI REST API, then writes intermediate JSON files) with direct CSV parsing from the PokeAPI/pokeapi repository's data/v2/csv/ directory, pulled in as a git submodule.
Motivation
- Eliminates network dependency: No more hitting the PokeAPI REST API (or running a local instance) during seed generation
- Faster: Reading local CSVs is instant vs. hundreds of HTTP requests (even with pokebase caching)
- More data available: The CSVs contain the complete dataset, not just what we query for
- Version-pinnable: The git submodule can be pinned to a specific commit for reproducible builds
- Removes
pokebasedependency: One less runtime/dev dependency to maintain
Current Approach
fetch_pokeapi.pyuses thepokebaselibrary to query the PokeAPI REST API- It processes responses and writes intermediate JSON files (
games.json,pokemon.json,firered.json, etc.) toseeds/data/ run.pyreads these JSON files and callsloader.pyto upsert into the database- Evolution data is also fetched from the API with an override mechanism (
evolution_overrides.json)
Proposed Approach
- Add git submodule: Add
https://github.com/PokeAPI/pokeapias a git submodule (e.g., atbackend/pokeapi-data/or a top-leveldata/pokeapi/directory) - Write CSV parser: Create a new module that reads the relevant CSVs directly. Key CSV files include:
pokemon.csv,pokemon_species.csv,pokemon_types.csv— Pokemon datalocations.csv,location_areas.csv,location_names.csv— Location/route dataencounters.csv,encounter_slots.csv,encounter_methods.csv— Encounter dataversions.csv,version_groups.csv,version_names.csv— Game/version datapokemon_evolution.csv,evolution_chains.csv,evolution_triggers.csv— Evolution datatypes.csv,type_names.csv— Type dataregions.csv— Region data
- Replace
fetch_pokeapi.py: The new CSV parser replaces the API-fetching script. It should produce the same (or equivalent) output thatloader.pyexpects, orloader.pyshould be updated to accept the new data format. - Keep the override mechanism:
evolution_overrides.jsonshould still work for manual corrections. - Remove intermediate JSON files: The generated JSON files in
seeds/data/can be removed from version control since data now comes from the submodule. - Remove
pokebasedependency: Remove frompyproject.toml/requirements.txt. - Update documentation: Update any setup/dev docs and the seed run command instructions.
Checklist
- Add PokeAPI repo as a git submodule
- Identify and document all needed CSV files from the PokeAPI data
- Write CSV parsing module to replace
fetch_pokeapi.py - Update
run.pyand/orloader.pyto work with the new data source - Preserve evolution override mechanism
- Remove intermediate JSON seed data files from version control
- Remove
pokebasedependency - Test that seeding produces equivalent results
- Update dev setup docs / seed run instructions