Files
nuzlocke-tracker/.beans/nuzlocke-tracker-ya2a--refactor-seeding-to-use-pokeapi-csv-data-via-git-s.md

3.4 KiB

title, status, type, created_at, updated_at
title status type created_at updated_at
Refactor seeding to use PokeAPI CSV data via git submodule draft task 2026-02-05T18:01:09Z 2026-02-05T18:01:09Z

Summary

Replace the current seeding approach (which uses the pokebase Python library to hit the PokeAPI REST API, then writes intermediate JSON files) with direct CSV parsing from the PokeAPI/pokeapi repository's data/v2/csv/ directory, pulled in as a git submodule.

Motivation

  • Eliminates network dependency: No more hitting the PokeAPI REST API (or running a local instance) during seed generation
  • Faster: Reading local CSVs is instant vs. hundreds of HTTP requests (even with pokebase caching)
  • More data available: The CSVs contain the complete dataset, not just what we query for
  • Version-pinnable: The git submodule can be pinned to a specific commit for reproducible builds
  • Removes pokebase dependency: One less runtime/dev dependency to maintain

Current Approach

  1. fetch_pokeapi.py uses the pokebase library to query the PokeAPI REST API
  2. It processes responses and writes intermediate JSON files (games.json, pokemon.json, firered.json, etc.) to seeds/data/
  3. run.py reads these JSON files and calls loader.py to upsert into the database
  4. Evolution data is also fetched from the API with an override mechanism (evolution_overrides.json)

Proposed Approach

  1. Add git submodule: Add https://github.com/PokeAPI/pokeapi as a git submodule (e.g., at backend/pokeapi-data/ or a top-level data/pokeapi/ directory)
  2. Write CSV parser: Create a new module that reads the relevant CSVs directly. Key CSV files include:
    • pokemon.csv, pokemon_species.csv, pokemon_types.csv — Pokemon data
    • locations.csv, location_areas.csv, location_names.csv — Location/route data
    • encounters.csv, encounter_slots.csv, encounter_methods.csv — Encounter data
    • versions.csv, version_groups.csv, version_names.csv — Game/version data
    • pokemon_evolution.csv, evolution_chains.csv, evolution_triggers.csv — Evolution data
    • types.csv, type_names.csv — Type data
    • regions.csv — Region data
  3. Replace fetch_pokeapi.py: The new CSV parser replaces the API-fetching script. It should produce the same (or equivalent) output that loader.py expects, or loader.py should be updated to accept the new data format.
  4. Keep the override mechanism: evolution_overrides.json should still work for manual corrections.
  5. Remove intermediate JSON files: The generated JSON files in seeds/data/ can be removed from version control since data now comes from the submodule.
  6. Remove pokebase dependency: Remove from pyproject.toml / requirements.txt.
  7. Update documentation: Update any setup/dev docs and the seed run command instructions.

Checklist

  • Add PokeAPI repo as a git submodule
  • Identify and document all needed CSV files from the PokeAPI data
  • Write CSV parsing module to replace fetch_pokeapi.py
  • Update run.py and/or loader.py to work with the new data source
  • Preserve evolution override mechanism
  • Remove intermediate JSON seed data files from version control
  • Remove pokebase dependency
  • Test that seeding produces equivalent results
  • Update dev setup docs / seed run instructions