Add Python tool scaffold for PokeDB data import

Set up tools/import-pokedb/ with CLI, JSON loader, and output models.
Replaces the Go/PokeAPI approach with local PokeDB.org JSON processing.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This commit is contained in:
Julian Tabel
2026-02-11 09:49:51 +01:00
parent 5151be785b
commit 1aa67665ff
11 changed files with 522 additions and 23 deletions

View File

@@ -1,17 +1,22 @@
---
# nuzlocke-tracker-bs05
title: Build PokeDB.org data import tool
status: draft
type: task
status: in-progress
type: feature
priority: normal
created_at: 2026-02-10T14:04:11Z
updated_at: 2026-02-10T14:31:08Z
updated_at: 2026-02-11T08:44:03Z
parent: nuzlocke-tracker-rzu4
blocking:
- nuzlocke-tracker-spx3
---
Build a Go tool that converts PokeDB.org's JSON data export into our existing seed JSON format. This replaces PokeAPI as the single source of truth for ALL games (Gen 1-9).
Build a standalone Python tool that converts PokeDB.org's JSON data export into our existing seed JSON format. This replaces PokeAPI as the single source of truth for ALL games (Gen 1-9).
Python was chosen over Go because:
- The backend is already Python, so the team is familiar with it
- We're processing local JSON files — no need for Go's concurrency
- Remains a standalone tool in `tools/import-pokedb/`, not part of the backend
## Data source
@@ -64,26 +69,15 @@ Each encounter record has:
- `visible` — overworld vs hidden encounter
- Max Raid and Tera Raid fields for special encounters
## Implementation approach
## Subtasks
### Checklist
- [ ] Set up project structure in `tools/import-pokedb/`
- [ ] Download and cache PokeDB JSON export files
- [ ] Parse PokeDB encounters, locations, location_areas, versions, pokemon_forms
- [ ] Build lookup maps: pokemon_form_identifier → pokeapi_id (using existing `pokemon.json`)
- [ ] Build lookup maps: location_area_identifier → location name + region
- [ ] Filter encounters by target game version
- [ ] Map PokeDB encounter methods to our seed format methods (73 → simplified set)
- [ ] Parse level strings ("2 - 4" → min_level: 2, max_level: 4)
- [ ] Handle rate variants per game generation:
- For now, flatten time/weather/season rates into `encounter_rate` (use the max or average)
- Preserve raw variant data for future use (see nuzlocke-tracker-oqfo)
- [ ] Group encounters by location area → route output
- [ ] Apply route ordering (use existing route_order.json or generate from location data)
- [ ] Output in existing `{game}.json` seed format
- [ ] Generate seed data for ALL games, replacing PokeAPI as the single source of truth
- [ ] Compare output against existing PokeAPI-sourced data to validate accuracy
- [ ] Run for all games and verify output
Work is broken into child task beans:
- [ ] **Set up Python tool scaffold** — project structure, CLI entry point, PokeDB JSON file loading
- [ ] **Build reference data mappings** — pokemon_form → pokeapi_id, location_area → name/region, encounter method mapping
- [ ] **Core encounter processing** — filter by game version, parse levels, handle rate variants, group by location area
- [ ] **Output seed JSON** — produce per-game JSON in existing format, integrate route ordering + special encounters
- [ ] **Validation & full generation** — compare against existing data, run for all games, fix discrepancies
## Encounter method mapping (draft)

View File

@@ -0,0 +1,30 @@
---
# nuzlocke-tracker-dqyb
title: Set up Python tool scaffold
status: in-progress
type: task
priority: normal
created_at: 2026-02-11T08:42:58Z
updated_at: 2026-02-11T08:44:03Z
parent: nuzlocke-tracker-bs05
blocking:
- nuzlocke-tracker-zno2
---
Set up the standalone Python tool project in `tools/import-pokedb/`.
## Checklist
- [x] Create `tools/import-pokedb/` directory structure
- [x] Set up `pyproject.toml` with dependencies (just stdlib should suffice for JSON processing, maybe `click` for CLI)
- [x] Create CLI entry point (`__main__.py` or similar) that accepts:
- Path to directory containing PokeDB JSON export files
- Target output directory (default: `backend/src/app/seeds/data/`)
- Optional: specific game version to generate (default: all)
- [x] Load and parse all PokeDB JSON files: `encounters.json`, `locations.json`, `location_areas.json`, `encounter_methods.json`, `versions.json`, `pokemon_forms.json`
- [x] Basic validation that all expected files are present and parseable
## Notes
- Keep it as a standalone tool, not part of the backend
- The PokeDB JSON files are downloaded manually from https://pokedb.org/data-export — no need to automate the download
- Model the CLI similarly to how `tools/fetch-pokeapi/` works (cd into dir, run the tool)

View File

@@ -0,0 +1,31 @@
---
# nuzlocke-tracker-gkcy
title: Output seed JSON
status: todo
type: task
priority: normal
created_at: 2026-02-11T08:43:21Z
updated_at: 2026-02-11T08:43:33Z
parent: nuzlocke-tracker-bs05
blocking:
- nuzlocke-tracker-vdks
---
Generate the final per-game JSON files in the existing seed format.
## Checklist
- [ ] **Apply route ordering**: Use the existing `backend/src/app/seeds/route_order.json` to assign `order` values to routes. Handle aliases (e.g. "red-blue" → "firered-leafgreen"). Log warnings for routes not in the order file.
- [ ] **Merge special encounters**: Integrate starters, gifts, fossils, and trades from `backend/src/app/seeds/special_encounters.json` into the appropriate routes.
- [ ] **Output per-game JSON**: Write `{game-slug}.json` files matching the existing format:
```json
[{"name": "Route 1", "order": 3, "encounters": [...], "children": []}]
```
- [ ] **Output games.json**: Generate the global games list from `version_groups.json` (this may already be handled by existing config, verify).
- [ ] **Output pokemon.json**: Generate the global pokemon list including all pokemon referenced in any encounter. Include pokeapi_id, national_dex, name, types, sprite_url.
- [ ] **Handle version exclusives**: Ensure encounters specific to one version in a version group only appear in that game's JSON file (e.g. FireRed exclusives vs LeafGreen exclusives).
## Notes
- The output must be a drop-in replacement for the existing files in `backend/src/app/seeds/data/`
- Boss data (`{game}-bosses.json`) is NOT generated by this tool — it's manually curated
- Evolutions data is also separate (currently from PokeAPI) — out of scope for this task

View File

@@ -0,0 +1,34 @@
---
# nuzlocke-tracker-rfg0
title: Core encounter processing
status: todo
type: task
priority: normal
created_at: 2026-02-11T08:43:12Z
updated_at: 2026-02-11T08:43:33Z
parent: nuzlocke-tracker-bs05
blocking:
- nuzlocke-tracker-gkcy
---
Implement the core logic that transforms raw PokeDB encounter records into our internal format.
## Checklist
- [ ] **Filter by game version**: Given a target game slug, select only encounters where `version_identifiers` includes that game
- [ ] **Parse level strings**: Convert "2 - 4" → min_level=2, max_level=4; "67" → min_level=67, max_level=67
- [ ] **Handle rate variants per generation**:
- Gen 1/3/6: use `rate_overall` directly as `encounter_rate`
- Gen 2/4: `rate_morning`, `rate_day`, `rate_night` — flatten to max or average for `encounter_rate`
- Gen 5: `rate_spring` through `rate_winter` — flatten similarly
- Gen 8 Sw/Sh: `weather_*_rate` fields — flatten to max
- Gen 8 Legends Arceus: `during_*` / `while_*` booleans — convert to a presence-based rate
- Gen 9 Sc/Vi: `probability_*` fields (spawn weights, not percentages) — normalize to percentages
- Preserve raw variant data in a way that nuzlocke-tracker-oqfo can use later
- [ ] **Aggregate encounters**: Group by (pokemon, method, location_area) and merge level ranges / rates where appropriate (same logic as the Go tool's aggregation)
- [ ] **Group by location area**: Collect all encounters for a location area into a route structure
- [ ] **Handle parent/child routes**: Multi-area locations (e.g. Safari Zone) should produce parent routes with children, matching the existing hierarchical format
## Notes
- Rate parsing needs to handle percentage strings like "40%" as well as bare numbers
- The Go tool aggregates encounters with the same pokemon+method at a location into a single entry with merged level ranges — replicate this

View File

@@ -0,0 +1,29 @@
---
# nuzlocke-tracker-vdks
title: Validation and full generation
status: todo
type: task
created_at: 2026-02-11T08:43:29Z
updated_at: 2026-02-11T08:43:29Z
parent: nuzlocke-tracker-bs05
---
Validate the new tool's output against existing data and generate seed data for all games.
## Checklist
- [ ] **Diff against existing data**: For games we already have PokeAPI-sourced data for, compare the PokeDB output. Identify and investigate discrepancies:
- Missing routes or encounters
- Different encounter rates
- Different level ranges
- Missing or extra pokemon
- [ ] **Fix discrepancies**: Adjust mappings, parsing, or aggregation logic to resolve legitimate differences. Document cases where PokeDB provides better/different data than PokeAPI.
- [ ] **Generate for all games**: Run the tool for every game version in `version_groups.json`. Verify output is valid JSON and structurally correct.
- [ ] **New game coverage**: For games not previously supported (or with incomplete PokeAPI data), verify the output looks reasonable by spot-checking a few routes.
- [ ] **Update route_order.json**: Add route orderings for any new games that didn't have entries. This may require manual curation.
- [ ] **Update special_encounters.json**: Add special encounters for any new games. This may require manual curation.
## Notes
- This is the final validation step before we can replace PokeAPI as the data source
- Some discrepancies are expected — PokeDB may have more complete data than PokeAPI
- Route ordering for new games will likely need manual work

View File

@@ -0,0 +1,26 @@
---
# nuzlocke-tracker-zno2
title: Build reference data mappings
status: todo
type: task
priority: normal
created_at: 2026-02-11T08:43:02Z
updated_at: 2026-02-11T08:43:33Z
parent: nuzlocke-tracker-bs05
blocking:
- nuzlocke-tracker-rfg0
---
Build the lookup maps needed to translate PokeDB identifiers into our seed format.
## Checklist
- [ ] **Pokemon form mapping**: Map `pokemon_form_identifier` (e.g. "pidgey-default", "mr-mime-default") to `pokeapi_id` using the existing `backend/src/app/seeds/data/pokemon.json` as reference. Handle naming convention differences between PokeDB and PokeAPI (may need fuzzy matching or a manual override table).
- [ ] **Location area mapping**: Map `location_area_identifier` to human-readable location names and regions using `locations.json` and `location_areas.json`. Produce names matching our existing format (e.g. "Route 1", "Viridian Forest").
- [ ] **Encounter method mapping**: Map PokeDB's 73 encounter methods to our simplified set. See the draft mapping in the parent bean. Implement as a dictionary/config that's easy to extend.
- [ ] **Version mapping**: Map PokeDB `version_identifiers` to our game slugs (should mostly be 1:1 but verify).
## Notes
- The pokemon form mapping is the trickiest part — PokeDB uses identifiers like "mr-mime-default" while our pokemon.json uses names like "Mr. Mime" and pokeapi IDs
- Log warnings for any unmapped identifiers so we can add overrides
- The `pokemon_forms.json` from PokeDB may help bridge the gap