Files
nuzlocke-tracker/.beans/nuzlocke-tracker-bs05--build-pokedborg-encounter-data-scraper.md
2026-02-10 15:31:36 +01:00

5.1 KiB

title, status, type, priority, created_at, updated_at, parent, blocking
title status type priority created_at updated_at parent blocking
Build PokeDB.org data import tool draft task normal 2026-02-10T14:04:11Z 2026-02-10T14:31:08Z nuzlocke-tracker-rzu4
nuzlocke-tracker-spx3

Build a Go tool that converts PokeDB.org's JSON data export into our existing seed JSON format. This replaces PokeAPI as the single source of truth for ALL games (Gen 1-9).

Data source

PokeDB.org provides a full data export at https://pokedb.org/data-export with JSON downloads:

  • encounters.json (69MB, 37,724 records) — all encounter data across all games
  • locations.json — 839 locations
  • location_areas.json — 2,672 location areas
  • encounter_methods.json — 73 encounter methods
  • versions.json — 82 game versions
  • pokemon_forms.json — Pokemon forms with identifiers

No scraping required. Just download the JSON files and process them locally.

Terms of use: "Data is provided for educational, research, and non-commercial purposes." Attribution to PokeDB requested.

Encounter data coverage

Encounter counts by version:

  • Sword: 10,160 / Shield: 10,144
  • Scarlet: 4,135 / Violet: 4,101
  • SoulSilver: 2,492 / HeartGold: 2,475
  • Shining Pearl: 2,021 / Brilliant Diamond: 2,013
  • Legends Arceus: 1,756
  • Black 2: 1,418 / White 2: 1,418
  • Crystal: 1,375 / Alpha Sapphire: 1,338 / Platinum: 1,337
  • Diamond: 1,292 / Pearl: 1,289 / Silver: 1,284 / Gold: 1,282
  • LeafGreen: 987 / FireRed: 985 / White: 981 / Black: 947
  • Ultra Moon: 886 / Ultra Sun: 885 / X: 880 / Y: 879
  • Emerald: 763 / Let's Go Eevee: 710 / Sun: 709 / Moon: 707
  • Sapphire: 707 / Ruby: 707 / Let's Go Pikachu: 690
  • Blue: 528 / Red: 526 / Yellow: 496

Data format details

Each encounter record has:

  • pokemon_form_identifier — e.g. "pidgey-default", "mr-mime-default"
  • version_identifiers — array of game version IDs (e.g. ["sword", "shield"])
  • location_area_identifier — e.g. "route-01-kanto", "axews-eye"
  • encounter_method_identifier — e.g. "walking-tall-grass", "surfing", "npc-trade"
  • levels — string like "2 - 4" or "67"
  • Rate fields vary by game generation:
    • Gen 1/3/6: rate_overall (single percentage)
    • Gen 2/4: rate_morning, rate_day, rate_night (time-of-day percentages)
    • Gen 5: rate_spring, rate_summer, rate_autumn, rate_winter (seasonal)
    • Gen 8 Sw/Sh: weather_*_rate fields (per-weather percentages, e.g. "40%")
    • Gen 8 Legends Arceus: during_* and while_* booleans (time+weather conditions)
    • Gen 9 Sc/Vi: probability_* fields (overworld probability weights)
  • trade_for — Pokemon form identifier for NPC trades
  • alpha_levels — for Legends Arceus alpha encounters
  • visible — overworld vs hidden encounter
  • Max Raid and Tera Raid fields for special encounters

Implementation approach

Checklist

  • Set up project structure in tools/import-pokedb/
  • Download and cache PokeDB JSON export files
  • Parse PokeDB encounters, locations, location_areas, versions, pokemon_forms
  • Build lookup maps: pokemon_form_identifier → pokeapi_id (using existing pokemon.json)
  • Build lookup maps: location_area_identifier → location name + region
  • Filter encounters by target game version
  • Map PokeDB encounter methods to our seed format methods (73 → simplified set)
  • Parse level strings ("2 - 4" → min_level: 2, max_level: 4)
  • Handle rate variants per game generation:
    • For now, flatten time/weather/season rates into encounter_rate (use the max or average)
    • Preserve raw variant data for future use (see nuzlocke-tracker-oqfo)
  • Group encounters by location area → route output
  • Apply route ordering (use existing route_order.json or generate from location data)
  • Output in existing {game}.json seed format
  • Generate seed data for ALL games, replacing PokeAPI as the single source of truth
  • Compare output against existing PokeAPI-sourced data to validate accuracy
  • Run for all games and verify output

Encounter method mapping (draft)

PokeDB method → Our seed method:

  • walking-tall-grass, walking-* → "walk"
  • surfing, surfing-* → "surf"
  • fishing-old-rod → "old-rod"
  • fishing-good-rod → "good-rod"
  • fishing-super-rod → "super-rod"
  • fishing → "fishing"
  • rock-smash → "rock-smash"
  • headbutt-* → "headbutt"
  • npc-gift, egg, revive → "gift"
  • npc-trade → "trade"
  • symbol-encounter → "walk" (overworld, Gen 8+)
  • wanderer → "walk" (overworld visible)
  • fixed-encounter, static-encounter → "static"
  • swarm → "swarm"
  • poke-radar → "pokeradar"
  • dual-slot-mode → "dual-slot"
  • Others: TBD based on relevance

Notes

  • This tool replaces tools/fetch-pokeapi/ as the primary data source for all games
  • Pokemon form identifiers need mapping to pokeapi IDs — may need a fuzzy match since naming conventions differ
  • The existing pokemon.json has names and pokeapi IDs we can use as a lookup
  • S/V probability weights are not percentages — they represent relative spawn weights
  • Legends Arceus uses boolean conditions (during_night + while_clear) rather than rates