# CHANGES.md

This file documents every editorial decision applied to Michael Paine's
original GEMFIND 1997 data when producing the JSON dataset in `data/`.
The original spreadsheets are preserved unmodified under `1_source/`.

The aggregation policy follows the project plan §6.4: corrections are
grouped by *category* and *action* rather than logged per-row, with a few
representative examples and a row count for each.

## dedupe
- **remove duplicate (gemstone, town) findings** — 1 row(s)
  - examples: `GARNET × BARRABA appeared twice`

## disambiguation
- **strip redundant state-suffix from town display name** — 4 row(s)
  - examples: `BEACONSFIELD TAS → BEACONSFIELD [TAS]`, `DUNDAS TAS → DUNDAS [TAS]`, `GLADSTONE TAS → GLADSTONE [TAS]`, `ROCKY CK QLD → ROCKY CK [QLD]`
  - note: Internal join key (town_key) keeps the original full-uppercase form so findings.json links unchanged.

## geocode
- **match towns against GeoNames AU dump (CC-BY)** — 316 row(s)
  - examples: `auto match rate: 287/316`
  - note: auto=287 (90%), needs_review=18, miss=11 (flagged 'location approximate' in v1).

## partition
- **split GEMSTONE='+' rows from locations.xls into separate anchors table** — 10 row(s)
  - examples: `Darwin`, `Brisbane`, `Adelaide`, `Hobart`, `Melbourne`
  - note: These are reference cities (Darwin, Brisbane, …) used for map orientation in the original program, not fossicking sites.

## row-drop
- **drop trailing blank rows from descriptions.xls** — 2252 row(s)
- **drop '+' sentinel row from descriptions.xls (reference-only marker)** — 1 row(s)
  - examples: `GEMSTONE='+', DESCRIP='Reference town only'`
- **drop trailing blank rows from more_locations.xls** — 1976 row(s)
- **drop footer junk from more_locations.xls** — 3 row(s)
  - examples: `Totals:`, `X_GRID      83388`, `Y_GRID      221439`
- **drop trailing blank rows from locations.xls** — 1647 row(s)
- **drop footer junk from locations.xls** — 2 row(s)
  - examples: `X_LOC       275615`, `Y_LOC       441909`
- **drop findings with no TOWN** — 1 row(s)
  - examples: `Totals:`

## spelling
- **correct 'CORRUNDUM, CHALCENDONY' → 'corundum, chalcedony' inside text columns** — 41 row(s)
  - examples: `CORRUNDUM→corundum (FAMILY)`, `CHALCENDONY→chalcedony (DESCRIP, RELATED, gemstone name)`
- **rename gemstone in GEMSTONE column** — 1 row(s)
  - examples: `CHALCENDONY → CHALCEDONY`
- **tidy gemstone display names** — 1 row(s)
  - examples: `Opal - Common → Common Opal`

## spillover
- **stitch DESCRIP overflow into 'Unnamed: 2' column** — 7 row(s)
  - examples: `CORUNDUM`, `DIAMOND`, `GARNET`, `GEODES`, `TOPAZ`
  - note: affected gemstones: CORUNDUM, DIAMOND, GARNET, GEODES, TOPAZ, TOURMALINE, ZIRCON

## typo
- **repair town-name typos in locations.xls so they join to towns table** — 2 row(s)
  - examples: `COORAM → COORAN`, `MUNDIMINDI → MUNDIWINDI`
