Skip to content

6. Configuration via a single Config dataclass (env + defaults)

Status

Accepted

Context

Slice 1 (ingestion) is the first code that needs runtime configuration — the site code, where raw volumes land, and where the SQLite index lives. CLAUDE.md requires that these come from config and are never hardcoded, and points at "one config file / env." Later slices (notably the collect service) will need the same values plus more (poll interval, failover list), so the mechanism has to scale without a rewrite of every call site.

Decision

A single frozen Config dataclass (src/backscatter/config.py) is the one source of truth. Every module takes a Config; no module reads the environment on its own.

load_config() resolves each field with precedence CLI argument > environment variable > built-in default:

  • site — CLI positional (e.g. backscatter pull KFTG) > BACKSCATTER_SITE > KFTG.
  • data_dirBACKSCATTER_DATA_DIR > ./data.
  • db_pathBACKSCATTER_DB_PATH > <data_dir>/backscatter.db.

No config-file parsing yet. The loader is the only place that knows where values come from, so a TOML (or similar) file loader slots in there later as one more precedence layer (file sitting between env and defaults) without touching callers.

Consequences

  • Tests and ad-hoc runs configure everything via env vars or by constructing a Config directly — no global state, easy to isolate in a tmp_path.
  • Adding a setting = one dataclass field + one line in load_config().
  • The defaults make backscatter pull work out of the box for the operator's KFTG default while staying fully overridable for any CONUS site.

Alternatives considered

  • A config file (TOML) now. Rejected for this slice as premature: more machinery than ingestion needs, and the dataclass-as-source-of-truth shape means we can add it later with no churn. Revisit when collect needs richer config.
  • Scattered os.environ reads at each call site. Rejected: no single source of truth, hard to test, and exactly what CLAUDE.md warns against.

Update — Slice 2 (location-based site resolution)

The primary location input is now a lat/lon, not a site code. Config gains lat and lon; the active site is resolved at load as the nearest radar to the lat/lon against the bundled NEXRAD table (ADR-0005), via sites.select.nearest_site.

Resolution and precedence: - lat / lon — arg > BACKSCATTER_LAT / BACKSCATTER_LON > default (Elizabeth, CO: 39.3603, -104.5969, which resolves to KFTG). - site — an explicit override (site arg, e.g. backscatter pull KTLX, or BACKSCATTER_SITE) still wins, upper-cased. Absent that, site is the resolved nearest radar. There is no longer a hardcoded default site.

The single-source-of-truth and "env reads live only in config.py" invariants are unchanged. pull is untouched — it still reads config.site, which is now resolved rather than hardcoded.

Update — Slice 8 (multiple locations)

Config generalizes from a single lat/lon to a list of named locations, exactly one flagged the default ("Home"). Config.locations: tuple[Location, ...]; each Location carries name, lat, lon, site, is_default, site_override. The former flat fields (lat/lon/site/site_override) are now read-only properties delegating to the default location, so every existing single-location consumer keeps working.

  • Source: BACKSCATTER_LOCATIONS — a JSON list of {name, lat, lon, default?}. Absent it, the single BACKSCATTER_LAT/BACKSCATTER_LON form is treated as a one-entry list named "Home" (back-compat). TOML can replace the JSON env later without touching call sites.
  • Validation (at load, raises ValueError): ≥1 location, exactly one default, unique names (case-insensitive).
  • Site override applies to the default location only. BACKSCATTER_SITE (or the pull positional) pins Home's site; every other location resolves its own nearest radar. Documented because it's the one non-obvious interaction.
  • Frames are per-radar, not per-location. The volumes index is unchanged ((site, scan_time)); a location maps to frames via its resolved site. Two co-located locations share one radar's frames — collect stores each volume once (dedupe on (site, scan_time)), never per-location. The API resolves a location param to its site.