ADR 0014: Unified Scraper Runtime Pipeline¶
Status¶
Accepted (2026-02-08)
Context¶
Scraper execution previously split persistence concerns across two paths:
- Scraper internals (
BaseScraper.run) for anomaly checks and heartbeat events when a DB handle exists. - CLI orchestration (
backend/src/waittime/cli/scraper.py) for hospital upserts, measurement inserts, and heartbeat writes.
This duplication increased drift risk and made behavior harder to reason about under failure.
Decision¶
Adopt a single authoritative runtime path by routing persistence through BaseScraper.run(...):
- CLI now instantiates scrapers with
db=DatabaseService(). - CLI invokes
scraper.run(save_to_db=True, before_save=...). BaseScraper.runretains anomaly checks, DB insert, and success/failure heartbeat recording.- A new optional
before_save(measurements)hook inBaseScraper.runis used for hospital prerequisite upserts.
Consequences¶
Positive¶
- One execution path governs anomaly detection, inserts, and heartbeat semantics.
- Reduced duplicate error handling in CLI.
- Preserves prerequisite hospital upsert behavior without reintroducing split writes.
Tradeoff¶
BaseScraper.runsurface area is slightly larger due to hook support.- Hook callers must remain side-effect-safe and idempotent.
Follow-Up¶
- Keep heartbeat monitoring source discovery dynamic (
sourcestable), not hardcoded. - Add integration coverage for end-to-end scraper run with DB fixtures when available.