Skip to content

0008. Two-Tier Aggregation Pipeline

Date: 2026-02-06

Status: Accepted

Deciders: Jeremy Dawson

Update (2026-03-13): the aggregate architecture in this ADR remains active, but the original assumption of automatic 30-day raw-data deletion is no longer the default operating policy. Raw measurements are now preserved by default, and aggregates remain the efficient long-range analytics layer.

Context and Problem Statement

How do we enable longitudinal analysis (90-day, 6-month, 1-year trends) without forcing every analytics query to scan the full raw measurement table?

Decision Drivers

  • Need efficient long-range analytics even when raw measurements are preserved
  • Need for long-range trend analysis (Scholar narrative)
  • Research-grade data export requires extended date ranges
  • Must preserve ontology metadata for comparability analysis

Considered Options

  • Keep all raw data indefinitely
  • Pre-computed materialized views
  • Two-tier architecture: raw measurements + permanent aggregates

Decision Outcome

Chosen option: "Two-tier architecture", because it balances analytical depth with query efficiency. Raw measurements remain available for high-granularity analysis, while permanent aggregates enable longitudinal research and cheaper read paths.

Positive Consequences

  • Enables 90d/6m/1y trend visualization
  • Data export supports extended date ranges with granularity selector
  • Ontology tags denormalized into aggregates preserve historical methodology context
  • Backfill CLI allows recomputing aggregates from existing raw data

Negative Consequences

  • Aggregates lose individual measurement granularity
  • Two-step query pattern (check aggregates first, fall back to raw)
  • Requires scheduled aggregation maintenance even when no purge is planned

Implementation Details

Table: measurement_aggregates with period types: hourly, daily, weekly, monthly.

Statistics per aggregate: mean, median, p90, min, max, std_dev, sample_count.

CLI: python -m waittime.cli.aggregate with --backfill, --incremental, --dry-run flags.

Frontend integration: TrendChart extended with 90d/6m/1y buttons; DataExport extended with granularity selector.

  • [Related to] ADR-0002 (ontology tags denormalized into aggregates)
  • Implementation plan: docs/planning/archive/milestone-13-aggregation.md