Data Dictionary¶
This document serves as the canonical reference for the Wait Time Canada database schema.
Core Schema¶
sources¶
Provincial data source metadata and provenance tracking.
| Column | Type | Description |
|---|---|---|
id | TEXT (PK) | Unique identifier (e.g., quebec-msss). |
name | TEXT | Display name of the source. |
province | CHAR(2) | Two-letter province code (ON, QC, etc.). |
url | TEXT | Official data portal URL. |
telehealth_name | TEXT | Local telehealth service name (e.g., "Health Link 811"). |
default_metric_family | ENUM | Default MetricFamily for this source. |
hospitals¶
Healthcare facilities that report wait times.
| Column | Type | Description |
|---|---|---|
id | TEXT (PK) | Unique identifier (format: ca-{province}-{slug}). |
name | TEXT | Official facility name. |
source_id | TEXT (FK) | Link to sources.id. |
is_verified | BOOLEAN | Safety Gate: Must be TRUE to be visible. |
is_visible | BOOLEAN | Whether to show on the public map. |
latitude | DOUBLE | Geographic coordinate. |
measurements¶
Individual audit logs of scraped data. High Volume.
| Column | Type | Description |
|---|---|---|
id | BIGSERIAL | Auto-incrementing primary key. |
hospital_id | TEXT (FK) | Link to hospitals.id. |
timestamp_utc | TIMESTAMPTZ | When the measurement was recorded. |
value | DOUBLE | Wait time value (usually minutes). |
metric_family | ENUM | Ontology tag: TIME_TO_PROVIDER, TOTAL_LOS, etc. |
start_event | ENUM | Ontology tag: TRIAGE, REGISTRATION, etc. |
end_event | ENUM | Ontology tag: PHYSICIAN, DISCHARGE, etc. |
statistic_type | ENUM | Ontology tag: P90, ROLLING_AVG, etc. |
raw_payload_hash | CHAR(64) | SHA256 hash of the source HTML (Storage Safety). |
scraper_status¶
Heartbeat monitor for scraper health.
| Column | Type | Description |
|---|---|---|
source_id | TEXT (PK) | Link to sources.id. |
last_run | TIMESTAMPTZ | Time of last scraper attempt (success or failure). |
status | ENUM | healthy, error, or stale. |
error_message | TEXT | Latest error message when status is error. |
measurements_count | INTEGER | Measurements persisted in the most recent run. |
last_success_run | TIMESTAMPTZ | Timestamp of the last successful run (last-known-good). |
last_success_measurements_count | INTEGER | Measurement count from the last successful run. |
last_error_run | TIMESTAMPTZ | Timestamp of the most recent failed run. |
last_error_category | TEXT | Structured failure class (upstream_unavailable, parser_breakage, infra_runtime, persistence_failure, unknown). |
last_error_stage | TEXT | Failure stage (fetch, parse, before_save, persist, heartbeat, orchestration). |
consecutive_failures | INTEGER | Number of consecutive failed runs since last success. |
last_run_duration_ms | INTEGER | Last run duration in milliseconds. |
scraper_alert_state¶
Persistent alert deduplication state for heartbeat incidents.
| Column | Type | Description |
|---|---|---|
source_id | TEXT (PK/FK) | Link to sources.id. |
active_incident_kind | TEXT | Current active incident kind: stale or error. |
active_incident_fingerprint | TEXT | Stable fingerprint for the active incident. |
opened_at | TIMESTAMPTZ | When the current active incident began. |
last_notified_at | TIMESTAMPTZ | When the active incident last generated a notification attempt. |
last_resolved_at | TIMESTAMPTZ | When the most recent incident for this source was resolved. |
updated_at | TIMESTAMPTZ | Row update timestamp. |
Public Health Hub Schema¶
public_data_sources¶
Public-health-hub source catalog and sync metadata.
| Column | Type | Description |
|---|---|---|
source_id | TEXT (PK) | Stable identifier for the public-health-hub source record. |
domain | TEXT | Source domain such as provider_facility, aed, safety_alert, environmental_overlay, or system_context. |
source_name | TEXT | Public display name used in provenance UI. |
connector_type | TEXT | Access posture such as api, feed, open_data_portal, or crowdsourced_registry. |
access_route | TEXT | Human-readable technical access path used by the source catalog UI and ops runbooks. |
license_reuse_status | TEXT | Hard implementation gate: approved, approved_with_conditions, or blocked. |
attribution_requirement | TEXT | Required attribution or provenance posture for shipped UI/API use. |
update_cadence | TEXT | Source refresh rhythm such as annual, ongoing, or real-time. |
recommended_usage_mode | TEXT | Whether the source is used via live_ui, scheduled_ingest, or a non-runtime mode. |
provenance_url | TEXT | Canonical upstream source URL shown in public provenance surfaces. |
last_verified_at | DATE | Last manual review date for source access/reuse posture. |
public_methodology_note | TEXT | Short user-facing caveat explaining how the source should and should not be interpreted. |
last_refreshed_at | TIMESTAMPTZ | Last successful in-product refresh timestamp for freshness rules. |
resource_locations¶
Normalized location resources for the public-health-hub module.
| Column | Type | Description |
|---|---|---|
id | TEXT (PK) | Stable internal identifier for a facility or AED record. |
source_id | TEXT (FK) | Link to public_data_sources.source_id. |
kind | TEXT | facility or aed. |
source_record_id | TEXT | Optional upstream identifier for deduplication and re-ingest. |
name | TEXT | Public resource name. |
province | CHAR(2) | Two-letter province code. |
latitude / longitude | DOUBLE PRECISION | Map coordinates for distance and display. |
reference_status | TEXT | Directory posture, currently directory_only for facility baseline data. |
crowdsourced | BOOLEAN | Marks crowdsourced fallback records such as OSM-backed AEDs. |
completeness_status | TEXT | Current completeness caveat, such as incomplete. |
provenance_url | TEXT | Upstream provenance URL for the record. |
last_refreshed_at | TIMESTAMPTZ | Last successful resource refresh timestamp used for show/warn/suppress logic. |
public_health_alerts¶
Normalized public recall and safety alert records.
| Column | Type | Description |
|---|---|---|
id | TEXT (PK) | Stable alert identifier. |
source_id | TEXT (FK) | Link to public_data_sources.source_id. |
title | TEXT | Public alert title. |
summary | TEXT | Short alert summary preserved from the official source. |
alert_type | TEXT | Feed-specific alert category. |
published_at | TIMESTAMPTZ | Official publication timestamp. |
source_updated_at | TIMESTAMPTZ | Upstream update timestamp when available. |
affected_products | JSONB | Structured affected-product list for optional enrichment/rendering. |
provenance_url | TEXT | Canonical alert URL. |
last_refreshed_at | TIMESTAMPTZ | Last successful ingest timestamp for freshness rules. |
public_health_system_metrics¶
Normalized Ontario EMS system-context records for analytics-only /resources cards.
| Column | Type | Description |
|---|---|---|
id | TEXT (PK) | Stable metric identifier derived from source, series, geography, year, and optional dimension label. |
source_id | TEXT (FK) | Link to public_data_sources.source_id. |
series_key | TEXT | Bounded metric family, currently cacc_average_response_times or paramedic_service_response_performance. |
province | CHAR(2) | Two-letter province code, currently ON. |
geography_type | TEXT | Geography semantics such as dispatch_centre or ambulance_service_coverage_area. |
geography_name | TEXT | Public geography label shown in the system-context UI. |
reporting_year | INTEGER | Official reporting year for the record. |
dimension_label | TEXT | Optional row dimension such as patient-severity category. |
metrics | JSONB | Structured numeric payload for route-specific rendering (for example response minutes, planned response rate, performance rate, call volume). |
provenance_url | TEXT | Canonical Ontario resource page for the specific row family. |
last_refreshed_at | TIMESTAMPTZ | Last successful ingest timestamp used for freshness and degradation rules. |
created_at / updated_at | TIMESTAMPTZ | Row lifecycle timestamps. |
public_health_source_alert_state¶
Persistent alert deduplication state for hard-fail public-health-hub sources.
| Column | Type | Description |
|---|---|---|
source_id | TEXT (PK/FK) | Link to public_data_sources.source_id. |
active_incident_kind | TEXT | Current active incident kind, currently degraded. |
active_incident_fingerprint | TEXT | Stable fingerprint for the active incident reasons. |
opened_at | TIMESTAMPTZ | When the current public-health ingest incident began. |
last_notified_at | TIMESTAMPTZ | When the active incident last generated a notification attempt. |
last_resolved_at | TIMESTAMPTZ | When the most recent incident for this source was resolved. |
updated_at | TIMESTAMPTZ | Row update timestamp. |
Analytics & Aggregation¶
measurement_aggregates¶
Permanent statistical summaries (hourly/daily/weekly/monthly).
| Column | Type | Description |
|---|---|---|
period_type | TEXT | hourly, daily, weekly, monthly. |
mean_value | DOUBLE | Average wait time for this period. |
p90_value | DOUBLE | 90th percentile wait time (if sufficient samples). |
metric_family | TEXT | Denormalized ontology snapshot. |
regions¶
Province region metadata for analytics segmentation.
| Column | Type | Description |
|---|---|---|
id | TEXT (PK) | Unique identifier. |
province | TEXT | Two-letter province code. |
name | TEXT | Region name (e.g., "Vancouver Coastal"). |
hospital_regions¶
Many-to-many mapping between hospitals and regions.
| Column | Type | Description |
|---|---|---|
region_id | TEXT (FK) | Link to regions.id. |
hospital_id | TEXT (FK) | Link to hospitals.id. |
is_primary | BOOLEAN | Whether this is the hospital's primary region. |
data_quality_snapshots¶
Daily scraper reliability metrics.
| Column | Type | Description |
|---|---|---|
snapshot_date | DATE | The date being analyzed. |
success_rate | DOUBLE | Percentage of expected scrapes that succeeded. |
longest_gap_minutes | INTEGER | Maximum downtime duration in minutes. |
methodology_change_events¶
Detected shifts in reporting methodology.
| Column | Type | Description |
|---|---|---|
detected_at | TIMESTAMPTZ | When the system flagged the shift. |
shift_percent | DOUBLE | Magnitude of the statistical shift. |
explanation | TEXT | Auto-generated hypothesis for the change. |