2. Implement Strict Metric Ontology System¶
Date: 2024-12-26
Status: Accepted
Deciders: Project Team
Technical Story: Core architectural decision for data model
Context and Problem Statement¶
Canadian provinces report emergency room wait times using different methodologies: - Alberta: Triage-to-Physician (90th percentile) - Quebec: Registration-to-Physician (Rolling average) - Manitoba: Algorithmic estimate (proprietary calculation)
Problem: Users will naturally assume these values are directly comparable and make inappropriate comparisons between provinces, leading to misinterpretation and potential poor healthcare decisions.
How do we store heterogeneous wait time data in a way that enables transparency about comparability without attempting to normalize incomparable metrics?
Decision Drivers¶
- Data Integrity - Never misrepresent or artificially normalize provincial data
- Scientific Rigor - Enable researchers to understand exact methodology
- User Safety - Prevent misinterpretation that could lead to poor healthcare decisions
- Portfolio Value - Demonstrate understanding of research methodology ("Scholar" narrative)
- Maintainability - Clear schema that prevents future drift
Considered Options¶
- Strict Metric Ontology - Tag every measurement with 4-field ontology (metric_family, start_event, end_event, statistic_type)
- Normalization Approach - Convert all measurements to standardized metric
- Free-Text Methodology - Store methodology as unstructured text
- Province-Specific Tables - Separate tables per province
Decision Outcome¶
Chosen option: "Strict Metric Ontology", because: - Only option that maintains data integrity while enabling comparability analysis - Provides scientific rigor required for research use - Enables automatic generation of "divergence briefs" when data is incomparable - Demonstrates deep understanding of research methodology (key portfolio value) - Scales to new provinces without schema changes
Positive Consequences¶
- Automatic Comparability Detection - Can programmatically determine if two measurements are comparable
- Divergence Brief Generation - System automatically explains why data is incomparable
- Database-Level Enforcement - PostgreSQL enums prevent invalid ontology values
- Research-Grade Data - Researchers can query by exact methodology
- Future-Proof - Adding new provinces/metrics doesn't require schema changes
- Portfolio Differentiation - Showcases "Physician-Innovator" competency
Negative Consequences¶
- Initial Complexity - Requires understanding of each province's methodology
- Cannot Average Across Provinces - No "national average" wait time possible
- Scraper Complexity - Each scraper must correctly tag ontology fields
- UI Complexity - Must show methodology warnings prominently
Pros and Cons of the Options¶
Strict Metric Ontology¶
Example schema:
CREATE TABLE measurements (
metric_family TEXT NOT NULL, -- TIME_TO_PROVIDER, TOTAL_LOS
start_event TEXT NOT NULL, -- TRIAGE, REGISTRATION, DOOR
end_event TEXT NOT NULL, -- PHYSICIAN, PROVIDER, DISCHARGE
statistic_type TEXT NOT NULL -- P90, MEAN, ALGORITHMIC
);
Comparability logic:
comparable = (
A.metric_family == B.metric_family and
A.start_event == B.start_event and
A.end_event == B.end_event and
A.statistic_type == B.statistic_type
)
- Good, because maintains exact provincial methodology
- Good, because enables automatic comparability detection
- Good, because database-enforced via enums (no drift)
- Good, because demonstrates research methodology understanding
- Bad, because cannot provide "average Canadian wait time"
- Bad, because each scraper must tag correctly
Normalization Approach¶
Convert all measurements to standard metric (e.g., "Triage-to-Physician P90")
- Good, because enables direct cross-province comparison
- Good, because simpler UI (just show wait time)
- Good, because can calculate national averages
- Bad, because scientifically invalid (cannot convert Registration→Triage accurately)
- Bad, because loses original methodology information
- Bad, because creates false precision
- Bad, because misrepresents provincial data
Free-Text Methodology¶
Store methodology as unstructured text field
- Good, because flexible (any methodology)
- Good, because preserves original description
- Bad, because cannot programmatically determine comparability
- Bad, because no enforcement (can drift over time)
- Bad, because difficult to query/filter
- Bad, because requires manual interpretation
Province-Specific Tables¶
Separate tables: measurements_ab, measurements_qc, etc.
- Good, because each province can have custom schema
- Good, because no cross-province confusion
- Bad, because cannot query across provinces
- Bad, because schema changes required for each new province
- Bad, because violates DRY principle
- Bad, because complex cross-province features
Links¶
- [Related to] Architecture Database Guide - PostgreSQL enums chosen for ontology enforcement
- [Refines] Strategic plan "Comparability Boolean" section
Additional Information¶
Ontology Field Definitions¶
metric_family - What is being measured? - TIME_TO_PROVIDER: Wait to see doctor/nurse - TOTAL_LOS: Total length of stay in ED - STRETCHER_OCCUPANCY: Current ED occupancy rate
start_event - When does the clock start? - TRIAGE: After initial triage assessment - REGISTRATION: After administrative check-in - DOOR: Upon physical arrival at ED - UNKNOWN: Source doesn't specify
end_event - When does the clock stop? - PHYSICIAN: Physician initial assessment - PROVIDER: Any provider (doctor, NP, PA) - DISCHARGE: Patient leaves ED - FIRST_ASSESSMENT: First clinical contact (any role)
statistic_type - How is value calculated? - P90: 90th percentile (CIHI standard) - MEDIAN: 50th percentile - MEAN: Average - ROLLING_AVG: Moving average (window unspecified) - ALGORITHMIC: Proprietary calculation - POINT_ESTIMATE: Current real-time value
Implementation Notes¶
- Use PostgreSQL enums for strict enforcement
- Create
CHECKconstraints as fallback - Pydantic models validate on insert
- Frontend shows methodology warnings when comparing incomparable data
- Auto-researcher generates "divergence briefs" for incomparable pairs
Real-World Example¶
Alberta (AHS):
{
"metric_family": "TIME_TO_PROVIDER",
"start_event": "TRIAGE",
"end_event": "PHYSICIAN",
"statistic_type": "P90"
}
Quebec (MSSS):
{
"metric_family": "TIME_TO_PROVIDER",
"start_event": "REGISTRATION",
"end_event": "PHYSICIAN",
"statistic_type": "MEAN"
}
Comparability Result: False (different start_event AND statistic_type)
Divergence Brief:
⚠️ Methodology Divergence: Alberta reports 90th percentile Triage-to-Physician time, meaning 90% of patients are seen within the reported time. Quebec reports average Registration-to-Physician time, which starts the clock later and uses a different statistical measure. Direct comparison is scientifically invalid.
References¶
- CIHI Emergency Department Performance Measures: https://www.cihi.ca/en/indicators/
- Alberta AHS Methodology: https://www.albertahealthservices.ca/waittimes/waittimes.aspx
- Quebec MSSS Methodology: https://www.quebec.ca/sante/urgences