Skip to content

Operational Status Report

Historical snapshot: this document reflects the production posture review from 2026-03-21. For the current live state, use docs/planning/roadmap.md, docs/operations/reports/2026-03-operational-report.md, docs/operations/direct-vps-frontend.md, and docs/operations/direct-vps-backend.md.

Current-state addendum (2026-04-18): the March 28 Neon transfer-quota outage is no longer the active blocker. Live verification now includes the 2026-04-17 shared-VPS frontend release, so /api/health, /api/status, and aggregate /api/data-quality are all re-verified in production. The remaining cost/reliability follow-up is Launch usage monitoring, because the production Neon project is already confirmed on Launch with the current billing period beginning on 2026-04-16. See docs/operations/neon-production-upgrade.md for the recorded runbook and monitoring posture. See also docs/operations/incident-reports/2026-03-28-neon-transfer-quota.md and docs/planning/roadmap.md.

Date: 2026-03-21 Status: Historical operational snapshot

Frontend addendum (2026-03-13): https://wait-time.ca is now live on the shared VPS behind host Caddy, https://www.wait-time.ca redirects to the canonical host through Caddy, and production smoke checks pass against the canonical domain.

Backend addendum (2026-03-13): GitHub Actions remains the live backend scheduler path. A same-host VPS backend attempt was paused after confirming that the Ontario source times out from this VPS.

Backend reliability addendum (2026-03-21): the live GitHub Actions Ontario scraper path now retries a read timeout once with an extended HTTP read timeout. A post-deploy production run completed successfully at 2026-03-21T10:31:36Z, preserving GitHub Actions as the healthy live backend path while the VPS backend remains deferred.


Executive Summary

Wait Time Canada's public frontend is now served from the shared VPS, while the authoritative backend scheduler path remains GitHub Actions. This report documents that split production configuration and the current Ontario blocker on the VPS backend path.


Infrastructure Status

Scrapers: ✅ OPERATIONAL

Component Status Details
Quebec Scraper ✅ Active MSSS portal, BeautifulSoup, 120+ hospitals
Ontario Scraper ✅ Active Health Quality Ontario, direct HTTP + HTML table parsing, 220+ hospitals
Alberta Scraper ✅ Active AHS portal, Playwright, 26 hospitals
BC Scraper ✅ Active PHSA portal, JSON extraction, 25 hospitals

Total Coverage: 390+ hospitals across 4 provinces

Scheduling: ✅ OPERATIONAL

  • Scraper Cron: Runs hourly (0 * * * *)
  • Heartbeat Monitor: Checks every 30 minutes (*/30 * * * *)
  • Execution: python -m waittime.cli.scraper --all
  • Runtime: ~8-12 minutes per cycle
  • Timeout: 20 minutes max

Monitoring: ✅ OPERATIONAL

  • Heartbeat Checks: Active (120-minute threshold)
  • Failure Alerts: Pushover configured
  • Dead Man's Switch: check_heartbeat CLI monitors all sources
  • Dynamic Discovery: Sources auto-detected from database

Frontend Hosting: ✅ OPERATIONAL

  • Public Runtime: Shared VPS behind host Caddy
  • Canonical Domain: https://wait-time.ca
  • Private Upstream: http://127.0.0.1:3400
  • HTTPS: Let's Encrypt certificate served by Caddy on the VPS
  • Redirects: https://www.wait-time.ca redirects to https://wait-time.ca/
  • Operational Meaning: the canonical production URL is live on the VPS and has passed smoke verification

VPS Backend Attempt: ⚠️ DEFERRED

  • Mechanical deploy: succeeded
  • Systemd install: succeeded
  • Blocker: Ontario upstream timed out repeatedly from this VPS
  • Operational Meaning: GitHub Actions remains the live backend scheduler path

Verification Results

1. GitHub Actions Workflows ✅

scraper-cron.yml: - ✅ YAML syntax valid - ✅ Runs hourly - ✅ Installs Playwright browsers - ✅ Runs all 4 scrapers with --all flag - ✅ Failure alerting configured - ✅ Manual trigger available

heartbeat-monitor.yml: - ✅ YAML syntax valid - ✅ Runs every 30 minutes - ✅ Checks all sources dynamically - ✅ 120-minute heartbeat threshold - ✅ Pushover alerts configured

2. Scraper CLI ✅

$ python -m waittime.cli.scraper --list
# Returns: alberta-ahs, bc-phsa, ontario-health, quebec-msss

Verified: - ✅ All 4 scrapers registered - ✅ CLI help documentation accurate - ✅ --all flag functional - ✅ --dry-run mode available - ✅ Individual scraper execution works

3. Database Configuration ✅

Sources Seeded: - ✅ quebec-msss (QC) - ✅ ontario-health (ON) - ✅ alberta-ahs (AB) - ✅ bc-phsa (BC) - Metadata updated

BC Source Corrections Applied: - URL: https://edwaittimes.ca (was: http://www.edwaittimes.ca/) - Methodology: TRIAGE → PHYSICIAN, P90 (was: REGISTRATION → PHYSICIAN, POINT_ESTIMATE) - Methodology URL: https://www.edwaittimes.ca/about (was: NULL)

Migration Files: 004_seed_sources.sql plus 020_sync_active_source_definitions.sql

4. Testing ✅

Source Consistency Tests:

$ pytest tests/unit/test_source_consistency.py -v
# Result: 4 passed in 1.36s

Verified: - ✅ Quebec source factory matches seed data - ✅ Ontario source factory matches seed data - ✅ Alberta source factory matches seed data - ✅ BC source factory matches seed data (after correction)


Documentation Created

New Files

  1. docs/operations/scraper-scheduling.md (9.4 KB)
  2. Comprehensive operational guide
  3. Scraper details for all 4 provinces
  4. GitHub Actions workflow documentation
  5. Monitoring and alerting procedures
  6. Troubleshooting guide
  7. Cost analysis and optimization options

  8. docs/operations/QUICK_START.md (2.1 KB)

  9. Quick reference for manual operations
  10. Common CLI commands
  11. Database queries
  12. Troubleshooting shortcuts

  13. docs/operations/OPERATIONAL_STATUS.md (this file)

  14. Production verification report
  15. Infrastructure status
  16. Verification results

Updated Files

  1. backend/migrations/004_seed_sources.sql
  2. ✅ BC source URL corrected
  3. ✅ BC methodology corrected (TRIAGE → PHYSICIAN, P90)
  4. ✅ BC methodology URL added

  5. docs/planning/roadmap.md

  6. ✅ Added operational verification milestone
  7. ✅ Updated strategic direction with scheduling status

  8. IMPLEMENTATION_SUMMARY.md

  9. ✅ Marked scraper scheduling complete
  10. ✅ Marked heartbeat monitoring complete

Configuration Details

Environment Variables (GitHub Actions Secrets)

Required: - DATABASE_URL - Neon PostgreSQL connection string - PUSHOVER_USER_KEY - Pushover notification user key - PUSHOVER_API_TOKEN - Pushover API token

Optional: - SENTRY_DSN - Error tracking (configured but not required)

Workflow Concurrency

scraper-cron.yml:

concurrency:
  group: scraper-cron
  cancel-in-progress: false  # Allow overlapping runs if previous is slow

heartbeat-monitor.yml:

concurrency:
  group: heartbeat-monitor
  cancel-in-progress: false

Timeout Configuration

  • Scraper cron: 20 minutes
  • Individual HTTP requests: 30 seconds
  • Playwright page loads: Default (30 seconds)

Performance Metrics

Expected Performance

Metric Target Status
Scraper Frequency Hourly ✅ Configured
Scraper Runtime < 15 min ✅ ~10 min avg
Heartbeat Frequency Every 30 min ✅ Configured
Max Heartbeat Age < 120 min ✅ Monitored
Data Freshness < 120 min ✅ hourly scheduler path

Cost Estimate

GitHub Actions Minutes: - Scraper cron: ~34,560 min/month - Heartbeat monitor: ~2,880 min/month - Total: ~37,440 min/month

Free Tier: 2,000 minutes/month Overage: ~35,440 minutes/month Estimated Cost: ~$283/month at $0.008/min

Optimization Opportunity: Reduce frequency to 30 minutes = 50% savings


Known Limitations

1. GitHub Actions Free Tier Exceeded

Impact: Monthly cost of ~$283 for scraper execution Mitigation Options: - Reduce scraper frequency to 30 minutes - Implement smart scheduling (skip overnight hours) - Use self-hosted runner (requires infrastructure)

2. Playwright Browser Size

Impact: ~500MB Chromium download per workflow run Cache: GitHub Actions caches Playwright browsers Mitigation: No action needed (cached effectively)

3. No Real-Time Alerting Dashboard

Impact: Alerts via Pushover only (no visual dashboard) Mitigation: Database queries provide manual monitoring Future: Consider Prometheus/Grafana integration


Next Steps

Immediate (Optional)

  1. Monitor Production:
  2. Check GitHub Actions runs over next 24 hours
  3. Verify heartbeat checks are passing
  4. Review Pushover alerts (should be none if healthy)

  5. Verify Data Freshness:

    SELECT source_id, MAX(timestamp_utc), COUNT(*)
    FROM measurements
    WHERE timestamp_utc > NOW() - INTERVAL '1 hour'
    GROUP BY source_id;
    

Short-Term

  1. Cost Optimization:
  2. Evaluate 30-minute scraper frequency
  3. Consider night-time pause for some provinces
  4. Analyze per-province data freshness requirements

  5. Enhanced Monitoring:

  6. Add Prometheus metrics endpoint
  7. Create Grafana dashboard for visualization
  8. Implement measurement count tracking

Long-Term

  1. Additional Provinces:
  2. Nova Scotia scraper (data source available)
  3. New Brunswick scraper (data source available)
  4. Saskatchewan (waiting for public data source)

  5. Advanced Features:

  6. Predictive scheduling (reduce frequency when data is stable)
  7. Per-hospital staleness tracking
  8. Automatic scraper recovery on persistent failures

User Action Required

The system is fully operational. Scrapers run automatically on the temporary 30m/60m cadence. You'll receive Pushover alerts if any scraper fails or becomes stale.

Recommendation: Monitor for 24-48 hours to ensure stability.

Option 2: Reduce Costs

If GitHub Actions costs are a concern:

  1. Reduce Frequency to 30 Minutes:

    # Edit .github/workflows/scraper-cron.yml
    schedule:
      - cron: '*/30 * * * *'  # Change from */15
    

  2. Update Heartbeat Threshold:

    # Edit .github/workflows/heartbeat-monitor.yml
    --max-age 90  # Change from 60
    

  3. Commit and Push:

    git add .github/workflows/
    git commit -m "ops: reduce scraper frequency to 30 min for cost optimization"
    git push
    

Impact: Reduces cost by ~50% (~$141/month instead of $283/month)

Option 3: Add Monitoring Dashboard

If you want visual monitoring:

  1. Set up self-hosted Prometheus instance
  2. Add metrics endpoint to backend API
  3. Create Grafana dashboard for visualization

Documentation: See docs/operations/scraper-scheduling.md for details


Support & Troubleshooting

Quick Health Check

cd backend
source .venv/bin/activate
export DATABASE_URL="your_connection_string"

# Check heartbeat status
python -m waittime.cli.check_heartbeat --dry-run

# List recent measurements
python -c "
from waittime.services import DatabaseService
db = DatabaseService()
with db.get_connection() as conn:
    with db.get_cursor(conn) as cur:
        cur.execute('SELECT source_id, MAX(timestamp_utc), COUNT(*) FROM measurements WHERE timestamp_utc > NOW() - INTERVAL %s GROUP BY source_id', ('1 hour',))
        for row in cur.fetchall():
            print(f'{row[0]}: {row[2]} measurements, latest: {row[1]}')
"

View Logs

  1. GitHub Actions: https://github.com/yourusername/waittimecanada/actions
  2. Database queries: See docs/operations/scraper-scheduling.md
  3. CLI help: python -m waittime.cli.scraper --help

Report Issues

  • GitHub Issues: Operational problems or bugs
  • Documentation: docs/operations/ directory
  • Code: backend/src/waittime/cli/scraper.py

Conclusion

All systems are operational and properly configured.

The Wait Time Canada scraper infrastructure is production-ready with: - 4 provincial scrapers running on automated schedules - Heartbeat monitoring and alerting - Comprehensive operational documentation - Verified GitHub Actions workflows - Corrected BC source metadata

No immediate user action required. The system will continue to operate automatically.


Last Verified: 2026-02-11 Next Review: 2026-02-18 (1 week) Responsible: GitHub Actions automation