Roadmaps
Current backlog
- Future roadmap (what is not implemented yet):
roadmap.md
Implementation plans (active)
Implementation plans live directly under docs/planning/ while they are active. When complete, move them to docs/planning/implemented/ and date them.
Active plans:
- Admissions strengthening (OMSAS ABS + CanMEDS, ~12 weeks):
2026-02-admissions-strengthening-plan.md - Hot-path staleness root-cause investigation:
2026-02-06-hotpath-staleness-root-cause-investigation.md - Public search and CIHR follow-through:
2026-05-05-public-search-and-cihr-followthrough.md
Current priority sequence
Treat the following as the current "what's next" order across roadmap docs:
- Close the post-CIHR public surface and recurrence-prevention follow-through.
- All 2026 annual jobs are indexed, search-ready, and research-ready as of 2026-05-05.
- Public search latency fixes, verifier fallback, raw/replay verification, and WARC-complete/ZIM-finalization recurrence prevention are deployed and live-verified.
- CIHR failed URL review is complete: 25 page/route URLs already had exact job
8snapshot coverage, and the lone uncovered render-asset image was accepted as a non-page gap. - Remaining work: optionally revisit broad
q=...&view=pagesDB/index-plan tuning if repeated warm-cache samples exceed the desired response target. - Canonical plan:
2026-05-05-public-search-and-cihr-followthrough.md - Canonical tracker:
../operations/healtharchive-ops-roadmap.md - After the crawl/ops path is stabilized, the main project emphasis is the active admissions-strengthening plan.
- That plan is the canonical home for the next external-validation, methods-paper, and dataset-release work.
- Canonical plan:
2026-02-admissions-strengthening-plan.md - Any docs-site migration planning should use the archived prep inventory as the starting point, but the actual generator swap remains a separate change series.
- Canonical reference:
implemented/2026-04-15-zensical-migration-prep.md
Operator Follow-Through (Maintenance Window)
Some plans are "implemented in repo" but still require a short, operator-run maintenance step on the VPS.
Current known items:
- CIHR scope/content-cost follow-through:
- Current state: job
8completed WARC capture, was manually accepted after a ZIM finalization failure, and indexed successfully with557972pages. The CIHR annual edition is research-ready, and the public verifier reaches search, snapshot metadata, raw HTML, replay, and frontend checks. - Failed URL review is complete; no targeted follow-up capture is needed for this incident.
- Historical plan:
implemented/2026-03-23-annual-crawl-content-cost-and-scope-diagnosis.md - PHAC annual-crawl policy follow-up after the 2026-03-23 canada.ca incident:
- Current state: job
7indexed successfully on 2026-04-29 through the labeledplaywright_warcfallback path, and the PHAC annual edition report is research-ready. - Policy decision: keep PHAC Browsertrix-first with labeled
playwright_warcfallback for now, and keep the temporary high-churn exclusions until a separate live verification proves those paths are stable. - No targeted PHAC recrawl is needed for the 2026 annual edition.
- Status tracking + next-step guidance:
../operations/healtharchive-ops-roadmap.md - Annual output-dir mount topology conversion (direct
sshfsmounts → bind mounts): - Current state: completed on 2026-05-06 for annual jobs
6,7, and8. - Verification: one Storage Box
sshfsprocess, hot/cold directory identity matched, annual status remainedindexed=3, and replay smoke returned200for HC, PHAC, and CIHR. - Status tracking:
../operations/healtharchive-ops-roadmap.md - Preserved VPS branch follow-up:
- Current state:
prod-pre-a3e0decewas reviewed against deployedca25d8ea. - Decision: do not merge or cherry-pick it; the branch is older than current deployed state and would remove newer annual edition, replay, search, and incident documentation work.
- Next action: delete the branch from the VPS after the roadmap update is deployed.
Implemented plans (history)
- Implemented plans archive:
implemented/README.md - Annual edition recovery handoff docs:
implemented/2026-04-29-annual-edition-recovery-handoff-docs.md - Repo audit truth maintenance:
implemented/2026-04-24-repo-audit-truth-maintenance.md - Frontend + backend monorepo consolidation:
implemented/2026-04-14-healtharchive-monorepo-consolidation-plan.md - Monorepo phase 0 inventory and execution checklist:
implemented/2026-04-14-healtharchive-monorepo-phase0-inventory.md - Annual crawl content-cost and scope diagnosis:
implemented/2026-03-23-annual-crawl-content-cost-and-scope-diagnosis.md - Crawl operability (locks, writability, retry controls):
implemented/2026-02-06-crawl-operability-locks-and-retry-controls.md - Crawl health remediation (scope regex, circuit breaker, dep separation, alerts):
implemented/2026-02-25-crawl-health-remediation.md - Automation-first crawl alerting and dashboarding:
implemented/2026-02-23-automation-first-crawl-alerting-and-dashboarding.md - Alerting noise reduction + routing tuning:
implemented/2026-02-19-alerting-noise-reduction-and-routing-tuning.md - Operational resilience improvements:
implemented/2026-02-01-operational-resilience-improvements.md - Deploy workflow hardening (single VPS):
implemented/2026-02-07-deploy-workflow-hardening.md - CI schema + governance guardrails:
implemented/2026-02-06-ci-schema-and-governance-guardrails.md - Storage watchdog observability hardening:
implemented/2026-02-06-storage-watchdog-observability-hardening.md - Disk usage investigation (48GB discrepancy):
implemented/2026-02-01-disk-usage-investigation.md - WARC discovery consistency improvements (partial):
implemented/2026-01-29-warc-discovery-consistency.md - WARC manifest verification:
implemented/2026-01-29-warc-manifest-verification.md - Patch-job-config CLI + integration tests:
implemented/2026-01-28-patch-job-config-and-integration-tests.md - archive_tool hardening + ops improvements:
implemented/2026-01-27-archive-tool-hardening-and-ops-improvements.md - Annual crawl throughput and WARC-first artifacts:
implemented/2026-01-23-annual-crawl-throughput-and-artifacts.md - Infra-error retry storms + Storage Box hot-path resilience:
implemented/2026-01-24-infra-error-and-storage-hotpath-hardening.md - SLA and service commitments (v1):
implemented/2026-01-17-sla-and-service-commitments.md - Test coverage: critical business logic:
implemented/2026-01-17-test-coverage-critical-business-logic.md - Disaster recovery and escalation procedures:
implemented/2026-01-17-disaster-recovery-and-escalation-procedures.md - Operational hardening: tiering alerting + incident follow-ups:
implemented/2026-01-17-ops-tiering-alerting-and-incident-followups.md - Search ranking + snippet quality iteration (v3):
implemented/2026-01-03-search-ranking-and-snippets-v3.md - Storage Box / sshfs stale mount recovery + integrity:
implemented/2026-01-08-storagebox-sshfs-stale-mount-recovery-and-integrity.md
Historical context
- HealthArchive 6-Phase Upgrade Roadmap (2025; archived):
implemented/2025-12-24-6-phase-upgrade-roadmap-2025.md