Deploy + verify playbook (production VPS)
Goal: deploy a known-good main and verify production matches policy.
Canonical references:
- Production runbook:
../../../deployment/production-single-vps.md - Monitoring/CI gate:
../../monitoring-and-ci-checklist.md - Baseline drift:
../../baseline-drift.md
Preconditions
- CI is green on the commit you intend to deploy.
- You are on the production VPS and can
sudo. - If an incident fix depends on repository changes, make sure those changes are already committed and pushed before you start the VPS procedure.
Procedure
-
Identify the exact commit to deploy.
-
Prefer a pinned deploy ref for incident work so the VPS change is unambiguous.
-
Verify the commit is already present on
origin/mainbefore running the VPS deploy. -
Run the deploy gate (recommended one command):
-
cd /opt/healtharchive && ./scripts/vps-deploy.sh --apply --baseline-mode live - Or pinned:
cd /opt/healtharchive && ./scripts/vps-deploy.sh --apply --baseline-mode live --ref <GIT_SHA>
Recommended wrapper (routine use):
cd /opt/healtharchive && ./scripts/vps-hetzdeploy.sh
This includes:
- DB migrations
- service restarts (API always; worker may be skipped during active crawls)
- baseline drift verification
- public surface verification
If your change updates systemd unit templates or Prometheus alert rules, you can apply those as part of the deploy:
./scripts/vps-deploy.sh --apply --baseline-mode live --install-systemd-units --apply-alerting
If the public frontend is externally unavailable, you can deploy backend-only:
./scripts/vps-hetzdeploy.sh --mode backend-only
Optional: install the wrapper outside the repo so it never dirties /opt/healtharchive:
sudo ./scripts/vps-install-hetzdeploy.sh --apply- Then run:
hetzdeployorhetzdeploy --mode backend-only
Notes:
- Do not run
git pullseparately as a substitute for the deploy helper. Use the deploy helper to fetch, install dependencies, run migrations, and restart the right services coherently. - If the operator is following assistant-provided incident commands, do not continue to repo-dependent recovery steps until the deploy output confirms the intended ref is active.
- Prefer a real command over an alias; aliases can accidentally persist
set -euo pipefailin your interactive shell. - If
hetzdeploy --mode backend-onlyerrors withsyntax error near unexpected token, you probably still have an alias namedhetzdeploy.- Check:
type hetzdeploy - Remove:
unalias hetzdeploy 2>/dev/null || trueand delete the alias line from your shell startup files.
- Check:
--apply-alertingrequires alerting to be configured on the VPS (webhook secret present at/etc/healtharchive/observability/alertmanager_webhook_url).
If you are updating the replay banner/template or replay service config on a single-VPS deployment, include replay restart + banner install:
./scripts/vps-deploy.sh --apply --baseline-mode live --restart-replay
Crawl safety:
- If any jobs are
status=running, the deploy helper will skip restartinghealtharchive-workerby default to avoid SIGTERMing an active crawl. - When you need to force a worker restart (only when safe):
./scripts/vps-deploy.sh --apply --baseline-mode live --force-worker-restart -
If you want to explicitly keep the worker untouched regardless of job status:
./scripts/vps-deploy.sh --apply --baseline-mode live --skip-worker-restart -
If the deploy gate fails:
-
Do not retry blindly.
- Read the failure output:
- drift report artifacts under
/srv/healtharchive/ops/baseline/ - verifier output from
verify_public_surface.py
- drift report artifacts under
- Fix the underlying mismatch (production state vs policy) or intentionally update policy.
Quick follow-ups (optional)
- Confirm timers/sentinels posture (if you operate automation):
./scripts/verify_ops_automation.sh