Skip to content

Automation maintenance playbook (systemd timers)

Goal: keep automation boring, observable, and explicitly controlled.

Canonical references:

  • systemd unit templates + enable/rollback: ../../../deployment/systemd/README.md
  • Verification ritual: ../../automation-verification-rituals.md

Install/update templates (after repo updates)

On the VPS:

  • cd /opt/healtharchive
  • sudo ./scripts/vps-install-systemd-units.sh --apply --restart-worker

Bootstrap ops directories (one-time)

If /srv/healtharchive/ops/ is not prepared:

  • sudo ./scripts/vps-bootstrap-ops-dirs.sh

Enablement controls (sentinel files)

Automation is intentionally gated by sentinel files under /etc/healtharchive/.

Follow the enable/rollback steps in ../../../deployment/systemd/README.md.

Verify posture

  • ./scripts/verify_ops_automation.sh
  • Spot-check logs:
  • journalctl -u <service> -n 200

Storage watchdog cadence (monthly)

For stale-mount watchdog reliability, include this in the periodic automation review:

  1. Re-run a safe dry-run watchdog drill:
  2. ../storage/storagebox-sshfs-stale-mount-drills.md (Section 1)
  3. Re-run the safe persistent failed-apply alert-condition drill:
  4. ../storage/storagebox-sshfs-stale-mount-drills.md (Section 2)
  5. Review watchdog state + key metrics:
  6. /srv/healtharchive/ops/watchdog/storage-hotpath-auto-recover.json
  7. healtharchive_storage_hotpath_auto_recover_last_apply_ok
  8. healtharchive_storage_hotpath_auto_recover_apply_total
  9. If HealthArchiveStorageHotpathApplyFailedPersistent fired recently, follow:
  10. ../storage/storagebox-sshfs-stale-mount-recovery.md

Burn-in helper command (safe, read-only summary):

  • python3 scripts/vps-storage-watchdog-burnin-report.py --window-hours 168 --json