Skip to content

CI/CD Troubleshooting Guide

This guide helps debug GitHub Actions workflow failures in the dotfiles repository.

Quick Reference

Workflow Job Common Issues Fix
ci.yml lint ShellCheck errors in mise-tasks Run mise run lint locally
ci.yml docs-validate Broken markdown links Fix links, run lychee locally
ci.yml docs-build MkDocs build errors Check Python deps, run mise run docs-build
ci.yml build Fast test failures Run mise run test locally
ci.yml test-server-deployment Chezmoi apply fails Test with chezmoi apply --dry-run
test-full.yml test-full Full test suite failures Download artifact, run mise run test locally
docs.yml deploy Docs build/deploy fails Check mkdocs.yml syntax
devcontainer-smoke.yml test Devcontainer setup fails Check .devcontainer/setup.sh

Understanding CI Workflows

Workflow: ci.yml (Main CI Pipeline)

Triggers: Push to main, pull requests Duration: ~2-3 minutes Philosophy: Fast essential tests only (see ADR-005)

Jobs:

  1. lint (~7s)
  2. Runs ShellCheck on mise-tasks/* scripts
  3. Common failures: SC2086, SC2034, SC2155 shellcheck errors
  4. Fix: Run shellcheck mise-tasks/<script> locally

  5. docs-validate (~50s)

  6. Validates markdown links with lychee
  7. Common failures: 404 links, network timeouts
  8. Fix: Run mise run docs-validate or check .lycheeignore

  9. docs-build (~25s)

  10. Builds MkDocs documentation site
  11. Uses Python 3.13 (pinned in mise.toml)
  12. Common failures: Missing pages, invalid mkdocs.yml syntax
  13. Fix: Run mise run docs-build locally

  14. build (~45s)

  15. Main test job with fast test suite
  16. Installs mise, runs linters, executes tests, runs doctor
  17. Tests run: static, smoke, basic, env, doctor (NOT full suite)
  18. Common failures: Cache misses, tool installation issues
  19. Fix: Clear cache, run mise install && mise run test

  20. test-server-deployment (~30s)

  21. Validates dotfiles-only mode (chezmoi without mise)
  22. Common failures: Template errors, missing dependencies
  23. Fix: Test chezmoi apply --dry-run in clean environment

  24. test-windows (~1m)

  25. Runs Pester tests for PowerShell provisioning
  26. Common failures: PowerShell version issues, UAC restrictions
  27. Fix: Run Invoke-Pester tests/windows/

Workflow: docs.yml (Documentation Deployment)

Triggers: Push to main (docs changes), manual trigger Duration: ~1-2 minutes

Jobs:

  1. deploy
  2. Link validation + MkDocs gh-deploy
  3. Common failures: Same as docs-build + deploy permissions
  4. Fix: Check GitHub Pages settings, verify write permissions

Workflow: test-full.yml (Full Test Suite)

Triggers: Weekly (Monday 6:00 UTC), manual (workflow_dispatch) Duration: ~10-15 minutes Matrix: Ubuntu 22.04, 24.04

Jobs:

  1. test-full (per OS version)
  2. Installs mise, runs full BATS test suite (all 450+ tests)
  3. Uploads test result artifacts
  4. Common failures: Platform-specific tests, timing issues
  5. Fix: Check test-results artifact, run mise run test locally

Workflow: devcontainer-smoke.yml (Devcontainer Testing)

Triggers: Manual (workflow_dispatch) Duration: ~10-15 minutes Timeout: 45 minutes

Jobs:

  1. test
  2. Builds devcontainer, runs setup.sh
  3. Common failures: Timeout, mise installation issues
  4. Fix: Test locally with VS Code devcontainers

Common CI Failures & Solutions

1. ShellCheck Lint Failures

Symptom:

Error: ShellCheck found issues in mise-tasks/doctor
SC2086: Double quote to prevent globbing

Cause: Shellcheck violations in mise task scripts

Fix:

# Run locally
shellcheck mise-tasks/*

# Fix specific issue
shellcheck mise-tasks/doctor

Prevention: Run mise run lint before committing


Symptom:

Error: Broken links found
✗ [404] https://example.com/page

Cause: Dead links in markdown docs

Fix:

# Run locally (requires lychee)
mise run docs-validate

# Or check specific file
lychee docs/*.md

Exclusions: Add to .lycheeignore if link is intentionally external/example


3. MkDocs Build Failures

Symptom:

Error: Config file 'mkdocs.yml' is invalid
Error: Page 'missing.md' not found

Cause: Invalid mkdocs.yml syntax or missing pages

Fix:

# Test build locally
mise run docs-build

# Validate YAML
yamllint mkdocs.yml

# Check for missing files referenced in nav

Common issues: - Typo in page path (case-sensitive) - Missing page in nav but file exists - Invalid YAML indentation


4. Fast Test Suite Failures

Symptom:

✗ test shell loads successfully

Cause: Core functionality broken (tests: static, smoke, basic, env, doctor)

Fix:

# Run full test suite locally
mise run test

# Run specific test file
bats tests/smoke.bats

# Debug with verbose output
bats --verbose-run tests/smoke.bats

Important: CI runs ONLY fast tests, not full 450+ test suite


5. Server Deployment Failures

Symptom:

Error: chezmoi apply failed
Template error: undefined variable

Cause: Chezmoi template syntax error or missing data

Fix:

# Validate templates
chezmoi apply --dry-run --verbose

# Check template syntax
chezmoi execute-template < dot_zshrc.tmpl

Prevention: Pre-commit hook validates templates (if configured)


6. Windows Provisioning Failures

Symptom:

Pester test failed: Chocolatey not installed

Cause: Windows-specific dependency issues

Fix: - Test on Windows machine or VM - Run: Invoke-Pester tests/windows/ - Check PowerShell version: $PSVersionTable.PSVersion


7. Cache Issues

Symptom:

Cache miss for key: mise-tools-...
Installing mise tools (slow)

Cause: Cache key changed or cache expired

Fix: - Normal: First run after dependency update - Issue: Check cache key in workflow (based on mise.toml hash) - Clear cache: Go to Actions → Caches → Delete specific cache

Cache locations: - ~/.local/share/mise (Linux/macOS) - ~/.cache/mise (build cache)


Local Reproduction

Run CI Jobs Locally

Most CI failures can be reproduced locally using mise tasks:

# Lint (ShellCheck)
mise run lint

# Documentation validation
mise run docs-validate

# Documentation build
mise run docs-build

# Fast test suite (same as CI)
bats tests/static.bats tests/smoke.bats tests/basic.bats tests/env.bats tests/doctor.bats

# Full test suite (local only, NOT in CI)
mise run test

# Health check
mise run doctor

Using act (GitHub Actions Locally)

Install act to run workflows locally:

# Run full CI workflow
act -j build

# Run specific job
act -j lint

# With secrets
act -j build --secret-file .secrets

Limitations: - Large Docker images - Some GitHub-specific features unavailable - Cache behavior differs


CI vs Local Differences

Environment Differences

Aspect CI (GitHub Actions) Local
OS Ubuntu 22.04 Varies (WSL2, Arch, macOS)
Shell bash (non-interactive) zsh (interactive)
User runner Your username
HOME /home/runner /home/jer
PATH Limited Full user PATH
Test suite Fast (5 files) Full (450+ tests)

Test Suite Differences

CI runs ONLY: - tests/static.bats - tests/smoke.bats - tests/basic.bats - tests/env.bats - tests/doctor.bats

Local runs ALL: - All 43 test files (450+ tests) - Platform-specific tests - Integration tests

Why? See TESTING.md and ADR-005: CI Pragmatism


Debugging CI Failures

Step 1: Read the Logs

  1. Go to GitHub Actions tab
  2. Click failed workflow run
  3. Click failed job
  4. Expand failing step
  5. Look for error messages (usually near bottom)

Step 2: Reproduce Locally

# Clone the exact commit
git checkout <commit-hash>

# Install dependencies
mise install

# Run the failing command
mise run <task>

Step 3: Check Environment

Compare CI environment to local: - OS version (uname -a) - Shell version ($SHELL --version) - Tool versions (mise ls) - Python version (python --version)

Step 4: Inspect Workflow

Read the workflow file to understand what CI is doing: - .github/workflows/ci.yml - .github/workflows/docs.yml - .github/workflows/devcontainer-smoke.yml


CI Performance

Expected Timings

Job Expected Time Acceptable Range
lint ~7s 5-15s
docs-validate ~50s 30s-1m
docs-build ~25s 15-45s
build ~45s 30s-1m
test-server-deployment ~30s 20-50s
test-windows ~1m 45s-2m

Total CI time: ~2-3 minutes

Performance Regressions

If CI times increase significantly: 1. Check for new dependencies (slow installs) 2. Verify cache is working (actions/cache logs) 3. Look for new tests added to fast suite 4. Check network issues (GitHub status page)


GitHub Actions Permissions

Required Permissions

ci.yml: - contents: read - Checkout code - pull-requests: write - Comment on PRs (step summary)

docs.yml: - contents: write - Push to gh-pages branch

devcontainer-smoke.yml: - contents: read - Checkout code

Secrets

Currently used: - GITHUB_TOKEN - Automatic, provided by GitHub - (No additional secrets required)

For age encryption (if needed): - Add AGE_SECRET_KEY to repository secrets - Reference in workflow: ${{ secrets.AGE_SECRET_KEY }}


Getting Help

CI is failing but works locally

  1. Check environment differences (above)
  2. Run exact CI command locally
  3. Check for timing/race conditions
  4. Review recent dependency updates (Renovate PRs)

CI passes but local fails

  1. Ensure dependencies are up to date: mise install
  2. Check for uncommitted changes: git status
  3. Verify tool versions match: mise ls vs CI logs
  4. Clear local caches: rm -rf ~/.cache/mise

Persistent failures

  1. File an issue with:
  2. Link to failed run
  3. Error logs (paste or screenshot)
  4. Local reproduction steps (if possible)
  5. Environment details (mise ls, uname -a)

  6. Check ROADMAP.md for known issues



Workflow Files Reference

ci.yml

  • Path: .github/workflows/ci.yml
  • Purpose: Main CI pipeline for PR validation
  • Key jobs: lint, docs-validate, docs-build, build, test-server-deployment, test-windows

docs.yml

  • Path: .github/workflows/docs.yml
  • Purpose: Documentation deployment to GitHub Pages
  • Triggers: Push to main (docs changes), manual

test-full.yml

  • Path: .github/workflows/test-full.yml
  • Purpose: Weekly comprehensive test run across Ubuntu versions
  • Triggers: Weekly (Monday), manual
  • Matrix: Ubuntu 22.04, 24.04

devcontainer-smoke.yml

  • Path: .github/workflows/devcontainer-smoke.yml
  • Purpose: Validate devcontainer setup
  • Triggers: Manual only