CI/CD Troubleshooting Guide¶
This guide helps debug GitHub Actions workflow failures in the dotfiles repository.
Quick Reference¶
| Workflow | Job | Common Issues | Fix |
|---|---|---|---|
| ci.yml | lint | ShellCheck errors in mise-tasks | Run mise run lint locally |
| ci.yml | docs-validate | Broken markdown links | Fix links, run lychee locally |
| ci.yml | docs-build | MkDocs build errors | Check Python deps, run mise run docs-build |
| ci.yml | build | Fast test failures | Run mise run test locally |
| ci.yml | test-server-deployment | Chezmoi apply fails | Test with chezmoi apply --dry-run |
| test-full.yml | test-full | Full test suite failures | Download artifact, run mise run test locally |
| docs.yml | deploy | Docs build/deploy fails | Check mkdocs.yml syntax |
| devcontainer-smoke.yml | test | Devcontainer setup fails | Check .devcontainer/setup.sh |
Understanding CI Workflows¶
Workflow: ci.yml (Main CI Pipeline)¶
Triggers: Push to main, pull requests Duration: ~2-3 minutes Philosophy: Fast essential tests only (see ADR-005)
Jobs:
- lint (~7s)
- Runs ShellCheck on
mise-tasks/*scripts - Common failures: SC2086, SC2034, SC2155 shellcheck errors
-
Fix: Run
shellcheck mise-tasks/<script>locally -
docs-validate (~50s)
- Validates markdown links with lychee
- Common failures: 404 links, network timeouts
-
Fix: Run
mise run docs-validateor check.lycheeignore -
docs-build (~25s)
- Builds MkDocs documentation site
- Uses Python 3.13 (pinned in mise.toml)
- Common failures: Missing pages, invalid mkdocs.yml syntax
-
Fix: Run
mise run docs-buildlocally -
build (~45s)
- Main test job with fast test suite
- Installs mise, runs linters, executes tests, runs doctor
- Tests run: static, smoke, basic, env, doctor (NOT full suite)
- Common failures: Cache misses, tool installation issues
-
Fix: Clear cache, run
mise install && mise run test -
test-server-deployment (~30s)
- Validates dotfiles-only mode (chezmoi without mise)
- Common failures: Template errors, missing dependencies
-
Fix: Test
chezmoi apply --dry-runin clean environment -
test-windows (~1m)
- Runs Pester tests for PowerShell provisioning
- Common failures: PowerShell version issues, UAC restrictions
- Fix: Run
Invoke-Pester tests/windows/
Workflow: docs.yml (Documentation Deployment)¶
Triggers: Push to main (docs changes), manual trigger Duration: ~1-2 minutes
Jobs:
- deploy
- Link validation + MkDocs gh-deploy
- Common failures: Same as docs-build + deploy permissions
- Fix: Check GitHub Pages settings, verify write permissions
Workflow: test-full.yml (Full Test Suite)¶
Triggers: Weekly (Monday 6:00 UTC), manual (workflow_dispatch) Duration: ~10-15 minutes Matrix: Ubuntu 22.04, 24.04
Jobs:
- test-full (per OS version)
- Installs mise, runs full BATS test suite (all 450+ tests)
- Uploads test result artifacts
- Common failures: Platform-specific tests, timing issues
- Fix: Check test-results artifact, run
mise run testlocally
Workflow: devcontainer-smoke.yml (Devcontainer Testing)¶
Triggers: Manual (workflow_dispatch) Duration: ~10-15 minutes Timeout: 45 minutes
Jobs:
- test
- Builds devcontainer, runs setup.sh
- Common failures: Timeout, mise installation issues
- Fix: Test locally with VS Code devcontainers
Common CI Failures & Solutions¶
1. ShellCheck Lint Failures¶
Symptom:
Cause: Shellcheck violations in mise task scripts
Fix:
Prevention: Run mise run lint before committing
2. Link Validation Failures¶
Symptom:
Cause: Dead links in markdown docs
Fix:
Exclusions: Add to .lycheeignore if link is intentionally external/example
3. MkDocs Build Failures¶
Symptom:
Cause: Invalid mkdocs.yml syntax or missing pages
Fix:
# Test build locally
mise run docs-build
# Validate YAML
yamllint mkdocs.yml
# Check for missing files referenced in nav
Common issues: - Typo in page path (case-sensitive) - Missing page in nav but file exists - Invalid YAML indentation
4. Fast Test Suite Failures¶
Symptom:
Cause: Core functionality broken (tests: static, smoke, basic, env, doctor)
Fix:
# Run full test suite locally
mise run test
# Run specific test file
bats tests/smoke.bats
# Debug with verbose output
bats --verbose-run tests/smoke.bats
Important: CI runs ONLY fast tests, not full 450+ test suite
5. Server Deployment Failures¶
Symptom:
Cause: Chezmoi template syntax error or missing data
Fix:
# Validate templates
chezmoi apply --dry-run --verbose
# Check template syntax
chezmoi execute-template < dot_zshrc.tmpl
Prevention: Pre-commit hook validates templates (if configured)
6. Windows Provisioning Failures¶
Symptom:
Cause: Windows-specific dependency issues
Fix: - Test on Windows machine or VM - Run: Invoke-Pester tests/windows/ - Check PowerShell version: $PSVersionTable.PSVersion
7. Cache Issues¶
Symptom:
Cause: Cache key changed or cache expired
Fix: - Normal: First run after dependency update - Issue: Check cache key in workflow (based on mise.toml hash) - Clear cache: Go to Actions → Caches → Delete specific cache
Cache locations: - ~/.local/share/mise (Linux/macOS) - ~/.cache/mise (build cache)
Local Reproduction¶
Run CI Jobs Locally¶
Most CI failures can be reproduced locally using mise tasks:
# Lint (ShellCheck)
mise run lint
# Documentation validation
mise run docs-validate
# Documentation build
mise run docs-build
# Fast test suite (same as CI)
bats tests/static.bats tests/smoke.bats tests/basic.bats tests/env.bats tests/doctor.bats
# Full test suite (local only, NOT in CI)
mise run test
# Health check
mise run doctor
Using act (GitHub Actions Locally)¶
Install act to run workflows locally:
# Run full CI workflow
act -j build
# Run specific job
act -j lint
# With secrets
act -j build --secret-file .secrets
Limitations: - Large Docker images - Some GitHub-specific features unavailable - Cache behavior differs
CI vs Local Differences¶
Environment Differences¶
| Aspect | CI (GitHub Actions) | Local |
|---|---|---|
| OS | Ubuntu 22.04 | Varies (WSL2, Arch, macOS) |
| Shell | bash (non-interactive) | zsh (interactive) |
| User | runner | Your username |
| HOME | /home/runner | /home/jer |
| PATH | Limited | Full user PATH |
| Test suite | Fast (5 files) | Full (450+ tests) |
Test Suite Differences¶
CI runs ONLY: - tests/static.bats - tests/smoke.bats - tests/basic.bats - tests/env.bats - tests/doctor.bats
Local runs ALL: - All 43 test files (450+ tests) - Platform-specific tests - Integration tests
Why? See TESTING.md and ADR-005: CI Pragmatism
Debugging CI Failures¶
Step 1: Read the Logs¶
- Go to GitHub Actions tab
- Click failed workflow run
- Click failed job
- Expand failing step
- Look for error messages (usually near bottom)
Step 2: Reproduce Locally¶
# Clone the exact commit
git checkout <commit-hash>
# Install dependencies
mise install
# Run the failing command
mise run <task>
Step 3: Check Environment¶
Compare CI environment to local: - OS version (uname -a) - Shell version ($SHELL --version) - Tool versions (mise ls) - Python version (python --version)
Step 4: Inspect Workflow¶
Read the workflow file to understand what CI is doing: - .github/workflows/ci.yml - .github/workflows/docs.yml - .github/workflows/devcontainer-smoke.yml
CI Performance¶
Expected Timings¶
| Job | Expected Time | Acceptable Range |
|---|---|---|
| lint | ~7s | 5-15s |
| docs-validate | ~50s | 30s-1m |
| docs-build | ~25s | 15-45s |
| build | ~45s | 30s-1m |
| test-server-deployment | ~30s | 20-50s |
| test-windows | ~1m | 45s-2m |
Total CI time: ~2-3 minutes
Performance Regressions¶
If CI times increase significantly: 1. Check for new dependencies (slow installs) 2. Verify cache is working (actions/cache logs) 3. Look for new tests added to fast suite 4. Check network issues (GitHub status page)
GitHub Actions Permissions¶
Required Permissions¶
ci.yml: - contents: read - Checkout code - pull-requests: write - Comment on PRs (step summary)
docs.yml: - contents: write - Push to gh-pages branch
devcontainer-smoke.yml: - contents: read - Checkout code
Secrets¶
Currently used: - GITHUB_TOKEN - Automatic, provided by GitHub - (No additional secrets required)
For age encryption (if needed): - Add AGE_SECRET_KEY to repository secrets - Reference in workflow: ${{ secrets.AGE_SECRET_KEY }}
Getting Help¶
CI is failing but works locally¶
- Check environment differences (above)
- Run exact CI command locally
- Check for timing/race conditions
- Review recent dependency updates (Renovate PRs)
CI passes but local fails¶
- Ensure dependencies are up to date:
mise install - Check for uncommitted changes:
git status - Verify tool versions match:
mise lsvs CI logs - Clear local caches:
rm -rf ~/.cache/mise
Persistent failures¶
- File an issue with:
- Link to failed run
- Error logs (paste or screenshot)
- Local reproduction steps (if possible)
-
Environment details (
mise ls,uname -a) -
Check ROADMAP.md for known issues
Related Documentation¶
- TESTING.md - Test strategy and philosophy
- CONTRIBUTING.md - Development workflow
- ADR-005 - CI pragmatism decision
- TROUBLESHOOTING.md - User-facing issues
- .github/workflows/ - Workflow definitions
Workflow Files Reference¶
ci.yml¶
- Path:
.github/workflows/ci.yml - Purpose: Main CI pipeline for PR validation
- Key jobs: lint, docs-validate, docs-build, build, test-server-deployment, test-windows
docs.yml¶
- Path:
.github/workflows/docs.yml - Purpose: Documentation deployment to GitHub Pages
- Triggers: Push to main (docs changes), manual
test-full.yml¶
- Path:
.github/workflows/test-full.yml - Purpose: Weekly comprehensive test run across Ubuntu versions
- Triggers: Weekly (Monday), manual
- Matrix: Ubuntu 22.04, 24.04
devcontainer-smoke.yml¶
- Path:
.github/workflows/devcontainer-smoke.yml - Purpose: Validate devcontainer setup
- Triggers: Manual only