Load Testing Guide
This guide covers load testing for CareConnect using k6.
[!IMPORTANT] CareConnect does not vendor the
k6executable in git. Installk6locally and ensure it is available on your shellPATHbefore running anynpm run test:load*command.
Overview
Load testing validates that the application performs well under expected and peak traffic conditions. We use k6 for load testing, which provides:
- JavaScript-based test scripts
- Realistic load simulation
- Detailed performance metrics
- CI/CD integration
Prerequisites
Install k6
macOS:
Linux:
sudo gpg -k
sudo gpg --no-default-keyring --keyring /usr/share/keyrings/k6-archive-keyring.gpg --keyserver hkp://keyserver.ubuntu.com:80 --recv-keys C5AD17C747E3415A3642D57D77C6C491D6AC1D69
echo "deb [signed-by=/usr/share/keyrings/k6-archive-keyring.gpg] https://dl.k6.io/deb stable main" | sudo tee /etc/apt/sources.list.d/k6.list
sudo apt-get update
sudo apt-get install k6
Windows:
Verification:
Start the Application
Test Types
1. Smoke Test
Purpose: Verify basic functionality with minimal load.
Characteristics:
- 1 virtual user (VU)
- 30 second duration
- Tests basic connectivity
Run:
Expected Results:
- 100% success rate
- p95 latency < 1000ms
- All health checks pass
2. Search API Load Test
Purpose: Realistic load testing with gradual ramp-up.
Characteristics:
- Ramps from 10 to 50 VUs over 5 minutes
- Sustained load at 50 VUs for 3 minutes
- Tests keyword search, filters, location-based queries
- Validates crisis detection
Run:
Expected Results:
- p95 latency < 800ms
- p99 latency < 1500ms
- Error rate < 5%
- 95% of checks pass
Thresholds:
{
http_req_duration: ['p(95)<800', 'p(99)<1500'],
http_req_failed: ['rate<0.05'],
checks: ['rate>0.95'],
}
3. Sustained Load Test
Purpose: Long-running test to detect memory leaks and performance degradation.
Characteristics:
- Constant 20 VUs
- 30 minute duration (adjustable)
- Monitors for stability issues
Run:
Expected Results:
- Stable performance across duration
- p95 latency < 1000ms
- p99 latency < 2000ms
- No performance degradation over time
Warning Signs:
- ⚠️ p99 latency increasing over time (memory leak)
- ⚠️ Error rate increasing (resource exhaustion)
- ⚠️ p99 > 3000ms (critical threshold)
4. Spike Test
Purpose: Test system resilience under sudden traffic spikes.
Characteristics:
- Spike from 0 to 100 VUs in 10 seconds
- Hold at 100 VUs for 1 minute
- Drop back to 0
Run:
Expected Results:
- p95 latency < 2000ms
- p99 latency < 5000ms
- Error rate < 15% (relaxed during spike)
- Rate limiting activates (protecting system)
- Circuit breaker may activate (expected)
Success Criteria:
- System degrades gracefully (doesn't crash)
- Rate limiting prevents cascading failures
- System recovers after spike
- Circuit breaker activates if needed
Understanding Results
Key Metrics
Response Time Percentiles:
- p50 (median): 50% of requests complete faster than this
- p95: 95% of requests complete faster than this (target SLA)
- p99: 99% of requests complete faster than this (worst case)
- max: Slowest request
Error Rates:
- http_req_failed: Percentage of HTTP errors (non-2xx)
- search_errors: Custom metric for search-specific errors
Custom Metrics:
- search_duration: Average search operation time
- rate_limit_hits: Number of rate limit activations
- circuit_breaker_activations: Circuit breaker open events
Example Output
=== Search API Load Test Summary ===
Total Requests: 15234
Failed Requests: 2.3%
Response Times:
p50: 245ms
p95: 678ms
p99: 1234ms
max: 2456ms
Custom Metrics:
Search Error Rate: 2.1%
Avg Search Duration: 287ms
Interpreting Results
Good Performance:
- ✅ p95 < 800ms
- ✅ p99 < 1500ms
- ✅ Error rate < 5%
- ✅ Stable performance across test duration
Degraded Performance:
- ⚠️ p95 between 800-1200ms
- ⚠️ p99 between 1500-2500ms
- ⚠️ Error rate between 5-10%
- Action: Investigate slow queries, optimize code
Critical Issues:
- ❌ p95 > 1200ms
- ❌ p99 > 3000ms
- ❌ Error rate > 10%
- Action: Stop deployment, investigate immediately
Baseline Performance Metrics
Current Baseline (v17.5)
Established on: 2026-01-25
Environment: Local development server (not representative of production)
| Test | VUs | Duration | p50 | p95 | p99 | Error Rate |
|---|---|---|---|---|---|---|
| Smoke | 1 | 30s | TBD | TBD | TBD | TBD |
| Search API | 10-50 | 10m | TBD | TBD | TBD | TBD |
| Sustained | 20 | 30m | TBD | TBD | TBD | TBD |
| Spike | 0-100 | 1m | TBD | TBD | TBD | TBD |
Note: Run tests and document actual baseline metrics before first production deployment.
Running Tests Against Different Environments
Local Development
Staging/Preview
Production (Caution!)
# Only run smoke tests against production
# NEVER run sustained or spike tests against production
BASE_URL=https://careconnect.ing npm run test:load:smoke
CI/CD Integration
GitHub Actions (Optional)
Create .github/workflows/load-test.yml:
name: Load Test
on:
workflow_dispatch: # Manual trigger only
schedule:
- cron: "0 2 * * 0" # Weekly on Sunday at 2 AM
jobs:
load-test:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v6
- name: Setup Node.js
uses: actions/setup-node@v6
with:
node-version: "22"
- name: Install k6
run: |
sudo gpg -k
sudo gpg --no-default-keyring --keyring /usr/share/keyrings/k6-archive-keyring.gpg --keyserver hkp://keyserver.ubuntu.com:80 --recv-keys C5AD17C747E3415A3642D57D77C6C491D6AC1D69
echo "deb [signed-by=/usr/share/keyrings/k6-archive-keyring.gpg] https://dl.k6.io/deb stable main" | sudo tee /etc/apt/sources.list.d/k6.list
sudo apt-get update
sudo apt-get install k6
- name: Install dependencies
run: npm ci
- name: Build application
run: npm run build
- name: Start application
run: npm start &
- name: Wait for application
run: sleep 10
- name: Run smoke test
run: npm run test:load:smoke
- name: Run load test
run: npm run test:load
- name: Upload results
uses: actions/upload-artifact@v7
if: always()
with:
name: load-test-results
path: "*.json"
Important: Mark as non-blocking initially. Only fail CI if performance degrades significantly.
Troubleshooting
High Latency
Symptoms: p95 > 1000ms, p99 > 2000ms
Possible Causes:
- Database queries not optimized
- Missing indexes
- N+1 query problems
- Network latency to Supabase
Actions:
- Enable performance tracking:
NEXT_PUBLIC_ENABLE_SEARCH_PERF_TRACKING=true - Check logs for slow operations
- Review database query plans
- Check
/api/v1/healthfor DB latency
High Error Rate
Symptoms: Error rate > 5%
Possible Causes:
- Rate limiting too strict
- Database connection pool exhausted
- Circuit breaker opening
- Timeout issues
Actions:
- Check
/api/v1/healthfor circuit breaker state - Review rate limit configuration
- Check database connection pool size
- Increase timeout values if needed
Memory Leaks
Symptoms: p99 latency increasing over time in sustained test
Possible Causes:
- In-memory cache not pruning
- Event listeners not cleaned up
- Database connections not released
Actions:
- Monitor Node.js heap usage
- Review in-memory caching logic (
lib/performance/metrics.ts) - Check database connection cleanup
- Use
node --inspectand Chrome DevTools memory profiler
Circuit Breaker Activating
Symptoms: 503 errors, circuit_breaker_activations > 0
Status: This is expected behavior during spikes!
Actions:
- Check
/api/v1/healthfor current state - Verify fallback to JSON working
- If persistent, investigate upstream Supabase issues
- Adjust circuit breaker thresholds if needed:
CIRCUIT_BREAKER_FAILURE_THRESHOLDCIRCUIT_BREAKER_TIMEOUT
Best Practices
DO:
- ✅ Run smoke tests before every deployment
- ✅ Run load tests weekly on staging
- ✅ Document baseline metrics
- ✅ Track performance trends over time
- ✅ Test realistic user scenarios
- ✅ Monitor during tests (health endpoint, logs)
DON'T:
- ❌ Run sustained/spike tests against production
- ❌ Run load tests during business hours (staging)
- ❌ Ignore gradual performance degradation
- ❌ Test without monitoring enabled
- ❌ Make changes without re-establishing baseline
Further Reading
- k6 Documentation
- k6 Documentation
- Performance Testing Best Practices
- CareConnect:
docs/adr/014-database-index-optimization.md - CareConnect:
docs/adr/013-unified-rls-policy.md
Next Steps
- Establish Baseline: Run all tests and document metrics in this file
- Set Alerts: Configure monitoring to alert on performance degradation
- Automate: Set up weekly load tests in CI/CD
- Optimize: Use insights to guide performance improvements
- Re-test: Verify optimizations with load tests