Load Testing Guide

This guide covers load testing for CareConnect using k6.

[!IMPORTANT] CareConnect does not vendor the k6 executable in git. Install k6 locally and ensure it is available on your shell PATH before running any npm run test:load* command.

Overview

Load testing validates that the application performs well under expected and peak traffic conditions. We use k6 for load testing, which provides:

JavaScript-based test scripts
Realistic load simulation
Detailed performance metrics
CI/CD integration

Prerequisites

Install k6

macOS:

brew install k6

Linux:

sudo gpg -k
sudo gpg --no-default-keyring --keyring /usr/share/keyrings/k6-archive-keyring.gpg --keyserver hkp://keyserver.ubuntu.com:80 --recv-keys C5AD17C747E3415A3642D57D77C6C491D6AC1D69
echo "deb [signed-by=/usr/share/keyrings/k6-archive-keyring.gpg] https://dl.k6.io/deb stable main" | sudo tee /etc/apt/sources.list.d/k6.list
sudo apt-get update
sudo apt-get install k6

Windows:

choco install k6

Verification:

k6 version

Start the Application

npm run dev
# Or for production build:
npm run build && npm start

Test Types

1. Smoke Test

Purpose: Verify basic functionality with minimal load.

Characteristics:

1 virtual user (VU)
30 second duration
Tests basic connectivity

Run:

npm run test:load:smoke

Expected Results:

100% success rate
p95 latency < 1000ms
All health checks pass

2. Search API Load Test

Purpose: Realistic load testing with gradual ramp-up.

Characteristics:

Ramps from 10 to 50 VUs over 5 minutes
Sustained load at 50 VUs for 3 minutes
Tests keyword search, filters, location-based queries
Validates crisis detection

Run:

npm run test:load

Expected Results:

p95 latency < 800ms
p99 latency < 1500ms
Error rate < 5%
95% of checks pass

Thresholds:

{
  http_req_duration: ['p(95)<800', 'p(99)<1500'],
  http_req_failed: ['rate<0.05'],
  checks: ['rate>0.95'],
}

3. Sustained Load Test

Purpose: Long-running test to detect memory leaks and performance degradation.

Characteristics:

Constant 20 VUs
30 minute duration (adjustable)
Monitors for stability issues

Run:

npm run test:load:sustained

Expected Results:

Stable performance across duration
p95 latency < 1000ms
p99 latency < 2000ms
No performance degradation over time

Warning Signs:

⚠️ p99 latency increasing over time (memory leak)
⚠️ Error rate increasing (resource exhaustion)
⚠️ p99 > 3000ms (critical threshold)

4. Spike Test

Purpose: Test system resilience under sudden traffic spikes.

Characteristics:

Spike from 0 to 100 VUs in 10 seconds
Hold at 100 VUs for 1 minute
Drop back to 0

Run:

npm run test:load:spike

Expected Results:

p95 latency < 2000ms
p99 latency < 5000ms
Error rate < 15% (relaxed during spike)
Rate limiting activates (protecting system)
Circuit breaker may activate (expected)

Success Criteria:

System degrades gracefully (doesn't crash)
Rate limiting prevents cascading failures
System recovers after spike
Circuit breaker activates if needed

Understanding Results

Key Metrics

Response Time Percentiles:

p50 (median): 50% of requests complete faster than this
p95: 95% of requests complete faster than this (target SLA)
p99: 99% of requests complete faster than this (worst case)
max: Slowest request

Error Rates:

http_req_failed: Percentage of HTTP errors (non-2xx)
search_errors: Custom metric for search-specific errors

Custom Metrics:

search_duration: Average search operation time
rate_limit_hits: Number of rate limit activations
circuit_breaker_activations: Circuit breaker open events

Example Output

=== Search API Load Test Summary ===

Total Requests: 15234
Failed Requests: 2.3%

Response Times:
  p50: 245ms
  p95: 678ms
  p99: 1234ms
  max: 2456ms

Custom Metrics:
  Search Error Rate: 2.1%
  Avg Search Duration: 287ms

Interpreting Results

Good Performance:

✅ p95 < 800ms
✅ p99 < 1500ms
✅ Error rate < 5%
✅ Stable performance across test duration

Degraded Performance:

⚠️ p95 between 800-1200ms
⚠️ p99 between 1500-2500ms
⚠️ Error rate between 5-10%
Action: Investigate slow queries, optimize code

Critical Issues:

❌ p95 > 1200ms
❌ p99 > 3000ms
❌ Error rate > 10%
Action: Stop deployment, investigate immediately

Baseline Performance Metrics

Current Baseline (v17.5)

Established on: 2026-01-25

Environment: Local development server (not representative of production)

Test	VUs	Duration	p50	p95	p99	Error Rate
Smoke	1	30s	TBD	TBD	TBD	TBD
Search API	10-50	10m	TBD	TBD	TBD	TBD
Sustained	20	30m	TBD	TBD	TBD	TBD
Spike	0-100	1m	TBD	TBD	TBD	TBD

Note: Run tests and document actual baseline metrics before first production deployment.

Running Tests Against Different Environments

Local Development

npm run test:load:smoke

Staging/Preview

BASE_URL=https://preview.careconnect.ing npm run test:load:smoke

Production (Caution!)

# Only run smoke tests against production
# NEVER run sustained or spike tests against production
BASE_URL=https://careconnect.ing npm run test:load:smoke

CI/CD Integration

GitHub Actions (Optional)

Create .github/workflows/load-test.yml:

name: Load Test

on:
  workflow_dispatch: # Manual trigger only
  schedule:
    - cron: "0 2 * * 0" # Weekly on Sunday at 2 AM

jobs:
  load-test:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v6

      - name: Setup Node.js
        uses: actions/setup-node@v6
        with:
          node-version: "22"

      - name: Install k6
        run: |
          sudo gpg -k
          sudo gpg --no-default-keyring --keyring /usr/share/keyrings/k6-archive-keyring.gpg --keyserver hkp://keyserver.ubuntu.com:80 --recv-keys C5AD17C747E3415A3642D57D77C6C491D6AC1D69
          echo "deb [signed-by=/usr/share/keyrings/k6-archive-keyring.gpg] https://dl.k6.io/deb stable main" | sudo tee /etc/apt/sources.list.d/k6.list
          sudo apt-get update
          sudo apt-get install k6

      - name: Install dependencies
        run: npm ci

      - name: Build application
        run: npm run build

      - name: Start application
        run: npm start &

      - name: Wait for application
        run: sleep 10

      - name: Run smoke test
        run: npm run test:load:smoke

      - name: Run load test
        run: npm run test:load

      - name: Upload results
        uses: actions/upload-artifact@v7
        if: always()
        with:
          name: load-test-results
          path: "*.json"

Important: Mark as non-blocking initially. Only fail CI if performance degrades significantly.

Troubleshooting

High Latency

Symptoms: p95 > 1000ms, p99 > 2000ms

Possible Causes:

Database queries not optimized
Missing indexes
N+1 query problems
Network latency to Supabase

Actions:

Enable performance tracking: NEXT_PUBLIC_ENABLE_SEARCH_PERF_TRACKING=true
Check logs for slow operations
Review database query plans
Check /api/v1/health for DB latency

High Error Rate

Symptoms: Error rate > 5%

Possible Causes:

Rate limiting too strict
Database connection pool exhausted
Circuit breaker opening
Timeout issues

Actions:

Check /api/v1/health for circuit breaker state
Review rate limit configuration
Check database connection pool size
Increase timeout values if needed

Memory Leaks

Symptoms: p99 latency increasing over time in sustained test

Possible Causes:

In-memory cache not pruning
Event listeners not cleaned up
Database connections not released

Actions:

Monitor Node.js heap usage
Review in-memory caching logic (lib/performance/metrics.ts)
Check database connection cleanup
Use node --inspect and Chrome DevTools memory profiler

Circuit Breaker Activating

Symptoms: 503 errors, circuit_breaker_activations > 0

Status: This is expected behavior during spikes!

Actions:

Check /api/v1/health for current state
Verify fallback to JSON working
If persistent, investigate upstream Supabase issues
Adjust circuit breaker thresholds if needed:
CIRCUIT_BREAKER_FAILURE_THRESHOLD
CIRCUIT_BREAKER_TIMEOUT

Best Practices

DO:

✅ Run smoke tests before every deployment
✅ Run load tests weekly on staging
✅ Document baseline metrics
✅ Track performance trends over time
✅ Test realistic user scenarios
✅ Monitor during tests (health endpoint, logs)

DON'T:

❌ Run sustained/spike tests against production
❌ Run load tests during business hours (staging)
❌ Ignore gradual performance degradation
❌ Test without monitoring enabled
❌ Make changes without re-establishing baseline

Next Steps

Establish Baseline: Run all tests and document metrics in this file
Set Alerts: Configure monitoring to alert on performance degradation
Automate: Set up weekly load tests in CI/CD
Optimize: Use insights to guide performance improvements
Re-test: Verify optimizations with load tests

Load Testing Guide

Overview

Prerequisites

Install k6

Start the Application

Test Types

1. Smoke Test

2. Search API Load Test

3. Sustained Load Test

4. Spike Test

Understanding Results

Key Metrics

Example Output

Interpreting Results

Baseline Performance Metrics

Current Baseline (v17.5)

Running Tests Against Different Environments

Local Development

Staging/Preview

Production (Caution!)

CI/CD Integration

GitHub Actions (Optional)

Troubleshooting

High Latency

High Error Rate

Memory Leaks

Circuit Breaker Activating

Best Practices

DO:

DON'T:

Further Reading

Next Steps