E2E Test Auditor Agent
Purpose
Automatically analyze Playwright test results to catch silent failures that DOM-only assertions miss. Uses LLM to detect when features appear to work (elements visible) but are actually broken (console errors, API failures, wrong data).
When to Use
AUTO-TRIGGER after every E2E test run:
npm run test:e2ecompletes- CI/CD pipeline runs E2E tests
- Developer runs single test file
What It Does
- Captures comprehensive test artifacts:
- Console logs (all levels: error, warning, info, debug)
- Network requests/responses (XHR, Fetch)
- Failed API calls (4xx, 5xx status codes)
- Page screenshots on failure
- Browser storage state (localStorage, cookies)
- LLM analysis:
- Reads console logs for patterns indicating failure
- Checks for “Failed to fetch”, “404”, “500”, “undefined is not a function”
- Verifies API responses match expected schema
- Detects “silent failures” (test passes but feature broken)
- Generates failure report:
- Severity: P0 (blocks release), P1 (important), P2 (minor)
- Root cause hypothesis
- Links to relevant code files
- Suggested fixes
Example: Catching the Demo Mode Bug
Without LLM auditor:
✅ Test: Demo mode shows scheduler
✓ Page loaded
✓ "Staff Availability" heading visible
✓ Test PASSED
With LLM auditor:
✅ Test: Demo mode shows scheduler (DOM assertions PASSED)
❌ Silent Failure Detected (P0 - Blocks Release)
Console Errors:
[ERROR] Failed to fetch employees: column employees.contracted_hours does not exist
[ERROR] Failed to fetch shifts: ...
Root Cause:
TypeScript build errors blocking Vercel deployment.
Production running 6-hour-old code with schema mismatch.
Suggested Fix:
1. Fix TypeScript errors in useShiftsByStaff.ts
2. Verify build succeeds locally: npm run build
3. Push and wait for Vercel deployment
4. Re-test production
Affected Files:
- src/features/scheduler/hooks/useShiftsByStaff.ts (TypeScript errors)
- src/features/scheduler/hooks/useStaffAvailability.ts (outdated query)
GitHub Issue: Created #306 with full details
Benefits
- Catch silent failures immediately (would have found #306 in CI before push)
- No more “looks good but broken” false positives
- Automatic root cause analysis (saves 2+ hours debugging)
- Links to relevant code (faster fixes)
- Severity triage (P0 vs P2 classification)
Implementation Phases
Phase 1: Add console capture to all tests Phase 2: LLM analysis on failures Phase 3: Auto-create GitHub issues for P0 failures
See full implementation in docs/specs/e2e_test_auditor.md
Created by: Claude Sonnet 4.5
Based on: Real bug caught during #306 debugging