Comprehensive verification using parallel test agents for unit tests, integration tests, E2E validation, security scanning, and type checking. Runs coverage analysis, detects regressions, and validates against project conventions. Reports pass/fail with detailed findings and coverage deltas. Use when verifying implementations, validating changes after /ork:implement, or running pre-merge quality gates.
SCOPE = "$ARGUMENTS" # Full argument string, e.g., "authentication flow"SCOPE_TOKEN = "$ARGUMENTS[0]" # First token for flag detection (e.g., "--scope=backend")# $ARGUMENTS[0], $ARGUMENTS[1] etc. for indexed access (CC 2.1.59)# Model override detection (CC 2.1.72)MODEL_OVERRIDE = Nonefor token in "$ARGUMENTS".split(): if token.startswith("--model="): MODEL_OVERRIDE = token.split("=", 1)[1] # "opus", "sonnet", "haiku" SCOPE = SCOPE.replace(token, "").strip()
Pass MODEL_OVERRIDE to all Agent() calls via model=MODEL_OVERRIDE when set. Accepts symbolic names (opus, sonnet, haiku) or full IDs (claude-opus-4-6) per CC 2.1.74.
Opus 4.7: Agents use native adaptive thinking (no MCP sequential-thinking needed). Extended 128K output supports comprehensive verification reports.
Load details: Read("$\{CLAUDE_SKILL_DIR\}/references/orchestration-mode.md") for env var check logic, Agent Teams vs Task Tool comparison, and mode selection rules.
Choose Agent Teams (mesh -- verifiers share findings) or Task tool (star -- all report to lead) based on the orchestration mode reference.
# Guard: Skip cron in headless/CI (CLAUDE_CODE_DISABLE_CRON)# if env CLAUDE_CODE_DISABLE_CRON is set, run a single check insteadCronCreate( schedule="0 8 * * *", prompt="Daily regression check: npm test. If 7 consecutive passes → CronDelete. If failures → alert with details.")
Load details: Read("$\{CLAUDE_SKILL_DIR\}/references/verification-phases.md") for complete phase details, agent spawn definitions, Agent Teams alternative, and team teardown.
Output each agent's score as soon as it completes — don't wait for all 6-7 agents.
Focus mode (CC 2.1.101): In focus mode, include the full composite score, all dimension scores, and the verdict in your final message — the user didn't see the incremental outputs.
Security: 8.2/10 — No critical vulnerabilities foundCode Quality: 7.5/10 — 3 complexity hotspots identified[...remaining agents still running...]
This gives users real-time visibility into multi-agent verification. If any dimension scores below the security_minimum threshold (default 5.0), flag it as a blocker immediately — the user can terminate early without waiting for remaining agents.
Use Monitor for streaming test execution output from background scripts:
# Stream test output in real-time instead of waiting for completionBash(command="npm test 2>&1", run_in_background=true)Monitor(pid=test_task_id) # Each line → notification
Full pattern reference (when to use vs. TaskOutput, until-condition gates, anti-patterns): Read("/Users/yonatangross/coding/yonatangross/orchestkit/plugins/ork/skills/chain-patterns/references/monitor-patterns.md").
Partial results (CC 2.1.98): If a verification agent fails mid-analysis, synthesize partial scores rather than re-spawning:
for agent_result in verification_results: if "[PARTIAL RESULT]" in agent_result.output: # Extract whatever scores the agent produced before crashing partial_score = parse_score(agent_result.output) # May be incomplete scores[agent_result.dimension] = { "score": partial_score, "partial": True, "note": "Agent crashed — score based on partial analysis" } # A 4-dimension score is better than no score. Do NOT re-spawn.
Load details: Read("$\{CLAUDE_SKILL_DIR\}/references/visual-capture.md") for auto-detection, route discovery, screenshot capture, and AI vision evaluation.
Summary: Auto-detects project framework, starts dev server, discovers routes, uses agent-browser to screenshot each route, evaluates with Claude vision, generates self-contained gallery.html with base64-embedded images.
Output: verification-output/\{timestamp\}/gallery.html — open in browser to see all screenshots with AI evaluations, scores, and annotation diffs.
Graceful degradation: If no frontend detected or server won't start, skips visual capture with a warning — never blocks verification.
Load details: Read("$\{CLAUDE_SKILL_DIR\}/references/visual-capture.md") (Phase 8.5 section) for agentation loop workflow.
Trigger: Only when agentation MCP is configured. Offers user the choice to annotate the live UI. ui-feedback agent processes annotations, re-screenshots show before/after.
Load details: Read("$\{CLAUDE_SKILL_DIR\}/rules/evidence-collection.md") for git commands, test execution patterns, metrics tracking, and post-verification feedback.
Push notifications (CC 2.1.110+): Verify runs for >5 min are common on complex changes. When the final verdict is ready, call PushNotification to alert the user — they likely walked away from the terminal. Requires Remote Control with "Push when Claude decides" config; fails silently for users without it.
Load Read("$\{CLAUDE_PLUGIN_ROOT\}/skills/shared/rules/verification-gate.md") — the minimum 5-step gate that applies to ALL completion claims across all skills. This is non-negotiable: NO COMPLETION CLAIMS WITHOUT FRESH VERIFICATION EVIDENCE.
Load Read("$\{CLAUDE_PLUGIN_ROOT\}/skills/shared/rules/anti-sycophancy.md") — all verification agents report findings directly without performative agreement. "Should be fine" is not evidence. "Tests pass (exit 0, 47/47)" is.
All verification agents MUST report using the standardized protocol: Read("$\{CLAUDE_PLUGIN_ROOT\}/agents/shared/status-protocol.md"). Never report DONE if concerns exist. Never silently produce work you're unsure about.
When a security agent finds a critical issue, share it with other verification agents:
SendMessage(to="test-generator", message="Security: SQL injection in user_service.py:88 — add parameterized query test")SendMessage(to="code-quality-reviewer", message="Security finding at user_service.py:88 — flag in review")
Session recovery (CC 2.1.108+): After idle periods or interruptions, use /recap to restore conversational context alongside checkpoint-resume state. Enabled by default since CC 2.1.110 (even with telemetry disabled).
git diff main --statgit log main..HEAD --onelinegit diff main --name-only | sort -u
Incorrect:
# Sequential — wastes time, no coverage datacd backend && pytest tests/cd frontend && npm test
Correct:
# Parallel with coverage — run both in ONE messagecd backend && poetry run pytest tests/ -v --cov=app --cov-report=jsoncd frontend && npm run test -- --coverage
Run in parallel with Phase 2 agents. Auto-detects frontend framework and captures screenshots.
Incorrect:
# Manual screenshots with no structureopen http://localhost:3000# Take manual screenshot...
Correct:
# Automated visual capture with AI evaluationAgent( subagent_type="general-purpose", prompt="Visual capture: detect framework, start server, screenshot routes via agent-browser, evaluate with Claude vision, generate gallery.html", run_in_background=True)
Output structure:
verification-output/{timestamp}/├── screenshots/ (PNGs per route, base64 in gallery)├── ai-evaluations/ (JSON per screenshot with score + issues)├── annotations/ (before/after if agentation used)│ ├── before/│ └── after/└── gallery.html (self-contained, open in browser)
Verification can be blocked by policy-as-code rules. See Policy-as-Code for configuration of composite minimums, dimension minimums, and blocking rules.
# Agent Teams is GA since CC 2.1.33import osforce_task_tool = os.environ.get("ORCHESTKIT_FORCE_TASK_TOOL") == "1"if force_task_tool: mode = "task_tool"else: # Teams available by default — use for full multi-dimensional work mode = "agent_teams" if scope == "full" else "task_tool"
If Agent Teams encounters issues mid-execution, fall back to Task tool for remaining work. This is safe because both modes produce the same output format (dimensional scores 0-10).
For full codebase work (>20 files), use the 1M context window to avoid agent context exhaustion. On 200K context, scope discovery should limit files to prevent overflow.
Launch ALL agents in ONE message with run_in_background=True and max_turns=25. Pass model=MODEL_OVERRIDE when user specifies --model=opus (CC 2.1.72).
Agent
Focus
Output
code-quality-reviewer
Lint, types, patterns
Quality 0-10
security-auditor
OWASP, secrets, CVEs
Security 0-10
test-generator
Coverage, test quality
Coverage 0-10
backend-system-architect
API design, async
API 0-10
frontend-ui-developer
React 19, Zod, a11y
UI 0-10
python-performance-engineer
Latency, resources, scaling
Performance 0-10
Use python-performance-engineer for backend-focused verification or frontend-performance-engineer for frontend-focused verification. See Quality Model for Performance (0.11) and Scalability (0.09) weights.
In Agent Teams mode, form a verification team where agents share findings and coordinate scoring:
TeamCreate(team_name="verify-{feature}", description="Verify {feature}")Agent(subagent_type="code-quality-reviewer", name="quality-verifier", team_name="verify-{feature}", model=MODEL_OVERRIDE, prompt="""# Cache-optimized: stable content first (CC 2.1.72) Verify code quality. Score 0-10. When you find patterns that affect security, message security-verifier. When you find untested code paths, message test-verifier. Share your quality score with all teammates for composite calculation. Feature: {feature}.""")Agent(subagent_type="security-auditor", name="security-verifier", team_name="verify-{feature}", model=MODEL_OVERRIDE, prompt="""# Cache-optimized: stable content first (CC 2.1.72) Security verification. Score 0-10. When quality-verifier flags security-relevant patterns, investigate deeper. When you find vulnerabilities in API endpoints, message api-verifier. Share severity findings with test-verifier for test gap analysis. Feature: {feature}.""")Agent(subagent_type="test-generator", name="test-verifier", team_name="verify-{feature}", model=MODEL_OVERRIDE, prompt="""# Cache-optimized: stable content first (CC 2.1.72) Verify test coverage. Score 0-10. When quality-verifier or security-verifier flag untested paths, quantify the gap. Run existing tests and report coverage metrics. Message the lead with coverage data for composite scoring. Feature: {feature}.""")Agent(subagent_type="backend-system-architect", name="api-verifier", team_name="verify-{feature}", model=MODEL_OVERRIDE, prompt="""# Cache-optimized: stable content first (CC 2.1.72) Verify API design and backend patterns. Score 0-10. When security-verifier flags endpoint issues, validate and score. Share API compliance findings with ui-verifier for consistency check. Feature: {feature}.""")Agent(subagent_type="frontend-ui-developer", name="ui-verifier", team_name="verify-{feature}", model=MODEL_OVERRIDE, prompt="""# Cache-optimized: stable content first (CC 2.1.72) Verify frontend implementation. Score 0-10. When api-verifier shares API patterns, verify frontend matches. Check React 19 patterns, accessibility, and loading states. Share findings with quality-verifier for overall assessment. Feature: {feature}.""")# Conditional 6th agent — use python-performance-engineer for backend,# frontend-performance-engineer for frontendAgent(subagent_type="python-performance-engineer", name="perf-verifier", team_name="verify-{feature}", model=MODEL_OVERRIDE, prompt="""# Cache-optimized: stable content first (CC 2.1.72) Verify performance and scalability. Score 0-10. Assess latency, resource usage, caching, and scaling patterns. When security-verifier flags resource-intensive endpoints, profile them. Share performance findings with api-verifier and quality-verifier. Feature: {feature}.""")
Runs as a 7th parallel agent alongside the 6 verification agents. See Visual Capture for full details.
# Launch IN THE SAME MESSAGE as Phase 2 agentsAgent( subagent_type="general-purpose", description="Visual capture and AI evaluation", prompt="""Visual verification capture for: {feature} 1. Detect project type from package.json 2. Start dev server (auto-detect framework) 3. Discover routes (framework-aware scan) 4. Use agent-browser to screenshot each route (max 20) 5. Read each screenshot PNG for AI vision evaluation 6. Score layout, accessibility, content completeness (0-10 per route) 7. Read gallery template from ${CLAUDE_SKILL_DIR}/assets/gallery-template.html 8. Generate gallery.html with base64-embedded screenshots 9. Write to verification-output/{timestamp}/gallery.html 10. Kill dev server If no frontend detected, write skip notice and exit. If server fails to start, write warning and exit. Never block — graceful degradation only.""", run_in_background=True, max_turns=30)
Output: verification-output/\{timestamp\}/ folder with screenshots, AI evaluations (JSON), and gallery.html.
Bash( command=f"{start_command} &", description="Start dev server for visual capture", run_in_background=True)
Wait for server readiness:
Bash(command=f"for i in $(seq 1 30); do curl -s http://localhost:{port} > /dev/null && exit 0; sleep 1; done; exit 1", description="Wait for dev server to be ready (max 30s)")
If server fails to start: Skip visual capture with a warning in the report. Do NOT block verification.
Then evaluate using this prompt template (include it in the visual capture agent's instructions):
Evaluate this screenshot of route "{route_path}" against these 6 criteria.For EACH criterion, provide a severity (ok/warning/error) and specific observation.Do NOT use generic "looks good" — cite what you actually see.1. LAYOUT: Overflow, alignment, spacing, responsive grid. Check: content cut off? Overlapping elements? Scroll needed?2. NAVIGATION: Is nav present and functional? Sidebar, breadcrumbs, TOC visible? Active state correct?3. CONTENT: Text readable? Headings hierarchical? Data populated (not placeholder/loading)? Counts/numbers accurate?4. ACCESSIBILITY: Contrast sufficient? Focus indicators visible? Text size adequate? Color-only information?5. INTERACTIVITY: Buttons/links styled consistently? Hover/focus states? Forms labeled? CTAs discoverable?6. BRANDING: Consistent with site theme? Dark/light mode correct? Typography matches design system?Output as JSON array — exactly 6 items, one per criterion:[{"severity": "ok|warning|error", "message": "CRITERION: specific observation with evidence"}]Score 0-10 based on: 0 errors=9+, 1-2 warnings=7-8, errors=5-6, multiple errors=<5.
Per-route evaluation output (6+ items, never a single line):
{ "route": "/dashboard", "score": 7.5, "evaluation": [ {"severity": "ok", "message": "LAYOUT: Content within viewport, no horizontal overflow, grid columns align properly"}, {"severity": "ok", "message": "NAVIGATION: Sidebar present with 8 sections, 'Dashboard' correctly highlighted as active"}, {"severity": "warning", "message": "CONTENT: Stats show '79 skills' but should be '89 skills' — stale count detected"}, {"severity": "ok", "message": "ACCESSIBILITY: Body text ~16px on dark bg (#e6edf3 on #0d1117), contrast ratio ~13:1, passes WCAG AAA"}, {"severity": "warning", "message": "INTERACTIVITY: Code block copy buttons present but no visible hover state change"}, {"severity": "ok", "message": "BRANDING: Dark theme consistent, green accent (#3fb950) used for active states, monospace for code"} ]}
After evaluating all routes, synthesize a summary object for the gallery:
# Build summary from all per-route evaluationssummary = { "total_routes": len(routes), "avg_score": round(sum(r.score for r in routes) / len(routes), 1), "pass_count": len([r for r in routes if r.score >= 7]), "warn_count": len([r for r in routes if 5 <= r.score < 7]), "fail_count": len([r for r in routes if r.score < 5]), "common_issues": [ # Issues appearing on 2+ routes {"count": 3, "message": "Stale skill count (79 instead of 89) on 3/5 pages"}, {"count": 2, "message": "Code block copy buttons lack hover state feedback"} ], "strengths": [ # Positive patterns across routes "Consistent dark theme and typography across all pages", "Sidebar navigation present and correctly highlights active page" ]}
Include this summary in GALLERY_JSON alongside routes.
Trigger: Only when agentation MCP is configured in .mcp.json.
# Check if agentation is availableToolSearch(query="select:mcp__agentation__agentation_get_all_pending")
If available, offer the user:
AskUserQuestion(questions=[{ "question": "Agentation is configured. Want to annotate the UI before finalizing?", "header": "Visual Feedback Loop", "options": [ {"label": "Yes, let me annotate", "description": "I'll mark issues on the live UI, then ui-feedback agent fixes them"}, {"label": "Skip", "description": "Finalize gallery with current screenshots"} ]}])
If yes:
# 1. Watch for annotationsmcp__agentation__agentation_get_all_pending()# 2. For each annotation:mcp__agentation__agentation_acknowledge(annotationId=id)# 3. Dispatch ui-feedback agentAgent(subagent_type="ork:ui-feedback", prompt="Process agentation annotation: {annotation}. Fix the issue, then resolve.", run_in_background=True)# 4. After fixes, re-screenshot affected routes# 5. Save before/after pairs# 6. Update gallery with annotation diffs