Ci Sentinel
Hourly autonomous classifier for failing PRs across your repos. Runs /ci-debug headless against every open PR with red required checks, posts the verdict as a collapsed PR comment, and appends to a per-repo .sentinel/ledger.jsonl. v1 is propose-don't-apply — NEVER auto-pushes a fix. Use when you're tired of /status sweeps catching the same 10 CI failure patterns over and over.
/ork:ci-sentinel/ork:ci-sentinel — Hourly autonomous CI classifier
Direct response to the 275-session insights audit (2026-05-16): 14 ci-debugging + 7 fix-ci-failures sessions in one month, most of them re-running the same 10-pattern classification you already encoded in /ci-debug. This skill makes the classifier autonomous.
What it does
⏰ hourly cron (:17)
│
▼
📥 gh pr list → PRs with FAILURE checks (yours, max 10)
│
▼
🤖 for each PR (skipping those already commented at this SHA):
claude -p → run /ci-debug → capture verdict markdown
│
▼
💬 post collapsed PR comment with marker so future runs dedupe
│
▼
📜 append { ts, pr, sha, tokens } to .sentinel/ledger.jsonl
│
▼
💰 if daily token spend > ORK_SENTINEL_DAILY_TOKEN_BUDGET → pauseWhat it does NOT do (v1)
- NEVER pushes a fix. Even for a 100%-confidence lockfile-drift match, v1 only proposes in a PR comment. Auto-push is a v2 question, gated on a quarter of false-positive-free operation.
- Does not page. Novel failures get a
🆕flag in the comment; you find them on your normal status sweep, not via a notification storm. - Does not analyze closed/merged PRs.
- Does not roam outside the repo it's installed in. This is per-repo by design. Org-wide sweep is a different shape — that's what
/statusis for.
Why it's safe to run hourly
| Risk | Mitigation |
|---|---|
| Token cost runaway | ORK_SENTINEL_DAILY_TOKEN_BUDGET=500000 ceiling, enforced by the workflow's first step. Resets daily. |
| Duplicate comments on the same SHA | Marker <!-- ork:ci-sentinel sha=<short> --> on every comment; workflow scans existing comments before posting. |
| Wrong-classification spam | Propose-don't-apply means the worst outcome is a noisy but accurate-looking comment. You can collapse them; you can't unmerge a bad auto-fix. |
| Stuck PR keeps re-classifying | Idempotent on SHA — only re-runs if you push new commits. |
| Sentinel itself breaking CI | Runs on ubuntu-latest, no pull_request trigger, no push trigger. Cannot block any other workflow. |
Install on a new repo
- Copy
.github/workflows/ci-sentinel.ymlfrom the OrchestKit repo into the target repo (this skill ships it). - Add the
ANTHROPIC_API_KEYsecret to the repo (Settings → Secrets and variables → Actions). - (Optional) Adjust
ORK_SENTINEL_DAILY_TOKEN_BUDGETenv in the workflow. - Trigger a manual run with
inputs.dry_run = trueto validate the wiring. - Once a dry-run posts no comments and looks healthy in the job summary, let the hourly cron take over.
Running locally as a background session
If you run the sentinel locally via claude --bg instead of the workflow:
Pin it (CC 2.1.147+): Press
Ctrl+Tinclaude agentsto pin the session. Pinned background sessions stay alive when idle (no silent reaping between hourly runs), restart in place to apply CC updates rather than dying, and under memory pressure are shed only after non-pinned sessions.
Resume it (CC 2.1.144+): Sessions started via
claude --bgnow appear in/resumemarkedbg— recover a crashed sentinel directly through/resumeinstead of the agent view.
Configuration
The workflow is intentionally configured via in-file env vars (not workflow inputs) so a fork stays self-contained:
| Var | Default | Meaning |
|---|---|---|
ORK_SENTINEL_DAILY_TOKEN_BUDGET | 1000000 | Hard daily ceiling. Hour-of-day not enforced; calendar day in UTC. Bumped from 500k after dropping --bare (see "Why no --bare" below). |
ORK_SENTINEL_PER_PR_TIMEOUT_S | 300 | Per-PR wall-clock cap on the claude -p invocation. |
max_prs (workflow_dispatch input) | 10 | Cap on PRs analyzed in one sweep. |
dry_run (workflow_dispatch input) | false | Skip comment posting (for spec validation). |
Why no --bare (2026-05-18 finding)
Originally designed around claude -p --bare (CC 2.1.81+) for minimal plugin/hook load and predictable ~4k tokens/PR. First real dispatch revealed --bare doesn't honor ANTHROPIC_API_KEY env var, --settings.apiKey, or --settings.apiKeyHelper — every call returns "Not logged in · Please run /login". Reproduced locally against multiple settings shapes.
Dropped --bare; cost per PR rises ~4k → ~10k tokens (plugins + hooks load), partially offset by --no-session-persistence (avoids disk writes). Daily budget bumped 500k → 1M to absorb the change while keeping monthly cost in the $10-12 range per repo.
If/when CC fixes --bare auth, the workflow can revert to bare mode by changing one line.
Dispatch envelope (CC 2.1.142+ flags — M146-6 / #1849)
Each claude -p invocation locks the dispatch envelope so cost-per-PR stays predictable regardless of what the runner inherits:
| Flag | Value | Why |
|---|---|---|
--permission-mode | dontAsk | /ci-debug is read-only by design (proposes, never applies). dontAsk silently refuses destructive ops — exactly what we want from an autonomous classifier. Never use bypassPermissions here. |
--max-turns | 4 | Cap on the conversation length. Sweep, classify, report — done. |
--output-format | json | Ledger needs usage.total_tokens for the budget circuit-breaker. |
--no-session-persistence | (flag) | Don't write session state to disk; sentinel runs are ephemeral. |
These are hardcoded in the workflow. If you need to override for a fork (e.g. you want a different permission-mode), edit .github/workflows/ci-sentinel.yml directly — intentionally not exposed as workflow_dispatch inputs to prevent accidental cost spikes from a one-off manual run.
Comment shape
Every verdict comment looks like:
<!-- ork:ci-sentinel sha=abc123def -->
<details><summary>🛰️ <b>CI Sentinel verdict</b> (sha abc123def)</summary>
## CI Debug: <repo> · #<n>
**Failing job:** ...
**Classification:** Pattern #N — <name>
**Reference:** memory <file.md>
**Proposed fix:** ...
</details>Collapsed by default — no inbox noise unless you click. Always carries the SHA so you know whether the verdict is still current.
Mutation journal (deferred to v1.1)
The insights audit's horizon-#1 design called for a CI_PLAYBOOK.md the sentinel mutates after each human-driven novel-failure fix. v1 doesn't write to the playbook — it just appends classification rows to .sentinel/ledger.jsonl. The playbook lives in /ci-debug's SKILL.md and stays human-curated for v1.
When v1.1 lands the journal:
- After a fix-PR merges, the sentinel diff-checks the PR title against existing patterns.
- If novel, it opens a follow-up issue suggesting the pattern be added to
/ci-debugSKILL.md. - Human approves the pattern in a PR; sentinel picks it up next sweep.
Cost model (back-of-envelope)
Per-PR analysis: ~8-12k tokens (full CC load — plugins + hooks — since --bare was dropped, see "Why no --bare" above).
Hourly sweep with avg 3 failing PRs: ~30k tokens/hour = ~720k/day.
Daily budget 1M tokens = safe headroom for spike days.
At ~$15/MTok (Sonnet input/output blended), that's roughly $10-12/month per repo before any auto-fix saves a session. Compared to 21 manual ci-debug sessions/month — each costing 10-20 minutes of your time — payback is immediate (5+ hours reclaimed for ~$12).
Related Skills
- Upstream —
/ci-debugdoes the classification (sentinel is the trigger). - Composes with —
/ork:swarm-migrate(after sentinel diagnoses, you might fix-and-swarm across repos),/status(sentinel doesn't replace org-wide sweeps). - Anti-pattern —
/loop /ci-debugin your own session is the manual analog; ship the sentinel and unplug from that.
When to invoke this skill (not the cron)
/ork:ci-sentinel install— copy the workflow into a new repo + check secrets./ork:ci-sentinel status— read.sentinel/ledger.jsonland summarize the last 24h./ork:ci-sentinel enable/disable— toggle the workflow'son.scheduleblock.
The hourly run itself does not invoke this skill — the workflow calls claude -p --bare against /ci-debug directly. This skill is for the human admin actions around the sentinel.
Why this design wins (one paragraph)
You already have 10 encoded CI failure patterns in /ci-debug and 632 ScheduleWakeup invocations in your history — you trust async, and you trust pattern-matched classifications when they cite the memory entry. The CI sentinel is composition, not invention: a GitHub Actions cron triggers headless /ci-debug against every open red PR, posts a collapsed verdict, and respects propose-don't-apply. Worst case is a noisy comment. Best case is 21 sessions/month of recurring CI archaeology reclaimed. Build effort: hours. Payback: immediate.
Ci Debug
Diagnose a failing CI run against a 10-pattern playbook. Classifies the failure, cites the relevant memory entry, proposes the exact fix command — but NEVER applies without explicit user approval. Use when a specific PR check or GitHub Actions run failed and you want a diagnosis instead of speculation. Don't use for org-wide CI sweeps (that's /status) or for app-level test failures (the playbook is CI-infra-specific).
Code Review Playbook
Use this skill when conducting or improving code reviews. Provides structured review processes, conventional comments patterns, language-specific checklists, and feedback templates. Use when reviewing PRs or standardizing review practices.
Last updated on