Skill Evolution
Tracks skill usage patterns, edit frequency, and success rates to suggest improvements and optimizations. Manages skill versioning with safe rollback capability and confidence scoring for suggestions. Use when reviewing skill performance, applying auto-suggested changes, or rolling back problematic versions.
Auto-activated — this skill loads automatically when Claude detects matching context.
Skill Evolution Manager
Enables skills to automatically improve based on usage patterns, user edits, and success rates. Provides version control with safe rollback capability.
Overview
- Reviewing how skills are performing across sessions
- Identifying patterns in user edits to skill outputs
- Applying learned improvements to skill templates
- Rolling back problematic skill changes
- Tracking skill version history and success rates
Quick Reference
| Command | Description |
|---|---|
/ork:skill-evolution | Show evolution report for all skills |
/ork:skill-evolution analyze <skill-id> | Analyze specific skill patterns |
/ork:skill-evolution evolve <skill-id> | Review and apply suggestions |
/ork:skill-evolution history <skill-id> | Show version history |
/ork:skill-evolution rollback <skill-id> <version> | Restore previous version |
How It Works
The skill evolution system operates in three phases:
COLLECT ANALYZE ACT
─────── ─────── ───
┌─────────────┐ ┌─────────────┐ ┌─────────────┐
│ PostTool │──────────▶│ Evolution │──────────▶│ /ork:skill- │
│ Edit │ patterns │ Analyzer │ suggest │ evolution │
│ Tracker │ │ Engine │ │ command │
└─────────────┘ └─────────────┘ └─────────────┘
│ │ │
▼ ▼ ▼
┌─────────────┐ ┌─────────────┐ ┌─────────────┐
│ edit- │ │ evolution- │ │ versions/ │
│ patterns. │ │ registry. │ │ snapshots │
│ jsonl │ │ json │ │ │
└─────────────┘ └─────────────┘ └─────────────┘Load details: Read("$\{CLAUDE_SKILL_DIR\}/rules/pattern-detection-heuristics.md") for tracked edit patterns and detection regexes. Load details: Read("$\{CLAUDE_SKILL_DIR\}/rules/confidence-scoring.md") for suggestion thresholds.
Subcommands
Each subcommand is documented with implementation details, shell commands, and sample output. Load details: Read("$\{CLAUDE_SKILL_DIR\}/references/evolution-commands.md")
Report (Default)
/ork:skill-evolution — Shows evolution report for all tracked skills with usage counts, success rates, and pending suggestions.
Analyze
/ork:skill-evolution analyze <skill-id> — Deep-dives into edit patterns for a specific skill, showing frequency, sample counts, and confidence scores.
Evolve
/ork:skill-evolution evolve <skill-id> — Interactive review of improvement suggestions. Uses AskUserQuestion for each suggestion (Apply / Skip / Reject). Creates version snapshot before applying.
History
/ork:skill-evolution history <skill-id> — Shows version history with performance metrics per version.
Rollback
/ork:skill-evolution rollback <skill-id> <version> — Restores a previous version after confirmation. Current version is backed up automatically.
Data Files
| File | Purpose | Format |
|---|---|---|
.claude/feedback/edit-patterns.jsonl | Raw edit pattern events | JSONL (append-only) |
.claude/feedback/evolution-registry.json | Aggregated suggestions | JSON |
.claude/feedback/metrics.json | Skill usage metrics | JSON |
skills/<cat>/<name>/versions/ | Version snapshots | Directory |
skills/<cat>/<name>/versions/manifest.json | Version metadata | JSON |
Auto-Evolution Safety
Load details: Read("$\{CLAUDE_SKILL_DIR\}/rules/auto-evolution-triggers.md") for full safety mechanisms, health monitoring, and trigger criteria.
Key safeguards: version snapshots before changes, auto-alert on >20% success rate drop, human review required, rejected suggestions never re-suggested.
References
Load on demand with Read("$\{CLAUDE_SKILL_DIR\}/references/<file>"):
| File | Content |
|---|---|
evolution-commands.md | Subcommand implementation, shell commands, and sample output |
evolution-analysis.md | Evolution analysis methodology |
version-management.md | Version management guide |
Rules
Load on demand with Read("$\{CLAUDE_SKILL_DIR\}/rules/<file>"):
| File | Content |
|---|---|
pattern-detection-heuristics.md | Edit pattern categories and regex detection |
confidence-scoring.md | Suggestion thresholds and confidence criteria |
auto-evolution-triggers.md | Safety mechanisms and trigger criteria |
Related Skills
ork:configure- Configure OrchestKit settingsork:doctor- Diagnose OrchestKit issuesfeedback-dashboard- View comprehensive feedback metrics
Rules (3)
Auto-Evolution Triggers — HIGH
Auto-Evolution Safety & Trigger Criteria
Safety Mechanisms
- Version Snapshots: Always created before changes
- Rollback Triggers: Auto-alert if success rate drops >20%
- Human Review: High-confidence suggestions require approval
- Rejection Memory: Rejected suggestions are never re-suggested
Health Monitoring
The system monitors skill health and can trigger warnings:
WARNING: api-design-framework success rate dropped from 94% to 71%
Consider: /ork:skill-evolution rollback api-design-framework 1.1.0Incorrect:
# Auto-apply pattern after 2 uses, no rollback tracking
confidence: 60%, samples: 2 → APPLYCorrect:
# Require minimum samples and high confidence before suggesting
confidence: 85%, samples: 8 → SUGGEST (requires human approval)
confidence: 60%, samples: 2 → TRACK ONLY (below threshold)When Auto-Evolution Activates
- Pattern frequency exceeds the Add Threshold (70%)
- At least Minimum Samples (5) uses recorded
- No prior rejection for the same pattern on the same skill
- Current skill version success rate is stable (no recent drops)
When Rollback Is Triggered
- Success rate drops more than 20% after an evolution
- Alert is surfaced in the next
reportoranalyzeinvocation - User is prompted to rollback via AskUserQuestion
Confidence Scoring — HIGH
Confidence Scoring & Suggestion Thresholds
Thresholds
| Threshold | Default | Description |
|---|---|---|
| Minimum Samples | 5 | Uses before generating suggestions |
| Add Threshold | 70% | Frequency to suggest adding pattern |
| Auto-Apply Confidence | 85% | Confidence for auto-application |
| Rollback Trigger | -20% | Success rate drop to trigger rollback |
Confidence Calculation
Confidence is calculated as the ratio of users who apply a pattern to total uses:
confidence = pattern_frequency / total_uses- Below 70%: Pattern tracked but no suggestion generated
- 70%-84%: Suggestion generated, requires human approval via
evolvesubcommand - 85%+: Auto-apply eligible (still requires human confirmation via AskUserQuestion)
Incorrect:
# Apply pattern with only 2 data points
pattern_frequency: 2/3 (67%) → auto-apply # Too few samples, unreliableCorrect:
# Wait for minimum samples before generating suggestions
pattern_frequency: 6/8 (75%) → suggest (requires approval)
pattern_frequency: 2/3 (67%) → track only (below 5 minimum samples)Suggestion States
Suggestions progress through: pending → applied | rejected
- Applied: Pattern added to skill template, version bumped
- Rejected: Marked in registry, never re-suggested for this skill
Pattern Detection Heuristics — HIGH
Edit Pattern Detection Heuristics
The system tracks these common edit patterns users apply after skill output:
| Pattern | Description | Detection Regex |
|---|---|---|
add_pagination | User adds pagination to API responses | limit.*offset, cursor.*pagination |
add_rate_limiting | User adds rate limiting | rate.?limit, throttl |
add_error_handling | User adds try/catch blocks | try.*catch, except |
add_types | User adds TypeScript/Python types | interface\s, Optional |
add_validation | User adds input validation | validate, Pydantic, Zod |
add_logging | User adds logging/observability | logger\., console.log |
remove_comments | User removes generated comments | Pattern removal detection |
add_auth_check | User adds authentication checks | @auth, @require_auth |
Incorrect:
# Generic pattern — matches too broadly
{"pattern": "add_.*", "regex": ".*"} # Matches everything, useless signalCorrect:
# Specific pattern with focused regex
{"pattern": "add_pagination", "regex": r"limit.*offset|cursor.*pagination"}How Detection Works
The PostTool Edit Tracker hook monitors file edits after skill invocations. When a user edits skill output, the edit is classified against the patterns above using regex matching. Results are appended to .claude/feedback/edit-patterns.jsonl.
References (4)
Evolution Analysis
Evolution Analysis Methodology
Reference guide for understanding how the skill evolution system analyzes patterns and generates suggestions.
Pattern Detection Algorithm
1. Data Collection (PostTool Hook)
When a Write or Edit tool is used after a skill was recently loaded:
IF skill_loaded_within(5_minutes) AND tool IN (Write, Edit):
content = get_edit_content()
patterns = detect_patterns(content)
IF patterns.length > 0:
log_to_edit_patterns_jsonl(skill_id, patterns)2. Pattern Matching
The system uses regex patterns to categorize edits:
PATTERN_DETECTORS=(
["add_pagination"]="limit.*offset|page.*size|cursor.*pagination|Paginated"
["add_rate_limiting"]="rate.?limit|throttl|RateLimiter|requests.?per"
["add_caching"]="@cache|cache_key|TTL|redis|memcache|@cached"
["add_retry_logic"]="retry|backoff|max_attempts|tenacity|Retry"
["add_error_handling"]="try.*catch|except|raise.*Exception|throw.*Error"
["add_validation"]="validate|Validator|@validate|Pydantic|Zod|yup"
["add_logging"]="logger\.|logging\.|console\.log|winston|pino"
["add_types"]=": *(str|int|bool|List|Dict|Optional)|interface\s|type\s.*="
["add_auth_check"]="@auth|@require_auth|isAuthenticated|requiresAuth"
["add_test_case"]="def test_|it\(|describe\(|expect\(|@pytest"
)3. Frequency Calculation
For each skill with sufficient usage:
frequency = pattern_count / total_skill_uses4. Confidence Scoring
Confidence combines frequency with sample size:
confidence = frequency × min(samples / 20, 1.0)This means:
- 100% frequency with 5 samples = 0.25 confidence (needs more data)
- 100% frequency with 20+ samples = 1.0 confidence (high certainty)
- 70% frequency with 15 samples = 0.53 confidence (moderate)
Suggestion Thresholds
| Metric | Threshold | Purpose |
|---|---|---|
| MIN_SAMPLES | 5 | Prevent premature suggestions |
| ADD_THRESHOLD | 0.70 | 70%+ users add = suggest adding |
| REMOVE_THRESHOLD | 0.70 | 70%+ users remove = suggest removing |
| AUTO_APPLY_CONFIDENCE | 0.85 | Auto-apply if very high confidence |
Suggestion Types
Add Suggestions
Generated when users frequently add similar content:
{
"type": "add",
"target": "template",
"pattern": "add_pagination",
"reason": "85% of users add pagination after using this skill"
}Remove Suggestions
Generated when users frequently remove generated content:
{
"type": "remove",
"target": "template",
"pattern": "remove_comments",
"reason": "72% of users remove docstrings from generated code"
}Analysis Best Practices
- Wait for sufficient data: Don't act on suggestions until MIN_SAMPLES reached
- Review high-confidence first: Focus on suggestions with confidence > 0.80
- Consider context: A pattern may be added for specific use cases only
- Monitor after changes: Track success rate changes after evolution
Interpreting Results
High-Value Improvements
- Frequency > 80%, Confidence > 0.70
- Pattern is universally applicable
- Easy to add to skill template
Conditional Improvements
- Frequency 50-80%
- May be context-dependent
- Consider adding as optional reference
Skip/Investigate
- Frequency < 50%
- Might be edge case or user preference
- Review individual edit patterns for context
Evolution Commands
Evolution Subcommand Reference
Detailed implementation and sample output for each subcommand.
Subcommand: Report (Default)
Usage: /ork:skill-evolution
Shows evolution report for all tracked skills.
Implementation
# Run the evolution engine report
"${CLAUDE_PROJECT_DIR}/.claude/scripts/evolution-engine.sh" reportSample Output
Skill Evolution Report
══════════════════════════════════════════════════════════════
Skills Summary:
┌────────────────────────────┬─────────┬─────────┬───────────┬────────────┐
│ Skill │ Uses │ Success │ Avg Edits │ Suggestions│
├────────────────────────────┼─────────┼─────────┼───────────┼────────────┤
│ api-design-framework │ 156 │ 94% │ 1.8 │ 2 │
│ database-schema-designer │ 89 │ 91% │ 2.1 │ 1 │
│ fastapi-patterns │ 67 │ 88% │ 2.4 │ 3 │
└────────────────────────────┴─────────┴─────────┴───────────┴────────────┘
Summary:
Skills tracked: 3
Total uses: 312
Overall success rate: 91%
Top Pending Suggestions:
1. 93% | api-design-framework | add add_pagination
2. 88% | api-design-framework | add add_rate_limiting
3. 85% | fastapi-patterns | add add_error_handlingSubcommand: Analyze
Usage: /ork:skill-evolution analyze <skill-id>
Analyzes edit patterns for a specific skill.
Implementation
# Run analysis for specific skill
"${CLAUDE_PROJECT_DIR}/.claude/scripts/evolution-engine.sh" analyze "$SKILL_ID"Sample Output
Skill Analysis: api-design-framework
────────────────────────────────────
Uses: 156 | Success: 94% | Avg Edits: 1.8
Edit Patterns Detected:
┌──────────────────────────┬─────────┬──────────┬────────────┐
│ Pattern │ Freq │ Samples │ Confidence │
├──────────────────────────┼─────────┼──────────┼────────────┤
│ add_pagination │ 85% │ 132/156 │ 0.93 │
│ add_rate_limiting │ 72% │ 112/156 │ 0.88 │
│ add_error_handling │ 45% │ 70/156 │ 0.56 │
└──────────────────────────┴─────────┴──────────┴────────────┘
Pending Suggestions:
1. 93% conf: ADD add_pagination to template
2. 88% conf: ADD add_rate_limiting to template
Run `/ork:skill-evolution evolve api-design-framework` to reviewSubcommand: Evolve
Usage: /ork:skill-evolution evolve <skill-id>
Interactive review and application of improvement suggestions.
Implementation
- Get Suggestions:
SUGGESTIONS=$("${CLAUDE_PROJECT_DIR}/.claude/scripts/evolution-engine.sh" suggest "$SKILL_ID")- For Each Suggestion, Present Interactive Options:
Use AskUserQuestion to let the user decide on each suggestion:
{
"questions": [{
"question": "Apply suggestion: ADD add_pagination to template? (93% confidence, 132/156 users add this)",
"header": "Evolution",
"options": [
{"label": "Apply", "description": "Add this pattern to the skill template"},
{"label": "Skip", "description": "Skip for now, ask again later"},
{"label": "Reject", "description": "Never suggest this again"}
],
"multiSelect": false
}]
}-
On Apply:
- Create version snapshot first
- Apply the suggestion to skill files
- Update evolution registry
-
On Reject:
- Mark suggestion as rejected in registry
- Will not be suggested again
Applying Suggestions
When a user accepts a suggestion, the implementation depends on the suggestion type:
For add suggestions to templates:
- Add the pattern to the skill's template files
- Update SKILL.md with new guidance
For add suggestions to references:
- Create new reference file in
references/directory
For remove suggestions:
- Remove the identified content
- Archive in version snapshot first
Subcommand: History
Usage: /ork:skill-evolution history <skill-id>
Shows version history with performance metrics.
Implementation
# Run version manager list
"${CLAUDE_PROJECT_DIR}/.claude/scripts/version-manager.sh" list "$SKILL_ID"Sample Output
Version History: api-design-framework
══════════════════════════════════════════════════════════════
Current Version: 1.2.0
┌─────────┬────────────┬─────────┬───────┬───────────┬────────────────────────────┐
│ Version │ Date │ Success │ Uses │ Avg Edits │ Changelog │
├─────────┼────────────┼─────────┼───────┼───────────┼────────────────────────────┤
│ 1.2.0 │ 2026-01-14 │ 94% │ 156 │ 1.8 │ Added pagination pattern │
│ 1.1.0 │ 2026-01-05 │ 89% │ 80 │ 2.3 │ Added error handling ref │
│ 1.0.0 │ 2025-11-01 │ 78% │ 45 │ 3.2 │ Initial release │
└─────────┴────────────┴─────────┴───────┴───────────┴────────────────────────────┘Subcommand: Rollback
Usage: /ork:skill-evolution rollback <skill-id> <version>
Restores a skill to a previous version.
Implementation
- Confirm with User:
Use AskUserQuestion for confirmation:
{
"questions": [{
"question": "Rollback api-design-framework from 1.2.0 to 1.0.0? Current version will be backed up.",
"header": "Rollback",
"options": [
{"label": "Confirm Rollback", "description": "Restore version 1.0.0"},
{"label": "Cancel", "description": "Keep current version"}
],
"multiSelect": false
}]
}- On Confirm:
"${CLAUDE_PROJECT_DIR}/.claude/scripts/version-manager.sh" restore "$SKILL_ID" "$VERSION"- Report Result:
Restored api-design-framework to version 1.0.0
Previous version backed up to: versions/.backup-1.2.0-1736867234Storage Patterns
Storage Patterns: Rolling Logbook vs Index-Per-Entry
When a skill (or a project's .claude/rules/*.md files) needs to accumulate state across sessions — decisions, observations, patterns, knowledge — there are two storage patterns. Pick wrong and Claude Code's 40k-char auto-load threshold ambushes you 3-6 months later.
TL;DR
| Need | Use |
|---|---|
| Append-only, < 30k chars total expected lifetime | Rolling logbook |
| Append-forever, no natural upper bound | Index-per-entry |
| Reads are chronological narrative | Rolling logbook |
| Reads are by-key or by-date lookup | Index-per-entry |
| Each entry independently meaningful | Index-per-entry |
If unsure, default to index-per-entry — bounded by construction.
Pattern 1 — Rolling Logbook
Single Markdown file appended forever.
.claude/rules/recent-decisions.md
# Recent Project Decisions
## 2026-04-15 — Use Postgres not MongoDB
## 2026-04-22 — Brainstorm: dual-write
## ... (one entry every week, no upper bound)Strengths
- Trivial to write:
>> file.mdand you're done - Operator-readable as chronological narrative
- One file means one place to check in / out of cache
- Diffs cleanly in PRs (additions only)
Weaknesses
- Grows unbounded. No mechanism stops you at any size.
- CC auto-loads everything in
.claude/rules/*.mdinto every<system-reminder>. At 40,000 chars CC emits a yellow warning, but by then you've burned that context on every prompt for weeks. - Stale entries pollute the loaded context (Q1 2024 decisions for a Q4 2026 session — irrelevant but billed).
- Recursive search/replace breaks (one wrong-line edit corrupts a 30k file).
Concrete failure case: yonatan-hq/platform/.claude/rules/recent-decisions.md ballooned to 53.8k chars in seven months. CC flagged it as "Large file will impact performance" — every session paid ~14k extra context tokens before the operator noticed.
Pattern 2 — Index-Per-Entry
One small index file with one bullet per entry; individual entries live in sibling files loaded on-demand via Read.
.claude/rules/ ← scanned by CC's auto-loader
├── index.md ← ≤ 200 lines, only file auto-loaded
│ - [Use Postgres not MongoDB](decisions/2026-04-15-postgres.md) — chose Postgres for jsonb support
│ - [Dual-write analytics](decisions/2026-04-22-dual-write.md) — HTTP sink alongside local JSONL
│ ...
└── decisions/ ← NOT auto-loaded (subdirectory)
├── 2026-04-15-postgres.md ← loaded only when relevant via Read
├── 2026-04-22-dual-write.md
└── ...Strengths
- Bounded. Index grows by one line per entry — even 500 entries is ~50 lines.
- On-demand load. Operator (or Claude) reads the specific entry that matters; the other 499 stay on disk.
- CC's auto-loader stops at the subdirectory boundary — by convention CC globs
*.mdat the rules root, not recursive. - Each entry is independently meaningful, addressable, and editable.
- Old entries don't pollute current context.
Weaknesses
- Two-step writes: append to index + write new file. Five extra seconds per entry. Bash function or skill can hide this.
- Two-step reads: scan index, then Read the relevant file. Costs one extra tool call.
- More PR-diff noise (one new file per entry vs append to one).
- Filenames must be unique and well-chosen — bad naming kills the on-demand pattern.
Migration Path
When a rolling logbook crosses the 30k-char mark, migrate proactively:
- Move the rolling file aside:
git mv recent-decisions.md decisions/_legacy-rolling.md - Create
decisions/subdirectory. - Split entries one-per-file. A small script can split on
##headers:csplit -k recent-decisions.md '/^## /' '{*}' - Create
index.mdwith one bullet per file. - Update any skill/hook references to point at the index.
The _legacy-rolling.md stays accessible via Read but won't auto-load (not at .claude/rules/*.md root).
Pattern selection by skill
| Skill | Pattern | Why |
|---|---|---|
memory | index-per-entry | Per-fact files keyed by topic; index in MEMORY.md |
remember | index-per-entry | Same as memory — entries grow forever, lookups are by-key |
recent-decisions (rules-level) | migrating from rolling → index-per-entry | Burned 53k chars before the auto-loader warning fired |
goal-history.jsonl | rolling (JSONL) | Not auto-loaded by CC; consumed by monitor on-demand. Different mechanism. |
Skill-internal references/*.md | one file per concept | Loaded explicitly via Read in SKILL.md, not auto-globbed |
Detection
The lifecycle/rules-size-check hook (#1815) warns at 35k chars (WARN) and 38k chars (CRITICAL) when a .claude/rules/*.md file would auto-load above CC's 40k threshold. Operator-facing stderr signal on every SessionStart — gives you ~5k chars of runway before CC starts complaining.
When the rolling pattern is still right
Don't blanket-reject rolling logbooks. They're correct when:
- The file has a natural upper bound (e.g., "this lists the 12 active milestones — milestones don't accumulate, they close")
- Total bytes are known to stay under 20k chars even at 5× growth
- Readability as a single narrative is the primary read mode
The 40k-char cliff isn't a hard rule — it's a heuristic for "auto-loaded into every prompt." If your file isn't at .claude/rules/*.md root, it isn't auto-loaded and the trade-off shifts.
Related
lifecycle/rules-size-check(hook, #1815) — pre-flight warning when a file approaches the cliffsrc/skills/CONTRIBUTING-SKILLS.md#storage-patterns— short pointer to this referencesrc/skills/memory/— index-per-entry pattern done well (canonical reference implementation)
Version Management
Version Management Guide
Reference guide for managing skill versions with safe rollback capability.
Version Structure
Each skill can have versioned snapshots stored in:
skills/<category>/<skill-name>/
├── SKILL.md # Current version
├── SKILL.md # Current metadata
├── references/ # Current references
├── scripts/ # Current templates
└── versions/
├── manifest.json # Version history metadata
├── 1.0.0/
│ ├── SKILL.md
│ ├── SKILL.md
│ ├── references/
│ └── CHANGELOG.md
└── 1.1.0/
├── SKILL.md
├── SKILL.md
├── references/
└── CHANGELOG.mdManifest Schema
The manifest.json tracks version history:
{
"$schema": "../../../../../../.claude/schemas/skill-evolution.schema.json",
"skillId": "api-design-framework",
"currentVersion": "1.2.0",
"versions": [
{
"version": "1.0.0",
"date": "2025-11-01",
"successRate": 0.78,
"uses": 45,
"avgEdits": 3.2,
"changelog": "Initial release"
},
{
"version": "1.1.0",
"date": "2026-01-05",
"successRate": 0.89,
"uses": 80,
"avgEdits": 1.8,
"changelog": "Added pagination pattern (85% users added manually)"
}
],
"suggestions": [],
"editPatterns": {},
"lastAnalyzed": "2026-01-14T10:30:00Z"
}Versioning Workflow
Creating a Version
-
Before making changes, create a version snapshot:
version-manager.sh create <skill-id> "Description of changes" -
The system:
- Bumps version number (patch by default)
- Copies current files to
versions/<new-version>/ - Records current metrics in manifest
- Creates CHANGELOG.md
Comparing Versions
Compare two versions to see what changed:
version-manager.sh diff <skill-id> 1.0.0 1.1.0Shows:
- File differences (unified diff)
- Metrics comparison (success rate, uses, avg edits)
Restoring a Version
If a change causes problems, rollback:
version-manager.sh restore <skill-id> <version>The system:
- Backs up current version to
.backup-<version>-<timestamp> - Copies snapshot files to skill root
- Updates manifest with rollback entry
Automatic Safety Checks
Rollback Triggers
The system monitors for:
| Trigger | Threshold | Action |
|---|---|---|
| Success rate drop | -20% | Warning + rollback suggestion |
| Avg edits increase | +50% | Warning (users fighting skill) |
| Consecutive failures | 5+ | Alert to review |
Health Check Integration
The posttool hooks monitor skill health:
check_skill_health() {
local skill_id="$1"
local current_rate=$(get_recent_success_rate "$skill_id" 10)
local baseline_rate=$(get_version_baseline "$skill_id")
if (( $(echo "$baseline_rate - $current_rate > 0.20" | bc -l) )); then
echo "WARNING: $skill_id dropped from ${baseline_rate} to ${current_rate}"
fi
}Best Practices
When to Create Versions
- Before applying evolution suggestions
- Before major skill modifications
- After validating improvements work well
- At regular intervals (weekly/monthly) for active skills
Version Naming
Use semantic versioning:
- Major (2.0.0): Breaking changes to skill behavior
- Minor (1.1.0): New features/patterns added
- Patch (1.0.1): Bug fixes, minor improvements
Cleanup Policy
- Keep last 5 versions minimum
- Archive versions older than 90 days
- Never delete versions with good metrics (baseline references)
Metrics Interpretation
Success Rate Trends
| Pattern | Interpretation |
|---|---|
| Increasing | Evolution working well |
| Stable | Skill mature and effective |
| Decreasing | Investigate recent changes |
Average Edits Trends
| Pattern | Interpretation |
|---|---|
| Decreasing | Skill producing better output |
| Stable | Consistent quality |
| Increasing | Users modifying more (skill may need updates) |
Recovery Scenarios
Accidental Breaking Change
# 1. Check history
version-manager.sh list <skill-id>
# 2. Find last good version
version-manager.sh metrics <skill-id>
# 3. Restore
version-manager.sh restore <skill-id> 1.1.0Gradual Degradation
# 1. Compare versions
version-manager.sh diff <skill-id> 1.0.0 1.2.0
# 2. Identify problematic changes
# 3. Create new version fixing issuesSetup
Personalized 8-phase onboarding wizard that scans the codebase, detects tech stack, recommends skills and MCP servers, and generates an improvement plan with readiness score. Includes safety checks, project-scoped configuration, and release channel detection. Use when setting up OrchestKit for a new project or rescanning after major changes.
Storybook Mcp Integration
Storybook MCP server integration for component-aware AI development. Covers 6 tools across 3 toolsets (dev, docs, testing): component discovery via list-all-documentation/get-documentation, story previews via preview-stories, and automated testing via run-story-tests. Use when generating components that should reuse existing Storybook components, running component tests via MCP, or previewing stories in chat.
Last updated on