evaluate — for Claude Code evaluate, community, for Claude Code, ide skills, __TASK_DESCRIPTION__, __STEP_*__, Interactive, step-by-step, review, execution

v1.0.0

关于此技能

适用场景: Ideal for AI agents that need user calls /evaluate. 本地化技能摘要: Klava personal assistant gateway # Evaluate Skill Interactive step-by-step review of agent's execution. This AI agent skill supports Claude Code, Cursor, and Windsurf workflows.

功能特性

User calls /evaluate
User is not satisfied with a result and wants to give precise feedback
After a complex task where the user wants to review the execution path
Do NOT use after every task - only when evaluation is needed.
Reflect on your actions in the current session

# 核心主题

VCasecnikovs VCasecnikovs
[0]
[0]
更新于: 4/25/2026

技能概览

先看适用场景、限制条件和安装路径,再决定是否继续深入。

适用场景: Ideal for AI agents that need user calls /evaluate. 本地化技能摘要: Klava personal assistant gateway # Evaluate Skill Interactive step-by-step review of agent's execution. This AI agent skill supports Claude Code, Cursor, and Windsurf workflows.

核心价值

推荐说明: evaluate helps agents user calls /evaluate. Klava personal assistant gateway # Evaluate Skill Interactive step-by-step review of agent's execution. This AI agent skill supports Claude Code, Cursor, and Windsurf

适用 Agent 类型

适用场景: Ideal for AI agents that need user calls /evaluate.

赋予的主要能力 · evaluate

适用任务: User calls /evaluate
适用任务: User is not satisfied with a result and wants to give precise feedback
适用任务: After a complex task where the user wants to review the execution path

! 使用限制与门槛

  • 限制说明: Do NOT use after every task - only when evaluation is needed.
  • 限制说明: Be specific - include file paths, tool names, actual values. The user needs to see exactly what happened.
  • 限制说明: The user needs to see exactly what happened

关于来源内容

The section below is adapted from the upstream repository. Use it as supporting material alongside the fit, use-case, and installation summary on this page.

实验室 Demo

Browser Sandbox Environment

⚡️ Ready to unleash?

Experience this Agent in a zero-setup browser environment powered by WebContainers. No installation required.

Boot Container Sandbox

常见问题与安装步骤

以下问题与步骤与页面结构化数据保持一致,便于搜索引擎理解页面内容。

? FAQ

evaluate 是什么?

适用场景: Ideal for AI agents that need user calls /evaluate. 本地化技能摘要: Klava personal assistant gateway # Evaluate Skill Interactive step-by-step review of agent's execution. This AI agent skill supports Claude Code, Cursor, and Windsurf workflows.

如何安装 evaluate?

运行命令:npx killer-skills add VCasecnikovs/klava。支持 Cursor、Windsurf、VS Code、Claude Code 等 19+ IDE/Agent。

evaluate 适用于哪些场景?

典型场景包括:适用任务: User calls /evaluate、适用任务: User is not satisfied with a result and wants to give precise feedback、适用任务: After a complex task where the user wants to review the execution path。

evaluate 支持哪些 IDE 或 Agent?

该技能兼容 Cursor, Windsurf, VS Code, Trae, Claude Code, OpenClaw, Aider, Codex, OpenCode, Goose, Cline, Roo Code, Kiro, Augment Code, Continue, GitHub Copilot, Sourcegraph Cody, and Amazon Q Developer。可使用 Killer-Skills CLI 一条命令通用安装。

evaluate 有哪些限制?

限制说明: Do NOT use after every task - only when evaluation is needed.;限制说明: Be specific - include file paths, tool names, actual values. The user needs to see exactly what happened.;限制说明: The user needs to see exactly what happened。

安装步骤

  1. 1. 打开终端

    在你的项目目录中打开终端或命令行。

  2. 2. 执行安装命令

    运行:npx killer-skills add VCasecnikovs/klava。CLI 会自动识别 IDE 或 AI Agent 并完成配置。

  3. 3. 开始使用技能

    evaluate 已启用,可立即在当前项目中调用。

! 来源说明

此页面仍可作为安装与查阅参考。继续使用前,请结合上方适用场景、限制条件和上游仓库说明一起判断。

Upstream Repository Material

The section below is adapted from the upstream repository. Use it as supporting material alongside the fit, use-case, and installation summary on this page.

Upstream Source

evaluate

Klava personal assistant gateway # Evaluate Skill Interactive step-by-step review of agent's execution. This AI agent skill supports Claude Code, Cursor, and

SKILL.md
Readonly
Upstream Repository Material
The section below is adapted from the upstream repository. Use it as supporting material alongside the fit, use-case, and installation summary on this page.
Upstream Source

Evaluate Skill

Interactive step-by-step review of agent's execution. Creates an HTML page where the user sees what the agent did at each step of the Agent Execution Loop and can write targeted feedback.

When to Use

  • User calls /evaluate
  • User is not satisfied with a result and wants to give precise feedback
  • After a complex task where the user wants to review the execution path

Do NOT use after every task - only when evaluation is needed.

Flow

  1. Reflect on your actions in the current session
  2. For each step (MATCH, THINK, ACT, VERIFY, LEARN) write what you did
  3. Generate HTML with embedded data
  4. Open in browser
  5. User writes comments next to each step, clicks "Copy Feedback"
  6. User pastes feedback in chat
  7. Agent parses feedback per step and applies fixes (update skill, retry, etc.)

Step 1: Reflect

Analyze the current session and fill in each step honestly:

  • MATCH: Which skill was chosen? Why? Or why none matched?
  • THINK: What was the expected result? What verification criteria were defined?
  • ACT: Which tools were called? In what order? What was parallel vs sequential?
  • VERIFY: What was checked? Did it pass or fail? Any skill audit issues?
  • LEARN: Was anything updated? If not, why?

Be specific - include file paths, tool names, actual values. The user needs to see exactly what happened.

Step 2: Generate HTML

Write a self-contained HTML file to /tmp/evaluate-{timestamp}.html with the data embedded inline. Use the template below.

Replace __TASK_DESCRIPTION__ and each __STEP_*__ placeholder with actual content from your reflection. Use HTML-safe text (escape <, >, &).

html
1<!DOCTYPE html> 2<html lang="en"> 3<head> 4<meta charset="UTF-8"> 5<meta name="viewport" content="width=device-width, initial-scale=1.0"> 6<title>Evaluate - Step Review</title> 7<style> 8 * { margin: 0; padding: 0; box-sizing: border-box; } 9 10 body { 11 font-family: -apple-system, BlinkMacSystemFont, 'SF Pro Text', 'Helvetica Neue', sans-serif; 12 background: #1a1a2e; 13 color: #e0e0e0; 14 line-height: 1.6; 15 min-height: 100vh; 16 } 17 18 .header { 19 padding: 1.5rem 2rem; 20 border-bottom: 1px solid #2a2a4a; 21 display: flex; 22 align-items: center; 23 justify-content: space-between; 24 } 25 26 .header h1 { 27 font-size: 1.4rem; 28 font-weight: 600; 29 color: #fff; 30 } 31 32 .header .task-label { 33 font-size: 0.85rem; 34 color: #888; 35 margin-top: 0.25rem; 36 } 37 38 .copy-btn { 39 background: #6c5ce7; 40 color: #fff; 41 border: none; 42 padding: 0.6rem 1.5rem; 43 border-radius: 6px; 44 font-size: 0.9rem; 45 font-weight: 500; 46 cursor: pointer; 47 transition: all 0.2s; 48 } 49 50 .copy-btn:hover { background: #5a4bd1; } 51 .copy-btn.copied { background: #27ae60; } 52 53 .container { 54 padding: 1.5rem 2rem; 55 max-width: 1400px; 56 margin: 0 auto; 57 } 58 59 .step { 60 display: grid; 61 grid-template-columns: 1fr 1fr; 62 gap: 1rem; 63 margin-bottom: 1rem; 64 } 65 66 .step-card { 67 background: #16213e; 68 border: 1px solid #2a2a4a; 69 border-radius: 8px; 70 padding: 1.25rem; 71 } 72 73 .step-label { 74 display: inline-block; 75 font-size: 0.7rem; 76 font-weight: 700; 77 text-transform: uppercase; 78 letter-spacing: 0.08em; 79 padding: 0.2rem 0.6rem; 80 border-radius: 4px; 81 margin-bottom: 0.75rem; 82 } 83 84 .step-match .step-label { background: #2d3a8c; color: #8b9cf7; } 85 .step-think .step-label { background: #1a5c3a; color: #6bcf8e; } 86 .step-act .step-label { background: #8c5a2d; color: #f7c98b; } 87 .step-verify .step-label { background: #5c1a5c; color: #cf6bcf; } 88 .step-learn .step-label { background: #1a4a5c; color: #6bb8cf; } 89 90 .step-content { 91 font-size: 0.9rem; 92 color: #c0c0c0; 93 white-space: pre-wrap; 94 } 95 96 .feedback-card { 97 background: #1e1e3a; 98 border: 1px solid #3a3a5a; 99 border-radius: 8px; 100 padding: 1.25rem; 101 display: flex; 102 flex-direction: column; 103 } 104 105 .feedback-card label { 106 font-size: 0.75rem; 107 color: #888; 108 text-transform: uppercase; 109 letter-spacing: 0.05em; 110 margin-bottom: 0.5rem; 111 } 112 113 .feedback-card textarea { 114 flex: 1; 115 min-height: 80px; 116 background: #12122a; 117 border: 1px solid #2a2a4a; 118 border-radius: 6px; 119 color: #e0e0e0; 120 font-family: inherit; 121 font-size: 0.9rem; 122 padding: 0.75rem; 123 resize: vertical; 124 line-height: 1.5; 125 } 126 127 .feedback-card textarea:focus { 128 outline: none; 129 border-color: #6c5ce7; 130 } 131 132 .feedback-card textarea::placeholder { 133 color: #555; 134 } 135 136 @media (max-width: 900px) { 137 .step { grid-template-columns: 1fr; } 138 .container { padding: 1rem; } 139 } 140</style> 141</head> 142<body> 143 <div class="header"> 144 <div> 145 <h1>Evaluate - Step Review</h1> 146 <div class="task-label">__TASK_DESCRIPTION__</div> 147 </div> 148 <button class="copy-btn" onclick="copyFeedback()">Copy Feedback</button> 149 </div> 150 151 <div class="container"> 152 <div class="step step-match"> 153 <div class="step-card"> 154 <div class="step-label">1. Match</div> 155 <div class="step-content">__STEP_MATCH__</div> 156 </div> 157 <div class="feedback-card"> 158 <label>Your feedback on MATCH</label> 159 <textarea id="fb-match" placeholder="Was the right skill chosen? Should a different one be used?"></textarea> 160 </div> 161 </div> 162 163 <div class="step step-think"> 164 <div class="step-card"> 165 <div class="step-label">2. Think</div> 166 <div class="step-content">__STEP_THINK__</div> 167 </div> 168 <div class="feedback-card"> 169 <label>Your feedback on THINK</label> 170 <textarea id="fb-think" placeholder="Was the expected result correct? Were verification criteria good?"></textarea> 171 </div> 172 </div> 173 174 <div class="step step-act"> 175 <div class="step-card"> 176 <div class="step-label">3. Act</div> 177 <div class="step-content">__STEP_ACT__</div> 178 </div> 179 <div class="feedback-card"> 180 <label>Your feedback on ACT</label> 181 <textarea id="fb-act" placeholder="Were the right tools used? Correct order? Missing steps?"></textarea> 182 </div> 183 </div> 184 185 <div class="step step-verify"> 186 <div class="step-card"> 187 <div class="step-label">4. Verify</div> 188 <div class="step-content">__STEP_VERIFY__</div> 189 </div> 190 <div class="feedback-card"> 191 <label>Your feedback on VERIFY</label> 192 <textarea id="fb-verify" placeholder="Was verification thorough? Missed checks?"></textarea> 193 </div> 194 </div> 195 196 <div class="step step-learn"> 197 <div class="step-card"> 198 <div class="step-label">5. Learn</div> 199 <div class="step-content">__STEP_LEARN__</div> 200 </div> 201 <div class="feedback-card"> 202 <label>Your feedback on LEARN</label> 203 <textarea id="fb-learn" placeholder="Should something be updated? Skill, CLAUDE.md, memory?"></textarea> 204 </div> 205 </div> 206 </div> 207 208<script> 209function copyFeedback() { 210 const steps = ['match', 'think', 'act', 'verify', 'learn']; 211 const labels = { match: 'MATCH', think: 'THINK', act: 'ACT', verify: 'VERIFY', learn: 'LEARN' }; 212 213 let output = '## Evaluation Feedback\n'; 214 let hasAny = false; 215 216 for (const step of steps) { 217 const agentEl = document.querySelector(`.step-${step} .step-content`); 218 const fbEl = document.getElementById(`fb-${step}`); 219 const feedback = fbEl.value.trim(); 220 221 if (feedback) { 222 hasAny = true; 223 output += `\n### ${labels[step]}\n`; 224 output += `**Agent:** ${agentEl.textContent.trim()}\n`; 225 output += `**Feedback:** ${feedback}\n`; 226 } 227 } 228 229 if (!hasAny) { 230 output += '\nNo feedback provided - all steps OK.\n'; 231 } 232 233 navigator.clipboard.writeText(output).then(() => { 234 const btn = document.querySelector('.copy-btn'); 235 btn.textContent = 'Copied!'; 236 btn.classList.add('copied'); 237 setTimeout(() => { 238 btn.textContent = 'Copy Feedback'; 239 btn.classList.remove('copied'); 240 }, 2000); 241 }); 242} 243</script> 244</body> 245</html>

Step 3: Open in Browser

bash
1open /tmp/evaluate-{timestamp}.html

Tell the user: "Открыл Evaluate в браузере. Напиши комментарии к шагам которые не понравились, нажми Copy Feedback и вставь сюда."

Step 4: Parse and Fix Feedback

When the user pastes feedback back (format: ## Evaluation Feedback with ### STEP_NAME sections), parse each step:

  1. Read each **Feedback:** line
  2. Determine what needs fixing:
    • MATCH feedback -> update skill frontmatter description or CLAUDE.md skill matching rules
    • THINK feedback -> update skill instructions (expected result, verification criteria)
    • ACT feedback -> update skill steps, tool usage, order of operations
    • VERIFY feedback -> add/update verification checks in skill
    • LEARN feedback -> update skill Known Issues, add prevention rules
  3. Apply the fix to the appropriate file
  4. Git commit: evaluate: {skill} - {what changed}
  5. If the task needs re-execution, re-run through the Agent Execution Loop

Known Issues

  • Browser clipboard API requires HTTPS or localhost - if clipboard fails, user can manually select all text from the output area

相关技能

寻找 evaluate 的替代方案 (Alternative) 或可搭配使用的同类 community Skill?探索以下相关开源技能。

查看全部

openclaw-release-maintainer

Logo of openclaw
openclaw

本地化技能摘要: 🦞 # OpenClaw Release Maintainer Use this skill for release and publish-time workflow. It covers ai, assistant, crustacean workflows. This AI agent skill supports Claude Code, Cursor, and Windsurf workflows.

333.8k
0
AI

widget-generator

Logo of f
f

本地化技能摘要: Generate customizable widget plugins for the prompts.chat feed system # Widget Generator Skill This skill guides creation of widget plugins for prompts.chat. It covers ai, artificial-intelligence, awesome-list workflows. This AI agent skill supports Claude Code, Cursor, and Windsurf

149.6k
0
AI

flags

Logo of vercel
vercel

本地化技能摘要: The React Framework # Feature Flags Use this skill when adding or changing framework feature flags in Next.js internals. It covers blog, browser, compiler workflows. This AI agent skill supports Claude Code, Cursor, and Windsurf workflows.

138.4k
0
浏览器

pr-review

Logo of pytorch
pytorch

本地化技能摘要: Usage Modes No Argument If the user invokes /pr-review with no arguments, do not perform a review. It covers autograd, deep-learning, gpu workflows. This AI agent skill supports Claude Code, Cursor, and Windsurf workflows.

98.6k
0
开发者工具