evaluate은 무엇인가요?

적합한 상황: Ideal for AI agents that need user calls /evaluate. 현지화된 요약: Klava personal assistant gateway # Evaluate Skill Interactive step-by-step review of agent's execution. This AI agent skill supports Claude Code, Cursor, and Windsurf workflows.

evaluate은 어떻게 설치하나요?

다음 명령을 실행하세요: npx killer-skills add VCasecnikovs/klava/evaluate. Cursor, Windsurf, VS Code, Claude Code와 19개 이상의 다른 IDE에서 동작합니다.

evaluate은 어디에 쓰이나요?

주요 활용 사례는 다음과 같습니다: 사용 사례: Applying User calls /evaluate, 사용 사례: Applying User is not satisfied with a result and wants to give precise feedback, 사용 사례: Applying After a complex task where the user wants to review the execution path.

evaluate 와 호환되는 IDE는 무엇인가요?

이 스킬은 Cursor, Windsurf, VS Code, Trae, Claude Code, OpenClaw, Aider, Codex, OpenCode, Goose, Cline, Roo Code, Kiro, Augment Code, Continue, GitHub Copilot, Sourcegraph Cody, and Amazon Q Developer 와 호환됩니다. 통합 설치에는 Killer-Skills CLI를 사용하세요.

Evaluate Skill

Name: evaluate
Availability: InStock
Author: VCasecnikovs

Interactive step-by-step review of agent's execution. Creates an HTML page where the user sees what the agent did at each step of the Agent Execution Loop and can write targeted feedback.

When to Use

User calls /evaluate
User is not satisfied with a result and wants to give precise feedback
After a complex task where the user wants to review the execution path

Do NOT use after every task - only when evaluation is needed.

Flow

Reflect on your actions in the current session
For each step (MATCH, THINK, ACT, VERIFY, LEARN) write what you did
Generate HTML with embedded data
Open in browser
User writes comments next to each step, clicks "Copy Feedback"
User pastes feedback in chat
Agent parses feedback per step and applies fixes (update skill, retry, etc.)

Step 1: Reflect

Analyze the current session and fill in each step honestly:

MATCH: Which skill was chosen? Why? Or why none matched?
THINK: What was the expected result? What verification criteria were defined?
ACT: Which tools were called? In what order? What was parallel vs sequential?
VERIFY: What was checked? Did it pass or fail? Any skill audit issues?
LEARN: Was anything updated? If not, why?

Be specific - include file paths, tool names, actual values. The user needs to see exactly what happened.

Step 2: Generate HTML

Write a self-contained HTML file to /tmp/evaluate-{timestamp}.html with the data embedded inline. Use the template below.

Replace __TASK_DESCRIPTION__ and each __STEP_*__ placeholder with actual content from your reflection. Use HTML-safe text (escape <, >, &).

html
1<!DOCTYPE html>
2<html lang="en">
3<head>
4<meta charset="UTF-8">
5<meta name="viewport" content="width=device-width, initial-scale=1.0">
6<title>Evaluate - Step Review</title>
7<style>
8  * { margin: 0; padding: 0; box-sizing: border-box; }
9
10  body {
11    font-family: -apple-system, BlinkMacSystemFont, 'SF Pro Text', 'Helvetica Neue', sans-serif;
12    background: #1a1a2e;
13    color: #e0e0e0;
14    line-height: 1.6;
15    min-height: 100vh;
16  }
17
18  .header {
19    padding: 1.5rem 2rem;
20    border-bottom: 1px solid #2a2a4a;
21    display: flex;
22    align-items: center;
23    justify-content: space-between;
24  }
25
26  .header h1 {
27    font-size: 1.4rem;
28    font-weight: 600;
29    color: #fff;
30  }
31
32  .header .task-label {
33    font-size: 0.85rem;
34    color: #888;
35    margin-top: 0.25rem;
36  }
37
38  .copy-btn {
39    background: #6c5ce7;
40    color: #fff;
41    border: none;
42    padding: 0.6rem 1.5rem;
43    border-radius: 6px;
44    font-size: 0.9rem;
45    font-weight: 500;
46    cursor: pointer;
47    transition: all 0.2s;
48  }
49
50  .copy-btn:hover { background: #5a4bd1; }
51  .copy-btn.copied { background: #27ae60; }
52
53  .container {
54    padding: 1.5rem 2rem;
55    max-width: 1400px;
56    margin: 0 auto;
57  }
58
59  .step {
60    display: grid;
61    grid-template-columns: 1fr 1fr;
62    gap: 1rem;
63    margin-bottom: 1rem;
64  }
65
66  .step-card {
67    background: #16213e;
68    border: 1px solid #2a2a4a;
69    border-radius: 8px;
70    padding: 1.25rem;
71  }
72
73  .step-label {
74    display: inline-block;
75    font-size: 0.7rem;
76    font-weight: 700;
77    text-transform: uppercase;
78    letter-spacing: 0.08em;
79    padding: 0.2rem 0.6rem;
80    border-radius: 4px;
81    margin-bottom: 0.75rem;
82  }
83
84  .step-match .step-label  { background: #2d3a8c; color: #8b9cf7; }
85  .step-think .step-label  { background: #1a5c3a; color: #6bcf8e; }
86  .step-act .step-label    { background: #8c5a2d; color: #f7c98b; }
87  .step-verify .step-label { background: #5c1a5c; color: #cf6bcf; }
88  .step-learn .step-label  { background: #1a4a5c; color: #6bb8cf; }
89
90  .step-content {
91    font-size: 0.9rem;
92    color: #c0c0c0;
93    white-space: pre-wrap;
94  }
95
96  .feedback-card {
97    background: #1e1e3a;
98    border: 1px solid #3a3a5a;
99    border-radius: 8px;
100    padding: 1.25rem;
101    display: flex;
102    flex-direction: column;
103  }
104
105  .feedback-card label {
106    font-size: 0.75rem;
107    color: #888;
108    text-transform: uppercase;
109    letter-spacing: 0.05em;
110    margin-bottom: 0.5rem;
111  }
112
113  .feedback-card textarea {
114    flex: 1;
115    min-height: 80px;
116    background: #12122a;
117    border: 1px solid #2a2a4a;
118    border-radius: 6px;
119    color: #e0e0e0;
120    font-family: inherit;
121    font-size: 0.9rem;
122    padding: 0.75rem;
123    resize: vertical;
124    line-height: 1.5;
125  }
126
127  .feedback-card textarea:focus {
128    outline: none;
129    border-color: #6c5ce7;
130  }
131
132  .feedback-card textarea::placeholder {
133    color: #555;
134  }
135
136  @media (max-width: 900px) {
137    .step { grid-template-columns: 1fr; }
138    .container { padding: 1rem; }
139  }
140</style>
141</head>
142<body>
143  <div class="header">
144    <div>
145      <h1>Evaluate - Step Review</h1>
146      <div class="task-label">__TASK_DESCRIPTION__</div>
147    </div>
148    <button class="copy-btn" onclick="copyFeedback()">Copy Feedback</button>
149  </div>
150
151  <div class="container">
152    <div class="step step-match">
153      <div class="step-card">
154        <div class="step-label">1. Match</div>
155        <div class="step-content">__STEP_MATCH__</div>
156      </div>
157      <div class="feedback-card">
158        <label>Your feedback on MATCH</label>
159        <textarea id="fb-match" placeholder="Was the right skill chosen? Should a different one be used?"></textarea>
160      </div>
161    </div>
162
163    <div class="step step-think">
164      <div class="step-card">
165        <div class="step-label">2. Think</div>
166        <div class="step-content">__STEP_THINK__</div>
167      </div>
168      <div class="feedback-card">
169        <label>Your feedback on THINK</label>
170        <textarea id="fb-think" placeholder="Was the expected result correct? Were verification criteria good?"></textarea>
171      </div>
172    </div>
173
174    <div class="step step-act">
175      <div class="step-card">
176        <div class="step-label">3. Act</div>
177        <div class="step-content">__STEP_ACT__</div>
178      </div>
179      <div class="feedback-card">
180        <label>Your feedback on ACT</label>
181        <textarea id="fb-act" placeholder="Were the right tools used? Correct order? Missing steps?"></textarea>
182      </div>
183    </div>
184
185    <div class="step step-verify">
186      <div class="step-card">
187        <div class="step-label">4. Verify</div>
188        <div class="step-content">__STEP_VERIFY__</div>
189      </div>
190      <div class="feedback-card">
191        <label>Your feedback on VERIFY</label>
192        <textarea id="fb-verify" placeholder="Was verification thorough? Missed checks?"></textarea>
193      </div>
194    </div>
195
196    <div class="step step-learn">
197      <div class="step-card">
198        <div class="step-label">5. Learn</div>
199        <div class="step-content">__STEP_LEARN__</div>
200      </div>
201      <div class="feedback-card">
202        <label>Your feedback on LEARN</label>
203        <textarea id="fb-learn" placeholder="Should something be updated? Skill, CLAUDE.md, memory?"></textarea>
204      </div>
205    </div>
206  </div>
207
208<script>
209function copyFeedback() {
210  const steps = ['match', 'think', 'act', 'verify', 'learn'];
211  const labels = { match: 'MATCH', think: 'THINK', act: 'ACT', verify: 'VERIFY', learn: 'LEARN' };
212
213  let output = '## Evaluation Feedback\n';
214  let hasAny = false;
215
216  for (const step of steps) {
217    const agentEl = document.querySelector(`.step-${step} .step-content`);
218    const fbEl = document.getElementById(`fb-${step}`);
219    const feedback = fbEl.value.trim();
220
221    if (feedback) {
222      hasAny = true;
223      output += `\n### ${labels[step]}\n`;
224      output += `**Agent:** ${agentEl.textContent.trim()}\n`;
225      output += `**Feedback:** ${feedback}\n`;
226    }
227  }
228
229  if (!hasAny) {
230    output += '\nNo feedback provided - all steps OK.\n';
231  }
232
233  navigator.clipboard.writeText(output).then(() => {
234    const btn = document.querySelector('.copy-btn');
235    btn.textContent = 'Copied!';
236    btn.classList.add('copied');
237    setTimeout(() => {
238      btn.textContent = 'Copy Feedback';
239      btn.classList.remove('copied');
240    }, 2000);
241  });
242}
243</script>
244</body>
245</html>

Step 3: Open in Browser

bash
1open /tmp/evaluate-{timestamp}.html

Tell the user: "Открыл Evaluate в браузере. Напиши комментарии к шагам которые не понравились, нажми Copy Feedback и вставь сюда."

Step 4: Parse and Fix Feedback

When the user pastes feedback back (format: ## Evaluation Feedback with ### STEP_NAME sections), parse each step:

Read each **Feedback:** line
Determine what needs fixing:
- MATCH feedback -> update skill frontmatter description or CLAUDE.md skill matching rules
- THINK feedback -> update skill instructions (expected result, verification criteria)
- ACT feedback -> update skill steps, tool usage, order of operations
- VERIFY feedback -> add/update verification checks in skill
- LEARN feedback -> update skill Known Issues, add prevention rules
Apply the fix to the appropriate file
Git commit: evaluate: {skill} - {what changed}
If the task needs re-execution, re-run through the Agent Execution Loop

Known Issues

Browser clipboard API requires HTTPS or localhost - if clipboard fails, user can manually select all text from the output area

evaluate — for Claude Code evaluate, community, for Claude Code, ide skills, __TASK_DESCRIPTION, STEP_*__, Interactive, step-by-step, review, execution

# 핵심 주제

Skill Overview

이 스킬을 사용하는 이유

최적의 용도

↓ 실행 가능한 사용 사례 for evaluate

! 보안 및 제한 사항

About The Source

Browser Sandbox Environment

⚡️ Ready to unleash?

FAQ 및 설치 단계

? 자주 묻는 질문

evaluate은 무엇인가요?

evaluate은 어떻게 설치하나요?

evaluate은 어디에 쓰이나요?

evaluate 와 호환되는 IDE는 무엇인가요?

evaluate에 제한 사항이 있나요?

↓ 이 스킬 설치 방법

! Source Notes

Upstream Repository Material

evaluate

Evaluate Skill

When to Use

Flow

Step 1: Reflect

Step 2: Generate HTML

Step 3: Open in Browser

Step 4: Parse and Fix Feedback

Known Issues

관련 스킬

Looking for an alternative to evaluate or another community skill for your workflow? Explore these related open-source skills.

openclaw-release-maintainer

widget-generator

flags

pr-review

evaluate — for Claude Code evaluate, community, for Claude Code, ide skills, __TASK_DESCRIPTION__, __STEP_*__, Interactive, step-by-step, review, execution

이 스킬 정보

기능

# 핵심 주제

Skill Overview

이 스킬을 사용하는 이유

최적의 용도

↓ 실행 가능한 사용 사례 for evaluate

! 보안 및 제한 사항

About The Source

Browser Sandbox Environment

⚡️ Ready to unleash?

FAQ 및 설치 단계

? 자주 묻는 질문

evaluate은 무엇인가요?

evaluate은 어떻게 설치하나요?

evaluate은 어디에 쓰이나요?

evaluate 와 호환되는 IDE는 무엇인가요?

evaluate에 제한 사항이 있나요?

↓ 이 스킬 설치 방법

! Source Notes

Upstream Repository Material

evaluate

Evaluate Skill

When to Use

Flow

Step 1: Reflect

Step 2: Generate HTML

Step 3: Open in Browser

Step 4: Parse and Fix Feedback

Known Issues

관련 스킬

Looking for an alternative to evaluate or another community skill for your workflow? Explore these related open-source skills.

openclaw-release-maintainer

widget-generator

flags

pr-review

evaluate — for Claude Code evaluate, community, for Claude Code, ide skills, __TASK_DESCRIPTION, STEP_*__, Interactive, step-by-step, review, execution