Evaluate Skill
Interactive step-by-step review of agent's execution. Creates an HTML page where the user sees what the agent did at each step of the Agent Execution Loop and can write targeted feedback.
When to Use
- User calls
/evaluate
- User is not satisfied with a result and wants to give precise feedback
- After a complex task where the user wants to review the execution path
Do NOT use after every task - only when evaluation is needed.
Flow
- Reflect on your actions in the current session
- For each step (MATCH, THINK, ACT, VERIFY, LEARN) write what you did
- Generate HTML with embedded data
- Open in browser
- User writes comments next to each step, clicks "Copy Feedback"
- User pastes feedback in chat
- Agent parses feedback per step and applies fixes (update skill, retry, etc.)
Step 1: Reflect
Analyze the current session and fill in each step honestly:
- MATCH: Which skill was chosen? Why? Or why none matched?
- THINK: What was the expected result? What verification criteria were defined?
- ACT: Which tools were called? In what order? What was parallel vs sequential?
- VERIFY: What was checked? Did it pass or fail? Any skill audit issues?
- LEARN: Was anything updated? If not, why?
Be specific - include file paths, tool names, actual values. The user needs to see exactly what happened.
Step 2: Generate HTML
Write a self-contained HTML file to /tmp/evaluate-{timestamp}.html with the data embedded inline. Use the template below.
Replace __TASK_DESCRIPTION__ and each __STEP_*__ placeholder with actual content from your reflection. Use HTML-safe text (escape <, >, &).
html
1<!DOCTYPE html>
2<html lang="en">
3<head>
4<meta charset="UTF-8">
5<meta name="viewport" content="width=device-width, initial-scale=1.0">
6<title>Evaluate - Step Review</title>
7<style>
8 * { margin: 0; padding: 0; box-sizing: border-box; }
9
10 body {
11 font-family: -apple-system, BlinkMacSystemFont, 'SF Pro Text', 'Helvetica Neue', sans-serif;
12 background: #1a1a2e;
13 color: #e0e0e0;
14 line-height: 1.6;
15 min-height: 100vh;
16 }
17
18 .header {
19 padding: 1.5rem 2rem;
20 border-bottom: 1px solid #2a2a4a;
21 display: flex;
22 align-items: center;
23 justify-content: space-between;
24 }
25
26 .header h1 {
27 font-size: 1.4rem;
28 font-weight: 600;
29 color: #fff;
30 }
31
32 .header .task-label {
33 font-size: 0.85rem;
34 color: #888;
35 margin-top: 0.25rem;
36 }
37
38 .copy-btn {
39 background: #6c5ce7;
40 color: #fff;
41 border: none;
42 padding: 0.6rem 1.5rem;
43 border-radius: 6px;
44 font-size: 0.9rem;
45 font-weight: 500;
46 cursor: pointer;
47 transition: all 0.2s;
48 }
49
50 .copy-btn:hover { background: #5a4bd1; }
51 .copy-btn.copied { background: #27ae60; }
52
53 .container {
54 padding: 1.5rem 2rem;
55 max-width: 1400px;
56 margin: 0 auto;
57 }
58
59 .step {
60 display: grid;
61 grid-template-columns: 1fr 1fr;
62 gap: 1rem;
63 margin-bottom: 1rem;
64 }
65
66 .step-card {
67 background: #16213e;
68 border: 1px solid #2a2a4a;
69 border-radius: 8px;
70 padding: 1.25rem;
71 }
72
73 .step-label {
74 display: inline-block;
75 font-size: 0.7rem;
76 font-weight: 700;
77 text-transform: uppercase;
78 letter-spacing: 0.08em;
79 padding: 0.2rem 0.6rem;
80 border-radius: 4px;
81 margin-bottom: 0.75rem;
82 }
83
84 .step-match .step-label { background: #2d3a8c; color: #8b9cf7; }
85 .step-think .step-label { background: #1a5c3a; color: #6bcf8e; }
86 .step-act .step-label { background: #8c5a2d; color: #f7c98b; }
87 .step-verify .step-label { background: #5c1a5c; color: #cf6bcf; }
88 .step-learn .step-label { background: #1a4a5c; color: #6bb8cf; }
89
90 .step-content {
91 font-size: 0.9rem;
92 color: #c0c0c0;
93 white-space: pre-wrap;
94 }
95
96 .feedback-card {
97 background: #1e1e3a;
98 border: 1px solid #3a3a5a;
99 border-radius: 8px;
100 padding: 1.25rem;
101 display: flex;
102 flex-direction: column;
103 }
104
105 .feedback-card label {
106 font-size: 0.75rem;
107 color: #888;
108 text-transform: uppercase;
109 letter-spacing: 0.05em;
110 margin-bottom: 0.5rem;
111 }
112
113 .feedback-card textarea {
114 flex: 1;
115 min-height: 80px;
116 background: #12122a;
117 border: 1px solid #2a2a4a;
118 border-radius: 6px;
119 color: #e0e0e0;
120 font-family: inherit;
121 font-size: 0.9rem;
122 padding: 0.75rem;
123 resize: vertical;
124 line-height: 1.5;
125 }
126
127 .feedback-card textarea:focus {
128 outline: none;
129 border-color: #6c5ce7;
130 }
131
132 .feedback-card textarea::placeholder {
133 color: #555;
134 }
135
136 @media (max-width: 900px) {
137 .step { grid-template-columns: 1fr; }
138 .container { padding: 1rem; }
139 }
140</style>
141</head>
142<body>
143 <div class="header">
144 <div>
145 <h1>Evaluate - Step Review</h1>
146 <div class="task-label">__TASK_DESCRIPTION__</div>
147 </div>
148 <button class="copy-btn" onclick="copyFeedback()">Copy Feedback</button>
149 </div>
150
151 <div class="container">
152 <div class="step step-match">
153 <div class="step-card">
154 <div class="step-label">1. Match</div>
155 <div class="step-content">__STEP_MATCH__</div>
156 </div>
157 <div class="feedback-card">
158 <label>Your feedback on MATCH</label>
159 <textarea id="fb-match" placeholder="Was the right skill chosen? Should a different one be used?"></textarea>
160 </div>
161 </div>
162
163 <div class="step step-think">
164 <div class="step-card">
165 <div class="step-label">2. Think</div>
166 <div class="step-content">__STEP_THINK__</div>
167 </div>
168 <div class="feedback-card">
169 <label>Your feedback on THINK</label>
170 <textarea id="fb-think" placeholder="Was the expected result correct? Were verification criteria good?"></textarea>
171 </div>
172 </div>
173
174 <div class="step step-act">
175 <div class="step-card">
176 <div class="step-label">3. Act</div>
177 <div class="step-content">__STEP_ACT__</div>
178 </div>
179 <div class="feedback-card">
180 <label>Your feedback on ACT</label>
181 <textarea id="fb-act" placeholder="Were the right tools used? Correct order? Missing steps?"></textarea>
182 </div>
183 </div>
184
185 <div class="step step-verify">
186 <div class="step-card">
187 <div class="step-label">4. Verify</div>
188 <div class="step-content">__STEP_VERIFY__</div>
189 </div>
190 <div class="feedback-card">
191 <label>Your feedback on VERIFY</label>
192 <textarea id="fb-verify" placeholder="Was verification thorough? Missed checks?"></textarea>
193 </div>
194 </div>
195
196 <div class="step step-learn">
197 <div class="step-card">
198 <div class="step-label">5. Learn</div>
199 <div class="step-content">__STEP_LEARN__</div>
200 </div>
201 <div class="feedback-card">
202 <label>Your feedback on LEARN</label>
203 <textarea id="fb-learn" placeholder="Should something be updated? Skill, CLAUDE.md, memory?"></textarea>
204 </div>
205 </div>
206 </div>
207
208<script>
209function copyFeedback() {
210 const steps = ['match', 'think', 'act', 'verify', 'learn'];
211 const labels = { match: 'MATCH', think: 'THINK', act: 'ACT', verify: 'VERIFY', learn: 'LEARN' };
212
213 let output = '## Evaluation Feedback\n';
214 let hasAny = false;
215
216 for (const step of steps) {
217 const agentEl = document.querySelector(`.step-${step} .step-content`);
218 const fbEl = document.getElementById(`fb-${step}`);
219 const feedback = fbEl.value.trim();
220
221 if (feedback) {
222 hasAny = true;
223 output += `\n### ${labels[step]}\n`;
224 output += `**Agent:** ${agentEl.textContent.trim()}\n`;
225 output += `**Feedback:** ${feedback}\n`;
226 }
227 }
228
229 if (!hasAny) {
230 output += '\nNo feedback provided - all steps OK.\n';
231 }
232
233 navigator.clipboard.writeText(output).then(() => {
234 const btn = document.querySelector('.copy-btn');
235 btn.textContent = 'Copied!';
236 btn.classList.add('copied');
237 setTimeout(() => {
238 btn.textContent = 'Copy Feedback';
239 btn.classList.remove('copied');
240 }, 2000);
241 });
242}
243</script>
244</body>
245</html>
Step 3: Open in Browser
bash
1open /tmp/evaluate-{timestamp}.html
Tell the user: "Открыл Evaluate в браузере. Напиши комментарии к шагам которые не понравились, нажми Copy Feedback и вставь сюда."
Step 4: Parse and Fix Feedback
When the user pastes feedback back (format: ## Evaluation Feedback with ### STEP_NAME sections), parse each step:
- Read each
**Feedback:** line
- Determine what needs fixing:
- MATCH feedback -> update skill frontmatter description or CLAUDE.md skill matching rules
- THINK feedback -> update skill instructions (expected result, verification criteria)
- ACT feedback -> update skill steps, tool usage, order of operations
- VERIFY feedback -> add/update verification checks in skill
- LEARN feedback -> update skill Known Issues, add prevention rules
- Apply the fix to the appropriate file
- Git commit:
evaluate: {skill} - {what changed}
- If the task needs re-execution, re-run through the Agent Execution Loop
Known Issues
- Browser clipboard API requires HTTPS or localhost - if clipboard fails, user can manually select all text from the output area