Thoughtbox 是一个 AI 意图账本，用于评估和跟踪 AI 决策

运行命令：npx killer-skills add Kastalien-Research/thoughtbox/eval。支持 Cursor、Windsurf、VS Code、Claude Code 等 19+ IDE/Agent。

eval 支持哪些 IDE 或 Agent？

该技能兼容 Cursor, Windsurf, VS Code, Trae, Claude Code, OpenClaw, Aider, Codex, OpenCode, Goose, Cline, Roo Code, Kiro, Augment Code, Continue, GitHub Copilot, Sourcegraph Cody, and Amazon Q Developer。可使用 Killer-Skills CLI 一条命令通用安装。

AI智能体技能：评估 Thoughtbox 决策 | Claude Code

Name: eval
Availability: InStock
Author: Kastalien-Research

eval

安装 eval，这是一款面向AI agent workflows and automation的 AI Agent Skill。支持 Claude Code、Cursor、Windsurf，一键安装。

SKILL.md

Readonly

Imported Repository Instructions

The section below is supporting source material from the upstream repository. Use the Killer-Skills review above as the primary decision layer.

Supporting Evidence

Evaluation harness: $ARGUMENTS

Commands

Parse the first word of $ARGUMENTS to determine the command:

`metrics` — Show current session metrics

Collect and display metrics for the current session:

Count commits: git log --oneline --since="today" | wc -l
Count test results: check for recent vitest output or .eval/metrics/ entries
Count beads changes: bd list --status=closed recently
Token usage: check LangSmith state file if available
Pattern usage: check .dgm/fitness.json for patterns used this session
Session duration: check session start time from logs

Display as:

## Current Session Metrics

| Metric | Value | Baseline | Delta |
|--------|-------|----------|-------|
| Commits | 5 | 3.2 avg | +56% |
| Tests passing | 42/42 | 40/42 | +2 |
| Beads closed | 3 | 2.1 avg | +43% |
| Files changed | 12 | 8.5 avg | +41% |
| Patterns used | 7 | 5.3 avg | +32% |

`baseline` — Set or update baselines

Read the last N session metric snapshots from .eval/metrics/
Calculate averages for each metric
Write to .eval/baselines.json
Report what changed

`compare` — Compare sessions

Usage: compare --last N or compare --session <id>

Load metric snapshots from .eval/metrics/
Compare against baselines
Highlight regressions (metric dropped >10% below baseline)
Highlight improvements (metric improved >10% above baseline)

`report` — Generate weekly evaluation report

Load all metrics from the past 7 days
Calculate trends (improving, stable, declining)
Identify top improvements and top regressions
Generate recommendations based on trends

`capture` — Capture current session metrics

Write a metric snapshot to .eval/metrics/session-{timestamp}.json:

json
1{
2  "session_id": "<session id>",
3  "timestamp": "<ISO 8601>",
4  "branch": "<git branch>",
5  "metrics": {
6    "commits": 0,
7    "tests_total": 0,
8    "tests_passing": 0,
9    "beads_closed": 0,
10    "beads_created": 0,
11    "files_changed": 0,
12    "patterns_referenced": 0,
13    "assumptions_verified": 0,
14    "escalations": 0,
15    "spiral_detections": 0
16  },
17  "qualitative": {
18    "session_focus": "<what the session was about>",
19    "memory_usefulness": 0,
20    "knowledge_gaps_found": []
21  }
22}

Notes

If .eval/baselines.json doesn't exist, skip baseline comparisons and suggest running baseline
Metric collection should be best-effort — missing data is noted, not an error
Regressions trigger a structured escalation suggestion (not automatic action)

eval — AI智能体技能 thoughtbox, community, AI智能体技能, ide skills, Claude Code, 意图账本, 性能指标, eval AI agent skill, eval for Claude Code, Cursor, Windsurf

# 核心主题

Killer-Skills Review

核心价值

适用 Agent 类型

↓ 赋予的主要能力 · eval

! 使用限制与门槛

Why this page is reference-only

Source Boundary

Browser Sandbox Environment

⚡️ Ready to unleash?

常见问题与安装步骤

? FAQ

eval 是什么？

如何安装 eval？

eval 支持哪些 IDE 或 Agent？

↓ 安装步骤

! 参考页模式

Imported Repository Instructions

eval

Commands

`metrics` — Show current session metrics

`baseline` — Set or update baselines

`compare` — Compare sessions

`report` — Generate weekly evaluation report

`capture` — Capture current session metrics

Notes

相关技能

寻找 eval 的替代方案 (Alternative) 或可搭配使用的同类 community Skill？探索以下相关开源技能。

openclaw-release-maintainer

widget-generator

flags

pr-review

eval — AI智能体技能 thoughtbox, community, AI智能体技能, ide skills, Claude Code, 意图账本, 性能指标, eval AI agent skill, eval for Claude Code, Cursor, Windsurf

关于此技能

功能特性

# 核心主题

Killer-Skills Review

核心价值

适用 Agent 类型

↓ 赋予的主要能力 · eval

! 使用限制与门槛

Why this page is reference-only

Source Boundary

Browser Sandbox Environment

⚡️ Ready to unleash?

常见问题与安装步骤

? FAQ

eval 是什么？

如何安装 eval？

eval 支持哪些 IDE 或 Agent？

↓ 安装步骤

! 参考页模式

Imported Repository Instructions

eval

Commands

metrics — Show current session metrics

baseline — Set or update baselines

compare — Compare sessions

report — Generate weekly evaluation report

capture — Capture current session metrics

Notes

相关技能

寻找 eval 的替代方案 (Alternative) 或可搭配使用的同类 community Skill？探索以下相关开源技能。

openclaw-release-maintainer

widget-generator

flags

pr-review

`metrics` — Show current session metrics

`baseline` — Set or update baselines

`compare` — Compare sessions

`report` — Generate weekly evaluation report

`capture` — Capture current session metrics