eval — AI智能体技能 thoughtbox, community, AI智能体技能, ide skills, Claude Code, 意图账本, 性能指标, eval AI agent skill, eval for Claude Code, Cursor, Windsurf

v1.0.0

关于此技能

Thoughtbox 是一个 AI 意图账本,用于评估和跟踪 AI 决策

功能特性

评估 AI 决策
跟踪 Thoughtbox 性能指标
优化 Claude Code 开发流程
支持多种指标收集
提供详细的评估报告

# 核心主题

Kastalien-Research Kastalien-Research
[52]
[12]
更新于: 3/29/2026

Killer-Skills Review

Decision support comes first. Repository text comes second.

Reference-Only Page Review Score: 2/11

This page remains useful for operators, but Killer-Skills treats it as reference material instead of a primary organic landing page.

Review Score
2/11
Quality Score
47
Canonical Locale
en
Detected Body Locale
en

Thoughtbox 是一个 AI 意图账本,用于评估和跟踪 AI 决策

核心价值

Thoughtbox 是一个 AI 意图账本,用于评估和跟踪 AI 决策

适用 Agent 类型

Suitable for operator workflows that need explicit guardrails before installation and execution.

赋予的主要能力 · eval

! 使用限制与门槛

Why this page is reference-only

  • - Current locale does not satisfy the locale-governance contract.
  • - The page lacks a strong recommendation layer.
  • - The page lacks concrete use-case guidance.
  • - The page lacks explicit limitations or caution signals.
  • - The underlying skill quality score is below the review floor.

Source Boundary

The section below is supporting source material from the upstream repository. Use the Killer-Skills review above as the primary decision layer.

实验室 Demo

Browser Sandbox Environment

⚡️ Ready to unleash?

Experience this Agent in a zero-setup browser environment powered by WebContainers. No installation required.

Boot Container Sandbox

常见问题与安装步骤

以下问题与步骤与页面结构化数据保持一致,便于搜索引擎理解页面内容。

? FAQ

eval 是什么?

Thoughtbox 是一个 AI 意图账本,用于评估和跟踪 AI 决策

如何安装 eval?

运行命令:npx killer-skills add Kastalien-Research/thoughtbox/eval。支持 Cursor、Windsurf、VS Code、Claude Code 等 19+ IDE/Agent。

eval 支持哪些 IDE 或 Agent?

该技能兼容 Cursor, Windsurf, VS Code, Trae, Claude Code, OpenClaw, Aider, Codex, OpenCode, Goose, Cline, Roo Code, Kiro, Augment Code, Continue, GitHub Copilot, Sourcegraph Cody, and Amazon Q Developer。可使用 Killer-Skills CLI 一条命令通用安装。

安装步骤

  1. 1. 打开终端

    在你的项目目录中打开终端或命令行。

  2. 2. 执行安装命令

    运行:npx killer-skills add Kastalien-Research/thoughtbox/eval。CLI 会自动识别 IDE 或 AI Agent 并完成配置。

  3. 3. 开始使用技能

    eval 已启用,可立即在当前项目中调用。

! 参考页模式

此页面仍可作为安装与查阅参考,但 Killer-Skills 不再把它视为主要可索引落地页。请优先阅读上方评审结论,再决定是否继续查看上游仓库说明。

Imported Repository Instructions

The section below is supporting source material from the upstream repository. Use the Killer-Skills review above as the primary decision layer.

Supporting Evidence

eval

安装 eval,这是一款面向AI agent workflows and automation的 AI Agent Skill。支持 Claude Code、Cursor、Windsurf,一键安装。

SKILL.md
Readonly
Imported Repository Instructions
The section below is supporting source material from the upstream repository. Use the Killer-Skills review above as the primary decision layer.
Supporting Evidence

Evaluation harness: $ARGUMENTS

Commands

Parse the first word of $ARGUMENTS to determine the command:

metrics — Show current session metrics

Collect and display metrics for the current session:

  1. Count commits: git log --oneline --since="today" | wc -l
  2. Count test results: check for recent vitest output or .eval/metrics/ entries
  3. Count beads changes: bd list --status=closed recently
  4. Token usage: check LangSmith state file if available
  5. Pattern usage: check .dgm/fitness.json for patterns used this session
  6. Session duration: check session start time from logs

Display as:

## Current Session Metrics

| Metric | Value | Baseline | Delta |
|--------|-------|----------|-------|
| Commits | 5 | 3.2 avg | +56% |
| Tests passing | 42/42 | 40/42 | +2 |
| Beads closed | 3 | 2.1 avg | +43% |
| Files changed | 12 | 8.5 avg | +41% |
| Patterns used | 7 | 5.3 avg | +32% |

baseline — Set or update baselines

  1. Read the last N session metric snapshots from .eval/metrics/
  2. Calculate averages for each metric
  3. Write to .eval/baselines.json
  4. Report what changed

compare — Compare sessions

Usage: compare --last N or compare --session <id>

  1. Load metric snapshots from .eval/metrics/
  2. Compare against baselines
  3. Highlight regressions (metric dropped >10% below baseline)
  4. Highlight improvements (metric improved >10% above baseline)

report — Generate weekly evaluation report

  1. Load all metrics from the past 7 days
  2. Calculate trends (improving, stable, declining)
  3. Identify top improvements and top regressions
  4. Generate recommendations based on trends

capture — Capture current session metrics

Write a metric snapshot to .eval/metrics/session-{timestamp}.json:

json
1{ 2 "session_id": "<session id>", 3 "timestamp": "<ISO 8601>", 4 "branch": "<git branch>", 5 "metrics": { 6 "commits": 0, 7 "tests_total": 0, 8 "tests_passing": 0, 9 "beads_closed": 0, 10 "beads_created": 0, 11 "files_changed": 0, 12 "patterns_referenced": 0, 13 "assumptions_verified": 0, 14 "escalations": 0, 15 "spiral_detections": 0 16 }, 17 "qualitative": { 18 "session_focus": "<what the session was about>", 19 "memory_usefulness": 0, 20 "knowledge_gaps_found": [] 21 } 22}

Notes

  • If .eval/baselines.json doesn't exist, skip baseline comparisons and suggest running baseline
  • Metric collection should be best-effort — missing data is noted, not an error
  • Regressions trigger a structured escalation suggestion (not automatic action)

相关技能

寻找 eval 的替代方案 (Alternative) 或可搭配使用的同类 community Skill?探索以下相关开源技能。

查看全部

openclaw-release-maintainer

Logo of openclaw
openclaw

Your own personal AI assistant. Any OS. Any Platform. The lobster way. 🦞

333.8k
0
AI

widget-generator

Logo of f
f

为prompts.chat的信息反馈系统生成可定制的插件小部件

149.6k
0
AI

flags

Logo of vercel
vercel

React 框架

138.4k
0
浏览器

pr-review

Logo of pytorch
pytorch

Python中具有强大GPU加速的张量和动态神经网络

98.6k
0
开发者工具