eval — for Claude Code thoughtbox, community, for Claude Code, ide skills, ai-agents, claude-code, claudecode, model-context-protocol, observability, reasoning

v1.0.0

이 스킬 정보

적합한 상황: Ideal for AI agents that need evaluation harness: $arguments. 현지화된 요약: Thoughtbox is an intention ledger for agents. It covers ai-agents, claude, claude-code workflows. This AI agent skill supports Claude Code, Cursor, and Windsurf workflows.

기능

Evaluation harness: $ARGUMENTS
Parse the first word of $ARGUMENTS to determine the command:
metrics — Show current session metrics
Collect and display metrics for the current session:
Count commits: git log --oneline --since="today" wc -l

# Core Topics

Kastalien-Research Kastalien-Research
[52]
[12]
Updated: 3/29/2026

Killer-Skills Review

Decision support comes first. Repository text comes second.

Reference-Only Page Review Score: 10/11

This page remains useful for teams, but Killer-Skills treats it as reference material instead of a primary organic landing page.

Original recommendation layer Concrete use-case guidance Explicit limitations and caution Quality floor passed for review
Review Score
10/11
Quality Score
57
Canonical Locale
en
Detected Body Locale
en

적합한 상황: Ideal for AI agents that need evaluation harness: $arguments. 현지화된 요약: Thoughtbox is an intention ledger for agents. It covers ai-agents, claude, claude-code workflows. This AI agent skill supports Claude Code, Cursor, and Windsurf workflows.

이 스킬을 사용하는 이유

추천 설명: eval helps agents evaluation harness: $arguments. Thoughtbox is an intention ledger for agents. This AI agent skill supports Claude Code, Cursor, and Windsurf workflows.

최적의 용도

적합한 상황: Ideal for AI agents that need evaluation harness: $arguments.

실행 가능한 사용 사례 for eval

사용 사례: Applying Evaluation harness: $ARGUMENTS
사용 사례: Applying Parse the first word of $ARGUMENTS to determine the command:
사용 사례: Applying metrics — Show current session metrics

! 보안 및 제한 사항

  • 제한 사항: Requires repository-specific context from the skill documentation
  • 제한 사항: Works best when the underlying tools and dependencies are already configured

Why this page is reference-only

  • - Current locale does not satisfy the locale-governance contract.

Source Boundary

The section below is imported from the upstream repository and should be treated as secondary evidence. Use the Killer-Skills review above as the primary layer for fit, risk, and installation decisions.

After The Review

Decide The Next Action Before You Keep Reading Repository Material

Killer-Skills should not stop at opening repository instructions. It should help you decide whether to install this skill, when to cross-check against trusted collections, and when to move into workflow rollout.

Labs Demo

Browser Sandbox Environment

⚡️ Ready to unleash?

Experience this Agent in a zero-setup browser environment powered by WebContainers. No installation required.

Boot Container Sandbox

FAQ & Installation Steps

These questions and steps mirror the structured data on this page for better search understanding.

? Frequently Asked Questions

What is eval?

적합한 상황: Ideal for AI agents that need evaluation harness: $arguments. 현지화된 요약: Thoughtbox is an intention ledger for agents. It covers ai-agents, claude, claude-code workflows. This AI agent skill supports Claude Code, Cursor, and Windsurf workflows.

How do I install eval?

Run the command: npx killer-skills add Kastalien-Research/thoughtbox/eval. It works with Cursor, Windsurf, VS Code, Claude Code, and 19+ other IDEs.

What are the use cases for eval?

Key use cases include: 사용 사례: Applying Evaluation harness: $ARGUMENTS, 사용 사례: Applying Parse the first word of $ARGUMENTS to determine the command:, 사용 사례: Applying metrics — Show current session metrics.

Which IDEs are compatible with eval?

This skill is compatible with Cursor, Windsurf, VS Code, Trae, Claude Code, OpenClaw, Aider, Codex, OpenCode, Goose, Cline, Roo Code, Kiro, Augment Code, Continue, GitHub Copilot, Sourcegraph Cody, and Amazon Q Developer. Use the Killer-Skills CLI for universal one-command installation.

Are there any limitations for eval?

제한 사항: Requires repository-specific context from the skill documentation. 제한 사항: Works best when the underlying tools and dependencies are already configured.

How To Install

  1. 1. Open your terminal

    Open the terminal or command line in your project directory.

  2. 2. Run the install command

    Run: npx killer-skills add Kastalien-Research/thoughtbox/eval. The CLI will automatically detect your IDE or AI agent and configure the skill.

  3. 3. Start using the skill

    The skill is now active. Your AI agent can use eval immediately in the current project.

! Reference-Only Mode

This page remains useful for installation and reference, but Killer-Skills no longer treats it as a primary indexable landing page. Read the review above before relying on the upstream repository instructions.

Upstream Repository Material

The section below is imported from the upstream repository and should be treated as secondary evidence. Use the Killer-Skills review above as the primary layer for fit, risk, and installation decisions.

Upstream Source

eval

Thoughtbox is an intention ledger for agents. It covers ai-agents, claude, claude-code workflows. This AI agent skill supports Claude Code, Cursor, and

SKILL.md
Readonly
Upstream Repository Material
The section below is imported from the upstream repository and should be treated as secondary evidence. Use the Killer-Skills review above as the primary layer for fit, risk, and installation decisions.
Supporting Evidence

Evaluation harness: $ARGUMENTS

Commands

Parse the first word of $ARGUMENTS to determine the command:

metrics — Show current session metrics

Collect and display metrics for the current session:

  1. Count commits: git log --oneline --since="today" | wc -l
  2. Count test results: check for recent vitest output or .eval/metrics/ entries
  3. Token usage: check LangSmith state file if available
  4. Pattern usage: check .dgm/fitness.json for patterns used this session
  5. Session duration: check session start time from logs

Display as:

## Current Session Metrics

| Metric | Value | Baseline | Delta |
|--------|-------|----------|-------|
| Commits | 5 | 3.2 avg | +56% |
| Tests passing | 42/42 | 40/42 | +2 |

| Files changed | 12 | 8.5 avg | +41% |
| Patterns used | 7 | 5.3 avg | +32% |

baseline — Set or update baselines

  1. Read the last N session metric snapshots from .eval/metrics/
  2. Calculate averages for each metric
  3. Write to .eval/baselines.json
  4. Report what changed

compare — Compare sessions

Usage: compare --last N or compare --session <id>

  1. Load metric snapshots from .eval/metrics/
  2. Compare against baselines
  3. Highlight regressions (metric dropped >10% below baseline)
  4. Highlight improvements (metric improved >10% above baseline)

report — Generate weekly evaluation report

  1. Load all metrics from the past 7 days
  2. Calculate trends (improving, stable, declining)
  3. Identify top improvements and top regressions
  4. Generate recommendations based on trends

capture — Capture current session metrics

Write a metric snapshot to .eval/metrics/session-{timestamp}.json:

json
1{ 2 "session_id": "<session id>", 3 "timestamp": "<ISO 8601>", 4 "branch": "<git branch>", 5 "metrics": { 6 "commits": 0, 7 "tests_total": 0, 8 "tests_passing": 0, 9 "files_changed": 0, 10 "patterns_referenced": 0, 11 "assumptions_verified": 0, 12 "escalations": 0, 13 "spiral_detections": 0 14 }, 15 "qualitative": { 16 "session_focus": "<what the session was about>", 17 "memory_usefulness": 0, 18 "knowledge_gaps_found": [] 19 } 20}

Notes

  • If .eval/baselines.json doesn't exist, skip baseline comparisons and suggest running baseline
  • Metric collection should be best-effort — missing data is noted, not an error
  • Regressions trigger a structured escalation suggestion (not automatic action)

관련 스킬

Looking for an alternative to eval or another community skill for your workflow? Explore these related open-source skills.

모두 보기

openclaw-release-maintainer

Logo of openclaw
openclaw

Your own personal AI assistant. Any OS. Any Platform. The lobster way. 🦞

333.8k
0
인공지능

widget-generator

Logo of f
f

prompts.chat 피드 시스템을 위한 사용자 지정 가능한 위젯 플러그인을 생성합니다

149.6k
0
인공지능

flags

Logo of vercel
vercel

리액트 프레임워크

138.4k
0
브라우저

pr-review

Logo of pytorch
pytorch

파이썬에서 텐서와 동적 신경망 구현 및 강력한 GPU 가속 지원

98.6k
0
개발자