gen-eval — for Claude Code gen-eval, agentic-coding-tools, community, for Claude Code, ide skills, $ARGUMENTS, descriptor <path>, mode <mode>, template-only, cli-augmented

v1.0.0

Über diesen Skill

Geeigneter Einsatz: Ideal for AI agents that need $arguments - optional flags:. Lokalisierte Zusammenfassung: Collection of tools to simplify agentic coding # Gen-Eval Run the generator-evaluator testing framework against live or local services. This AI agent skill supports Claude Code, Cursor, and Windsurf workflows.

Funktionen

$ARGUMENTS - Optional flags:
--descriptor <path — Path to interface descriptor YAML (auto-detected if omitted)
--mode <mode (default: template-only) — template-only, cli-augmented, or sdk-only
--cli-command <cmd (default: claude) — CLI tool for cli-augmented mode
--time-budget <minutes (default: 60) — Time budget for CLI mode

# Core Topics

jankneumann jankneumann
[4]
[1]
Updated: 4/22/2026

Killer-Skills Review

Decision support comes first. Repository text comes second.

Reference-Only Page Review Score: 10/11

This page remains useful for teams, but Killer-Skills treats it as reference material instead of a primary organic landing page.

Original recommendation layer Concrete use-case guidance Explicit limitations and caution Quality floor passed for review
Review Score
10/11
Quality Score
69
Canonical Locale
en
Detected Body Locale
en

Geeigneter Einsatz: Ideal for AI agents that need $arguments - optional flags:. Lokalisierte Zusammenfassung: Collection of tools to simplify agentic coding # Gen-Eval Run the generator-evaluator testing framework against live or local services. This AI agent skill supports Claude Code, Cursor, and Windsurf workflows.

Warum diese Fähigkeit verwenden

Empfehlung: gen-eval helps agents $arguments - optional flags:. Collection of tools to simplify agentic coding # Gen-Eval Run the generator-evaluator testing framework against live or local services. This AI agent

Am besten geeignet für

Geeigneter Einsatz: Ideal for AI agents that need $arguments - optional flags:.

Handlungsfähige Anwendungsfälle for gen-eval

Anwendungsfall: Applying $ARGUMENTS - Optional flags:
Anwendungsfall: Applying --descriptor <path — Path to interface descriptor YAML (auto-detected if omitted)
Anwendungsfall: Applying --mode <mode (default: template-only) — template-only, cli-augmented, or sdk-only

! Sicherheit & Einschränkungen

  • Einschraenkung: --mode <mode (default: template-only) — template-only, cli-augmented, or sdk-only
  • Einschraenkung: MODE="${MODE:-template-only}"
  • Einschraenkung: Requires repository-specific context from the skill documentation

Why this page is reference-only

  • - Current locale does not satisfy the locale-governance contract.

Source Boundary

The section below is imported from the upstream repository and should be treated as secondary evidence. Use the Killer-Skills review above as the primary layer for fit, risk, and installation decisions.

After The Review

Decide The Next Action Before You Keep Reading Repository Material

Killer-Skills should not stop at opening repository instructions. It should help you decide whether to install this skill, when to cross-check against trusted collections, and when to move into workflow rollout.

Labs Demo

Browser Sandbox Environment

⚡️ Ready to unleash?

Experience this Agent in a zero-setup browser environment powered by WebContainers. No installation required.

Boot Container Sandbox

FAQ & Installation Steps

These questions and steps mirror the structured data on this page for better search understanding.

? Frequently Asked Questions

What is gen-eval?

Geeigneter Einsatz: Ideal for AI agents that need $arguments - optional flags:. Lokalisierte Zusammenfassung: Collection of tools to simplify agentic coding # Gen-Eval Run the generator-evaluator testing framework against live or local services. This AI agent skill supports Claude Code, Cursor, and Windsurf workflows.

How do I install gen-eval?

Run the command: npx killer-skills add jankneumann/agentic-coding-tools/gen-eval. It works with Cursor, Windsurf, VS Code, Claude Code, and 19+ other IDEs.

What are the use cases for gen-eval?

Key use cases include: Anwendungsfall: Applying $ARGUMENTS - Optional flags:, Anwendungsfall: Applying --descriptor <path — Path to interface descriptor YAML (auto-detected if omitted), Anwendungsfall: Applying --mode <mode (default: template-only) — template-only, cli-augmented, or sdk-only.

Which IDEs are compatible with gen-eval?

This skill is compatible with Cursor, Windsurf, VS Code, Trae, Claude Code, OpenClaw, Aider, Codex, OpenCode, Goose, Cline, Roo Code, Kiro, Augment Code, Continue, GitHub Copilot, Sourcegraph Cody, and Amazon Q Developer. Use the Killer-Skills CLI for universal one-command installation.

Are there any limitations for gen-eval?

Einschraenkung: --mode <mode (default: template-only) — template-only, cli-augmented, or sdk-only. Einschraenkung: MODE="${MODE:-template-only}". Einschraenkung: Requires repository-specific context from the skill documentation.

How To Install

  1. 1. Open your terminal

    Open the terminal or command line in your project directory.

  2. 2. Run the install command

    Run: npx killer-skills add jankneumann/agentic-coding-tools/gen-eval. The CLI will automatically detect your IDE or AI agent and configure the skill.

  3. 3. Start using the skill

    The skill is now active. Your AI agent can use gen-eval immediately in the current project.

! Reference-Only Mode

This page remains useful for installation and reference, but Killer-Skills no longer treats it as a primary indexable landing page. Read the review above before relying on the upstream repository instructions.

Upstream Repository Material

The section below is imported from the upstream repository and should be treated as secondary evidence. Use the Killer-Skills review above as the primary layer for fit, risk, and installation decisions.

Upstream Source

gen-eval

Collection of tools to simplify agentic coding # Gen-Eval Run the generator-evaluator testing framework against live or local services. This AI agent skill

SKILL.md
Readonly
Upstream Repository Material
The section below is imported from the upstream repository and should be treated as secondary evidence. Use the Killer-Skills review above as the primary layer for fit, risk, and installation decisions.
Supporting Evidence

Gen-Eval

Run the generator-evaluator testing framework against live or local services. Generates test scenarios from interface descriptors, executes them, and evaluates results against expected behavior.

Arguments

$ARGUMENTS - Optional flags:

  • --descriptor <path> — Path to interface descriptor YAML (auto-detected if omitted)
  • --mode <mode> (default: template-only) — template-only, cli-augmented, or sdk-only
  • --cli-command <cmd> (default: claude) — CLI tool for cli-augmented mode
  • --time-budget <minutes> (default: 60) — Time budget for CLI mode
  • --sdk-budget <usd> — USD budget cap for SDK mode
  • --max-iterations <n> (default: 1) — Feedback loop iterations
  • --parallel <n> (default: 5) — Concurrent scenario execution
  • --changed-features-ref <git-ref> — Git ref for change detection
  • --categories <cat1> [cat2 ...] — Filter to specific categories
  • --report-format <format> (default: both) — markdown, json, or both
  • --output-dir <path> (default: .) — Report output directory
  • --no-services — Skip service startup/teardown
  • --verbose — Enable verbose output

Steps

1. Auto-Detect Descriptor

If --descriptor is not provided, find the nearest descriptor YAML:

bash
1DESCRIPTOR=$(find . -path "*/evaluation/gen_eval/descriptors/*.yaml" -type f 2>/dev/null | head -1) 2 3if [ -z "$DESCRIPTOR" ]; then 4 echo "ERROR: No gen-eval descriptor found. Provide --descriptor <path> or create one with /gen-eval-scenario." 5 exit 1 6fi 7echo "Auto-detected descriptor: $DESCRIPTOR"

2. Detect Project Root and Activate Venv

bash
1# Find the project root (directory containing the descriptor's evaluation/ parent) 2PROJECT_ROOT=$(dirname "$(dirname "$(dirname "$(dirname "$DESCRIPTOR")")")") 3echo "Project root: $PROJECT_ROOT" 4 5# Activate the project venv 6if [ -f "$PROJECT_ROOT/.venv/bin/python" ]; then 7 PYTHON="$PROJECT_ROOT/.venv/bin/python" 8else 9 PYTHON="python3" 10fi

3. Parse Mode and Build Command

Parse $ARGUMENTS for mode and flags. Build the CLI command:

bash
1# Defaults 2MODE="${MODE:-template-only}" 3PARALLEL="${PARALLEL:-5}" 4MAX_ITER="${MAX_ITER:-1}" 5REPORT_FORMAT="${REPORT_FORMAT:-both}" 6OUTPUT_DIR="${OUTPUT_DIR:-.}" 7 8CMD="$PYTHON -m evaluation.gen_eval --descriptor $DESCRIPTOR --mode $MODE --parallel $PARALLEL --max-iterations $MAX_ITER --report-format $REPORT_FORMAT --output-dir $OUTPUT_DIR" 9 10# Append optional flags from arguments 11if [ -n "$TIME_BUDGET" ]; then CMD="$CMD --time-budget $TIME_BUDGET"; fi 12if [ -n "$SDK_BUDGET" ]; then CMD="$CMD --sdk-budget $SDK_BUDGET"; fi 13if [ -n "$CLI_COMMAND" ]; then CMD="$CMD --cli-command $CLI_COMMAND"; fi 14if [ -n "$CHANGED_REF" ]; then CMD="$CMD --changed-features-ref $CHANGED_REF"; fi 15if [ -n "$CATEGORIES" ]; then CMD="$CMD --categories $CATEGORIES"; fi 16if [ "$NO_SERVICES" = "true" ]; then CMD="$CMD --no-services"; fi 17if [ "$VERBOSE" = "true" ]; then CMD="$CMD --verbose"; fi

4. Run Gen-Eval

Execute from the project root:

bash
1cd "$PROJECT_ROOT" 2echo "Running: $CMD" 3$CMD 4EXIT_CODE=$?

5. Report Results

After execution, display a summary:

  • If reports were generated, read and summarize the markdown report
  • Show pass rate, coverage %, and any failures
  • If EXIT_CODE != 0, highlight failing scenarios and suggest /gen-eval-scenario for authoring targeted scenarios
bash
1if [ -f "$OUTPUT_DIR/gen-eval-report.md" ]; then 2 echo "" 3 echo "=== Gen-Eval Report ===" 4 cat "$OUTPUT_DIR/gen-eval-report.md" 5fi

Quick Start

The simplest invocation — auto-detects the descriptor and runs template-only:

bash
1/gen-eval

With CLI-augmented generation (subscription-covered):

bash
1/gen-eval --mode cli-augmented --time-budget 30

Against specific categories:

bash
1/gen-eval --categories lock-lifecycle auth-boundary

Integration Points

  • /validate-feature: Gen-eval runs as phase 4b (between smoke and e2e). Auto-detected when descriptors exist.
  • /explore-feature: Gen-eval report signals (failing interfaces, coverage gaps) feed into feature opportunity ranking.
  • /gen-eval-scenario: Create new scenario YAML files interactively.
  • make gen-eval: Makefile shorthand for the most common invocation.

Output

  • gen-eval-report.md — Markdown report with pass/fail summary
  • gen-eval-report.json — Machine-readable results
  • gen-eval-metrics.json — Per-scenario metrics for pipeline integration
  • Exit code 0 if pass rate meets threshold (default 95%), 1 otherwise

Verwandte Fähigkeiten

Looking for an alternative to gen-eval or another community skill for your workflow? Explore these related open-source skills.

Alle anzeigen

openclaw-release-maintainer

Logo of openclaw
openclaw

Lokalisierte Zusammenfassung: 🦞 # OpenClaw Release Maintainer Use this skill for release and publish-time workflow. It covers ai, assistant, crustacean workflows. This AI agent skill supports Claude Code, Cursor, and Windsurf workflows.

333.8k
0
Künstliche Intelligenz

widget-generator

Logo of f
f

Lokalisierte Zusammenfassung: Generate customizable widget plugins for the prompts.chat feed system # Widget Generator Skill This skill guides creation of widget plugins for prompts.chat . It covers ai, artificial-intelligence, awesome-list workflows. This AI agent skill supports Claude Code

149.6k
0
Künstliche Intelligenz

flags

Logo of vercel
vercel

Lokalisierte Zusammenfassung: The React Framework # Feature Flags Use this skill when adding or changing framework feature flags in Next.js internals. It covers blog, browser, compiler workflows. This AI agent skill supports Claude Code, Cursor, and Windsurf workflows.

138.4k
0
Browser

pr-review

Logo of pytorch
pytorch

Lokalisierte Zusammenfassung: Usage Modes No Argument If the user invokes /pr-review with no arguments, do not perform a review . It covers autograd, deep-learning, gpu workflows. This AI agent skill supports Claude Code, Cursor, and Windsurf workflows.

98.6k
0
Entwickler