How do I install gen-eval?

Run the command: npx killer-skills add jankneumann/agentic-coding-tools/gen-eval. It works with Cursor, Windsurf, VS Code, Claude Code, and 19+ other IDEs.

What are the use cases for gen-eval?

Key use cases include: Anwendungsfall: Applying $ARGUMENTS - Optional flags:, Anwendungsfall: Applying --descriptor <path — Path to interface descriptor YAML (auto-detected if omitted), Anwendungsfall: Applying --mode <mode (default: template-only) — template-only, cli-augmented, or sdk-only.

Which IDEs are compatible with gen-eval?

This skill is compatible with Cursor, Windsurf, VS Code, Trae, Claude Code, OpenClaw, Aider, Codex, OpenCode, Goose, Cline, Roo Code, Kiro, Augment Code, Continue, GitHub Copilot, Sourcegraph Cody, and Amazon Q Developer. Use the Killer-Skills CLI for universal one-command installation.

Are there any limitations for gen-eval?

Einschraenkung: --mode <mode (default: template-only) — template-only, cli-augmented, or sdk-only. Einschraenkung: MODE="${MODE:-template-only}". Einschraenkung: Requires repository-specific context from the skill documentation.

gen-eval

Collection of tools to simplify agentic coding # Gen-Eval Run the generator-evaluator testing framework against live or local services. This AI agent skill

SKILL.md

Readonly

Upstream Repository Material

The section below is imported from the upstream repository and should be treated as secondary evidence. Use the Killer-Skills review above as the primary layer for fit, risk, and installation decisions.

Supporting Evidence

Gen-Eval

Name: gen-eval
Availability: InStock
Author: jankneumann

Run the generator-evaluator testing framework against live or local services. Generates test scenarios from interface descriptors, executes them, and evaluates results against expected behavior.

Arguments

$ARGUMENTS - Optional flags:

--descriptor <path> — Path to interface descriptor YAML (auto-detected if omitted)
--mode <mode> (default: template-only) — template-only, cli-augmented, or sdk-only
--cli-command <cmd> (default: claude) — CLI tool for cli-augmented mode
--time-budget <minutes> (default: 60) — Time budget for CLI mode
--sdk-budget <usd> — USD budget cap for SDK mode
--max-iterations <n> (default: 1) — Feedback loop iterations
--parallel <n> (default: 5) — Concurrent scenario execution
--changed-features-ref <git-ref> — Git ref for change detection
--categories <cat1> [cat2 ...] — Filter to specific categories
--report-format <format> (default: both) — markdown, json, or both
--output-dir <path> (default: .) — Report output directory
--no-services — Skip service startup/teardown
--verbose — Enable verbose output

Steps

1. Auto-Detect Descriptor

If --descriptor is not provided, find the nearest descriptor YAML:

bash
1DESCRIPTOR=$(find . -path "*/evaluation/gen_eval/descriptors/*.yaml" -type f 2>/dev/null | head -1)
2
3if [ -z "$DESCRIPTOR" ]; then
4  echo "ERROR: No gen-eval descriptor found. Provide --descriptor <path> or create one with /gen-eval-scenario."
5  exit 1
6fi
7echo "Auto-detected descriptor: $DESCRIPTOR"

2. Detect Project Root and Activate Venv

bash
1# Find the project root (directory containing the descriptor's evaluation/ parent)
2PROJECT_ROOT=$(dirname "$(dirname "$(dirname "$(dirname "$DESCRIPTOR")")")")
3echo "Project root: $PROJECT_ROOT"
4
5# Activate the project venv
6if [ -f "$PROJECT_ROOT/.venv/bin/python" ]; then
7  PYTHON="$PROJECT_ROOT/.venv/bin/python"
8else
9  PYTHON="python3"
10fi

3. Parse Mode and Build Command

Parse $ARGUMENTS for mode and flags. Build the CLI command:

bash
1# Defaults
2MODE="${MODE:-template-only}"
3PARALLEL="${PARALLEL:-5}"
4MAX_ITER="${MAX_ITER:-1}"
5REPORT_FORMAT="${REPORT_FORMAT:-both}"
6OUTPUT_DIR="${OUTPUT_DIR:-.}"
7
8CMD="$PYTHON -m evaluation.gen_eval --descriptor $DESCRIPTOR --mode $MODE --parallel $PARALLEL --max-iterations $MAX_ITER --report-format $REPORT_FORMAT --output-dir $OUTPUT_DIR"
9
10# Append optional flags from arguments
11if [ -n "$TIME_BUDGET" ]; then CMD="$CMD --time-budget $TIME_BUDGET"; fi
12if [ -n "$SDK_BUDGET" ]; then CMD="$CMD --sdk-budget $SDK_BUDGET"; fi
13if [ -n "$CLI_COMMAND" ]; then CMD="$CMD --cli-command $CLI_COMMAND"; fi
14if [ -n "$CHANGED_REF" ]; then CMD="$CMD --changed-features-ref $CHANGED_REF"; fi
15if [ -n "$CATEGORIES" ]; then CMD="$CMD --categories $CATEGORIES"; fi
16if [ "$NO_SERVICES" = "true" ]; then CMD="$CMD --no-services"; fi
17if [ "$VERBOSE" = "true" ]; then CMD="$CMD --verbose"; fi

4. Run Gen-Eval

Execute from the project root:

bash
1cd "$PROJECT_ROOT"
2echo "Running: $CMD"
3$CMD
4EXIT_CODE=$?

5. Report Results

After execution, display a summary:

If reports were generated, read and summarize the markdown report
Show pass rate, coverage %, and any failures
If EXIT_CODE != 0, highlight failing scenarios and suggest /gen-eval-scenario for authoring targeted scenarios

bash
1if [ -f "$OUTPUT_DIR/gen-eval-report.md" ]; then
2  echo ""
3  echo "=== Gen-Eval Report ==="
4  cat "$OUTPUT_DIR/gen-eval-report.md"
5fi

Quick Start

The simplest invocation — auto-detects the descriptor and runs template-only:

bash
1/gen-eval

With CLI-augmented generation (subscription-covered):

bash
1/gen-eval --mode cli-augmented --time-budget 30

Against specific categories:

bash
1/gen-eval --categories lock-lifecycle auth-boundary

Integration Points

/validate-feature: Gen-eval runs as phase 4b (between smoke and e2e). Auto-detected when descriptors exist.
/explore-feature: Gen-eval report signals (failing interfaces, coverage gaps) feed into feature opportunity ranking.
/gen-eval-scenario: Create new scenario YAML files interactively.
make gen-eval: Makefile shorthand for the most common invocation.

Output

gen-eval-report.md — Markdown report with pass/fail summary
gen-eval-report.json — Machine-readable results
gen-eval-metrics.json — Per-scenario metrics for pipeline integration
Exit code 0 if pass rate meets threshold (default 95%), 1 otherwise

gen-eval — for Claude Code gen-eval, agentic-coding-tools, community, for Claude Code, ide skills, $ARGUMENTS, descriptor <path>, mode <mode>, template-only, cli-augmented

Über diesen Skill

Funktionen

# Core Topics

Killer-Skills Review

Warum diese Fähigkeit verwenden

Am besten geeignet für

↓ Handlungsfähige Anwendungsfälle for gen-eval

! Sicherheit & Einschränkungen

Why this page is reference-only

Source Boundary

Decide The Next Action Before You Keep Reading Repository Material

Start With Installation And Validation

Cross-Check Against Trusted Picks

Move To Workflow Collections For Team Rollout

Browser Sandbox Environment

⚡️ Ready to unleash?

FAQ & Installation Steps

? Frequently Asked Questions

What is gen-eval?

How do I install gen-eval?

What are the use cases for gen-eval?

Which IDEs are compatible with gen-eval?

Are there any limitations for gen-eval?

↓ How To Install

! Reference-Only Mode

Upstream Repository Material

gen-eval

Gen-Eval

Arguments

Steps

1. Auto-Detect Descriptor

2. Detect Project Root and Activate Venv

3. Parse Mode and Build Command

4. Run Gen-Eval

5. Report Results

Quick Start

Integration Points

Output

Verwandte Fähigkeiten

Looking for an alternative to gen-eval or another community skill for your workflow? Explore these related open-source skills.

openclaw-release-maintainer

widget-generator

flags

pr-review