Perfect for LLM Analysis Agents needing advanced output comparison and testing capabilities. Agent for building cycling routes

How do I install promptfoo?

Run the command: npx killer-skills add bendrucker/route-agent/promptfoo. It works with Cursor, Windsurf, VS Code, Claude Code, and 19+ other IDEs.

What are the use cases for promptfoo?

Key use cases include: Automating LLM output testing with config files, Comparing outputs from different language models, Debugging prompt engineering using inline prompts with variable substitution.

Which IDEs are compatible with promptfoo?

This skill is compatible with Cursor, Windsurf, VS Code, Trae, Claude Code, OpenClaw, Aider, Codex, OpenCode, Goose, Cline, Roo Code, Kiro, Augment Code, Continue, GitHub Copilot, Sourcegraph Cody, and Amazon Q Developer. Use the Killer-Skills CLI for universal one-command installation.

Are there any limitations for promptfoo?

Requires CLI setup. Limited to specific file formats (.yaml, .json, .js). Dependent on the presence of a 'promptfooconfig.yaml' file for auto-discovery.

promptfoo

Install promptfoo, an AI agent skill for AI agent workflows and automation. Review the use cases, limitations, and setup path before rollout.

SKILL.md

Readonly

Upstream Repository Material

The section below is imported from the upstream repository and should be treated as secondary evidence. Use the Killer-Skills review above as the primary layer for fit, risk, and installation decisions.

Supporting Evidence

Promptfoo

Name: promptfoo
Availability: InStock
Author: bendrucker

Promptfoo is a CLI tool for testing and comparing LLM outputs.

Config File

The CLI auto-discovers promptfooconfig.yaml in the current directory. Use -c path for other locations.

Supported extensions: .yaml, .json, .js

Configuration

yaml
1# yaml-language-server: $schema=https://promptfoo.dev/config-schema.json
2description: "What this eval tests"
3
4prompts:
5  - file://prompt.txt
6  - |
7    Inline prompt with {{variable}} substitution
8
9providers:
10  - anthropic:messages:claude-sonnet-4-5-20250929
11
12defaultTest:
13  options:
14    provider:
15      config:
16        temperature: 0.0
17        max_tokens: 4096
18
19tests:
20  - description: "What this case tests"
21    vars:
22      variable: "value"
23      from_file: file://data/input.txt
24    assert:
25      - type: contains
26        value: "expected substring"
27
28# Or load tests from files
29tests: file://cases/all.yaml
30
31outputPath: ./results.json
32
33evaluateOptions:
34  maxConcurrency: 4

Provider IDs

Model	ID
Opus 4.5	`anthropic:messages:claude-opus-4-5-20251101`
Sonnet 4.5	`anthropic:messages:claude-sonnet-4-5-20250929`
Haiku 4.5	`anthropic:messages:claude-haiku-4-5-20251001`

Provider config: temperature, max_tokens, top_p, top_k, tools, tool_choice

Prompts

file://path.txt — load from file (path relative to config)
Inline string with {{variable}} Nunjucks substitution
Chat format via JSON: [{"role": "system", "content": "..."}, {"role": "user", "content": "{{input}}"}]

Assertion Types

Type	Use	Value
`contains`	Substring match	`"expected text"`
`icontains`	Case-insensitive substring	`"expected text"`
`equals`	Exact match	`"exact value"`
`regex`	Pattern match	`"\\d{4}-\\d{2}-\\d{2}"`
`is-json`	Valid JSON output	—
`contains-json`	Output contains JSON	—
`starts-with`	Prefix match	`"prefix"`
`cost`	Max cost	`threshold: 0.01`
`latency`	Max response time (ms)	`threshold: 5000`
`javascript`	Custom JS expression	`output.includes('x')`
`python`	Custom Python	`file://check.py:fn_name`
`llm-rubric`	LLM-as-judge	rubric text
`similar`	Semantic similarity	`value: "text"`, `threshold: 0.8`
`model-graded-factuality`	Fact checking	—

Prefix any assertion with not- to negate (e.g., not-contains).

llm-rubric

Uses an LLM to grade output against a rubric:

yaml
1assert:
2  - type: llm-rubric
3    value: |
4      The response should:
5      - Mention at least 3 factors
6      - Include specific examples
7    threshold: 0.7
8    provider: anthropic:messages:claude-sonnet-4-5-20250929

javascript

Inline expressions or functions. Access output (string) and context (with vars, prompt):

yaml
1assert:
2  - type: javascript
3    value: output.length > 100 && output.includes('route')
4  - type: javascript
5    value: |
6      const data = JSON.parse(output);
7      return data.calories >= 200 && data.calories <= 300;

Test Organization

Split cases into separate files and reference them:

yaml
1tests:
2  - file://cases/basic.yaml
3  - file://cases/edge-cases.yaml

Each case file contains a YAML array of test objects.

CLI

bash
1npx promptfoo eval                         # Run with auto-discovered config
2npx promptfoo eval -c path/to/config.yaml  # Specific config
3npx promptfoo eval --filter-metadata key=v # Filter tests
4npx promptfoo view                         # Web UI for results
5npx promptfoo cache clear                  # Clear result cache

References

Consult the configuration reference and Anthropic provider docs for full details.

promptfoo — community promptfoo, route-agent, community, ide skills

About this Skill

Killer-Skills Review

Core Value

Ideal Agent Persona

↓ Capabilities Granted for promptfoo

! Prerequisites & Limits

Why this page is reference-only

Source Boundary

Decide The Next Action Before You Keep Reading Repository Material

Start With Installation And Validation

Cross-Check Against Trusted Picks

Move To Workflow Collections For Team Rollout

Browser Sandbox Environment

⚡️ Ready to unleash?

FAQ & Installation Steps

? Frequently Asked Questions

What is promptfoo?

How do I install promptfoo?

What are the use cases for promptfoo?

Which IDEs are compatible with promptfoo?

Are there any limitations for promptfoo?

↓ How To Install

! Reference-Only Mode

Upstream Repository Material

promptfoo

Promptfoo

Config File

Configuration

Provider IDs

Prompts

Assertion Types

llm-rubric

javascript

Test Organization

CLI

References

Related Skills

Looking for an alternative to promptfoo or another community skill for your workflow? Explore these related open-source skills.

openclaw-release-maintainer

widget-generator

flags

pr-review