KS
Killer-Skills

promptfoo — how to use promptfoo how to use promptfoo, promptfoo alternative, promptfoo config file, promptfoo setup guide, LLM output testing, CLI tool for AI agents, promptfoo vs other LLM tools, install promptfoo, promptfoo documentation

v1.0.0
GitHub

About this Skill

Perfect for LLM Agent Developers needing comprehensive content analysis and comparison of model outputs. promptfoo is a command-line interface tool for testing and comparing large language model outputs

Features

Auto-discovers promptfooconfig.yaml in the current directory
Supports config file extensions: .yaml, .json, .js
Allows inline prompt with variable substitution
Uses -c path for custom config file locations
Supports file://prompt.txt for external prompt files

# Core Topics

bendrucker bendrucker
[0]
[0]
Updated: 3/6/2026

Quality Score

Top 5%
42
Excellent
Based on code quality & docs
Installation
SYS Universal Install (Auto-Detect)
Cursor IDE Windsurf IDE VS Code IDE
> npx killer-skills add bendrucker/route-agent/promptfoo

Agent Capability Analysis

The promptfoo MCP Server by bendrucker is an open-source Categories.community integration for Claude and other AI agents, enabling seamless task automation and capability expansion. Optimized for how to use promptfoo, promptfoo alternative, promptfoo config file.

Ideal Agent Persona

Perfect for LLM Agent Developers needing comprehensive content analysis and comparison of model outputs.

Core Value

Empowers agents to test and compare LLM outputs using YAML, JSON, or JS config files, supporting extensions like .yaml, .json, and .js, and auto-discovering promptfooconfig.yaml for streamlined evaluation.

Capabilities Granted for promptfoo MCP Server

Automating LLM output comparisons
Generating config files for custom evaluations
Debugging model performance with inline prompt substitution

! Prerequisites & Limits

  • Requires CLI setup
  • Limited to specific file formats (.yaml, .json, .js)
Project
SKILL.md
3.8 KB
.cursorrules
1.2 KB
package.json
240 B
Ready
UTF-8

# Tags

[No tags]
SKILL.md
Readonly

Promptfoo

Promptfoo is a CLI tool for testing and comparing LLM outputs.

Config File

The CLI auto-discovers promptfooconfig.yaml in the current directory. Use -c path for other locations.

Supported extensions: .yaml, .json, .js

Configuration

yaml
1# yaml-language-server: $schema=https://promptfoo.dev/config-schema.json 2description: "What this eval tests" 3 4prompts: 5 - file://prompt.txt 6 - | 7 Inline prompt with {{variable}} substitution 8 9providers: 10 - anthropic:messages:claude-sonnet-4-5-20250929 11 12defaultTest: 13 options: 14 provider: 15 config: 16 temperature: 0.0 17 max_tokens: 4096 18 19tests: 20 - description: "What this case tests" 21 vars: 22 variable: "value" 23 from_file: file://data/input.txt 24 assert: 25 - type: contains 26 value: "expected substring" 27 28# Or load tests from files 29tests: file://cases/all.yaml 30 31outputPath: ./results.json 32 33evaluateOptions: 34 maxConcurrency: 4

Provider IDs

ModelID
Opus 4.5anthropic:messages:claude-opus-4-5-20251101
Sonnet 4.5anthropic:messages:claude-sonnet-4-5-20250929
Haiku 4.5anthropic:messages:claude-haiku-4-5-20251001

Provider config: temperature, max_tokens, top_p, top_k, tools, tool_choice

Prompts

  • file://path.txt — load from file (path relative to config)
  • Inline string with {{variable}} Nunjucks substitution
  • Chat format via JSON: [{"role": "system", "content": "..."}, {"role": "user", "content": "{{input}}"}]

Assertion Types

TypeUseValue
containsSubstring match"expected text"
icontainsCase-insensitive substring"expected text"
equalsExact match"exact value"
regexPattern match"\\d{4}-\\d{2}-\\d{2}"
is-jsonValid JSON output
contains-jsonOutput contains JSON
starts-withPrefix match"prefix"
costMax costthreshold: 0.01
latencyMax response time (ms)threshold: 5000
javascriptCustom JS expressionoutput.includes('x')
pythonCustom Pythonfile://check.py:fn_name
llm-rubricLLM-as-judgerubric text
similarSemantic similarityvalue: "text", threshold: 0.8
model-graded-factualityFact checking

Prefix any assertion with not- to negate (e.g., not-contains).

llm-rubric

Uses an LLM to grade output against a rubric:

yaml
1assert: 2 - type: llm-rubric 3 value: | 4 The response should: 5 - Mention at least 3 factors 6 - Include specific examples 7 threshold: 0.7 8 provider: anthropic:messages:claude-sonnet-4-5-20250929

javascript

Inline expressions or functions. Access output (string) and context (with vars, prompt):

yaml
1assert: 2 - type: javascript 3 value: output.length > 100 && output.includes('route') 4 - type: javascript 5 value: | 6 const data = JSON.parse(output); 7 return data.calories >= 200 && data.calories <= 300;

Test Organization

Split cases into separate files and reference them:

yaml
1tests: 2 - file://cases/basic.yaml 3 - file://cases/edge-cases.yaml

Each case file contains a YAML array of test objects.

CLI

bash
1npx promptfoo eval # Run with auto-discovered config 2npx promptfoo eval -c path/to/config.yaml # Specific config 3npx promptfoo eval --filter-metadata key=v # Filter tests 4npx promptfoo view # Web UI for results 5npx promptfoo cache clear # Clear result cache

References

Consult the configuration reference and Anthropic provider docs for full details.

Related Skills

Looking for an alternative to promptfoo or building a Categories.community AI Agent? Explore these related open-source MCP Servers.

View All

widget-generator

Logo of f
f

widget-generator is an open-source AI agent skill for creating widget plugins that are injected into prompt feeds on prompts.chat. It supports two rendering modes: standard prompt widgets using default PromptCard styling and custom render widgets built as full React components.

149.6k
0
Design

chat-sdk

Logo of lobehub
lobehub

chat-sdk is a unified TypeScript SDK for building chat bots across multiple platforms, providing a single interface for deploying bot logic.

73.0k
0
Communication

zustand

Logo of lobehub
lobehub

The ultimate space for work and life — to find, build, and collaborate with agent teammates that grow with you. We are taking agent harness to the next level — enabling multi-agent collaboration, effortless agent team design, and introducing agents as the unit of work interaction.

72.8k
0
Communication

data-fetching

Logo of lobehub
lobehub

The ultimate space for work and life — to find, build, and collaborate with agent teammates that grow with you. We are taking agent harness to the next level — enabling multi-agent collaboration, effortless agent team design, and introducing agents as the unit of work interaction.

72.8k
0
Communication