eval
eval is a Thoughtbox intention ledger for agents, designed to evaluate AI decisions against its decision-making metrics.
Browse and install thousands of AI Agent skills in the Killer-Skills directory. Supports Claude Code, Windsurf, Cursor, and more.
eval is a Thoughtbox intention ledger for agents, designed to evaluate AI decisions against its decision-making metrics.
Add and manage evaluation results in Hugging Face model cards. Supports extracting eval tables from README content, importing scores from Artificial Analysis API, and running custom model evaluations
The api-rules skill provides a Python programming assistant for LLM evaluation tasks, utilizing OpenAI API and libraries like Pandas and NumPy. It benefits developers working with LLMs.
Debug AI traces, find exceptions, analyze sessions, and manage prompts via Langfuse MCP. Use when debugging AI pipelines, investigating errors, analyzing latency, managing prompt versions, or setting