Browsing:

Browse and install thousands of AI Agent skills in the Killer-Skills directory. Supports Claude Code, Windsurf, Cursor, and more.

35 available skills

typescript-sdk

Logo of comet-ml
comet-ml

TypeScript SDK patterns for Opik. Use when working in sdks/typescript.

17.8k
0
AI

hugging-face-evaluation

[ Official ]
Logo of huggingface
huggingface

Add and manage evaluation results in Hugging Face model cards. Supports extracting eval tables from README content, importing scores from Artificial Analysis API, and running custom model evaluations

9.5k
0
AI

budmem

Logo of BudEcosystem
BudEcosystem

Bud AI Foundry - A comprehensive inference stack for compound AI deployment, optimization and scaling. Bud Stack provides intelligent infrastructure automation, performance optimization, and seamless

10
0
AI

ground-truth-evaluation

Logo of oaknational
oaknational

A collection of tools for working with the Oak Open Curriculum Data, including a published MCP server

3
0
Developer

agent-evaluation

Logo of oimiragieo
oimiragieo

LLM-as-judge evaluation framework with 5-dimension rubric (accuracy, groundedness, coherence, completeness, helpfulness) for scoring AI-generated content quality with weighted composite scores and evi

14
0
Developer

evaluation

Logo of mshraditya
mshraditya

Evaluation is a process of assessing agent systems using different approaches than traditional software

0
0
Developer

Healthcare AI Evaluation

Logo of MFD3000
MFD3000

Guide evaluation of healthcare AI systems with domain-specific safety criteria, clinical accuracy rubrics, and score interpretation. Use when building or reviewing health/medical AI evaluations.

0
0
Developer

debug-stuck-eval

Logo of METR
METR

Debug stuck Hawk/Inspect AI evaluations. Use when user mentions stuck eval, eval not progressing, eval hanging, samples not completing, eval set frozen, runner stuck, 500 errors in eval, retry loop, e

21
0
AI

eval-harness

[ Featured ]
Logo of affaan-m
affaan-m

eval-harness is a formal evaluation framework implementing eval-driven development principles for Claude Code sessions

108.5k
0
Developer

skill-stocktake

[ Featured ]
Logo of affaan-m
affaan-m

Use when auditing Claude skills and commands for quality. Supports Quick Scan (changed skills only) and Full Stocktake modes with sequential subagent batch evaluation.

108.5k
0
Developer

launch-prep

Logo of benchflow-ai
benchflow-ai

Framework for creating high fidelity and complex RL environments and evaluation tasks

203
0
Developer

e2e

Logo of langwatch
langwatch

Generate and verify E2E tests for a feature. Explores live app, creates test plan, generates tests, runs and fixes until passing.

2.8k
0
AI