ai code evaluation AI Agent Skills Search Results

hugging-face-evaluation

[ Official ]

huggingface

Add and manage evaluation results in Hugging Face model cards. Supports extracting eval tables from README content, importing scores from Artificial Analysis API, and running custom model evaluations

★ 9.5k

⑂ 0

AI

Healthcare AI Evaluation

MFD3000

Guide evaluation of healthcare AI systems with domain-specific safety criteria, clinical accuracy rubrics, and score interpretation. Use when building or reviewing health/medical AI evaluations.

★ 0

⑂ 0

Developer

skill-stocktake

[ Featured ]

affaan-m

Use when auditing Claude skills and commands for quality. Supports Quick Scan (changed skills only) and Full Stocktake modes with sequential subagent batch evaluation.

★ 108.5k

⑂ 0

Developer

cpp-testing

[ Featured ]

affaan-m

cpp-testing is a specialized AI agent skill for automating C++ test workflows using GoogleTest and CMake.

★ 108.5k

⑂ 0

Developer

requesting-code-review

[ Featured ]

obra

requesting-code-review is a skill that enables developers to request code reviews, ensuring work meets requirements through subagent-driven evaluation

★ 113.6k

⑂ 0

Productivity

sandbox-sdk

[ Official ]

cloudflare

sandbox-sdk is a Cloudflare Workers solution for secure code execution, providing a sandboxed environment for running untrusted code, ideal for developers needing isolated execution.

★ 711

⑂ 0

Developer

gemini

interactive-inc

Get alternative perspectives via Google Gemini on complex design decisions, architecture trade-offs, stuck debugging, or approach evaluation.

★ 0

⑂ 0

Developer

Benchmark Manager

sunholo-data

Benchmark Manager is a skill that manages AILANG evaluation benchmarks with features like prompt integration, debugging, and best practices.

★ 21

⑂ 0

AI

eval

Kastalien-Research

eval is a Thoughtbox intention ledger for agents, designed to evaluate AI decisions against its decision-making metrics.

★ 52

⑂ 0

AI

analyze-experiment

niznik-dev

Generate visualizations from completed experiment evaluations using inspect-viz. Use after run-experiment to create interactive HTML plots from inspect-ai evaluation logs.

★ 0

⑂ 0

Developer

mixseek-skills

drillan

MixSeek Agent Skills collection for AI coding assistants. Provides workspace management, team configuration, evaluation setup, and debugging tools for MixSeek-Core.

★ 0

⑂ 0

Developer

agent-evaluation

oimiragieo

LLM-as-judge evaluation framework with 5-dimension rubric (accuracy, groundedness, coherence, completeness, helpfulness) for scoring AI-generated content quality with weighted composite scores and evi

★ 14

⑂ 0

Developer

Browsing:

hugging-face-evaluation

Healthcare AI Evaluation

skill-stocktake

cpp-testing

requesting-code-review

sandbox-sdk

gemini

Benchmark Manager

eval

analyze-experiment

mixseek-skills

agent-evaluation