typescript-sdk
TypeScript SDK patterns for Opik. Use when working in sdks/typescript.
Browse and install thousands of AI Agent skills in the Killer-Skills directory. Supports Claude Code, Windsurf, Cursor, and more.
TypeScript SDK patterns for Opik. Use when working in sdks/typescript.
Add and manage evaluation results in Hugging Face model cards. Supports extracting eval tables from README content, importing scores from Artificial Analysis API, and running custom model evaluations
carrier-relationship-management is a skill that automates and optimizes freight management processes using AI-powered tools.
Generate and verify E2E tests for a feature. Explores live app, creates test plan, generates tests, runs and fixes until passing.
A collection of tools for working with the Oak Open Curriculum Data, including a published MCP server
LLM-as-judge evaluation framework with 5-dimension rubric (accuracy, groundedness, coherence, completeness, helpfulness) for scoring AI-generated content quality with weighted composite scores and evi
Auto-activates during requirements analysis to evaluate technical stack
Debug stuck Hawk/Inspect AI evaluations. Use when user mentions stuck eval, eval not progressing, eval hanging, samples not completing, eval set frozen, runner stuck, 500 errors in eval, retry loop, e
Evaluation is a process of assessing agent systems using different approaches than traditional software
Guide evaluation of healthcare AI systems with domain-specific safety criteria, clinical accuracy rubrics, and score interpretation. Use when building or reviewing health/medical AI evaluations.
Framework for creating high fidelity and complex RL environments and evaluation tasks
Evaluate LLM systems using automated metrics, LLM-as-judge, and benchmarks. Use when testing prompt quality, validating RAG pipelines, measuring safety (hallucinations, bias), or comparing models for