evaluating-llms
Evaluate LLM systems using automated metrics, LLM-as-judge, and benchmarks. Use when testing prompt quality, validating RAG pipelines, measuring safety (hallucinations, bias), or comparing models for
Browse and install thousands of AI Agent skills in the Killer-Skills directory. Supports Claude Code, Windsurf, Cursor, and more.
Evaluate LLM systems using automated metrics, LLM-as-judge, and benchmarks. Use when testing prompt quality, validating RAG pipelines, measuring safety (hallucinations, bias), or comparing models for
frontend-design is an AI agent skill for before coding, understand the context and commit to a bold aesthetic direction:.
Design and implement observability and SRE controls including SLO/SLI, OpenTelemetry instrumentation, alerting, dashboards, and incident readiness. Use when tasks involve telemetry coverage, operation
expo-modules is an AI agent skill for great module standards.
Router skill for ToolUniverse tasks. First checks if specialized tooluniverse skills (54 skills covering disease/drug/target research, clinical decision support, genomics, transcriptomics, single-cell
Create distinctive, production-grade frontend interfaces with high design quality. Use this skill for UI tasks that build web components, pages, or applications. Generates creative, polished code that
Display session efficiency report showing token savings, cache performance, and optimization recommendations. Use when user asks show my stats, how efficient am I?, show session metrics, or wants to s
Display session efficiency report showing token savings, cache performance, and optimization recommendations. Use when user asks show my stats, how efficient am I?, show session metrics, or wants to s
convex-component-authoring is an AI agent skill for convex component authoring.
Debug preprocessing pipeline failures. Guides through reading checkpoint files, checking step artifacts, interpreting QC metrics, examining visualization PNGs, and identifying which step failed and wh
bimverdi-design is an AI agent skill for bim verdi design skill.
System-wide structural audit of the entire PM knowledge system. Checks orphans, broken links, register coverage, schema violations across all decisions, and vault health metrics. Slower and deeper tha