evaluating-llms
Evaluate LLM systems using automated metrics, LLM-as-judge, and benchmarks. Use when testing prompt quality, validating RAG pipelines, measuring safety (hallucinations, bias), or comparing models for production deployment.
浏览和安装 Killer-Skills 目录中的数千个 AI Agent 技能。支持 Claude Code、Windsurf、Cursor 等。
Evaluate LLM systems using automated metrics, LLM-as-judge, and benchmarks. Use when testing prompt quality, validating RAG pipelines, measuring safety (hallucinations, bias), or comparing models for production deployment.
Context-aware Swedish translation for Sport Wizard youth soccer coaching application. Use when the user explicitly requests translation of components, screens, features, or user-facing text to Swedish (e.g., Translate ConfigurationScreen to Swedish, Add Swedish translations for the game screen, Translate the new player rotation UI). This skill provides domain-specific soccer terminology, natural Swedish phrasing patterns, and automated translation workflow that updates both JSON translation files and component code.
Systematic testing methodology for Go projects using TDD, coverage-driven gap closure, fixture patterns, and CLI testing. Use when establishing test strategy from scratch, improving test coverage from 60-75% to 80%+, creating test infrastructure with mocks and fixtures, building CLI test suites, or systematizing ad-hoc testing. Provides 8 documented patterns (table-driven, golden file, fixture, mocking, CLI testing, integration, helper utilities, coverage-driven gap closure), 3 automation tools (coverage analyzer 186x speedup, test generator 200x speedup, methodology guide 7.5x speedup). Validated across 3 project archetypes with 3.1x average speedup, 5.8% adaptation effort, 89% transferability to Python/Rust/TypeScript.