evaluating-llms
Evaluate LLM systems using automated metrics, LLM-as-judge, and benchmarks. Use when testing prompt quality, validating RAG pipelines, measuring safety (hallucinations, bias), or comparing models for production deployment.
Top DevOps and cloud skills for deployment, infrastructure, Docker, Kubernetes, and CI/CD workflows.
This directory brings installable AI Agent skills into one place so you can filter by search, category, topic, and official source, then install them directly into Claude Code, Cursor, Windsurf, and other supported environments.
Evaluate LLM systems using automated metrics, LLM-as-judge, and benchmarks. Use when testing prompt quality, validating RAG pipelines, measuring safety (hallucinations, bias), or comparing models for production deployment.
MCP Builder est un standard pour relier les systèmes d'intelligence artificielle aux outils et sources de données externes
FastMCP est un framework pour développer des serveurs Model Context Protocol (MCP) avec des outils et des ressources avancés
Translates Figma designs into production-ready code with 1:1 visual fidelity. Use when implementing UI from Figma files, when user mentions implement design, generate code, implement component, build Figma design, provides Figma URLs, or asks to build components matching Figma specs. Requires Figma MCP server connection.
High-performance CLI trading bot and MCP server for the DeepDex protocol. Enables spot/perpetual trading, wallet management, subaccount operations, and automated trading strategies. Use when building features, fixing bugs, or extending the trading bot.
This skill enables AI agents to orchestrate the DHTI development workflow by installing elixirs and conches, and starting a fully functional DHTI server with all components installed using Docker.
Creates ./scripts/verify.sh, a standardized verification script that runs tests, linting, formatting, and type checking. Use when setting up a new project, creating CI/CD pipelines, or when ./scripts/verify.sh is missing.
Generate and run unit and integration tests using pytest (Python) or Jest (JavaScript) with fixtures, mocks, and async support. Use for test-driven development, code review validation, coverage verification, and regression testing. Target 80%+ code coverage. Supports pytest markers, Jest snapshots, and CI/CD integration. Triggers on test, TDD, unit test, integration test, test coverage, pytest, jest.
Manage Workers/KV/R2/D1/Hyperdrive via Cloudflare MCP, perform observability/build troubleshooting/audit/container sandbox operations. Triggers: worker/KV/R2/D1/logs/build/deploy/screenshot/audit/sandbox. Three permission tiers: Diagnose (read-only), Change (write requires confirmation), Super Admin (isolated environment). Write operations must follow read-first, user confirmation, post-execution verification.
Use when creating new skills, editing existing skills, or verifying skills work before deployment
Configure and manage API gateways including Kong, Tyk, AWS API Gateway, and Apigee. Activates when users need help setting up API gateways, rate limiting, authentication, request transformation, or API management.
Generate a complete software specification document for the current project/repo, including architecture, data model, key processes, pseudocode, and Mermaid diagrams (context, container/deployment, module relations, sequence, ER, class, flowchart, state).