agent-evaluation
LLM-as-judge evaluation framework with 5-dimension rubric (accuracy, groundedness, coherence, completeness, helpfulness) for scoring AI-generated content quality with weighted composite scores and evidence citations
Explora e instala miles de habilidades para AI Agents en el directorio de Killer-Skills. Compatible con Claude Code, Windsurf, Cursor y más.
This directory brings installable AI Agent skills into one place so you can filter by search, category, topic, and official source, then install them directly into Claude Code, Cursor, Windsurf, and other supported environments.
LLM-as-judge evaluation framework with 5-dimension rubric (accuracy, groundedness, coherence, completeness, helpfulness) for scoring AI-generated content quality with weighted composite scores and evidence citations
Comprehensive mobile app testing strategies for iOS and Android. Covers unit tests, UI tests, integration tests, performance testing, and test automation with Detox, Appium, and XCTest.
Pwning AI Code Interpreters for fun and profit - by Phantom Labs
Generate visual hierarchy diagrams of agent system showing levels and delegation. Use for documentation or onboarding.
Best-practice guidance for the SojuStack monorepo (NestJS + Drizzle + Better Auth + TanStack Start). Use when editing files in apps/api or apps/web, designing routes, query/form patterns, auth/transaction flows, or implementing cross-stack features.
How to use the Boros adapter for fixed-rate market data, margin flows, and vault execution in Wayfinder Paths (market discovery, vault screening, quoting, and transaction gotchas).
How to run scenario tests against Gorlami fork RPCs (dry runs) before broadcasting live transactions. Covers config, seeding balances, runner flags, and safe script patterns.
Synchronize DataSpoke specification documents with current implementation state. Use when specs and implementations have drifted and need reconciliation.
Simplified Claude Flow for beginners - AI agent orchestration made easy
Read and extract text from PDF files — documents, reports, contracts, spreadsheets. Use whenever you need to read PDF content, not just when explicitly asked. Handles local files, URLs, and WhatsApp attachments.
Manage the local kubernetes cluster for development. Use when the user asks to check pods, restart deployments, view logs, apply manifests, create or delete resources, or perform any kubectl/helm operation.
Guidelines for using web search and documentation lookup tools (WebSearch, WebFetch, context7 MCP). Use when agents need to verify technical claims, check library APIs, or research current tool capabilities.