Langfuse Skill
Debug your AI systems through Langfuse observability.
Triggers: langfuse, traces, debug AI, find exceptions, set up langfuse, what went wrong, why is it slow, datasets, evaluation sets
Setup
Step 1: Get credentials from https://cloud.langfuse.com → Settings → API Keys
If self-hosted, use your instance URL for LANGFUSE_HOST and create keys there.
Step 2: Install MCP (pick one):
bash1# Claude Code (project-scoped, shared via .mcp.json) 2claude mcp add \ 3 --scope project \ 4 --env LANGFUSE_PUBLIC_KEY=pk-... \ 5 --env LANGFUSE_SECRET_KEY=sk-... \ 6 --env LANGFUSE_HOST=https://cloud.langfuse.com \ 7 langfuse -- uvx --python 3.11 langfuse-mcp 8 9# Codex CLI (user-scoped, stored in ~/.codex/config.toml) 10codex mcp add langfuse \ 11 --env LANGFUSE_PUBLIC_KEY=pk-... \ 12 --env LANGFUSE_SECRET_KEY=sk-... \ 13 --env LANGFUSE_HOST=https://cloud.langfuse.com \ 14 -- uvx --python 3.11 langfuse-mcp
Step 3: Restart CLI, verify with /mcp (Claude) or codex mcp list (Codex)
Step 4: Test: fetch_traces(age=60)
Read-Only Mode
For safer observability without risk of modifying prompts or datasets, enable read-only mode:
bash1# CLI flag 2langfuse-mcp --read-only 3 4# Or environment variable 5LANGFUSE_MCP_READ_ONLY=true
This disables write tools: create_text_prompt, create_chat_prompt, update_prompt_labels, create_dataset, create_dataset_item, delete_dataset_item.
For manual .mcp.json setup or troubleshooting, see references/setup.md.
Playbooks
"Where are the errors?"
find_exceptions(age=1440, group_by="file")
→ Shows error counts by file. Pick the worst offender.
find_exceptions_in_file(filepath="src/ai/chat.py", age=1440)
→ Lists specific exceptions. Grab a trace_id.
get_exception_details(trace_id="...")
→ Full stacktrace and context.
"What happened in this interaction?"
fetch_traces(age=60, user_id="...")
→ Find the trace. Note the trace_id.
If you don't know the user_id, start with:
fetch_traces(age=60)
fetch_trace(trace_id="...", include_observations=true)
→ See all LLM calls in the trace.
fetch_observation(observation_id="...")
→ Inspect a specific generation's input/output.
"Why is it slow?"
fetch_observations(age=60, type="GENERATION")
→ Find recent LLM calls. Look for high latency.
fetch_observation(observation_id="...")
→ Check token counts, model, timing.
"What's this user experiencing?"
get_user_sessions(user_id="...", age=1440)
→ List their sessions.
get_session_details(session_id="...")
→ See all traces in the session.
"Manage datasets"
list_datasets()
→ See all datasets.
get_dataset(name="evaluation-set-v1")
→ Get dataset details.
list_dataset_items(dataset_name="evaluation-set-v1", page=1, limit=10)
→ Browse items in the dataset.
create_dataset(name="qa-test-cases", description="QA evaluation set")
→ Create a new dataset.
create_dataset_item(
dataset_name="qa-test-cases",
input={"question": "What is 2+2?"},
expected_output={"answer": "4"}
)
→ Add test cases.
create_dataset_item(
dataset_name="qa-test-cases",
item_id="item_123",
input={"question": "What is 3+3?"},
expected_output={"answer": "6"}
)
→ Upsert: updates existing item by id or creates if missing.
"Manage prompts"
list_prompts()
→ See all prompts with labels.
get_prompt(name="...", label="production")
→ Fetch current production version.
create_text_prompt(name="...", prompt="...", labels=["staging"])
→ Create new version in staging.
update_prompt_labels(name="...", version=N, labels=["production"])
→ Promote to production. (Rollback = re-apply label to older version)
Quick Reference
| Task | Tool |
|---|---|
| List traces | fetch_traces(age=N) |
| Get trace details | fetch_trace(trace_id="...", include_observations=true) |
| List LLM calls | fetch_observations(age=N, type="GENERATION") |
| Get observation | fetch_observation(observation_id="...") |
| Error count | get_error_count(age=N) |
| Find exceptions | find_exceptions(age=N, group_by="file") |
| List sessions | fetch_sessions(age=N) |
| User sessions | get_user_sessions(user_id="...", age=N) |
| List prompts | list_prompts() |
| Get prompt | get_prompt(name="...", label="production") |
| List datasets | list_datasets() |
| Get dataset | get_dataset(name="...") |
| List dataset items | list_dataset_items(dataset_name="...", limit=N) |
| Create/update dataset item | create_dataset_item(dataset_name="...", item_id="...") |
age = minutes to look back (max 10080 = 7 days)
References
references/tool-reference.md— Full parameter docs, filter semantics, response schemasreferences/setup.md— Manual setup, troubleshooting, advanced configuration