KS
Killer-Skills

agent-browser — how to use agent-browser how to use agent-browser, agent-browser setup guide, browser automation with agent-browser, agent-browser alternative, agent-browser vs selenium, agent-browser install, what is agent-browser, agent-browser documentation, agent-browser tutorial

v1.0.0
GitHub

About this Skill

Ideal for Web Automation Agents needing to interact with dynamic web pages and perform complex browser actions. agent-browser is a browser automation tool that allows developers to navigate, snapshot, and interact with web pages using specific commands and references

Features

Navigates to web pages using 'agent-browser open <url>' command
Captures interactive elements with references like '@e1' and '@e2' using 'agent-browser snapshot -i'
Performs click actions on elements by reference using 'agent-browser click @e1'
Fills input fields by reference using 'agent-browser fill @e2 "text"'
Closes the browser instance using 'agent-browser close' command

# Core Topics

Kjdragan Kjdragan
[0]
[0]
Updated: 3/6/2026

Quality Score

Top 5%
60
Excellent
Based on code quality & docs
Installation
SYS Universal Install (Auto-Detect)
Cursor IDE Windsurf IDE VS Code IDE
> npx killer-skills add Kjdragan/universal_agent/agent-browser

Agent Capability Analysis

The agent-browser MCP Server by Kjdragan is an open-source Categories.community integration for Claude and other AI agents, enabling seamless task automation and capability expansion. Optimized for how to use agent-browser, agent-browser setup guide, browser automation with agent-browser.

Ideal Agent Persona

Ideal for Web Automation Agents needing to interact with dynamic web pages and perform complex browser actions.

Core Value

Empowers agents to automate browser interactions using snapshot capabilities, capturing interactive elements and performing actions like clicking and filling inputs via refs, leveraging protocols like HTTP and HTML elements.

Capabilities Granted for agent-browser MCP Server

Automating web scraping tasks
Generating interactive web tests
Debugging web application UI issues

! Prerequisites & Limits

  • Requires browser instance access
  • Limited to HTTP and HTML interactions
Project
SKILL.md
8.9 KB
.cursorrules
1.2 KB
package.json
240 B
Ready
UTF-8

# Tags

[No tags]
SKILL.md
Readonly

Browser Automation with agent-browser

Quick start

bash
1agent-browser open <url> # Navigate to page 2agent-browser snapshot -i # Get interactive elements with refs 3agent-browser click @e1 # Click element by ref 4agent-browser fill @e2 "text" # Fill input by ref 5agent-browser close # Close browser

Core workflow

  1. Navigate: agent-browser open <url>
  2. Snapshot: agent-browser snapshot -i (returns elements with refs like @e1, @e2)
  3. Interact using refs from the snapshot
  4. Re-snapshot after navigation or significant DOM changes

Commands

Navigation

bash
1agent-browser open <url> # Navigate to URL 2agent-browser back # Go back 3agent-browser forward # Go forward 4agent-browser reload # Reload page 5agent-browser close # Close browser

Snapshot (page analysis)

bash
1agent-browser snapshot # Full accessibility tree 2agent-browser snapshot -i # Interactive elements only (recommended) 3agent-browser snapshot -c # Compact output 4agent-browser snapshot -d 3 # Limit depth to 3 5agent-browser snapshot -s "#main" # Scope to CSS selector

Interactions (use @refs from snapshot)

bash
1agent-browser click @e1 # Click 2agent-browser dblclick @e1 # Double-click 3agent-browser focus @e1 # Focus element 4agent-browser fill @e2 "text" # Clear and type 5agent-browser type @e2 "text" # Type without clearing 6agent-browser press Enter # Press key 7agent-browser press Control+a # Key combination 8agent-browser keydown Shift # Hold key down 9agent-browser keyup Shift # Release key 10agent-browser hover @e1 # Hover 11agent-browser check @e1 # Check checkbox 12agent-browser uncheck @e1 # Uncheck checkbox 13agent-browser select @e1 "value" # Select dropdown 14agent-browser scroll down 500 # Scroll page 15agent-browser scrollintoview @e1 # Scroll element into view 16agent-browser drag @e1 @e2 # Drag and drop 17agent-browser upload @e1 file.pdf # Upload files

Get information

bash
1agent-browser get text @e1 # Get element text 2agent-browser get html @e1 # Get innerHTML 3agent-browser get value @e1 # Get input value 4agent-browser get attr @e1 href # Get attribute 5agent-browser get title # Get page title 6agent-browser get url # Get current URL 7agent-browser get count ".item" # Count matching elements 8agent-browser get box @e1 # Get bounding box

Check state

bash
1agent-browser is visible @e1 # Check if visible 2agent-browser is enabled @e1 # Check if enabled 3agent-browser is checked @e1 # Check if checked

Data Extraction Best Practices

  • Prefer innerText: Dynamic sites often obfuscate class names (e.g., css-1234). Use document.body.innerText to get all text and parse it with your LLM context.
  • Use snapshot -i: This uses the accessibility tree which is more stable than HTML structure, especially for interactive elements.
  • Avoid complex selectors: Selectors like div > div:nth-child(3) are brittle. If you must use JS, try robust queries like document.querySelectorAll('a[href*="showtime"]').
  • Fallback: If a JS extraction returns [], immediately fallback to getting the whole page text.

Screenshots & PDF

bash
1agent-browser screenshot # Screenshot to stdout 2agent-browser screenshot path.png # Save to file 3agent-browser screenshot --full # Full page 4agent-browser pdf output.pdf # Save as PDF

Video recording

bash
1agent-browser record start ./demo.webm # Start recording (uses current URL + state) 2agent-browser click @e1 # Perform actions 3agent-browser record stop # Stop and save video 4agent-browser record restart ./take2.webm # Stop current + start new recording

Recording creates a fresh context but preserves cookies/storage from your session. If no URL is provided, it automatically returns to your current page. For smooth demos, explore first, then start recording.

Wait

bash
1agent-browser wait @e1 # Wait for element 2agent-browser wait 2000 # Wait milliseconds 3agent-browser wait --text "Success" # Wait for text 4agent-browser wait --url "**/dashboard" # Wait for URL pattern 5agent-browser wait --load networkidle # Wait for network idle 6agent-browser wait --fn "window.ready" # Wait for JS condition

Mouse control

bash
1agent-browser mouse move 100 200 # Move mouse 2agent-browser mouse down left # Press button 3agent-browser mouse up left # Release button 4agent-browser mouse wheel 100 # Scroll wheel

Semantic locators (alternative to refs)

bash
1agent-browser find role button click --name "Submit" 2agent-browser find text "Sign In" click 3agent-browser find label "Email" fill "user@test.com" 4agent-browser find first ".item" click 5agent-browser find nth 2 "a" text

Browser settings

bash
1agent-browser set viewport 1920 1080 # Set viewport size 2agent-browser set device "iPhone 14" # Emulate device 3agent-browser set geo 37.7749 -122.4194 # Set geolocation 4agent-browser set offline on # Toggle offline mode 5agent-browser set headers '{"X-Key":"v"}' # Extra HTTP headers 6agent-browser set credentials user pass # HTTP basic auth 7agent-browser set media dark # Emulate color scheme

Cookies & Storage

bash
1agent-browser cookies # Get all cookies 2agent-browser cookies set name value # Set cookie 3agent-browser cookies clear # Clear cookies 4agent-browser storage local # Get all localStorage 5agent-browser storage local key # Get specific key 6agent-browser storage local set k v # Set value 7agent-browser storage local clear # Clear all

Network

bash
1agent-browser network route <url> # Intercept requests 2agent-browser network route <url> --abort # Block requests 3agent-browser network route <url> --body '{}' # Mock response 4agent-browser network unroute [url] # Remove routes 5agent-browser network requests # View tracked requests 6agent-browser network requests --filter api # Filter requests

Tabs & Windows

bash
1agent-browser tab # List tabs 2agent-browser tab new [url] # New tab 3agent-browser tab 2 # Switch to tab 4agent-browser tab close # Close tab 5agent-browser window new # New window

Frames

bash
1agent-browser frame "#iframe" # Switch to iframe 2agent-browser frame main # Back to main frame

Dialogs

bash
1agent-browser dialog accept [text] # Accept dialog 2agent-browser dialog dismiss # Dismiss dialog

JavaScript

bash
1agent-browser eval "document.title" # Run JavaScript

Example: Form submission

bash
1agent-browser open https://example.com/form 2agent-browser snapshot -i 3# Output shows: textbox "Email" [ref=e1], textbox "Password" [ref=e2], button "Submit" [ref=e3] 4 5agent-browser fill @e1 "user@example.com" 6agent-browser fill @e2 "password123" 7agent-browser click @e3 8agent-browser wait --load networkidle 9agent-browser snapshot -i # Check result

Example: Authentication with saved state

bash
1# Login once 2agent-browser open https://app.example.com/login 3agent-browser snapshot -i 4agent-browser fill @e1 "username" 5agent-browser fill @e2 "password" 6agent-browser click @e3 7agent-browser wait --url "**/dashboard" 8agent-browser state save auth.json 9 10# Later sessions: load saved state 11agent-browser state load auth.json 12agent-browser open https://app.example.com/dashboard

Sessions (parallel browsers)

bash
1agent-browser --session test1 open site-a.com 2agent-browser --session test2 open site-b.com 3agent-browser session list

JSON output (for parsing)

Add --json for machine-readable output:

bash
1agent-browser snapshot -i --json 2agent-browser get text @e1 --json

Debugging

bash
1agent-browser open example.com --headed # Show browser window 2agent-browser console # View console messages 3agent-browser errors # View page errors 4agent-browser record start ./debug.webm # Record from current page 5agent-browser record stop # Save recording 6agent-browser open example.com --headed # Show browser window 7agent-browser --cdp 9222 snapshot # Connect via CDP 8agent-browser console # View console messages 9agent-browser console --clear # Clear console 10agent-browser errors # View page errors 11agent-browser errors --clear # Clear errors 12agent-browser highlight @e1 # Highlight element 13agent-browser trace start # Start recording trace 14agent-browser trace stop trace.zip # Stop and save trace

Related Skills

Looking for an alternative to agent-browser or building a Categories.community AI Agent? Explore these related open-source MCP Servers.

View All

widget-generator

Logo of f
f

widget-generator is an open-source AI agent skill for creating widget plugins that are injected into prompt feeds on prompts.chat. It supports two rendering modes: standard prompt widgets using default PromptCard styling and custom render widgets built as full React components.

149.6k
0
Design

chat-sdk

Logo of lobehub
lobehub

chat-sdk is a unified TypeScript SDK for building chat bots across multiple platforms, providing a single interface for deploying bot logic.

73.0k
0
Communication

zustand

Logo of lobehub
lobehub

The ultimate space for work and life — to find, build, and collaborate with agent teammates that grow with you. We are taking agent harness to the next level — enabling multi-agent collaboration, effortless agent team design, and introducing agents as the unit of work interaction.

72.8k
0
Communication

data-fetching

Logo of lobehub
lobehub

The ultimate space for work and life — to find, build, and collaborate with agent teammates that grow with you. We are taking agent harness to the next level — enabling multi-agent collaboration, effortless agent team design, and introducing agents as the unit of work interaction.

72.8k
0
Communication