What is agent-browser?

Perfect for Test Automation Agents needing advanced browser interaction capabilities. Automates browser interactions for web testing, form filling, screenshots, and data extraction. Use when the user needs to navigate websites, interact with web pages, fill forms, take screenshots, tes

How do I install agent-browser?

Run the command: npx killer-skills add Kjdragan/universal_agent/agent-browser. It works with Cursor, Windsurf, VS Code, Claude Code, and 19+ other IDEs.

What are the use cases for agent-browser?

Key use cases include: Automating web testing for interactive elements, Generating screenshots of web pages for visual validation, Extracting data from websites using element references, Filling out forms automatically for user simulation.

Which IDEs are compatible with agent-browser?

This skill is compatible with Cursor, Windsurf, VS Code, Trae, Claude Code, OpenClaw, Aider, Codex, OpenCode, Goose, Cline, Roo Code, Kiro, Augment Code, Continue, GitHub Copilot, Sourcegraph Cody, and Amazon Q Developer. Use the Killer-Skills CLI for universal one-command installation.

Are there any limitations for agent-browser?

Requires command-line interface access. Limited to browser interactions only. Dependent on stable network connections for HTTP requests.

Browser Automation with agent-browser

Name: agent-browser
Availability: InStock
Author: Kjdragan

Quick start

bash
1agent-browser open <url>        # Navigate to page
2agent-browser snapshot -i       # Get interactive elements with refs
3agent-browser click @e1         # Click element by ref
4agent-browser fill @e2 "text"   # Fill input by ref
5agent-browser close             # Close browser

Core workflow

Navigate: agent-browser open <url>
Snapshot: agent-browser snapshot -i (returns elements with refs like @e1, @e2)
Interact using refs from the snapshot
Re-snapshot after navigation or significant DOM changes

Commands

bash
1agent-browser open <url>      # Navigate to URL
2agent-browser back            # Go back
3agent-browser forward         # Go forward
4agent-browser reload          # Reload page
5agent-browser close           # Close browser

Snapshot (page analysis)

bash
1agent-browser snapshot            # Full accessibility tree
2agent-browser snapshot -i         # Interactive elements only (recommended)
3agent-browser snapshot -c         # Compact output
4agent-browser snapshot -d 3       # Limit depth to 3
5agent-browser snapshot -s "#main" # Scope to CSS selector

Interactions (use @refs from snapshot)

bash
1agent-browser click @e1           # Click
2agent-browser dblclick @e1        # Double-click
3agent-browser focus @e1           # Focus element
4agent-browser fill @e2 "text"     # Clear and type
5agent-browser type @e2 "text"     # Type without clearing
6agent-browser press Enter         # Press key
7agent-browser press Control+a     # Key combination
8agent-browser keydown Shift       # Hold key down
9agent-browser keyup Shift         # Release key
10agent-browser hover @e1           # Hover
11agent-browser check @e1           # Check checkbox
12agent-browser uncheck @e1         # Uncheck checkbox
13agent-browser select @e1 "value"  # Select dropdown
14agent-browser scroll down 500     # Scroll page
15agent-browser scrollintoview @e1  # Scroll element into view
16agent-browser drag @e1 @e2        # Drag and drop
17agent-browser upload @e1 file.pdf # Upload files

Get information

bash
1agent-browser get text @e1        # Get element text
2agent-browser get html @e1        # Get innerHTML
3agent-browser get value @e1       # Get input value
4agent-browser get attr @e1 href   # Get attribute
5agent-browser get title           # Get page title
6agent-browser get url             # Get current URL
7agent-browser get count ".item"   # Count matching elements
8agent-browser get box @e1         # Get bounding box

Check state

bash
1agent-browser is visible @e1      # Check if visible
2agent-browser is enabled @e1      # Check if enabled
3agent-browser is checked @e1      # Check if checked

Data Extraction Best Practices

Prefer innerText: Dynamic sites often obfuscate class names (e.g., css-1234). Use document.body.innerText to get all text and parse it with your LLM context.
Use snapshot -i: This uses the accessibility tree which is more stable than HTML structure, especially for interactive elements.
Avoid complex selectors: Selectors like div > div:nth-child(3) are brittle. If you must use JS, try robust queries like document.querySelectorAll('a[href*="showtime"]').
Fallback: If a JS extraction returns [], immediately fallback to getting the whole page text.

Screenshots & PDF

bash
1agent-browser screenshot          # Screenshot to stdout
2agent-browser screenshot path.png # Save to file
3agent-browser screenshot --full   # Full page
4agent-browser pdf output.pdf      # Save as PDF

Video recording

bash
1agent-browser record start ./demo.webm    # Start recording (uses current URL + state)
2agent-browser click @e1                   # Perform actions
3agent-browser record stop                 # Stop and save video
4agent-browser record restart ./take2.webm # Stop current + start new recording

Recording creates a fresh context but preserves cookies/storage from your session. If no URL is provided, it automatically returns to your current page. For smooth demos, explore first, then start recording.

Wait

bash
1agent-browser wait @e1                     # Wait for element
2agent-browser wait 2000                    # Wait milliseconds
3agent-browser wait --text "Success"        # Wait for text
4agent-browser wait --url "**/dashboard"    # Wait for URL pattern
5agent-browser wait --load networkidle      # Wait for network idle
6agent-browser wait --fn "window.ready"     # Wait for JS condition

Mouse control

bash
1agent-browser mouse move 100 200      # Move mouse
2agent-browser mouse down left         # Press button
3agent-browser mouse up left           # Release button
4agent-browser mouse wheel 100         # Scroll wheel

Semantic locators (alternative to refs)

bash
1agent-browser find role button click --name "Submit"
2agent-browser find text "Sign In" click
3agent-browser find label "Email" fill "user@test.com"
4agent-browser find first ".item" click
5agent-browser find nth 2 "a" text

Browser settings

bash
1agent-browser set viewport 1920 1080      # Set viewport size
2agent-browser set device "iPhone 14"      # Emulate device
3agent-browser set geo 37.7749 -122.4194   # Set geolocation
4agent-browser set offline on              # Toggle offline mode
5agent-browser set headers '{"X-Key":"v"}' # Extra HTTP headers
6agent-browser set credentials user pass   # HTTP basic auth
7agent-browser set media dark              # Emulate color scheme

Cookies & Storage

bash
1agent-browser cookies                     # Get all cookies
2agent-browser cookies set name value      # Set cookie
3agent-browser cookies clear               # Clear cookies
4agent-browser storage local               # Get all localStorage
5agent-browser storage local key           # Get specific key
6agent-browser storage local set k v       # Set value
7agent-browser storage local clear         # Clear all

Network

bash
1agent-browser network route <url>              # Intercept requests
2agent-browser network route <url> --abort      # Block requests
3agent-browser network route <url> --body '{}'  # Mock response
4agent-browser network unroute [url]            # Remove routes
5agent-browser network requests                 # View tracked requests
6agent-browser network requests --filter api    # Filter requests

Tabs & Windows

bash
1agent-browser tab                 # List tabs
2agent-browser tab new [url]       # New tab
3agent-browser tab 2               # Switch to tab
4agent-browser tab close           # Close tab
5agent-browser window new          # New window

Frames

bash
1agent-browser frame "#iframe"     # Switch to iframe
2agent-browser frame main          # Back to main frame

Dialogs

bash
1agent-browser dialog accept [text]  # Accept dialog
2agent-browser dialog dismiss        # Dismiss dialog

JavaScript

bash
1agent-browser eval "document.title"   # Run JavaScript

Example: Form submission

bash
1agent-browser open https://example.com/form
2agent-browser snapshot -i
3# Output shows: textbox "Email" [ref=e1], textbox "Password" [ref=e2], button "Submit" [ref=e3]
4
5agent-browser fill @e1 "user@example.com"
6agent-browser fill @e2 "password123"
7agent-browser click @e3
8agent-browser wait --load networkidle
9agent-browser snapshot -i  # Check result

Example: Authentication with saved state

bash
1# Login once
2agent-browser open https://app.example.com/login
3agent-browser snapshot -i
4agent-browser fill @e1 "username"
5agent-browser fill @e2 "password"
6agent-browser click @e3
7agent-browser wait --url "**/dashboard"
8agent-browser state save auth.json
9
10# Later sessions: load saved state
11agent-browser state load auth.json
12agent-browser open https://app.example.com/dashboard

Sessions (parallel browsers)

bash
1agent-browser --session test1 open site-a.com
2agent-browser --session test2 open site-b.com
3agent-browser session list

JSON output (for parsing)

Add --json for machine-readable output:

bash
1agent-browser snapshot -i --json
2agent-browser get text @e1 --json

Debugging

bash
1agent-browser open example.com --headed              # Show browser window
2agent-browser console                                # View console messages
3agent-browser errors                                 # View page errors
4agent-browser record start ./debug.webm   # Record from current page
5agent-browser record stop                            # Save recording
6agent-browser open example.com --headed  # Show browser window
7agent-browser --cdp 9222 snapshot        # Connect via CDP
8agent-browser console                    # View console messages
9agent-browser console --clear            # Clear console
10agent-browser errors                     # View page errors
11agent-browser errors --clear             # Clear errors
12agent-browser highlight @e1              # Highlight element
13agent-browser trace start                # Start recording trace
14agent-browser trace stop trace.zip       # Stop and save trace

agent-browser — community agent-browser, universal_agent, community, ide skills

About this Skill

Killer-Skills Review

Core Value

Ideal Agent Persona

↓ Capabilities Granted for agent-browser

! Prerequisites & Limits

Source Boundary

Decide The Next Action Before You Keep Reading Repository Material

Start With Installation And Validation

Cross-Check Against Trusted Picks

Move To Workflow Collections For Team Rollout

Browser Sandbox Environment

⚡️ Ready to unleash?

FAQ & Installation Steps

? Frequently Asked Questions