KS
Killer-Skills

browser-use — Categories.community

v1.0.0
GitHub

About this Skill

Ideal for Automation Agents requiring persistent browser sessions and multi-step workflow capabilities. Intelligent multi-agent system for personalized coding education. Features 7 AI agents, adaptive learning, secure code execution, gamification, and social learning. Built with FastAPI, React, PostgreSQL, and Docker. 356 tests, 90%+ coverage. Spec-driven development with Kiro CLI.

khushparag khushparag
[0]
[0]
Updated: 3/4/2026

Quality Score

Top 5%
60
Excellent
Based on code quality & docs
Installation
SYS Universal Install (Auto-Detect)
Cursor IDE Windsurf IDE VS Code IDE
> npx killer-skills add khushparag/agentic_learning_coach

Agent Capability Analysis

The browser-use MCP Server by khushparag is an open-source Categories.community integration for Claude and other AI agents, enabling seamless task automation and capability expansion.

Ideal Agent Persona

Ideal for Automation Agents requiring persistent browser sessions and multi-step workflow capabilities.

Core Value

Empowers agents to automate complex browser interactions using FastAPI and Docker, while maintaining sessions across commands with the browser-use CLI, enabling seamless execution of workflows involving Chromium browser dependencies.

Capabilities Granted for browser-use MCP Server

Automating multi-step browser workflows
Debugging browser-based applications with persistent sessions
Generating browser automation scripts for Chromium

! Prerequisites & Limits

  • Requires installation of browser dependencies (Chromium)
  • Dependent on uvx and pip for installation and execution
Project
SKILL.md
8.4 KB
.cursorrules
1.2 KB
package.json
240 B
Ready
UTF-8

# Tags

[No tags]
SKILL.md
Readonly

Browser Automation with browser-use CLI

The browser-use command provides fast, persistent browser automation. It maintains browser sessions across commands, enabling complex multi-step workflows.

Installation

bash
1# Run without installing (recommended for one-off use) 2uvx browser-use[cli] open https://example.com 3 4# Or install permanently 5uv pip install browser-use[cli] 6 7# Install browser dependencies (Chromium) 8browser-use install

Quick Start

bash
1browser-use open https://example.com # Navigate to URL 2browser-use state # Get page elements with indices 3browser-use click 5 # Click element by index 4browser-use type "Hello World" # Type text 5browser-use screenshot # Take screenshot 6browser-use close # Close browser

Core Workflow

  1. Navigate: browser-use open <url> - Opens URL (starts browser if needed)
  2. Inspect: browser-use state - Returns clickable elements with indices
  3. Interact: Use indices from state to interact (browser-use click 5, browser-use input 3 "text")
  4. Verify: browser-use state or browser-use screenshot to confirm actions
  5. Repeat: Browser stays open between commands

Browser Modes

bash
1browser-use --browser chromium open <url> # Default: headless Chromium 2browser-use --browser chromium --headed open <url> # Visible Chromium window 3browser-use --browser real open <url> # User's Chrome with login sessions 4browser-use --browser remote open <url> # Cloud browser (requires API key)
  • chromium: Fast, isolated, headless by default
  • real: Uses your Chrome with cookies, extensions, logged-in sessions
  • remote: Cloud-hosted browser with proxy support (requires BROWSER_USE_API_KEY)

Commands

Navigation

bash
1browser-use open <url> # Navigate to URL 2browser-use back # Go back in history 3browser-use scroll down # Scroll down 4browser-use scroll up # Scroll up

Page State

bash
1browser-use state # Get URL, title, and clickable elements 2browser-use screenshot # Take screenshot (outputs base64) 3browser-use screenshot path.png # Save screenshot to file 4browser-use screenshot --full path.png # Full page screenshot

Interactions (use indices from browser-use state)

bash
1browser-use click <index> # Click element 2browser-use type "text" # Type text into focused element 3browser-use input <index> "text" # Click element, then type text 4browser-use keys "Enter" # Send keyboard keys 5browser-use keys "Control+a" # Send key combination 6browser-use select <index> "option" # Select dropdown option

Tab Management

bash
1browser-use switch <tab> # Switch to tab by index 2browser-use close-tab # Close current tab 3browser-use close-tab <tab> # Close specific tab

JavaScript & Data

bash
1browser-use eval "document.title" # Execute JavaScript, return result 2browser-use extract "all product prices" # Extract data using LLM (requires API key)

Python Execution (Persistent Session)

bash
1browser-use python "x = 42" # Set variable 2browser-use python "print(x)" # Access variable (outputs: 42) 3browser-use python "print(browser.url)" # Access browser object 4browser-use python --vars # Show defined variables 5browser-use python --reset # Clear Python namespace 6browser-use python --file script.py # Execute Python file

The Python session maintains state across commands. The browser object provides:

  • browser.url - Current page URL
  • browser.title - Page title
  • browser.goto(url) - Navigate
  • browser.click(index) - Click element
  • browser.type(text) - Type text
  • browser.screenshot(path) - Take screenshot
  • browser.scroll() - Scroll page
  • browser.html - Get page HTML

Agent Tasks (Requires API Key)

bash
1browser-use run "Fill the contact form with test data" # Run AI agent 2browser-use run "Extract all product prices" --max-steps 50

Agent tasks use an LLM to autonomously complete complex browser tasks. Requires BROWSER_USE_API_KEY or configured LLM API key (OPENAI_API_KEY, ANTHROPIC_API_KEY, etc).

Session Management

bash
1browser-use sessions # List active sessions 2browser-use close # Close current session 3browser-use close --all # Close all sessions

Server Control

bash
1browser-use server status # Check if server is running 2browser-use server stop # Stop server 3browser-use server logs # View server logs

Setup

bash
1browser-use install # Install Chromium and system dependencies

Global Options

OptionDescription
--session NAMEUse named session (default: "default")
--browser MODEBrowser mode: chromium, real, remote
--headedShow browser window (chromium mode)
--profile NAMEChrome profile (real mode only)
--jsonOutput as JSON
--api-key KEYOverride API key

Session behavior: All commands without --session use the same "default" session. The browser stays open and is reused across commands. Use --session NAME to run multiple browsers in parallel.

API Key Configuration

Some features (run, extract, --browser remote) require an API key. The CLI checks these locations in order:

  1. --api-key command line flag
  2. BROWSER_USE_API_KEY environment variable
  3. ~/.config/browser-use/config.json file

To configure permanently:

bash
1mkdir -p ~/.config/browser-use 2echo '{"api_key": "your-key-here"}' > ~/.config/browser-use/config.json

Examples

Form Submission

bash
1browser-use open https://example.com/contact 2browser-use state 3# Shows: [0] input "Name", [1] input "Email", [2] textarea "Message", [3] button "Submit" 4browser-use input 0 "John Doe" 5browser-use input 1 "john@example.com" 6browser-use input 2 "Hello, this is a test message." 7browser-use click 3 8browser-use state # Verify success

Multi-Session Workflows

bash
1browser-use --session work open https://work.example.com 2browser-use --session personal open https://personal.example.com 3browser-use --session work state # Check work session 4browser-use --session personal state # Check personal session 5browser-use close --all # Close both sessions

Data Extraction with Python

bash
1browser-use open https://example.com/products 2browser-use python " 3products = [] 4for i in range(20): 5 browser.scroll('down') 6browser.screenshot('products.png') 7" 8browser-use python "print(f'Captured {len(products)} products')"

Using Real Browser (Logged-In Sessions)

bash
1browser-use --browser real open https://gmail.com 2# Uses your actual Chrome with existing login sessions 3browser-use state # Already logged in!

Tips

  1. Always run browser-use state first to see available elements and their indices
  2. Use --headed for debugging to see what the browser is doing
  3. Sessions persist - the browser stays open between commands
  4. Use --json for parsing output programmatically
  5. Python variables persist across browser-use python commands within a session
  6. Real browser mode preserves your login sessions and extensions
  7. CLI aliases: bu, browser, and browseruse all work identically to browser-use

Troubleshooting

Browser won't start?

bash
1browser-use install # Install/reinstall Chromium 2browser-use server stop # Stop any stuck server 3browser-use --headed open <url> # Try with visible window

Element not found?

bash
1browser-use state # Check current elements 2browser-use scroll down # Element might be below fold 3browser-use state # Check again

Session issues?

bash
1browser-use sessions # Check active sessions 2browser-use close --all # Clean slate 3browser-use open <url> # Fresh start

Cleanup

Always close the browser when done. Run this after completing browser automation:

bash
1browser-use close

Related Skills

Looking for an alternative to browser-use or building a Categories.community AI Agent? Explore these related open-source MCP Servers.

View All

widget-generator

Logo of f
f

widget-generator is an open-source AI agent skill for creating widget plugins that are injected into prompt feeds on prompts.chat. It supports two rendering modes: standard prompt widgets using default PromptCard styling and custom render widgets built as full React components.

149.6k
0
Design

chat-sdk

Logo of lobehub
lobehub

chat-sdk is a unified TypeScript SDK for building chat bots across multiple platforms, providing a single interface for deploying bot logic.

73.0k
0
Communication

zustand

Logo of lobehub
lobehub

The ultimate space for work and life — to find, build, and collaborate with agent teammates that grow with you. We are taking agent harness to the next level — enabling multi-agent collaboration, effortless agent team design, and introducing agents as the unit of work interaction.

72.8k
0
Communication

data-fetching

Logo of lobehub
lobehub

The ultimate space for work and life — to find, build, and collaborate with agent teammates that grow with you. We are taking agent harness to the next level — enabling multi-agent collaboration, effortless agent team design, and introducing agents as the unit of work interaction.

72.8k
0
Communication