Skip to content

Automate Browser

Snippbot’s browser tool gives agents full control over a Chromium instance via the Chrome DevTools Protocol (CDP). Agents can navigate, click, type, extract data, take screenshots, and record sessions for replay.

Choose your browser backend in Settings → Browser:

BackendDescription
Managed (default)Headless Chromium launched and managed by Snippbot
ChromeUse your already-installed Chrome browser
CDPConnect to an existing Chrome via remote debugging port
BrowserlessConnect to a browserless.io cloud instance

In the chat interface, simply describe what you want:

Example prompt: “Navigate to github.com/trending and extract the top 10 repos as JSON”

Example prompt: “Go to my company’s Jira board and create a ticket: Title: Fix login timeout, Priority: High, Assignee: me”

Example prompt: “Search for ‘python async tutorial’ on YouTube and screenshot the first 5 results”

The agent uses browser tools internally — you see the results in the chat.

The full set of browser capabilities:

  • navigate — Go to a URL
  • go_back / go_forward — Browser history
  • reload — Reload current page
  • click — Click an element
  • double_click — Double-click
  • right_click — Right-click (context menu)
  • type_text — Type into a field
  • press_key — Press keyboard keys (Enter, Tab, Escape, etc.)
  • hover — Hover over an element
  • drag_and_drop — Drag from one element to another
  • select_option — Select from a dropdown
  • check / uncheck — Toggle checkboxes
  • scroll — Scroll by pixels
  • scroll_to_element — Scroll to bring an element into view
  • wait_for_selector — Wait for an element to appear
  • wait_for_navigation — Wait for page navigation to complete
  • wait_for_network_idle — Wait until there’s no network activity
  • screenshot — Screenshot of the viewport
  • screenshot_element — Screenshot of a specific element
  • screenshot_full_page — Full-page screenshot
  • pdf_export — Export page as PDF
  • get_text — Extract visible text
  • get_attribute — Get element attribute value
  • get_bounding_box — Get element position and size
  • is_visible / is_enabled — Check element state
  • element_count — Count matching elements
  • handle_dialog — Accept/dismiss JavaScript dialogs (auto-dismissed after 30s)

Instead of using CSS selectors, the agent can work with numbered element references from a DOM snapshot. This makes browser interaction more readable and less fragile:

The snapshot lists each interactive element with a numbered reference, its label, element type, and visibility state. For example, the agent might see a sign-in button as element 1, a username field as element 2, and a password field as element 3.

The agent types: “type ‘user@example.com’ into element 2” and the DOM snapshot engine resolves it to the correct element.

Record a browser session and replay it later:

  1. Start recording — open the Browser page (/browser) and click Record in the toolbar

  2. Perform actions manually or let the agent control the browser

  3. Stop recording — the session is saved as a JSON file in the agent workspace

  4. Replay — open the saved session from the Browser page and click Replay in the toolbar

Recordings redact sensitive parameters (passwords, tokens) automatically.

Test responsive designs and mobile behavior:

Example prompt: “Emulate an iPhone 14 Pro Max and take a screenshot of twitter.com”

Supported presets:

  • iPhone 14, iPhone 14 Pro Max, Pixel 7, Galaxy S23
  • iPad Pro 11
  • Desktop 1920x1080, Desktop 1280x720

Or ask the agent for a custom viewport:

Example prompt: “Emulate a 375x812 viewport at 3x scale with a mobile user agent and take a screenshot”

Capture and mock network requests:

Example prompt: “Capture all API calls made by the page, then mock the /api/users endpoint to return test data”

The NetworkManager supports:

  • Request/response capture (HAR format)
  • Response mocking with custom status and body
  • Header injection
  • Request blocking

The agent can save and restore login state across sessions:

Example prompt: “Log into GitHub with my credentials, then save the session for reuse”

Auth state (cookies, localStorage, sessionStorage) is encrypted at rest using Fernet. On the next session, the agent restores the state to skip the login step.

The browser tool includes an SSRF guard that blocks:

  • Private IPv4 ranges (10.x, 192.168.x, 172.16–31.x, 127.x)
  • IPv6 loopback and link-local
  • Cloud metadata endpoints (169.254.169.254 for AWS, GCP, Alibaba)
  • Internal hostnames (.local, .internal)
  • Dangerous URL schemes (file://, ftp://, javascript:, gopher://)
  • 60 browser actions per 60 seconds (sliding window)
  • Actions that exceed the limit are queued with exponential backoff

Open the Browser page (/browser) in the Snippbot UI to see a live stream of what the agent is doing. The viewport updates at up to 2 frames/second by default.

When Auto-snapshot is enabled, Snippbot takes a screenshot at each step. This is invaluable for:

  • Debugging automation failures
  • Creating audit trails
  • Building replay recordings

Screenshots are saved to the agent’s workspace.

  • Use natural language goals, not step-by-step instructions. The agent handles the specifics.
  • Enable auto-snapshot for complex workflows — you can see exactly where things went wrong
  • Use cursor sync in the Browser UI to track the agent’s mouse position
  • Start with managed Chromium — switch to CDP only if you need an existing browser profile/extensions