Browser Use

Overview of the Browser Use book.

The following documentation is for Browser Use v1.13.2 (BDK).

Overview

The Browser Use book enables you to perform web automation using natural language. Use plain English to describe a task, and a remote browser agent will execute it.

The browser agent can perform a range of web automation tasks, including navigating pages, filling out forms, and extracting structured data, all with full traceability. Every action is logged with detailed step plans and unique session IDs. Take a look at this quick example:

Prerequisites

1. Required Books

The following Book(s) need to be added to your agent so it can learn and understand the automation procedures defined within them:

Browser Use

How to Add the Book(s)

Go to Books → All Books.
Search for the name of the book and click on it.
Click on Install or Add Connection to add the book to your agent.
If adding a connection, you'll be prompted for connectivity details.

Create a new playground after adding the book so the agent can recognize its procedures.

Connectivity

This section outlines the available methods for connecting to the Book, along with the required configuration details for each.

The Browser Use book can be used with or without credentials.

Getting Started

Follow these steps to write an automation that performs a web-based task:

Provision a Browser

The first step is to provision a browser, which sets up a browser instance for automation to control. This is required for any browser automation task. In the editor, write:

provision a browser

You can also name the browser instance using "as". This allows you to run multiple browser sessions or refer to a specific one. In this example, the browser instance is named x:

provision a browser as x

Prepare to Perform a Task

On the following line, write this procedure to send the task to a remote browser agent:

perform <the task> on the browser and get the visual log, the detailed plan, the result, the browser run id and the browser files

If you assigned a name to the browser instance, use it in place of the browser:

perform <the task> on x and get the visual log, the detailed plan, the result, the browser run id and the browser files

Create a Web Automation Task

Replace <the task> in the previous line with a Browser Automation widget:

In the editor, type / to open the widget menu and select Browser Automation.

Don't see the widget in your menu? Contact our support team to have it enabled in your agent.

Within the widget, select Choose an Existing Task or configure a new task by providing the following details. Then, click Save to finalize it.

Task Name

A short, descriptive name for the task (e.g., Check Website for Updates)

Task Instructions

Describe what you want the browser to do in plain English (e.g., Go to example.com, log in, and check for new messages).

Output Format (Optional)

Define how you want the results formatted. It is recommended to use key-value pairs for clarity (e.g., {"key": "value"}). If specified, the agent will format the result accordingly.

Credentials (Optional)

If your task requires access to a secure website or account, provide the necessary credentials (e.g., username and password). You can select from existing connections in your agent or add a new connection.

Run Your Automation

After provisioning the browser and defining your task, you’re ready to run the automation. The browser agent will carry out the steps and return the following:

Visual Log

A screen-by-screen view of what happened in the browser.

Detailed Plan

The step-by-step breakdown of actions taken.

Result

The output of the task (e.g., extracted data).

Browser Run ID

A unique identifier for the automation run.

Browser Files

List of files generated by the agent upon completing the browser automation task.

Learn how to work with data and files in browser automation tasks.

Best Practices

Follow these guidelines when configuring the Browser Automation widget to write clear and accurate task instructions for the browser agent.

1. Use Sequential Task Instructions

Write task instructions as a step-by-step script, where each line represents a distinct action in the workflow. For example:

1. Go to https://salesportal.com
2. Log in using stored credentials
3. Click the "Reports" tab
4. Select "Quarterly Sales Summary"
5. Click the "Download Report" button
6. Save the downloaded file to the workspace

2. Define the Output Format

Specify the Output Format to ensure the data is returned in an expected format. For example:

JSON in the format {"city": "temperature"}

3. Be Precise

Refer to webpage elements using their exact visible text or label to prevent ambiguity.

✅ Example

Click the 'Download Report' button.
Type the data into the field labeled 'Customer ID'.

❌ Instead of

Click the button.
Find the table.

4. Use Dynamic Input

Use facts to dynamically pass data into your task, rather than hardcoding values.

✅ Example

Search for the client profile that has the name {the client name}.

❌ Instead of

Search for "John Doe".

PreviousConcepts NextProcedures

Last updated 19 days ago

Was this helpful?