# Browser Use

{% hint style="info" %}
The following documentation is for **Browser Use v1.14.1** *(BDK)*.
{% endhint %}

## Overview

The **Browser Use** book enables you to perform **web automation** using natural language. Use plain English to describe a task, and a remote browser agent will execute it.

The browser agent can perform a range of web automation tasks, including navigating pages, filling out forms, and extracting structured data, all with full traceability. Every action is logged with detailed step plans and unique session IDs. Take a look at this quick example:

{% embed url="<https://app.supademo.com/demo/cmfrasdf107h810k8ggp5fqyk>" %}

## Prerequisites

### 1. Required Books

The following Book(s) need to be added to your agent so it can learn and understand the automation procedures defined within them:

* **Browser Use**

#### How to Add the Book(s)

1. Go to **Books** → **All Books**.
2. Search for the name of the book and click on it.
3. Click on <kbd>**Install**</kbd> or <kbd>**Add Connection**</kbd> to add the book to your agent.
4. If adding a connection, you'll be prompted for [**connectivity**](#connectivity) details.

## Procedures

### to close a browser

Close a browser instance.

**Input Concepts**

| Concept                        | Description                                    | Type               | Required | Default Value |
| ------------------------------ | ---------------------------------------------- | ------------------ | -------- | ------------- |
| [`browser`](#browser-instance) | A reference to the browser instance to delete. | `browser instance` | Yes      | (no default)  |

**Output Concepts**

| Concept  | Description | Type      |
| -------- | ----------- | --------- |
| `answer` | None        | `boolean` |

**Examples**

Close a browser instance:

```generic
close the browser
```

### to perform a task on a browser and get the visual log, the detailed plan, the result, the browser run id and the browser files

Execute a browser-based web automation task.

Note: Before using this procedure, you must first "provision a browser" for your automation task.

**Input Concepts**

| Concept                        | Description                                                                           | Type               | Required | Default Value |
| ------------------------------ | ------------------------------------------------------------------------------------- | ------------------ | -------- | ------------- |
| [`browser`](#browser-instance) | The reference to the provisioned browser instance.                                    | `browser instance` | Yes      | (no default)  |
| [`task`](#browser-task)        | An object containing task configuration (created with the Browser Automation widget). | `browser task`     | Yes      | (no default)  |

**Output Concepts**

| Concept          | Description                                                           | Type   |
| ---------------- | --------------------------------------------------------------------- | ------ |
| `visual log`     | A series of screenshots showing each step the browser took.           | `file` |
| `detailed plan`  | A text summary of the steps taken to automate the task.               | `text` |
| `result`         | The data extracted or produced by the agent upon completing the task. | `any`  |
| `browser run id` | The identifier of the browser run.                                    | `text` |
| `browser files`  | A list of files generated by the agent upon completing the task.      | `file` |

**Examples**

Get the Weather in Philadelphia. Task instructions: "Search for the 'current weather in Philadelphia' and extract the temperature"

```generic
provision a browser
perform <the task> on the browser and get the visual log, the detailed plan, the result, the browser run id and the browser files
```

Get Laptop Prices from Amazon. Task instructions: "Go to [www.Amazon.com](http://www.Amazon.com) and search for 'laptop.' Look at the first 5 product listings on the results page. For each product, copy the product name and its price. Show me the list of product names with their prices."

```generic
provision a browser
perform <the task> on the browser and get the visual log, the detailed plan, the result, the browser run id and the browser files
```

### to provision a browser

Provision a new K8s browser instance for a web automation task.

**Output Concepts**

| Concept                       | Description | Type               |
| ----------------------------- | ----------- | ------------------ |
| [`answer`](#browser-instance) | None        | `browser instance` |

**Examples**

Provision a browser and get the reference to the browser instance:

```generic
provision a browser
```

### to provision an aws browser

Provision a new AWS AgentCore browser instance for a web automation task.

**Output Concepts**

| Concept                       | Description | Type               |
| ----------------------------- | ----------- | ------------------ |
| [`answer`](#browser-instance) | None        | `browser instance` |

**Examples**

Provision an AWS browser and get the reference to the browser instance:

```generic
provision an aws browser
```

### to run a playwright script on a browser and get the result, the browser run id and the browser files

Execute a custom Playwright script on the provisioned browser.

Note: Before using this procedure, you must first "provision a browser". The script must: 1. Be stored in S3 2. Contain an async function: run(page, context, browser, params, llm=None) 3. Be accessible from the browser farm's AWS credentials

**Input Concepts**

| Concept                        | Description                                                                  | Type               | Required | Default Value |
| ------------------------------ | ---------------------------------------------------------------------------- | ------------------ | -------- | ------------- |
| [`browser`](#browser-instance) | Reference to the provisioned browser instance.                               | `browser instance` | Yes      | (no default)  |
| `playwright script`            | S3 URI of the Playwright script (e.g., "s3://bucket/scripts/my\_script.py"). | `text`             | Yes      | (no default)  |
| `parameter json`               | Optional dictionary of parameters to pass to the script.                     | `json`             | No       | (no default)  |

**Output Concepts**

| Concept          | Description                              | Type   |
| ---------------- | ---------------------------------------- | ------ |
| `result`         | Data returned by the script execution.   | `any`  |
| `browser run id` | Identifier of the browser run.           | `text` |
| `browser files`  | Files generated during script execution. | `file` |

**Examples**

Run a custom data extraction script with parameters:

```generic
provision a browser
the script s3 uri is "s3://my-bucket/scripts/123456789/weather_search.py"
run the script s3 uri on the browser with parameters {"location": "San Francisco, CA"} and get the result, the browser run id and the browser files
```

## Concepts

### Browser instance

Represents attributes of an instance with browser support.

| Field Name     | Description                                                    | Type             |
| -------------- | -------------------------------------------------------------- | ---------------- |
| `browser_name` | The name of the browser instance (pod name or AWS session ID). | `text`           |
| `vnc`          | The Virtual Network Computing (VNC) URL to access the browser. | `text`           |
| `browser_type` | The type of browser backend ("k8s" or "aws").                  | `optional[text]` |

### Browser task

Represents a browser automation task.

| Field Name             | Description                                                                                                                                                                                | Type                     |
| ---------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ | ------------------------ |
| `instructions`         | A natural language description of the task for the browser agent to perform (e.g., "Log into my account on example.com and check for new messages").                                       | `text`                   |
| `browser_task_id`      | Identifier for the browser task.                                                                                                                                                           | `text`                   |
| `output`               | The desired structure or format of the results. It is recommended to use key-value pairs for clarity (e.g., {"key": "value"}). If specified, the agent will format the result accordingly. | `optional[text]`         |
| `worker_id`            | Identifier for the Kognitos worker executing the task.                                                                                                                                     | `text`                   |
| `line_id`              | Identifier for the specific line/step in the Kognitos automation.                                                                                                                          | `text`                   |
| `browser_task_version` | Version of the browser task.                                                                                                                                                               | `text`                   |
| `task_context`         | Optional additional context or data to assist in completing the task. Contains the resolved reference facts for the task.                                                                  | `optional[json]`         |
| `credential_names`     | The names of the authentication credentials needed for the task.                                                                                                                           | `optional[list of text]` |
| `files`                | The files to be uploaded to the browser.                                                                                                                                                   | `optional[list of text]` |


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.kognitos.com/legacy/legacy-experience/books/reference/browser-use.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
