> ## Documentation Index
> Fetch the complete documentation index at: https://docs.browsernode.com/llms.txt
> Use this file to discover all available pages before exploring further.

# Agent Settings

> Learn how to configure the agent

## Overview

The `Agent` class is the core component of Browsernode that handles browser automation. Here are the main configuration options you can use when initializing an agent.

## Basic Settings

```js theme={null}
import { ChatOpenAI } from "browsernode/llm";
import { Agent } from "browsernode";

// Initialize the model
const llm = new ChatOpenAI({
	model: "gpt-4o",
	apiKey: process.env.OPENAI_API_KEY,
});
// Create agent with the model
const task = "Your task here";
const agent = new Agent(task, llm);
agent.run();
```

### Required Parameters

* `task`: The instruction for the agent to execute
* `llm`: A LangChain chat model instance. See <a href="/customize/supported-models">LangChain Models</a> for supported models.

## Agent Behavior

Control how the agent operates:

```js theme={null}
import { ChatOpenAI } from "browsernode/llm";
import { Agent, Controller } from "browsernode";

const customController = new Controller();

const llmClient = new ChatOpenAI({
	model: "gpt-4o",
	apiKey: process.env.OPENAI_API_KEY,
});

const agent = new Agent("your task", llmClient, {
	controller: customController,
	useVision: true,
	saveConversationPath: "logs/conversation",
});
```

### Behavior Parameters

* `controller`: Registry of functions the agent can call. Defaults to base Controller. See <a href="/customize/custom-functions">Custom Functions</a> for details.
* `useVision`: Enable/disable vision capabilities. Defaults to `True`.
  * When enabled, the model processes visual information from web pages
  * Disable to reduce costs or use models without vision support
  * For GPT-4o, image processing costs approximately 800-1000 tokens (\~\$0.002 USD) per image (but this depends on the defined screen size)
* `saveConversationPath`: Path to save the complete conversation history. Useful for debugging.
* `systemPromptClass`: Custom system prompt class. See <a href="/customize/system-prompt">System Prompt</a> for customization options.
* `overrideSystemMessage`: Completely replace the default system prompt with a custom one.
* `extendSystemMessage`: Add additional instructions to the default system prompt.

<Note>
  Vision capabilities are recommended for better web interaction understanding,
  but can be disabled to reduce costs or when using models without vision
  support.
</Note>

### Reuse Existing Browser Context

By default browsernode launches its own builtin browser using playwright chromium.
You can also connect to a remote browser or pass any of the following
existing playwright objects to the Agent: `page`, `browserContext`, `browser`, `browserSession`, or `browserProfile`.

These all get passed down to create a `BrowserSession` for the `Agent`:

```js theme={null}
agent = Agent(
    task='book a flight to fiji',
    llm=llm,
    browserProfile=browserProfile,  // use this profile to create a BrowserSession
    browserSession= new BrowserSession(   // use an existing BrowserSession
      cdpUrl=...,                      // remote CDP browser to connect to
      // or
      wssUrl=...,                      // remote wss playwright server provider
      // or
      browserPid=...                   // pid of a locally running browser process to attach to
      // or
      executablePath=...               // provide a custom chrome binary path
      // or
      channel=...                       // specify chrome, chromium, ms-edge, etc.
      // or
      page=page,                        // use an existing playwright Page object
      // or
      browserContext=browserContext,  // use an existing playwright BrowserContext object
      // or
      browser=browser,                  // use an existing playwright Browser object
    ),
)
```

For example, to connect to an existing browser over CDP you could do:

```js theme={null}
agent = Agent(
    //...
    browserSession=new BrowserSession(cdpUrl='http://localhost:9222'),
)
```

For example, to connect to a local running chrome instance you can do:

```js theme={null}
agent = Agent(
    //...
    browserSession=new BrowserSession(browserPid=1234),
)
```

See <a href="/customize/real-browser">Connect to your Browser</a> for more info.

<Note>
  You can reuse the same `BrowserSession` after an agent has completed running.
  If you do nothing, the browser will be automatically closed on `run()`
  completion only if it was launched by us.
</Note>

## Running the Agent

The agent is executed using the async `run()` method:

* `maxSteps` (default: `100`)
  Maximum number of steps the agent can take during execution. This prevents infinite loops and helps control execution time.

## Agent History

The method returns an `AgentHistoryList` object containing the complete execution history. This history is invaluable for debugging, analysis, and creating reproducible scripts.

```js theme={null}
// Example of accessing history
const history = await agent.run()

// Access (some) useful information
history.urls()              // List of visited URLs
history.screenshots()       // List of screenshot paths
history.actionNames()      // Names of executed actions
history.extractedContent() // Content extracted during execution
history.errors()           // Any errors that occurred
history.modelActions()     // All actions with their parameters
```

The `AgentHistoryList` provides many helper methods to analyze the execution:

* `finalResult()`: Get the final extracted content
* `isDone()`: Check if the agent completed successfully
* `hasErrors()`: Check if any errors occurred
* `modelThoughts()`: Get the agent's reasoning process
* `actionResults()`: Get results of all actions

<Note>
  For a complete list of helper methods and detailed history analysis
  capabilities, refer to the [AgentHistoryList source code](https://github.com/leoning60/browsernode/blob/main/src/agent/views.ts).
</Note>

## Run initial actions without LLM

With [this example](https://github.com/leoning60/browsernode/blob/main/examples/features/initial_actions.ts) you can run initial actions without the LLM.
Specify the action as a dictionary where the key is the action name and the value is the action parameters. You can find all our actions in the [Controller](https://github.com/leoning60/browsernode/blob/main/src/controller/service.ts) source code.

```js theme={null}
import { ChatOpenAI } from "browsernode/llm";
import { Agent } from "browsernode";

const initialActions = [
	{ openTab: { url: "https://search.brave.com" } },
	{
		openTab: {
			url: "https://www.anthropic.com/engineering/building-effective-agents",
		},
	},
	{ scrollDown: { amount: 5000 } },
];

const llm = new ChatOpenAI({
	model: "gpt-4o",
	temperature: 0.0,
	apiKey: process.env.OPENAI_API_KEY,
});

const task = "What theories are displayed on the page?";
const agent = new Agent(task, llm, { initialActions: initialActions });
console.log("---initial_actions.ts agent run---");
agent.run();

```

## Run with message context

You can configure the agent and provide a separate message to help the LLM understand the task better.

```js theme={null}
import { ChatOpenAI } from "browsernode/llm";
import { Agent } from "browsernode";

const llm = new ChatOpenAI({
	modelName: "gpt-4o",
	temperature: 0.0,
	streaming: true,
	openAIApiKey: process.env.OPENAI_API_KEY,
});
const agent = new Agent({
  task: task,
  llm: llm,
  messageContext: "Additional information about the task"
});
agent.run();
```

## Run with planner model

You can configure the agent to use a separate planner model for high-level task planning:

```js theme={null}
import { ChatOpenAI } from "browsernode/llm";
import { Agent } from "browsernode";

const llm = new ChatOpenAI({
	model: "gpt-4o",
	temperature: 0.0,
	apiKey: process.env.OPENAI_API_KEY,
});

const plannerLLM = new ChatOpenAI({
	modelName: "o3-mini",
	openAIApiKey: process.env.OPENAI_API_KEY,
});

const task = "your task here";
const agent = new Agent({
  task: task,
  llm: llm,
	plannerLLM: plannerLLM,
	useVisionForPlanner: false,
	plannerInterval: 4,
});
agent.run();

```

### Planner Parameters

* `plannerLLM`: A LangChain chat model instance used for high-level task planning. Can be a smaller/cheaper model than the main LLM.
* `useVisionForPlanner`: Enable/disable vision capabilities for the planner model. Defaults to `True`.
* `plannerInterval`: Number of steps between planning phases. Defaults to `1`.

Using a separate planner model can help:

* Reduce costs by using a smaller model for high-level planning
* Improve task decomposition and strategic thinking
* Better handle complex, multi-step tasks

<Note>
  The planner model is optional. If not specified, the agent will not use the planner model.
</Note>

### Optional Parameters

* `messageContext`: Additional information about the task to help the LLM understand the task better.
* `initialActions`: List of initial actions to run before the main task.
* `maxActionsPerStep`: Maximum number of actions to run in a step. Defaults to `10`.
* `maxFailures`: Maximum number of failures before giving up. Defaults to `3`.
* `retryDelay`: Time to wait between retries in seconds when rate limited. Defaults to `10`.
* `generateGif`: Enable/disable GIF generation. Defaults to `False`. Set to `True` or a string path to save the GIF.

## Memory

Memory management in browsernode has been significantly improved. The agent's context handling and state management are now robust enough that the previous memory system (`mem0`) is no longer needed or supported.

The agent maintains its context and task progress through:

* Detailed history tracking of actions and results
* Structured state management
* Clear goal setting and evaluation at each step

The `enableMemory` parameter has been removed as the new system provides better context management by default.

<Note>
  If you're upgrading from an older version that used `enableMemory`, simply remove this parameter. The agent will automatically use the improved context management system.
</Note>
