Overview

The Agent class is the core component of Browsernode that handles browser automation. Here are the main configuration options you can use when initializing an agent.

Basic Settings

import { ChatOpenAI } from "browsernode/llm";
import { Agent } from "browsernode";

// Initialize the model
const llm = new ChatOpenAI({
	model: "gpt-4o",
	apiKey: process.env.OPENAI_API_KEY,
});
// Create agent with the model
const task = "Your task here";
const agent = new Agent(task, llm);
agent.run();

Required Parameters

  • task: The instruction for the agent to execute
  • llm: A LangChain chat model instance. See LangChain Models for supported models.

Agent Behavior

Control how the agent operates:
import { ChatOpenAI } from "browsernode/llm";
import { Agent, Controller } from "browsernode";

const customController = new Controller();

const llmClient = new ChatOpenAI({
	model: "gpt-4o",
	apiKey: process.env.OPENAI_API_KEY,
});

const agent = new Agent("your task", llmClient, {
	controller: customController,
	useVision: true,
	saveConversationPath: "logs/conversation",
});

Behavior Parameters

  • controller: Registry of functions the agent can call. Defaults to base Controller. See Custom Functions for details.
  • useVision: Enable/disable vision capabilities. Defaults to True.
    • When enabled, the model processes visual information from web pages
    • Disable to reduce costs or use models without vision support
    • For GPT-4o, image processing costs approximately 800-1000 tokens (~$0.002 USD) per image (but this depends on the defined screen size)
  • saveConversationPath: Path to save the complete conversation history. Useful for debugging.
  • systemPromptClass: Custom system prompt class. See System Prompt for customization options.
  • overrideSystemMessage: Completely replace the default system prompt with a custom one.
  • extendSystemMessage: Add additional instructions to the default system prompt.
Vision capabilities are recommended for better web interaction understanding, but can be disabled to reduce costs or when using models without vision support.

Reuse Existing Browser Context

By default browsernode launches its own builtin browser using playwright chromium. You can also connect to a remote browser or pass any of the following existing playwright objects to the Agent: page, browserContext, browser, browserSession, or browserProfile. These all get passed down to create a BrowserSession for the Agent:
agent = Agent(
    task='book a flight to fiji',
    llm=llm,
    browserProfile=browserProfile,  // use this profile to create a BrowserSession
    browserSession= new BrowserSession(   // use an existing BrowserSession
      cdpUrl=...,                      // remote CDP browser to connect to
      // or
      wssUrl=...,                      // remote wss playwright server provider
      // or
      browserPid=...                   // pid of a locally running browser process to attach to
      // or
      executablePath=...               // provide a custom chrome binary path
      // or
      channel=...                       // specify chrome, chromium, ms-edge, etc.
      // or
      page=page,                        // use an existing playwright Page object
      // or
      browserContext=browserContext,  // use an existing playwright BrowserContext object
      // or
      browser=browser,                  // use an existing playwright Browser object
    ),
)
For example, to connect to an existing browser over CDP you could do:
agent = Agent(
    //...
    browserSession=new BrowserSession(cdpUrl='http://localhost:9222'),
)
For example, to connect to a local running chrome instance you can do:
agent = Agent(
    //...
    browserSession=new BrowserSession(browserPid=1234),
)
See Connect to your Browser for more info.
You can reuse the same BrowserSession after an agent has completed running. If you do nothing, the browser will be automatically closed on run() completion only if it was launched by us.

Running the Agent

The agent is executed using the async run() method:
  • maxSteps (default: 100) Maximum number of steps the agent can take during execution. This prevents infinite loops and helps control execution time.

Agent History

The method returns an AgentHistoryList object containing the complete execution history. This history is invaluable for debugging, analysis, and creating reproducible scripts.
// Example of accessing history
const history = await agent.run()

// Access (some) useful information
history.urls()              // List of visited URLs
history.screenshots()       // List of screenshot paths
history.actionNames()      // Names of executed actions
history.extractedContent() // Content extracted during execution
history.errors()           // Any errors that occurred
history.modelActions()     // All actions with their parameters
The AgentHistoryList provides many helper methods to analyze the execution:
  • finalResult(): Get the final extracted content
  • isDone(): Check if the agent completed successfully
  • hasErrors(): Check if any errors occurred
  • modelThoughts(): Get the agent’s reasoning process
  • actionResults(): Get results of all actions
For a complete list of helper methods and detailed history analysis capabilities, refer to the AgentHistoryList source code.

Run initial actions without LLM

With this example you can run initial actions without the LLM. Specify the action as a dictionary where the key is the action name and the value is the action parameters. You can find all our actions in the Controller source code.
import { ChatOpenAI } from "browsernode/llm";
import { Agent } from "browsernode";

const initialActions = [
	{ openTab: { url: "https://search.brave.com" } },
	{
		openTab: {
			url: "https://www.anthropic.com/engineering/building-effective-agents",
		},
	},
	{ scrollDown: { amount: 5000 } },
];

const llm = new ChatOpenAI({
	model: "gpt-4o",
	temperature: 0.0,
	apiKey: process.env.OPENAI_API_KEY,
});

const task = "What theories are displayed on the page?";
const agent = new Agent(task, llm, { initialActions: initialActions });
console.log("---initial_actions.ts agent run---");
agent.run();

Run with message context

You can configure the agent and provide a separate message to help the LLM understand the task better.
import { ChatOpenAI } from "browsernode/llm";
import { Agent } from "browsernode";

const llm = new ChatOpenAI({
	modelName: "gpt-4o",
	temperature: 0.0,
	streaming: true,
	openAIApiKey: process.env.OPENAI_API_KEY,
});
const agent = new Agent({
  task: task,
  llm: llm,
  messageContext: "Additional information about the task"
});
agent.run();

Run with planner model

You can configure the agent to use a separate planner model for high-level task planning:
import { ChatOpenAI } from "browsernode/llm";
import { Agent } from "browsernode";

const llm = new ChatOpenAI({
	model: "gpt-4o",
	temperature: 0.0,
	apiKey: process.env.OPENAI_API_KEY,
});

const plannerLLM = new ChatOpenAI({
	modelName: "o3-mini",
	openAIApiKey: process.env.OPENAI_API_KEY,
});

const task = "your task here";
const agent = new Agent({
  task: task,
  llm: llm,
	plannerLLM: plannerLLM,
	useVisionForPlanner: false,
	plannerInterval: 4,
});
agent.run();

Planner Parameters

  • plannerLLM: A LangChain chat model instance used for high-level task planning. Can be a smaller/cheaper model than the main LLM.
  • useVisionForPlanner: Enable/disable vision capabilities for the planner model. Defaults to True.
  • plannerInterval: Number of steps between planning phases. Defaults to 1.
Using a separate planner model can help:
  • Reduce costs by using a smaller model for high-level planning
  • Improve task decomposition and strategic thinking
  • Better handle complex, multi-step tasks
The planner model is optional. If not specified, the agent will not use the planner model.

Optional Parameters

  • messageContext: Additional information about the task to help the LLM understand the task better.
  • initialActions: List of initial actions to run before the main task.
  • maxActionsPerStep: Maximum number of actions to run in a step. Defaults to 10.
  • maxFailures: Maximum number of failures before giving up. Defaults to 3.
  • retryDelay: Time to wait between retries in seconds when rate limited. Defaults to 10.
  • generateGif: Enable/disable GIF generation. Defaults to False. Set to True or a string path to save the GIF.

Memory

Memory management in browsernode has been significantly improved. The agent’s context handling and state management are now robust enough that the previous memory system (mem0) is no longer needed or supported. The agent maintains its context and task progress through:
  • Detailed history tracking of actions and results
  • Structured state management
  • Clear goal setting and evaluation at each step
The enableMemory parameter has been removed as the new system provides better context management by default.
If you’re upgrading from an older version that used enableMemory, simply remove this parameter. The agent will automatically use the improved context management system.