page.ai()

The page.ai() method executes multi-step browser tasks using natural language. It runs an agentic loop that observes the page, makes decisions, and executes actions until the goal is complete.

agent.executeTask() does the same thing—it’s a convenience method that creates a page for you. Use whichever fits your workflow.

Basic Usage

Execute a multi-step task on a specific page:

import { HyperAgent } from "@hyperbrowser/agent";

const agent = new HyperAgent({
  llm: { provider: "openai", model: "gpt-4o" },
});

const page = await agent.newPage();
await page.goto("https://flights.google.com");

const { output, actionCache } = await page.ai(
  "Search for round-trip flights from Miami to LAX, leaving Dec 15 and returning Dec 22"
);

console.log(output);
// The actionCache can be saved and replayed later
console.log(`Completed in ${actionCache.steps.length} steps`);

await agent.closeAgent();

Parameters

task

string

required

Natural language description of what you want to accomplish. Be specific for best results.

options

object

Configuration options for the task execution.

options.maxSteps

number

default:"50"

Maximum number of actions the AI can take. Increase for complex tasks.

options.useDomCache

boolean

default:"false"

Reuse DOM snapshots across steps for faster execution.

options.enableVisualMode

boolean

default:"false"

Enable screenshots with element overlays for visual understanding.

options.enableDomStreaming

boolean

default:"false"

Stream DOM updates for more responsive execution.

options.outputSchema

ZodSchema

Define a Zod schema for structured output extraction at the end of the task.

Example with Options

import { z } from "zod";

const { output, actionCache } = await page.ai(
  "Find the cheapest flight from NYC to London next month",
  {
    maxSteps: 30,
    useDomCache: true,
    outputSchema: z.object({
      airline: z.string(),
      price: z.number(),
      departure: z.string(),
      arrival: z.string(),
    }),
  }
);

console.log(output); // Typed as { airline, price, departure, arrival }

agent.executeTask()

executeTask() is a shorthand that creates a page and runs page.ai() for you:

// This:
const result = await agent.executeTask("Go to amazon.com and find the top seller");

// Is equivalent to:
const page = await agent.newPage();
const result = await page.ai("Go to amazon.com and find the top seller");

It accepts the same parameters as page.ai():

With Output Schema

import { z } from "zod";

const result = await agent.executeTask(
  "Navigate to imdb.com, search for 'The Matrix', and extract the movie details",
  {
    outputSchema: z.object({
      director: z.string().describe("The name of the movie director"),
      releaseYear: z.number().describe("The year the movie was released"),
      rating: z.string().describe("The IMDb rating of the movie"),
    }),
  }
);

console.log(result.output);
// { director: "Lana Wachowski, Lilly Wachowski", releaseYear: 1999, rating: "8.7/10" }

Return Value

Both methods return a TaskOutput object:

interface TaskOutput {
  taskId: string;           // Unique identifier for this task
  status: TaskStatus;       // "completed" | "failed" | "cancelled"
  output: string | T;       // Result (or typed if outputSchema provided)
  steps: AgentStep[];       // Array of steps taken
  actionCache: ActionCacheOutput; // Recorded actions for replay
}

Task Status

Status	Description
`completed`	Task finished successfully
`failed`	Task encountered an error
`cancelled`	Task was cancelled before completion

Visual Mode

Enable visual mode when the AI needs to understand page layout or when dealing with complex visual elements:

const { output } = await page.ai(
  "Find the product image and describe what's shown",
  {
    enableVisualMode: true,
  }
);

Visual mode uses screenshots which increases token usage and latency. Only enable when visual understanding is necessary.

Real-World Examples

Flight Search

const page = await agent.newPage();
await page.goto("https://flights.google.com");

const { output, actionCache } = await page.ai(
  "Search for round-trip flights from Rio de Janeiro to Los Angeles, " +
  "leaving December 11, 2025 and returning December 22, 2025. " +
  "Select the option with the lowest carbon emissions.",
  {
    useDomCache: true,
    enableDomStreaming: true,
  }
);

// Save actionCache for later replay
console.log(JSON.stringify(actionCache, null, 2));

E-commerce Price Comparison

import { z } from "zod";

const result = await agent.executeTask(
  "Go to amazon.com, search for 'mechanical keyboard', and compare the top 3 results",
  {
    outputSchema: z.object({
      products: z.array(z.object({
        name: z.string(),
        price: z.number(),
        rating: z.number(),
        reviewCount: z.number(),
      })),
      recommendation: z.string(),
    }),
  }
);

console.log(result.output.products);
console.log(result.output.recommendation);

Google Form Submission

const agent = new HyperAgent({
  llm: { provider: "openai", model: "gpt-4o" },
});

const page = await agent.newPage();
await page.goto("https://docs.google.com/forms/d/e/1FAIpQLScPkE8wNLpPSkP2d__Ee7xx5Pj7_XDuZ0p16geYWrp73Nutmw/viewform?usp=dialog");

// Fill each field
await page.perform("fill the name field with John Doe");
await page.perform("fill the email field with [email protected]");
await page.perform("fill the feedback text area with This is a test submission");
await page.perform("select 5 rating option");

// Submit
await page.perform("click the submit button");

await agent.closeAgent();

Action Cache Output

Every page.ai() call returns an actionCache that records all actions taken:

{
  "taskId": "abc-123",
  "createdAt": "2025-01-15T10:30:00Z",
  "status": "completed",
  "steps": [
    {
      "stepIndex": 0,
      "actionType": "actElement",
      "instruction": "Click the source location input",
      "method": "click",
      "arguments": [],
      "frameIndex": 0,
      "xpath": "/html/body/div[1]/input[1]",
      "success": true,
      "message": "Successfully clicked element"
    }
    // ... more steps
  ]
}

This cache can be saved and replayed later for deterministic execution without LLM calls.

Error Handling

try {
  const result = await page.ai("Complete the checkout process", {
    maxSteps: 50,
  });
  
  if (result.status === "failed") {
    console.error("Task failed:", result.output);
  }
} catch (error) {
  console.error("Execution error:", error);
}

Best Practices

Write specific, detailed instructions

Instead of “search for flights”, say “search for round-trip flights from Miami to LAX, departing December 15 and returning December 22, 2025”.

Set appropriate maxSteps

Simple tasks: 10-20 steps. Complex multi-page workflows: 50+ steps. Monitor task outputs and adjust as needed.

Use outputSchema for structured data

When you need specific data extracted, define a Zod schema to get typed, validated output.

Save actionCache for replay

Store the returned actionCache to replay the same automation later without LLM calls.

Next Steps

page.perform()

Fast single-action execution

page.extract()

Extract structured data

Action Caching

Replay automations without LLM calls

Configuration

Configure LLM providers

Getting Started

Core Methods

Configuration

Action Caching

Advanced

Basic Usage

Parameters

Example with Options

agent.executeTask()

With Output Schema

Return Value

Task Status

Visual Mode

Real-World Examples

Flight Search

E-commerce Price Comparison

Google Form Submission

Action Cache Output

Error Handling

Best Practices

Next Steps

page.perform()

page.extract()

Action Caching

Configuration

Getting Started

Core Methods

Configuration

Action Caching

Advanced

​Basic Usage

​Parameters

​Example with Options

​agent.executeTask()

​With Output Schema

​Return Value

​Task Status

​Visual Mode

​Real-World Examples

​Flight Search

​E-commerce Price Comparison

​Google Form Submission

​Action Cache Output

​Error Handling

​Best Practices

​Next Steps

page.perform()

page.extract()

Action Caching

Configuration

Basic Usage

Parameters

Example with Options

agent.executeTask()

With Output Schema

Return Value

Task Status

Visual Mode

Real-World Examples

Flight Search

E-commerce Price Comparison

Google Form Submission

Action Cache Output

Error Handling

Best Practices

Next Steps