Skip to main content
The page.ai() method executes multi-step browser tasks using natural language. It runs an agentic loop that observes the page, makes decisions, and executes actions until the goal is complete.
agent.executeTask() does the same thing—it’s a convenience method that creates a page for you. Use whichever fits your workflow.

Basic Usage

Execute a multi-step task on a specific page:
import { HyperAgent } from "@hyperbrowser/agent";

const agent = new HyperAgent({
  llm: { provider: "openai", model: "gpt-4o" },
});

const page = await agent.newPage();
await page.goto("https://flights.google.com");

const { output, actionCache } = await page.ai(
  "Search for round-trip flights from Miami to LAX, leaving Dec 15 and returning Dec 22"
);

console.log(output);
// The actionCache can be saved and replayed later
console.log(`Completed in ${actionCache.steps.length} steps`);

await agent.closeAgent();

Parameters

task
string
required
Natural language description of what you want to accomplish. Be specific for best results.
options
object
Configuration options for the task execution.
options.maxSteps
number
default:"50"
Maximum number of actions the AI can take. Increase for complex tasks.
options.useDomCache
boolean
default:"false"
Reuse DOM snapshots across steps for faster execution.
options.enableVisualMode
boolean
default:"false"
Enable screenshots with element overlays for visual understanding.
options.enableDomStreaming
boolean
default:"false"
Stream DOM updates for more responsive execution.
options.outputSchema
ZodSchema
Define a Zod schema for structured output extraction at the end of the task.

Example with Options

import { z } from "zod";

const { output, actionCache } = await page.ai(
  "Find the cheapest flight from NYC to London next month",
  {
    maxSteps: 30,
    useDomCache: true,
    outputSchema: z.object({
      airline: z.string(),
      price: z.number(),
      departure: z.string(),
      arrival: z.string(),
    }),
  }
);

console.log(output); // Typed as { airline, price, departure, arrival }

agent.executeTask()

executeTask() is a shorthand that creates a page and runs page.ai() for you:
// This:
const result = await agent.executeTask("Go to amazon.com and find the top seller");

// Is equivalent to:
const page = await agent.newPage();
const result = await page.ai("Go to amazon.com and find the top seller");
It accepts the same parameters as page.ai():

With Output Schema

import { z } from "zod";

const result = await agent.executeTask(
  "Navigate to imdb.com, search for 'The Matrix', and extract the movie details",
  {
    outputSchema: z.object({
      director: z.string().describe("The name of the movie director"),
      releaseYear: z.number().describe("The year the movie was released"),
      rating: z.string().describe("The IMDb rating of the movie"),
    }),
  }
);

console.log(result.output);
// { director: "Lana Wachowski, Lilly Wachowski", releaseYear: 1999, rating: "8.7/10" }

Return Value

Both methods return a TaskOutput object:
interface TaskOutput {
  taskId: string;           // Unique identifier for this task
  status: TaskStatus;       // "completed" | "failed" | "cancelled"
  output: string | T;       // Result (or typed if outputSchema provided)
  steps: AgentStep[];       // Array of steps taken
  actionCache: ActionCacheOutput; // Recorded actions for replay
}

Task Status

StatusDescription
completedTask finished successfully
failedTask encountered an error
cancelledTask was cancelled before completion

Visual Mode

Enable visual mode when the AI needs to understand page layout or when dealing with complex visual elements:
const { output } = await page.ai(
  "Find the product image and describe what's shown",
  {
    enableVisualMode: true,
  }
);
Visual mode uses screenshots which increases token usage and latency. Only enable when visual understanding is necessary.

Real-World Examples

const page = await agent.newPage();
await page.goto("https://flights.google.com");

const { output, actionCache } = await page.ai(
  "Search for round-trip flights from Rio de Janeiro to Los Angeles, " +
  "leaving December 11, 2025 and returning December 22, 2025. " +
  "Select the option with the lowest carbon emissions.",
  {
    useDomCache: true,
    enableDomStreaming: true,
  }
);

// Save actionCache for later replay
console.log(JSON.stringify(actionCache, null, 2));

E-commerce Price Comparison

import { z } from "zod";

const result = await agent.executeTask(
  "Go to amazon.com, search for 'mechanical keyboard', and compare the top 3 results",
  {
    outputSchema: z.object({
      products: z.array(z.object({
        name: z.string(),
        price: z.number(),
        rating: z.number(),
        reviewCount: z.number(),
      })),
      recommendation: z.string(),
    }),
  }
);

console.log(result.output.products);
console.log(result.output.recommendation);

Google Form Submission

const agent = new HyperAgent({
  llm: { provider: "openai", model: "gpt-4o" },
});

const page = await agent.newPage();
await page.goto("https://docs.google.com/forms/d/e/1FAIpQLScPkE8wNLpPSkP2d__Ee7xx5Pj7_XDuZ0p16geYWrp73Nutmw/viewform?usp=dialog");

// Fill each field
await page.perform("fill the name field with John Doe");
await page.perform("fill the email field with [email protected]");
await page.perform("fill the feedback text area with This is a test submission");
await page.perform("select 5 rating option");

// Submit
await page.perform("click the submit button");

await agent.closeAgent();

Action Cache Output

Every page.ai() call returns an actionCache that records all actions taken:
{
  "taskId": "abc-123",
  "createdAt": "2025-01-15T10:30:00Z",
  "status": "completed",
  "steps": [
    {
      "stepIndex": 0,
      "actionType": "actElement",
      "instruction": "Click the source location input",
      "method": "click",
      "arguments": [],
      "frameIndex": 0,
      "xpath": "/html/body/div[1]/input[1]",
      "success": true,
      "message": "Successfully clicked element"
    }
    // ... more steps
  ]
}
This cache can be saved and replayed later for deterministic execution without LLM calls.

Error Handling

try {
  const result = await page.ai("Complete the checkout process", {
    maxSteps: 50,
  });
  
  if (result.status === "failed") {
    console.error("Task failed:", result.output);
  }
} catch (error) {
  console.error("Execution error:", error);
}

Best Practices

Instead of “search for flights”, say “search for round-trip flights from Miami to LAX, departing December 15 and returning December 22, 2025”.
Simple tasks: 10-20 steps. Complex multi-page workflows: 50+ steps. Monitor task outputs and adjust as needed.
When you need specific data extracted, define a Zod schema to get typed, validated output.
Store the returned actionCache to replay the same automation later without LLM calls.

Next Steps