> ## Documentation Index
> Fetch the complete documentation index at: https://hyperbrowser.ai/docs/llms.txt
> Use this file to discover all available pages before exploring further.

# page.ai()

> Execute complex multi-step browser tasks with AI-powered automation

The `page.ai()` method executes multi-step browser tasks using natural language. It runs an agentic loop that observes the page, makes decisions, and executes actions until the goal is complete.

<Note>
  `agent.executeTask()` does the same thing—it's a convenience method that creates a page for you. Use whichever fits your workflow.
</Note>

## Basic Usage

Execute a multi-step task on a specific page:

```typescript theme={null}
import { HyperAgent } from "@hyperbrowser/agent";

const agent = new HyperAgent({
  llm: { provider: "openai", model: "gpt-4o" },
});

const page = await agent.newPage();
await page.goto("https://flights.google.com");

const { output, actionCache } = await page.ai(
  "Search for round-trip flights from Miami to LAX, leaving Dec 15 and returning Dec 22"
);

console.log(output);
// The actionCache can be saved and replayed later
console.log(`Completed in ${actionCache.steps.length} steps`);

await agent.closeAgent();
```

### Parameters

<ParamField path="task" type="string" required>
  Natural language description of what you want to accomplish. Be specific for best results.
</ParamField>

<ParamField path="options" type="object">
  Configuration options for the task execution.
</ParamField>

<ParamField path="options.maxSteps" type="number" default="50">
  Maximum number of actions the AI can take. Increase for complex tasks.
</ParamField>

<ParamField path="options.useDomCache" type="boolean" default="false">
  Reuse DOM snapshots across steps for faster execution.
</ParamField>

<ParamField path="options.enableVisualMode" type="boolean" default="false">
  Enable screenshots with element overlays for visual understanding.
</ParamField>

<ParamField path="options.enableDomStreaming" type="boolean" default="false">
  Stream DOM updates for more responsive execution.
</ParamField>

<ParamField path="options.outputSchema" type="ZodSchema">
  Define a Zod schema for structured output extraction at the end of the task.
</ParamField>

### Example with Options

```typescript theme={null}
import { z } from "zod";

const { output, actionCache } = await page.ai(
  "Find the cheapest flight from NYC to London next month",
  {
    maxSteps: 30,
    useDomCache: true,
    outputSchema: z.object({
      airline: z.string(),
      price: z.number(),
      departure: z.string(),
      arrival: z.string(),
    }),
  }
);

console.log(output); // Typed as { airline, price, departure, arrival }
```

## agent.executeTask()

`executeTask()` is a shorthand that creates a page and runs `page.ai()` for you:

```typescript theme={null}
// This:
const result = await agent.executeTask("Go to amazon.com and find the top seller");

// Is equivalent to:
const page = await agent.newPage();
const result = await page.ai("Go to amazon.com and find the top seller");
```

It accepts the same parameters as `page.ai()`:

### With Output Schema

```typescript theme={null}
import { z } from "zod";

const result = await agent.executeTask(
  "Navigate to imdb.com, search for 'The Matrix', and extract the movie details",
  {
    outputSchema: z.object({
      director: z.string().describe("The name of the movie director"),
      releaseYear: z.number().describe("The year the movie was released"),
      rating: z.string().describe("The IMDb rating of the movie"),
    }),
  }
);

console.log(result.output);
// { director: "Lana Wachowski, Lilly Wachowski", releaseYear: 1999, rating: "8.7/10" }
```

## Return Value

Both methods return a `TaskOutput` object:

```typescript theme={null}
interface TaskOutput {
  taskId: string;           // Unique identifier for this task
  status: TaskStatus;       // "completed" | "failed" | "cancelled"
  output: string | T;       // Result (or typed if outputSchema provided)
  steps: AgentStep[];       // Array of steps taken
  actionCache: ActionCacheOutput; // Recorded actions for replay
}
```

### Task Status

| Status      | Description                          |
| ----------- | ------------------------------------ |
| `completed` | Task finished successfully           |
| `failed`    | Task encountered an error            |
| `cancelled` | Task was cancelled before completion |

## Visual Mode

Enable visual mode when the AI needs to understand page layout or when dealing with complex visual elements:

```typescript theme={null}
const { output } = await page.ai(
  "Find the product image and describe what's shown",
  {
    enableVisualMode: true,
  }
);
```

<Note>
  Visual mode uses screenshots which increases token usage and latency. Only enable when visual understanding is necessary.
</Note>

## Real-World Examples

### Flight Search

```typescript theme={null}
const page = await agent.newPage();
await page.goto("https://flights.google.com");

const { output, actionCache } = await page.ai(
  "Search for round-trip flights from Rio de Janeiro to Los Angeles, " +
  "leaving December 11, 2025 and returning December 22, 2025. " +
  "Select the option with the lowest carbon emissions.",
  {
    useDomCache: true,
    enableDomStreaming: true,
  }
);

// Save actionCache for later replay
console.log(JSON.stringify(actionCache, null, 2));
```

### E-commerce Price Comparison

```typescript theme={null}
import { z } from "zod";

const result = await agent.executeTask(
  "Go to amazon.com, search for 'mechanical keyboard', and compare the top 3 results",
  {
    outputSchema: z.object({
      products: z.array(z.object({
        name: z.string(),
        price: z.number(),
        rating: z.number(),
        reviewCount: z.number(),
      })),
      recommendation: z.string(),
    }),
  }
);

console.log(result.output.products);
console.log(result.output.recommendation);
```

### Google Form Submission

```typescript theme={null}
const agent = new HyperAgent({
  llm: { provider: "openai", model: "gpt-4o" },
});

const page = await agent.newPage();
await page.goto("https://docs.google.com/forms/d/e/1FAIpQLScPkE8wNLpPSkP2d__Ee7xx5Pj7_XDuZ0p16geYWrp73Nutmw/viewform?usp=dialog");

// Fill each field
await page.perform("fill the name field with John Doe");
await page.perform("fill the email field with john@example.com");
await page.perform("fill the feedback text area with This is a test submission");
await page.perform("select 5 rating option");

// Submit
await page.perform("click the submit button");

await agent.closeAgent();
```

## Action Cache Output

Every `page.ai()` call returns an `actionCache` that records all actions taken:

```json theme={null}
{
  "taskId": "abc-123",
  "createdAt": "2025-01-15T10:30:00Z",
  "status": "completed",
  "steps": [
    {
      "stepIndex": 0,
      "actionType": "actElement",
      "instruction": "Click the source location input",
      "method": "click",
      "arguments": [],
      "frameIndex": 0,
      "xpath": "/html/body/div[1]/input[1]",
      "success": true,
      "message": "Successfully clicked element"
    }
    // ... more steps
  ]
}
```

This cache can be saved and [replayed later](/hyperagent/action-cache) for deterministic execution without LLM calls.

## Error Handling

```typescript theme={null}
try {
  const result = await page.ai("Complete the checkout process", {
    maxSteps: 50,
  });
  
  if (result.status === "failed") {
    console.error("Task failed:", result.output);
  }
} catch (error) {
  console.error("Execution error:", error);
}
```

## Best Practices

<AccordionGroup>
  <Accordion title="Write specific, detailed instructions">
    Instead of "search for flights", say "search for round-trip flights from Miami to LAX, departing December 15 and returning December 22, 2025".
  </Accordion>

  <Accordion title="Set appropriate maxSteps">
    Simple tasks: 10-20 steps. Complex multi-page workflows: 50+ steps. Monitor task outputs and adjust as needed.
  </Accordion>

  <Accordion title="Use outputSchema for structured data">
    When you need specific data extracted, define a Zod schema to get typed, validated output.
  </Accordion>

  <Accordion title="Save actionCache for replay">
    Store the returned `actionCache` to replay the same automation later without LLM calls.
  </Accordion>
</AccordionGroup>

## Next Steps

<CardGroup cols={2}>
  <Card title="page.perform()" icon="bolt" href="/hyperagent/page-perform">
    Fast single-action execution
  </Card>

  <Card title="page.extract()" icon="database" href="/hyperagent/extract">
    Extract structured data
  </Card>

  <Card title="Action Caching" icon="rotate" href="/hyperagent/action-cache">
    Replay automations without LLM calls
  </Card>

  <Card title="Configuration" icon="gear" href="/hyperagent/llm-providers">
    Configure LLM providers
  </Card>
</CardGroup>
