> ## Documentation Index
> Fetch the complete documentation index at: https://hyperbrowser.ai/docs/llms.txt
> Use this file to discover all available pages before exploring further.

# page.perform()

> Execute fast, single-action browser interactions using natural language

The `page.perform()` method executes single, granular actions on a web page. It's optimized for speed and reliability, using the accessibility tree instead of screenshots.

## Overview

| Characteristic  | Description                                                  |
| --------------- | ------------------------------------------------------------ |
| **Speed**       | ⚡ Fast - Uses accessibility tree (no screenshots)            |
| **Cost**        | 💰 Cheap - Single LLM call per action                        |
| **Reliability** | 🎯 Direct element finding and execution                      |
| **Efficiency**  | 📊 Text-based DOM analysis with automatic ad-frame filtering |

## Basic Usage

```typescript theme={null}
import { HyperAgent } from "@hyperbrowser/agent";

const agent = new HyperAgent({
  llm: { provider: "openai", model: "gpt-4o" },
});

const page = await agent.newPage();
await page.goto("https://example.com/login");

// Execute single actions
await page.perform("fill email with user@example.com");
await page.perform("fill password with mypassword");
await page.perform("click the login button");

await agent.closeAgent();
```

## Common Actions

### Click Elements

```typescript theme={null}
await page.perform("click the login button");
await page.perform("click the first search result");
await page.perform("click the 'Add to Cart' button");
await page.perform("click the menu icon in the top right");
```

### Fill or Type Inputs

```typescript theme={null}
await page.perform("fill email with test@example.com");
await page.perform("type 'mechanical keyboard' into the search box");
await page.perform("fill the password field with MySecurePass123");
```

### Form Interactions

```typescript theme={null}
await page.perform("check the 'Remember me' checkbox");
await page.perform("uncheck the newsletter subscription");
await page.perform("select 'United States' from the country dropdown");
```

### Scrolling

```typescript theme={null}
// Scroll to a specific element
await page.perform("scroll to the pricing section");
await page.perform("scroll the reviews section into view");

// Scroll by percentage
await page.perform("scroll to 50% of the page");
await page.perform("scroll to the bottom of the page");

// Chunk-based scrolling (useful for infinite scroll or long pages)
await page.perform("scroll to the next chunk");
await page.perform("scroll to the previous chunk");
```

### Hover

```typescript theme={null}
await page.perform("hover over the user profile menu");
await page.perform("hover over the dropdown to reveal options");
```

### Keyboard Actions

```typescript theme={null}
await page.perform("press Enter");
await page.perform("press Escape to close the modal");
await page.perform("press Tab to move to the next field");
```

## When to Use perform() vs ai()

<CardGroup cols={2}>
  <Card title="Use page.perform()" icon="bolt">
    * Single, specific actions
    * When you know exactly what action is needed
    * Fast, reliable execution
    * Lower token cost
  </Card>

  <Card title="Use page.ai()" icon="brain">
    * Complex multi-step workflows
    * When visual context is needed
    * Tasks requiring decision making
    * When next action depends on page state
  </Card>
</CardGroup>

### Example: Combining Both

```typescript theme={null}
const page = await agent.newPage();
await page.goto("https://amazon.com");

// Use perform for known, simple actions
await page.perform("click the search box");
await page.perform("type 'laptop' into the search box");
await page.perform("click the search button");

// Use ai() when complex decision-making is needed
await page.ai("find the best-rated laptop under $1000 and add it to cart");
```

## Return Value

`page.perform()` returns a `TaskOutput` object:

```typescript theme={null}
interface TaskOutput {
  taskId: string;
  status: TaskStatus;  // "completed" | "failed"
  output: string;      // Result message
  steps: AgentStep[];  // Steps taken (usually 1 for perform)
}
```

### Checking Success

```typescript theme={null}
const result = await page.perform("click the submit button");

if (result.status === "completed") {
  console.log("Action successful:", result.output);
} else {
  console.error("Action failed:", result.output);
}
```

## Error Handling

```typescript theme={null}
try {
  await page.perform("click the non-existent button");
} catch (error) {
  console.error("Failed to perform action:", error);
}
```

## Tips for Writing Effective Instructions

<AccordionGroup>
  <Accordion title="Be specific about the target element">
    **Good:** "click the blue 'Sign Up' button at the bottom of the form"

    **Bad:** "click the button"
  </Accordion>

  <Accordion title="Include context when elements look similar">
    **Good:** "fill the email input in the login form with [user@example.com](mailto:user@example.com)"

    **Bad:** "fill email"
  </Accordion>

  <Accordion title="Use visible text for identification">
    **Good:** "click the link that says 'Learn More'"

    **Bad:** "click the third link"
  </Accordion>

  <Accordion title="Specify the action clearly">
    **Good:** "type 'search query' into the search box"

    **Bad:** "search for something"
  </Accordion>
</AccordionGroup>

## CDP Actions

HyperAgent uses Chrome DevTools Protocol (CDP) for precise element interactions by default. This provides:

* Exact coordinate-based clicks
* Deep iframe support
* Auto-filtering of ad frames

To disable CDP and use Playwright locators instead:

```typescript theme={null}
const agent = new HyperAgent({
  cdpActions: false,
});
```

## Next Steps

<CardGroup cols={2}>
  <Card title="page.ai()" icon="brain" href="/hyperagent/page-ai">
    Complex multi-step automation
  </Card>

  <Card title="page.extract()" icon="database" href="/hyperagent/extract">
    Extract structured data
  </Card>

  <Card title="Action Caching" icon="rotate" href="/hyperagent/action-cache">
    Record and replay automations
  </Card>
</CardGroup>
