Skip to main content
The page.perform() method executes single, granular actions on a web page. It’s optimized for speed and reliability, using the accessibility tree instead of screenshots.

Overview

CharacteristicDescription
Speed⚡ Fast - Uses accessibility tree (no screenshots)
Cost💰 Cheap - Single LLM call per action
Reliability🎯 Direct element finding and execution
Efficiency📊 Text-based DOM analysis with automatic ad-frame filtering

Basic Usage

import { HyperAgent } from "@hyperbrowser/agent";

const agent = new HyperAgent({
  llm: { provider: "openai", model: "gpt-4o" },
});

const page = await agent.newPage();
await page.goto("https://example.com/login");

// Execute single actions
await page.perform("fill email with [email protected]");
await page.perform("fill password with mypassword");
await page.perform("click the login button");

await agent.closeAgent();

Common Actions

Click Elements

await page.perform("click the login button");
await page.perform("click the first search result");
await page.perform("click the 'Add to Cart' button");
await page.perform("click the menu icon in the top right");

Fill or Type Inputs

await page.perform("fill email with [email protected]");
await page.perform("type 'mechanical keyboard' into the search box");
await page.perform("fill the password field with MySecurePass123");

Form Interactions

await page.perform("check the 'Remember me' checkbox");
await page.perform("uncheck the newsletter subscription");
await page.perform("select 'United States' from the country dropdown");

Scrolling

// Scroll to a specific element
await page.perform("scroll to the pricing section");
await page.perform("scroll the reviews section into view");

// Scroll by percentage
await page.perform("scroll to 50% of the page");
await page.perform("scroll to the bottom of the page");

// Chunk-based scrolling (useful for infinite scroll or long pages)
await page.perform("scroll to the next chunk");
await page.perform("scroll to the previous chunk");

Hover

await page.perform("hover over the user profile menu");
await page.perform("hover over the dropdown to reveal options");

Keyboard Actions

await page.perform("press Enter");
await page.perform("press Escape to close the modal");
await page.perform("press Tab to move to the next field");

When to Use perform() vs ai()

Use page.perform()

  • Single, specific actions
  • When you know exactly what action is needed
  • Fast, reliable execution
  • Lower token cost

Use page.ai()

  • Complex multi-step workflows
  • When visual context is needed
  • Tasks requiring decision making
  • When next action depends on page state

Example: Combining Both

const page = await agent.newPage();
await page.goto("https://amazon.com");

// Use perform for known, simple actions
await page.perform("click the search box");
await page.perform("type 'laptop' into the search box");
await page.perform("click the search button");

// Use ai() when complex decision-making is needed
await page.ai("find the best-rated laptop under $1000 and add it to cart");

Return Value

page.perform() returns a TaskOutput object:
interface TaskOutput {
  taskId: string;
  status: TaskStatus;  // "completed" | "failed"
  output: string;      // Result message
  steps: AgentStep[];  // Steps taken (usually 1 for perform)
}

Checking Success

const result = await page.perform("click the submit button");

if (result.status === "completed") {
  console.log("Action successful:", result.output);
} else {
  console.error("Action failed:", result.output);
}

Error Handling

try {
  await page.perform("click the non-existent button");
} catch (error) {
  console.error("Failed to perform action:", error);
}

Tips for Writing Effective Instructions

Good: “click the blue ‘Sign Up’ button at the bottom of the form”Bad: “click the button”
Good: “fill the email input in the login form with [email protected]Bad: “fill email”
Good: “click the link that says ‘Learn More’”Bad: “click the third link”
Good: “type ‘search query’ into the search box”Bad: “search for something”

CDP Actions

HyperAgent uses Chrome DevTools Protocol (CDP) for precise element interactions by default. This provides:
  • Exact coordinate-based clicks
  • Deep iframe support
  • Auto-filtering of ad frames
To disable CDP and use Playwright locators instead:
const agent = new HyperAgent({
  cdpActions: false,
});

Next Steps