page.perform() method executes single, granular actions on a web page. It’s optimized for speed and reliability, using the accessibility tree instead of screenshots.
Overview
| Characteristic | Description |
|---|---|
| Speed | ⚡ Fast - Uses accessibility tree (no screenshots) |
| Cost | 💰 Cheap - Single LLM call per action |
| Reliability | 🎯 Direct element finding and execution |
| Efficiency | 📊 Text-based DOM analysis with automatic ad-frame filtering |
Basic Usage
Common Actions
Click Elements
Fill or Type Inputs
Form Interactions
Scrolling
Hover
Keyboard Actions
When to Use perform() vs ai()
Use page.perform()
- Single, specific actions
- When you know exactly what action is needed
- Fast, reliable execution
- Lower token cost
Use page.ai()
- Complex multi-step workflows
- When visual context is needed
- Tasks requiring decision making
- When next action depends on page state
Example: Combining Both
Return Value
page.perform() returns a TaskOutput object:
Checking Success
Error Handling
Tips for Writing Effective Instructions
Be specific about the target element
Be specific about the target element
Good: “click the blue ‘Sign Up’ button at the bottom of the form”Bad: “click the button”
Include context when elements look similar
Include context when elements look similar
Good: “fill the email input in the login form with [email protected]”Bad: “fill email”
Use visible text for identification
Use visible text for identification
Good: “click the link that says ‘Learn More’”Bad: “click the third link”
Specify the action clearly
Specify the action clearly
Good: “type ‘search query’ into the search box”Bad: “search for something”
CDP Actions
HyperAgent uses Chrome DevTools Protocol (CDP) for precise element interactions by default. This provides:- Exact coordinate-based clicks
- Deep iframe support
- Auto-filtering of ad frames