Action Caching

Action Caching records every action taken during a page.ai() call. Replay these recordings later for deterministic, LLM-free automation—dramatically reducing costs and improving execution speed.

Why Use Action Caching?

Without Caching	With Caching
LLM call every run	LLM call once, replay free
Variable behavior	Deterministic execution
Higher latency	Near-instant replay
Higher cost	Pay once

Recording Actions

Every page.ai() call automatically returns an actionCache:

import { HyperAgent } from "@hyperbrowser/agent";
import fs from "fs";

const agent = new HyperAgent();
const page = await agent.newPage();

await page.goto("https://flights.google.com");

// Execute task - actionCache is automatically generated
const { output, actionCache } = await page.ai(
  "Search for flights from Miami to LAX on December 15"
);


console.log(`Recorded ${actionCache.actionCache.steps.length} steps`);

Replaying Actions

Use agent.runFromActionCache() to replay recorded actions:

import { HyperAgent, ActionCacheOutput } from "@hyperbrowser/agent";
import fs from "fs";

const agent = new HyperAgent();


// Replay without LLM calls
const result = await agent.runFromActionCache(actionCache.steps, {
  maxXPathRetries: 3,
});

console.log("Replay status:", result.status);
await agent.closeAgent();

Generate Script from Action Cache

Instead of replaying actions programmatically, you can generate a standalone TypeScript script from recorded actions using agent.createScriptFromActionCache():

import { HyperAgent } from "@hyperbrowser/agent";

const agent = new HyperAgent({
  llm: { provider: "anthropic", model: "claude-sonnet-4-0" },
});

const page = await agent.newPage();

// Record the automation
const { actionCache } = await page.ai(
  "Go to https://demo.automationtesting.in/Frames.html, " +
  "select the iframe within iframe tab, " +
  "and fill in the text box in the nested iframe"
);

// Generate a reusable script
const script = agent.createScriptFromActionCache(actionCache.steps);
console.log(script);

await agent.closeAgent();

This outputs a standalone script you can save and run directly—no LLM calls needed:

// Generated script
import { HyperAgent } from "@hyperbrowser/agent";

async function main() {
  const agent = new HyperAgent();
  const page = await agent.newPage();

  await page.goto("https://demo.automationtesting.in/Frames.html");
  await page.performClick("/html/body/section/div/div/div/ul/li[2]/a", {
    performInstruction: "Click the iframe within iframe tab"
  });
  await page.performFill("/html/body/section/div/div/div/input", "Hello", {
    performInstruction: "Fill in the text box",
    frameIndex: 2
  });

  await agent.closeAgent();
}

main();

The generated script uses the cache perform actions to execute the task without LLM calls.

How Replay Works

XPath First: Attempts to find elements using cached XPaths
Retry on Failure: Retries up to maxXPathRetries times
LLM Fallback: If XPath fails, falls back to AI using the cached instruction
Continue or Stop: Stops on first failure by default

const result = await page.runFromActionCache(cache, {
  maxXPathRetries: 3,  // Retry XPath 3 times before LLM fallback
  debug: true,         // Log execution details
});

// Check what happened
for (const step of result.steps) {
  console.log(`Step ${step.stepIndex}:`, {
    usedXPath: step.usedXPath,
    fallbackUsed: step.fallbackUsed,
    success: step.success,
  });
}

Action Cache Format

The cache is a JSON structure containing all recorded steps:

{
  "taskId": "abc-123",
  "createdAt": "2025-01-15T10:30:00Z",
  "status": "completed",
  "steps": [
    {
      "stepIndex": 0,
      "actionType": "actElement",
      "instruction": "Click the departure city input",
      "method": "click",
      "arguments": [],
      "frameIndex": 0,
      "xpath": "/html/body/div[2]/div[4]/input[1]",
      "success": true
    },
    {
      "stepIndex": 1,
      "actionType": "actElement",
      "instruction": "Type 'Miami' into the input",
      "method": "fill",
      "arguments": ["Miami"],
      "frameIndex": 0,
      "xpath": "/html/body/div[2]/div[4]/input[1]",
      "success": true
    }
  ]
}

Direct XPath Execution

For maximum control, use the perform helpers to execute actions directly:

const page = await agent.newPage();
await page.goto("https://example.com");

// Execute by XPath with LLM fallback
await page.performClick(
  "/html/body/button[1]",
  { performInstruction: "Click the submit button" }
);

await page.performFill(
  "/html/body/input[1]",
  "[email protected]",
  { performInstruction: "Fill the email field" }
);

Available Perform Actions

Helper	Description
`performClick(xpath)`	Click an element
`performFill(xpath, text)`	Clear and fill an input
`performType(xpath, text)`	Type into an element
`performPress(xpath, key)`	Press a keyboard key
`performSelectOption(xpath, option)`	Select from dropdown
`performCheck(xpath)`	Check a checkbox
`performUncheck(xpath)`	Uncheck a checkbox
`performHover(xpath)`	Hover over an element
`performScrollToElement(xpath)`	Scroll element into view

Each helper accepts an options object:

await page.performClick(xpath, {
  performInstruction: "Click the login button", // Fallback instruction
  frameIndex: 0,  // Target iframe (0 = main frame)
  maxSteps: 3,    // Retries before fallback
});

When to Use Action Caching

Scenario	Recommendation
Repetitive tasks (daily scraping, scheduled jobs)	✅ Record once, replay indefinitely
E2E testing	✅ Fast, deterministic test runs
High-volume automation	✅ Eliminate per-run LLM costs
Stable page structures	✅ XPaths remain valid longer
Dynamic pages with frequent layout changes	⚠️ May require frequent re-recording
One-time tasks	❌ Just use `page.ai()` directly

Monitoring Fallback Rates

When a cached XPath no longer matches the page, HyperAgent falls back to the LLM to find the element if the performInstruction is provided. You’ll see logs like this:

⚠️ [runCachedStep] Cached action failed. Falling back to LLM...
   Instruction: "Select the LATAM/Delta flight with the lowest carbon emissions"
   ❌ Cached XPath Failed: "/html[1]/body[1]/c-wiz[2]/div[1]/.../li[5]/div[1]/div[1]"
   ✅ LLM Resolved New XPath: "/html[1]/body[1]/c-wiz[2]/div[1]/.../li[4]/div[1]/div[1]"

What this means:

The cached XPath pointed to li[5] but the element moved to li[4]
The LLM successfully found the correct element using the instruction
The action completed, but with added latency and cost

When to re-record:

If you see fallback warnings frequently, the page structure has changed
Re-run the original page.ai() task to capture fresh XPaths
Save the new actionCache to replace your stale recording

Enable debug: true on your agent to see more detailed logging.

Getting Started

Core Methods

Configuration

Advanced

Action Caching

Why Use Action Caching?

Recording Actions

Replaying Actions

Generate Script from Action Cache

How Replay Works

Action Cache Format

Direct XPath Execution

Available Perform Actions

When to Use Action Caching

Monitoring Fallback Rates

Getting Started

Core Methods

Configuration

Action Caching

Advanced

​Why Use Action Caching?

​Recording Actions

​Replaying Actions

​Generate Script from Action Cache

​How Replay Works

​Action Cache Format

​Direct XPath Execution

​Available Perform Actions

​When to Use Action Caching

​Monitoring Fallback Rates

Why Use Action Caching?

Recording Actions

Replaying Actions

Generate Script from Action Cache

How Replay Works

Action Cache Format

Direct XPath Execution

Available Perform Actions

When to Use Action Caching

Monitoring Fallback Rates