page.extract()

The page.extract() method pulls structured data from web pages. Define what you want using natural language and optionally enforce a schema with Zod for type-safe results.

Basic Usage

import { HyperAgent } from "@hyperbrowser/agent";
import { z } from "zod";

const agent = new HyperAgent();
const page = await agent.newPage();
await page.goto("https://news.ycombinator.com");

// Simple extraction (returns string)
const topStory = await page.extract("what is the title of the top story?");
console.log(topStory); // "Show HN: I built a..."

// Structured extraction (returns typed object)
const stories = await page.extract(
  "get the top 5 stories",
  z.object({
    stories: z.array(z.object({
      title: z.string(),
      points: z.number(),
      author: z.string(),
    }))
  })
);

console.log(stories.stories[0].title);

Parameters

instruction

string

required

Natural language description of what data to extract.

schema

ZodSchema

Optional Zod schema for structured, typed output. Without a schema, returns a string.

Why Use Schemas?

Schemas provide three key benefits:

Type Safety: Get full TypeScript autocompletion and type checking
Validation: Ensures the AI returns data in the correct format
Documentation: Use .describe() to guide the AI on what each field means

const productSchema = z.object({
  name: z.string().describe("The product name"),
  price: z.number().describe("Price in USD, numbers only"),
  inStock: z.boolean().describe("Whether the item is available"),
  reviews: z.array(z.object({
    rating: z.number().min(1).max(5),
    text: z.string(),
  })).describe("Customer reviews"),
});

const product = await page.extract("get this product's details", productSchema);
// product is fully typed: { name: string, price: number, inStock: boolean, reviews: [...] }

Common Extraction Patterns

Product Information

const product = await page.extract(
  "extract the product details",
  z.object({
    name: z.string(),
    price: z.number(),
    originalPrice: z.number().optional(),
    rating: z.number(),
    reviewCount: z.number(),
    availability: z.enum(["in_stock", "out_of_stock", "limited"]),
  })
);

Table Data

const tableData = await page.extract(
  "extract all rows from the pricing table",
  z.object({
    rows: z.array(z.object({
      plan: z.string(),
      price: z.string(),
      features: z.array(z.string()),
    }))
  })
);

Article Content

const article = await page.extract(
  "extract the article content",
  z.object({
    title: z.string(),
    author: z.string(),
    publishDate: z.string(),
    content: z.string(),
    tags: z.array(z.string()),
  })
);

Lists and Rankings

const rankings = await page.extract(
  "get the top 10 items from this list",
  z.object({
    items: z.array(z.object({
      rank: z.number(),
      name: z.string(),
      score: z.number().optional(),
    }))
  })
);

Error Handling

try {
  const data = await page.extract("get the user profile", schema);
} catch (error) {
  if (error.message.includes("validation")) {
    console.error("Data didn't match schema:", error);
  } else {
    console.error("Extraction failed:", error);
  }
}

Best Practices

Be specific in your instructions

Good: “extract the price shown next to the ‘Buy Now’ button”Bad: “get the price”

Use descriptive schema fields

// ✅ Good: Clear field names and descriptions
z.object({
  priceUsd: z.number().describe("Price in US dollars"),
  stockCount: z.number().describe("Number of items in stock"),
})

// ❌ Bad: Ambiguous fields
z.object({
  p: z.number(),
  n: z.number(),
})

Match schema complexity to page content

Don’t create overly complex schemas for simple data. If you only need a single value:

const price = await page.extract("what is the price?", z.number());

Use enums for known categories

z.object({
  status: z.enum(["pending", "shipped", "delivered"]),
  category: z.enum(["electronics", "clothing", "home"]),
})

Getting Started

Core Methods

Configuration

Action Caching

Advanced

Basic Usage

Parameters

Why Use Schemas?

Common Extraction Patterns

Product Information

Table Data

Article Content

Lists and Rankings

Error Handling

Best Practices

Next Steps

page.ai()

page.perform()

Getting Started

Core Methods

Configuration

Action Caching

Advanced

​Basic Usage

​Parameters

​Why Use Schemas?

​Common Extraction Patterns

​Product Information

​Table Data

​Article Content

​Lists and Rankings

​Error Handling

​Best Practices

​Next Steps

page.ai()

page.perform()

Basic Usage

Parameters

Why Use Schemas?

Common Extraction Patterns

Product Information

Table Data

Article Content

Lists and Rankings

Error Handling

Best Practices

Next Steps