Skip to main content
The page.extract() method pulls structured data from web pages. Define what you want using natural language and optionally enforce a schema with Zod for type-safe results.

Basic Usage

import { HyperAgent } from "@hyperbrowser/agent";
import { z } from "zod";

const agent = new HyperAgent();
const page = await agent.newPage();
await page.goto("https://news.ycombinator.com");

// Simple extraction (returns string)
const topStory = await page.extract("what is the title of the top story?");
console.log(topStory); // "Show HN: I built a..."

// Structured extraction (returns typed object)
const stories = await page.extract(
  "get the top 5 stories",
  z.object({
    stories: z.array(z.object({
      title: z.string(),
      points: z.number(),
      author: z.string(),
    }))
  })
);

console.log(stories.stories[0].title);

Parameters

instruction
string
required
Natural language description of what data to extract.
schema
ZodSchema
Optional Zod schema for structured, typed output. Without a schema, returns a string.

Why Use Schemas?

Schemas provide three key benefits:
  1. Type Safety: Get full TypeScript autocompletion and type checking
  2. Validation: Ensures the AI returns data in the correct format
  3. Documentation: Use .describe() to guide the AI on what each field means
const productSchema = z.object({
  name: z.string().describe("The product name"),
  price: z.number().describe("Price in USD, numbers only"),
  inStock: z.boolean().describe("Whether the item is available"),
  reviews: z.array(z.object({
    rating: z.number().min(1).max(5),
    text: z.string(),
  })).describe("Customer reviews"),
});

const product = await page.extract("get this product's details", productSchema);
// product is fully typed: { name: string, price: number, inStock: boolean, reviews: [...] }

Common Extraction Patterns

Product Information

const product = await page.extract(
  "extract the product details",
  z.object({
    name: z.string(),
    price: z.number(),
    originalPrice: z.number().optional(),
    rating: z.number(),
    reviewCount: z.number(),
    availability: z.enum(["in_stock", "out_of_stock", "limited"]),
  })
);

Table Data

const tableData = await page.extract(
  "extract all rows from the pricing table",
  z.object({
    rows: z.array(z.object({
      plan: z.string(),
      price: z.string(),
      features: z.array(z.string()),
    }))
  })
);

Article Content

const article = await page.extract(
  "extract the article content",
  z.object({
    title: z.string(),
    author: z.string(),
    publishDate: z.string(),
    content: z.string(),
    tags: z.array(z.string()),
  })
);

Lists and Rankings

const rankings = await page.extract(
  "get the top 10 items from this list",
  z.object({
    items: z.array(z.object({
      rank: z.number(),
      name: z.string(),
      score: z.number().optional(),
    }))
  })
);

Error Handling

try {
  const data = await page.extract("get the user profile", schema);
} catch (error) {
  if (error.message.includes("validation")) {
    console.error("Data didn't match schema:", error);
  } else {
    console.error("Extraction failed:", error);
  }
}

Best Practices

Good: “extract the price shown next to the ‘Buy Now’ button”Bad: “get the price”
// ✅ Good: Clear field names and descriptions
z.object({
  priceUsd: z.number().describe("Price in US dollars"),
  stockCount: z.number().describe("Number of items in stock"),
})

// ❌ Bad: Ambiguous fields
z.object({
  p: z.number(),
  n: z.number(),
})
Don’t create overly complex schemas for simple data. If you only need a single value:
const price = await page.extract("what is the price?", z.number());
z.object({
  status: z.enum(["pending", "shipped", "delivered"]),
  category: z.enum(["electronics", "clothing", "home"]),
})

Next Steps