Extract structured data from web pages using natural language and Zod schemas
The page.extract() method pulls structured data from web pages. Define what you want using natural language and optionally enforce a schema with Zod for type-safe results.
import { HyperAgent } from "@hyperbrowser/agent";import { z } from "zod";const agent = new HyperAgent();const page = await agent.newPage();await page.goto("https://news.ycombinator.com");// Simple extraction (returns string)const topStory = await page.extract("what is the title of the top story?");console.log(topStory); // "Show HN: I built a..."// Structured extraction (returns typed object)const stories = await page.extract( "get the top 5 stories", z.object({ stories: z.array(z.object({ title: z.string(), points: z.number(), author: z.string(), })) }));console.log(stories.stories[0].title);
Good: “extract the price shown next to the ‘Buy Now’ button”Bad: “get the price”
Use descriptive schema fields
Copy
Ask AI
// ✅ Good: Clear field names and descriptionsz.object({ priceUsd: z.number().describe("Price in US dollars"), stockCount: z.number().describe("Number of items in stock"),})// ❌ Bad: Ambiguous fieldsz.object({ p: z.number(), n: z.number(),})
Match schema complexity to page content
Don’t create overly complex schemas for simple data. If you only need a single value:
Copy
Ask AI
const price = await page.extract("what is the price?", z.number());