Building a Smart Shopping Assistant with Hyperbrowser and GPT-4o

In this cookbook, we'll build an intelligent shopping assistant that can search for products, extract pricing information, and provide personalized recommendations based on user preferences. Our assistant will:

  1. Search for products on Google Shopping
  2. Extract detailed product information including prices, brands, and categories
  3. Filter results based on user preferences (price range, gender, size, etc.)
  4. Provide tailored shopping recommendations

We'll use these tools to build our assistant:

  • Hyperbrowser for web scraping and data extraction from shopping sites
  • OpenAI's GPT-4o for intelligent product analysis and personalized recommendations

By the end of this cookbook, you'll have a versatile shopping assistant that can help you find the best products matching your specific requirements!

Prerequisites

To follow along you'll need the following:

  1. A Hyperbrowser API key (sign up at hyperbrowser.ai if you don't have one, it's free)
  2. An OpenAI API key (sign up at openai.com if you don't have one, it's free)

Both API keys should be stored in a .env file in the same directory as this notebook with the following format:

HYPERBROWSER_API_KEY=your_hyperbrowser_key_here
OPENAI_API_KEY=your_openai_key_here

Step 1: Set up imports and load environment variables

First, we'll import all the necessary libraries and initialize our environment. This includes:

  • Hyperbrowser for web scraping and data extraction
  • OpenAI for AI-powered analysis
  • Pydantic for data validation and modeling
  • Other utility libraries for handling async operations and formatting
import asyncio
import json
import os
from dotenv import load_dotenv
from hyperbrowser import AsyncHyperbrowser
from hyperbrowser.models.session import CreateSessionParams
from hyperbrowser.models.scrape import (
StartScrapeJobParams,
ScrapeOptions,
)
from openai import AsyncOpenAI
from openai.types.chat import (
ChatCompletionMessageParam,
)
from typing import List, Literal
from pydantic import BaseModel
load_dotenv()

Step 2: Initialize API clients

Next, we'll create instances of the Hyperbrowser and OpenAI clients using our API keys. These clients will be responsible for web data extraction and AI-powered product analysis respectively.

hb = AsyncHyperbrowser(api_key=os.getenv("HYPERBROWSER_API_KEY"))
llm = AsyncOpenAI()

Step 3: Define data models and scraping functionality

Now we'll define our data models and functions to scrape shopping results. The main components are:

  1. PriceExtractSchema - Models a single product with details like price, name, brand, and category
  2. PriceExtractSchemaList - A container for multiple product listings
  3. scrape_shopping_results() - Scrapes Google Shopping for a given search query
  4. extract_product_data() - Uses GPT-4o to extract structured product data from raw scraped content

The scraping function uses Hyperbrowser's advanced features like tag filtering to target only the relevant shopping elements on the page.

from typing import Optional
import urllib.parse
class PriceExtractSchema(BaseModel):
current_price: float
actual_price: Optional[float]
product_name: str
product_category: Literal["women", "men", "children"]
product_brand: str
shop: str
size: Optional[str]
source: Literal["google", "bing"]
class PriceExtractSchemaList(BaseModel):
products: List[PriceExtractSchema]
async def scrape_shopping_results(query: str):
# Configure extract parameters
scrape_params = StartScrapeJobParams(
url=f"https://www.google.com/search?q={urllib.parse.quote_plus(query)}&tbm=shop",
session_options=CreateSessionParams(),
scrape_options=ScrapeOptions(
formats=["markdown"],
# Filter out the shopping card elements only
exclude_tags=["img"],
include_tags=["div[jsname='Nhy0ad']"],
only_main_content=True,
),
)
scrape_results = await hb.scrape.start_and_wait(scrape_params)
if scrape_results.error:
raise Exception(scrape_results.error)
elif scrape_results.data is None or scrape_results.data.markdown is None:
raise Exception("No data found")
return scrape_results.data.markdown
async def extract_product_data(markdown_content: str) -> PriceExtractSchemaList:
messages: List[ChatCompletionMessageParam] = [
{
"role": "system",
"content": """You are a helpful assistant that can search for products on Google Shopping and return the results in a structured format.You will be provided with the markdown content of the page, and have to extract structured data from it regarding the product.""",
},
{
"role": "user",
"content": f"""Here is the markdown content of the page:
{markdown_content}
""",
},
]
structured_extraction = await llm.beta.chat.completions.parse(
messages=messages,
model="gpt-4o-mini",
response_format=PriceExtractSchemaList,
max_tokens=10000,
)
if structured_extraction.choices[0].message.parsed is None:
raise Exception("No structured data found")
return structured_extraction.choices[0].message.parsed

Step 4: Implement product filtering and recommendation

Once we have the raw product data, we need to filter it according to user preferences and provide personalized recommendations. The analyze_shopping_results() function:

  1. Takes the list of extracted products and user parameters (price range, gender, size, etc.)
  2. Uses GPT-4o-mini with a specialized system prompt to analyze the products
  3. Filters the results to match user preferences
  4. Returns a structured list of recommended products

This approach combines structured data filtering with AI-powered analysis to provide tailored recommendations.

async def analyze_shopping_results(results: PriceExtractSchemaList, **kwargs):
messages: List[ChatCompletionMessageParam] = [
{
"role": "system",
"content": """You are a helpful shopping assistant that analyzes product listings and provides insights about pricing and options.Please analyze them and provide insights about:
The user will also provide parameters like price range, product category, brand, size, etc. Filter the results based on the parameters and provide the best options.
- Price ranges and best deals
- Product categories and brands represented
- Size availability where applicable""",
},
{
"role": "user",
"content": f"""Here are some shopping results.
User Parameters:
{"\n".join([f"{key} should be {kwargs[key]}" for key in (kwargs).keys()])}
Results:
{results.model_dump_json(indent=2)}
""",
},
]
response = await llm.beta.chat.completions.parse(
model="gpt-4o-mini",
messages=messages,
temperature=0.7,
response_format=PriceExtractSchemaList,
)
return response.choices[0].message.parsed

Step 5: Define search parameters

Now we'll set up our search query and user parameters. This includes:

  1. The product we're searching for ("New Balance 574" in this example)
  2. User preferences like minimum price, gender, and size
  3. A maximum length limit for the scraped content to prevent processing issues

These parameters will guide our shopping assistant in finding the most relevant products.

query = "New Balance 574"
parameters = {"Min price": 50, "Gender": "male", "Size": "10 or close to it"}
MAX_MARKDOWN_LENGTH = 10000

Step 6: Scrape shopping results

Let's execute our scraping function to get product data from Google Shopping. This step:

  1. Sends a search query to Google Shopping
  2. Uses Hyperbrowser to scrape the search results page
  3. Returns the raw markdown content containing product listings

This is the data collection phase of our shopping assistant.

markdown_content = await scrape_shopping_results(query)

Step 7: Extract and process product information

Now we'll process the raw markdown content to extract structured product information. This step:

  1. Limits the content length if necessary to prevent processing issues
  2. Uses our extraction function with GPT-4o-mini to parse the content
  3. Returns a structured list of products with detailed information

The result will be a comprehensive set of product listings in a structured format ready for analysis and filtering.

if len(markdown_content) > MAX_MARKDOWN_LENGTH:
markdown_content = markdown_content[:MAX_MARKDOWN_LENGTH]
shopping_results = await extract_product_data(markdown_content)

Step 8: Generate personalized recommendations

Finally, we'll analyze the product data and generate personalized recommendations based on the user's parameters. This step:

  1. Takes the extracted product data and user parameters
  2. Uses our analyze_shopping_results() function to filter and analyze the products
  3. Returns a filtered list of recommended products that match the user's preferences

The result will be a tailored set of recommendations that consider factors like gender, price range, and size preferences.

sorted_results = await analyze_shopping_results(shopping_results, **parameters)
if sorted_results:
print("\nRecommended Products:")
print("-" * 50)
for i, product in enumerate(sorted_results.products, 1):
print(f"\n{i}. {product.product_name}")
print(f" Discounted Price: ${product.current_price:.2f}")
if product.actual_price:
print(f" Actual Price: ${product.actual_price:.2f}")
print(f" Brand: {product.product_brand}")
print(f" Category: {product.product_category}")
if product.size:
print(f" Size: {product.size}")
if product.shop:
print(f" Shop: {product.shop}")
print("-" * 50)
else:
print("No products found matching the parameters.")

Recommended Products:

--------------------------------------------------



1. New Balance Men's 574

   Discounted Price: $89.99

   Brand: New Balance

   Category: men

   Shop: New Balance & more

--------------------------------------------------



2. New Balance Numeric Men's 574 Vulc

   Discounted Price: $89.99

   Brand: New Balance

   Category: men

   Shop: New Balance & more

--------------------------------------------------



3. New Balance Men's Golf 574 Greens v2 Shoes

   Discounted Price: $99.99

   Brand: New Balance

   Category: men

   Shop: New Balance & more

--------------------------------------------------



4. New Balance Men's 1906A

   Discounted Price: $169.99

   Brand: New Balance

   Category: men

   Shop: New Balance & more

--------------------------------------------------



5. Men's New Balance 574

   Discounted Price: $90.00

   Brand: New Balance

   Category: men

   Shop: Foot Locker & more

--------------------------------------------------



6. New Balance 574 Men's Shoes

   Discounted Price: $65.00

   Actual Price: $90.00

   Brand: New Balance

   Category: men

   Shop: Finish Line & more

--------------------------------------------------

Conclusion

In this cookbook, we built a powerful shopping assistant using Hyperbrowser and OpenAI's GPT-4o. Our assistant can:

  1. Search for products on Google Shopping using specific queries
  2. Extract detailed product information including prices, brands, and categories
  3. Filter results based on user preferences like price range, gender, and size
  4. Provide personalized product recommendations

This approach combines web scraping, structured data extraction, and AI-powered analysis to create a versatile shopping assistant that can help users find the best products matching their specific requirements.

Next Steps

To take this further, you might consider:

  • Adding support for more shopping platforms (Amazon, Walmart, etc.)
  • Implementing price tracking and deal alerts
  • Creating a web interface for easier interaction
  • Adding product review analysis
  • Integrating with shopping APIs for more reliable data

Happy shopping!