Building a Smart Shopping Assistant with Hyperbrowser and GPT-4o
In this cookbook, we'll build an intelligent shopping assistant that can search for products, extract pricing information, and provide personalized recommendations based on user preferences. Our assistant will:
- Search for products on Google Shopping
- Extract detailed product information including prices, brands, and categories
- Filter results based on user preferences (price range, gender, size, etc.)
- Provide tailored shopping recommendations
We'll use these tools to build our assistant:
- Hyperbrowser for web scraping and data extraction from shopping sites
- OpenAI's GPT-4o for intelligent product analysis and personalized recommendations
By the end of this cookbook, you'll have a versatile shopping assistant that can help you find the best products matching your specific requirements!
Prerequisites
To follow along you'll need the following:
- A Hyperbrowser API key (sign up at hyperbrowser.ai if you don't have one, it's free)
- An OpenAI API key (sign up at openai.com if you don't have one, it's free)
Both API keys should be stored in a .env
file in the same directory as this notebook with the following format:
HYPERBROWSER_API_KEY=your_hyperbrowser_key_here
OPENAI_API_KEY=your_openai_key_here
Step 1: Set up imports and load environment variables
First, we'll import all the necessary libraries and initialize our environment. This includes:
- Hyperbrowser for web scraping and data extraction
- OpenAI for AI-powered analysis
- Pydantic for data validation and modeling
- Other utility libraries for handling async operations and formatting
import asyncioimport jsonimport osfrom dotenv import load_dotenvfrom hyperbrowser import AsyncHyperbrowserfrom hyperbrowser.models.session import CreateSessionParamsfrom hyperbrowser.models.scrape import (StartScrapeJobParams,ScrapeOptions,)from openai import AsyncOpenAIfrom openai.types.chat import (ChatCompletionMessageParam,)from typing import List, Literalfrom pydantic import BaseModelload_dotenv()
Step 2: Initialize API clients
Next, we'll create instances of the Hyperbrowser and OpenAI clients using our API keys. These clients will be responsible for web data extraction and AI-powered product analysis respectively.
hb = AsyncHyperbrowser(api_key=os.getenv("HYPERBROWSER_API_KEY"))llm = AsyncOpenAI()
Step 3: Define data models and scraping functionality
Now we'll define our data models and functions to scrape shopping results. The main components are:
PriceExtractSchema
- Models a single product with details like price, name, brand, and categoryPriceExtractSchemaList
- A container for multiple product listingsscrape_shopping_results()
- Scrapes Google Shopping for a given search queryextract_product_data()
- Uses GPT-4o to extract structured product data from raw scraped content
The scraping function uses Hyperbrowser's advanced features like tag filtering to target only the relevant shopping elements on the page.
from typing import Optionalimport urllib.parseclass PriceExtractSchema(BaseModel):current_price: floatactual_price: Optional[float]product_name: strproduct_category: Literal["women", "men", "children"]product_brand: strshop: strsize: Optional[str]source: Literal["google", "bing"]class PriceExtractSchemaList(BaseModel):products: List[PriceExtractSchema]async def scrape_shopping_results(query: str):# Configure extract parametersscrape_params = StartScrapeJobParams(url=f"https://www.google.com/search?q={urllib.parse.quote_plus(query)}&tbm=shop",session_options=CreateSessionParams(),scrape_options=ScrapeOptions(formats=["markdown"],# Filter out the shopping card elements onlyexclude_tags=["img"],include_tags=["div[jsname='Nhy0ad']"],only_main_content=True,),)scrape_results = await hb.scrape.start_and_wait(scrape_params)if scrape_results.error:raise Exception(scrape_results.error)elif scrape_results.data is None or scrape_results.data.markdown is None:raise Exception("No data found")return scrape_results.data.markdownasync def extract_product_data(markdown_content: str) -> PriceExtractSchemaList:messages: List[ChatCompletionMessageParam] = [{"role": "system","content": """You are a helpful assistant that can search for products on Google Shopping and return the results in a structured format.You will be provided with the markdown content of the page, and have to extract structured data from it regarding the product.""",},{"role": "user","content": f"""Here is the markdown content of the page:{markdown_content}""",},]structured_extraction = await llm.beta.chat.completions.parse(messages=messages,model="gpt-4o-mini",response_format=PriceExtractSchemaList,max_tokens=10000,)if structured_extraction.choices[0].message.parsed is None:raise Exception("No structured data found")return structured_extraction.choices[0].message.parsed
Step 4: Implement product filtering and recommendation
Once we have the raw product data, we need to filter it according to user preferences and provide personalized recommendations. The analyze_shopping_results()
function:
- Takes the list of extracted products and user parameters (price range, gender, size, etc.)
- Uses GPT-4o-mini with a specialized system prompt to analyze the products
- Filters the results to match user preferences
- Returns a structured list of recommended products
This approach combines structured data filtering with AI-powered analysis to provide tailored recommendations.
async def analyze_shopping_results(results: PriceExtractSchemaList, **kwargs):messages: List[ChatCompletionMessageParam] = [{"role": "system","content": """You are a helpful shopping assistant that analyzes product listings and provides insights about pricing and options.Please analyze them and provide insights about:The user will also provide parameters like price range, product category, brand, size, etc. Filter the results based on the parameters and provide the best options.- Price ranges and best deals- Product categories and brands represented- Size availability where applicable""",},{"role": "user","content": f"""Here are some shopping results.User Parameters:{"\n".join([f"{key} should be {kwargs[key]}" for key in (kwargs).keys()])}Results:{results.model_dump_json(indent=2)}""",},]response = await llm.beta.chat.completions.parse(model="gpt-4o-mini",messages=messages,temperature=0.7,response_format=PriceExtractSchemaList,)return response.choices[0].message.parsed
Step 5: Define search parameters
Now we'll set up our search query and user parameters. This includes:
- The product we're searching for ("New Balance 574" in this example)
- User preferences like minimum price, gender, and size
- A maximum length limit for the scraped content to prevent processing issues
These parameters will guide our shopping assistant in finding the most relevant products.
query = "New Balance 574"parameters = {"Min price": 50, "Gender": "male", "Size": "10 or close to it"}MAX_MARKDOWN_LENGTH = 10000
Step 6: Scrape shopping results
Let's execute our scraping function to get product data from Google Shopping. This step:
- Sends a search query to Google Shopping
- Uses Hyperbrowser to scrape the search results page
- Returns the raw markdown content containing product listings
This is the data collection phase of our shopping assistant.
markdown_content = await scrape_shopping_results(query)
Step 7: Extract and process product information
Now we'll process the raw markdown content to extract structured product information. This step:
- Limits the content length if necessary to prevent processing issues
- Uses our extraction function with GPT-4o-mini to parse the content
- Returns a structured list of products with detailed information
The result will be a comprehensive set of product listings in a structured format ready for analysis and filtering.
if len(markdown_content) > MAX_MARKDOWN_LENGTH:markdown_content = markdown_content[:MAX_MARKDOWN_LENGTH]shopping_results = await extract_product_data(markdown_content)
Step 8: Generate personalized recommendations
Finally, we'll analyze the product data and generate personalized recommendations based on the user's parameters. This step:
- Takes the extracted product data and user parameters
- Uses our
analyze_shopping_results()
function to filter and analyze the products - Returns a filtered list of recommended products that match the user's preferences
The result will be a tailored set of recommendations that consider factors like gender, price range, and size preferences.
sorted_results = await analyze_shopping_results(shopping_results, **parameters)
if sorted_results:print("\nRecommended Products:")print("-" * 50)for i, product in enumerate(sorted_results.products, 1):print(f"\n{i}. {product.product_name}")print(f" Discounted Price: ${product.current_price:.2f}")if product.actual_price:print(f" Actual Price: ${product.actual_price:.2f}")print(f" Brand: {product.product_brand}")print(f" Category: {product.product_category}")if product.size:print(f" Size: {product.size}")if product.shop:print(f" Shop: {product.shop}")print("-" * 50)else:print("No products found matching the parameters.")
Recommended Products: -------------------------------------------------- 1. New Balance Men's 574 Discounted Price: $89.99 Brand: New Balance Category: men Shop: New Balance & more -------------------------------------------------- 2. New Balance Numeric Men's 574 Vulc Discounted Price: $89.99 Brand: New Balance Category: men Shop: New Balance & more -------------------------------------------------- 3. New Balance Men's Golf 574 Greens v2 Shoes Discounted Price: $99.99 Brand: New Balance Category: men Shop: New Balance & more -------------------------------------------------- 4. New Balance Men's 1906A Discounted Price: $169.99 Brand: New Balance Category: men Shop: New Balance & more -------------------------------------------------- 5. Men's New Balance 574 Discounted Price: $90.00 Brand: New Balance Category: men Shop: Foot Locker & more -------------------------------------------------- 6. New Balance 574 Men's Shoes Discounted Price: $65.00 Actual Price: $90.00 Brand: New Balance Category: men Shop: Finish Line & more --------------------------------------------------
Conclusion
In this cookbook, we built a powerful shopping assistant using Hyperbrowser and OpenAI's GPT-4o. Our assistant can:
- Search for products on Google Shopping using specific queries
- Extract detailed product information including prices, brands, and categories
- Filter results based on user preferences like price range, gender, and size
- Provide personalized product recommendations
This approach combines web scraping, structured data extraction, and AI-powered analysis to create a versatile shopping assistant that can help users find the best products matching their specific requirements.
Next Steps
To take this further, you might consider:
- Adding support for more shopping platforms (Amazon, Walmart, etc.)
- Implementing price tracking and deal alerts
- Creating a web interface for easier interaction
- Adding product review analysis
- Integrating with shopping APIs for more reliable data
Happy shopping!