Building a Product Review Analyzer with Hyperbrowser and GPT-4o
In this cookbook, we'll create an intelligent review analyzer that can automatically extract and summarize product reviews from e-commerce websites. This agent will:
- Visit any product review page
- Extract review content using web scraping
- Analyze sentiment, pros, cons, and common themes
- Generate a comprehensive summary with actionable insights
- Answer specific questions about customer feedback
This approach combines:
- Hyperbrowser for web scraping and content extraction
- OpenAI's GPT-4o for intelligent analysis and insight generation
By the end of this cookbook, you'll have a powerful tool that can help businesses understand customer sentiment and identify product improvement opportunities!
Prerequisites
Before starting, you'll need:
- A Hyperbrowser API key (sign up at hyperbrowser.ai if you don't have one)
- An OpenAI API key with access to GPT-4o
Store these API keys in a .env
file in the same directory as this notebook:
HYPERBROWSER_API_KEY=your_hyperbrowser_key_here
OPENAI_API_KEY=your_openai_key_here
Step 1: Set up imports and load environment variables
We start by importing the necessary packages and initializing our environment variables. The key libraries we'll use include:
asyncio
for handling asynchronous operationshyperbrowser
for web scraping and content extractionopenai
for intelligent analysis and insight generationIPython.display
for rendering markdown output in the notebook
import asyncioimport jsonimport osfrom dotenv import load_dotenvfrom hyperbrowser import AsyncHyperbrowserfrom hyperbrowser.tools import WebsiteScrapeToolfrom openai import AsyncOpenAIfrom openai.types.chat import (ChatCompletionMessageParam,ChatCompletionMessageToolCall,ChatCompletionToolMessageParam,)from IPython.display import Markdown, displayload_dotenv()
Step 2: Initialize API clients
Here we create instances of the Hyperbrowser and OpenAI clients using our API keys. These clients will be responsible for web scraping and intelligent analysis respectively.
hb = AsyncHyperbrowser(api_key=os.getenv("HYPERBROWSER_API_KEY"))llm = AsyncOpenAI()
Step 3: Implement the tool handler
The tool handler function processes requests from the LLM to interact with our web scraping functionality. It:
- Receives tool call parameters from the LLM
- Validates that the requested tool is available
- Configures advanced scraping options like proxy usage and CAPTCHA solving
- Executes the web scraping operation
- Returns the scraped content or handles any errors that occur
This function is crucial for enabling the LLM to access web content dynamically.
async def handle_tool_call(tc: ChatCompletionMessageToolCall,) -> ChatCompletionToolMessageParam:print(f"Handling tool call: {tc.function.name}")try:if (tc.function.name!= WebsiteScrapeTool.openai_tool_definition["function"]["name"]):raise ValueError(f"Tool not found: {tc.function.name}")args = json.loads(tc.function.arguments)print(args)content = await WebsiteScrapeTool.async_runnable(hb=hb,params=dict(**args,session_options={"use_proxy": True, "solve_captchas": True},),)return {"role": "tool", "tool_call_id": tc.id, "content": content}except Exception as e:err_msg = f"Error handling tool call: {e}"print(err_msg)return {"role": "tool","tool_call_id": tc.id,"content": err_msg,"is_error": True, # type: ignore}
Step 4: Create the agent loop
Now we implement the core agent loop that orchestrates the conversation between:
- The user (who asks about product reviews)
- The LLM (which analyzes the request and determines what information is needed)
- Our tool (which fetches review content from websites)
This recursive pattern allows for sophisticated interactions where the agent can gather information iteratively, making multiple web scraping requests if necessary to fully understand the reviews before generating insights.
async def agent_loop(messages: list[ChatCompletionMessageParam]) -> str:while True:response = await llm.chat.completions.create(messages=messages,model="gpt-4o",tools=[WebsiteScrapeTool.openai_tool_definition,],max_completion_tokens=8000,)choice = response.choices[0]# Append response to messagesmessages.append(choice.message) # type: ignore# Handle tool callsif (choice.finish_reason == "tool_calls"and choice.message.tool_calls is not None):tool_result_messages = await asyncio.gather(*[handle_tool_call(tc) for tc in choice.message.tool_calls])messages.extend(tool_result_messages)elif choice.finish_reason == "stop" and choice.message.content is not None:return choice.message.contentelse:print(choice)raise ValueError(f"Unhandled finish reason: {choice.finish_reason}")
Step 5: Design the system prompt
The system prompt is crucial for guiding the LLM's behavior. Our prompt establishes the agent as an expert review analyzer that can:
- Extract review content from product pages
- Analyze overall sentiment and rating distribution
- Identify common pros and cons mentioned by customers
- Detect any issues with the company or service
- Answer specific questions about the reviews
This structured approach ensures that the analysis is comprehensive and actionable.
SYSTEM_PROMPT = """You are an expert review analyzer. You have access to a 'scrape_webpage' tool which can be used to get markdown data from a webpage.This is the link to the review page {link}. You are required to analyze the markdown content from the page, and provide a summary of the reviews. You will provide the following info:1. The overall sentiment towards the product2. The number of reviews3. [Optional] The number of reviews with 1 star, 2 stars, 3 stars, 4 stars, 5 stars4. The cons of the product5. The pros of the product6. Any issues with the company or serviceIf the user provides you with a question regarding the reviews, provide that information as well.Provide the total info in markdown format.""".strip()
Step 6: Create a factory function for generating review analyzers
Now we'll create a factory function that generates a specialized review analyzer for any product page. This function:
- Takes a URL to a review page as input
- Ensures the URL has the proper format (adding https:// if needed)
- Formats the system prompt with this URL
- Returns a function that can answer questions about the reviews on that page
This approach makes our solution reusable for analyzing reviews of any product across different e-commerce platforms.
from typing import Coroutine, Any, Callabledef make_review_analyzer(link_to_review: str,) -> Callable[..., Coroutine[Any, Any, str]]:# Popular documentation providers like Gitbook, Mintlify etc automatically generate a llms.txt file# for documentation sites hosted on their platforms.if not (link_to_review.startswith("http://") or link_to_review.startswith("https://")):link_to_review = f"https://{link_to_review}"sysprompt = SYSTEM_PROMPT.format(link=link_to_review,)async def review_analyzer(question: str) -> str:messages: list[ChatCompletionMessageParam] = [{"role": "system", "content": sysprompt}]if question:messages.append({"role": "user", "content": question})return await agent_loop(messages)return review_analyzer
Step 7: Test the review analyzer
Let's test our agent by analyzing reviews for a MacBook Air on Best Buy. We'll ask a specific question about how to improve the product and what customers like most about it. This demonstrates the agent's ability to not just summarize reviews but also extract actionable insights.
question = "How can I improve this product? and what do people like the most about it ?"link = "https://www.bestbuy.com/site/reviews/apple-macbook-air-13-inch-apple-m2-chip-built-for-apple-intelligence-16gb-memory-256gb-ssd-midnight/6602763?variant=A"
Step 8: Run the analysis and display results
Now we'll create an instance of our review analyzer for the MacBook Air page, run the analysis with our specific question, and display the results in a nicely formatted markdown output. The agent will scrape the review page, analyze the content, and provide insights about potential improvements and customer preferences.
review_analyzer = make_review_analyzer(link)response = await review_analyzer(question)if response is not None:display(Markdown(response))else:print("Could not process response")
Handling tool call: scrape_webpage {'url': 'https://www.bestbuy.com/site/reviews/apple-macbook-air-13-inch-apple-m2-chip-built-for-apple-intelligence-16gb-memory-256gb-ssd-midnight/6602763?variant=A', 'scrape_options': {'formats': ['markdown'], 'include_tags': ['.reviews-content'], 'exclude_tags': [], 'only_main_content': True}}
I will analyze the reviews from the webpage to answer your question about what users like the most and how the product can be improved. Please allow me a moment to complete this analysis.
I am currently unable to retrieve the data directly from the webpage. However, I can guide you on how to improve the Apple MacBook Air and highlight its most liked features based on general knowledge and typical user feedback.
Common Pros of the Apple MacBook Air:
-
Performance: Powered by the M2 chip, it offers improved speed and efficiency for day-to-day tasks and demanding applications.
-
Battery Life: Users often appreciate the long battery life which allows for extended use without needing to recharge.
-
Design: Its sleek, lightweight design makes it highly portable and stylish.
-
Display: The retina display provides clear and vibrant visuals which is a major plus for users.
-
Quiet Operation: The fanless design contributes to silent operation, which many users find appealing.
Common Cons and Areas for Improvement:
-
Price: The cost may be a barrier for some potential buyers, who feel it doesn't justify the benefits compared to alternatives.
-
Ports: A limited number of ports has been an issue, leading to reliance on adapters.
-
Repairability: Repair options can be costly and limited, as is typical with Apple products.
-
Customization: Users often mention the need for more customization options in terms of hardware upgrades.
For the most accurate and specific insights, I recommend exploring review websites or forums for updated user opinions.
Future Explorations
There are many exciting ways to extend and enhance this review analyzer. Here are some possibilities for developers and users to explore:
Advanced Analysis Features
- Demographic Segmentation: Identify if different user groups have different experiences.
- Comparative Analysis: Compare reviews across multiple products
- Interactive Dashboards: Build visualization dashboards for review insights.
Technical Enhancements
- Multi-platform Integration: Analyze reviews from multiple sources.
- Real-time Monitoring: Continuously monitor new reviews and alert on significant deviations.
- Automatic customer support: A review analysis agent could help customers with common issues they may face, improving the sentiment towards the product.
All, or even some of these features could make the review analyzer evolve from a useful tool into a comprehensive intel agent. These could provide some interesting ideas for the direction of evolution for such an agent!
Conclusion
In this cookbook, we built a powerful review analyzer using Hyperbrowser and GPT-4o. This agent can:
- Automatically extract review content from any product page
- Analyze sentiment and identify common themes
- Summarize pros, cons, and customer experiences
- Answer specific questions about customer feedback
- Provide actionable insights for product improvement
This pattern can be extended to create more sophisticated review analysis tools, such as:
- Competitive analysis by comparing reviews across similar products
- Trend analysis by tracking sentiment changes over time
- Feature prioritization based on customer feedback
- Automated customer support response generation
Happy analyzing! 📊