Building a Product Review Analyzer with Hyperbrowser and GPT-4o

In this cookbook, we'll create an intelligent review analyzer that can automatically extract and summarize product reviews from e-commerce websites. This agent will:

Visit any product review page
Extract review content using web scraping
Analyze sentiment, pros, cons, and common themes
Generate a comprehensive summary with actionable insights
Answer specific questions about customer feedback

This approach combines:

Hyperbrowser for web scraping and content extraction
OpenAI's GPT-4o for intelligent analysis and insight generation

By the end of this cookbook, you'll have a powerful tool that can help businesses understand customer sentiment and identify product improvement opportunities!

Prerequisites

Before starting, you'll need:

A Hyperbrowser API key (sign up at hyperbrowser.ai if you don't have one)
An OpenAI API key with access to GPT-4o

Store these API keys in a .env file in the same directory as this notebook:

HYPERBROWSER_API_KEY=your_hyperbrowser_key_here
OPENAI_API_KEY=your_openai_key_here

Step 1: Set up imports and load environment variables

We start by importing the necessary packages and initializing our environment variables. The key libraries we'll use include:

asyncio for handling asynchronous operations
hyperbrowser for web scraping and content extraction
openai for intelligent analysis and insight generation
IPython.display for rendering markdown output in the notebook

import asyncio
import json
import os

from dotenv import load_dotenv
from hyperbrowser import AsyncHyperbrowser
from hyperbrowser.tools import WebsiteScrapeTool
from openai import AsyncOpenAI
from openai.types.chat import (
    ChatCompletionMessageParam,
    ChatCompletionMessageToolCall,
    ChatCompletionToolMessageParam,
)
from IPython.display import Markdown, display

load_dotenv()

Step 2: Initialize API clients

Here we create instances of the Hyperbrowser and OpenAI clients using our API keys. These clients will be responsible for web scraping and intelligent analysis respectively.

hb = AsyncHyperbrowser(api_key=os.getenv("HYPERBROWSER_API_KEY"))
llm = AsyncOpenAI()

Step 3: Implement the tool handler

The tool handler function processes requests from the LLM to interact with our web scraping functionality. It:

Receives tool call parameters from the LLM
Validates that the requested tool is available
Configures advanced scraping options like proxy usage and CAPTCHA solving
Executes the web scraping operation
Returns the scraped content or handles any errors that occur

This function is crucial for enabling the LLM to access web content dynamically.

async def handle_tool_call(
    tc: ChatCompletionMessageToolCall,
) -> ChatCompletionToolMessageParam:
    print(f"Handling tool call: {tc.function.name}")

    try:
        if (
            tc.function.name
            != WebsiteScrapeTool.openai_tool_definition["function"]["name"]
        ):
            raise ValueError(f"Tool not found: {tc.function.name}")

        args = json.loads(tc.function.arguments)
        print(args)

        content = await WebsiteScrapeTool.async_runnable(
            hb=hb,
            params=dict(
                **args,
                session_options={"use_proxy": True, "solve_captchas": True},
            ),
        )

        return {"role": "tool", "tool_call_id": tc.id, "content": content}

    except Exception as e:
        err_msg = f"Error handling tool call: {e}"
        print(err_msg)
        return {
            "role": "tool",
            "tool_call_id": tc.id,
            "content": err_msg,
            "is_error": True,  # type: ignore
        }

Step 4: Create the agent loop

Now we implement the core agent loop that orchestrates the conversation between:

The user (who asks about product reviews)
The LLM (which analyzes the request and determines what information is needed)
Our tool (which fetches review content from websites)

This recursive pattern allows for sophisticated interactions where the agent can gather information iteratively, making multiple web scraping requests if necessary to fully understand the reviews before generating insights.

async def agent_loop(messages: list[ChatCompletionMessageParam]) -> str:
    while True:
        response = await llm.chat.completions.create(
            messages=messages,
            model="gpt-4o",
            tools=[
                WebsiteScrapeTool.openai_tool_definition,
            ],
            max_completion_tokens=8000,
        )

        choice = response.choices[0]

        # Append response to messages
        messages.append(choice.message)  # type: ignore

        # Handle tool calls
        if (
            choice.finish_reason == "tool_calls"
            and choice.message.tool_calls is not None
        ):
            tool_result_messages = await asyncio.gather(
                *[handle_tool_call(tc) for tc in choice.message.tool_calls]
            )
            messages.extend(tool_result_messages)

        elif choice.finish_reason == "stop" and choice.message.content is not None:
            return choice.message.content

        else:
            print(choice)
            raise ValueError(f"Unhandled finish reason: {choice.finish_reason}")

Step 5: Design the system prompt

The system prompt is crucial for guiding the LLM's behavior. Our prompt establishes the agent as an expert review analyzer that can:

Extract review content from product pages
Analyze overall sentiment and rating distribution
Identify common pros and cons mentioned by customers
Detect any issues with the company or service
Answer specific questions about the reviews

This structured approach ensures that the analysis is comprehensive and actionable.

SYSTEM_PROMPT = """
You are an expert review analyzer. You have access to a 'scrape_webpage' tool which can be used to get markdown data from a webpage. 

This is the link to the review page {link}. You are required to analyze the markdown content from the page, and provide a summary of the reviews. You will provide the following info:
1. The overall sentiment towards the product
2. The number of reviews
3. [Optional] The number of reviews with 1 star, 2 stars, 3 stars, 4 stars, 5 stars 
4. The cons of the product
5. The pros of the product
6. Any issues with the company or service

If the user provides you with a question regarding the reviews, provide that information as well.

Provide the total info in markdown format. 
""".strip()

Step 6: Create a factory function for generating review analyzers

Now we'll create a factory function that generates a specialized review analyzer for any product page. This function:

Takes a URL to a review page as input
Ensures the URL has the proper format (adding https:// if needed)
Formats the system prompt with this URL
Returns a function that can answer questions about the reviews on that page

This approach makes our solution reusable for analyzing reviews of any product across different e-commerce platforms.

from typing import Coroutine, Any, Callable


def make_review_analyzer(
    link_to_review: str,
) -> Callable[..., Coroutine[Any, Any, str]]:
    # Popular documentation providers like Gitbook, Mintlify etc automatically generate a llms.txt file
    # for documentation sites hosted on their platforms.
    if not (
        link_to_review.startswith("http://") or link_to_review.startswith("https://")
    ):
        link_to_review = f"https://{link_to_review}"

    sysprompt = SYSTEM_PROMPT.format(
        link=link_to_review,
    )

    async def review_analyzer(question: str) -> str:
        messages: list[ChatCompletionMessageParam] = [
            {"role": "system", "content": sysprompt}
        ]

        if question:
            messages.append({"role": "user", "content": question})

        return await agent_loop(messages)

    return review_analyzer

Step 7: Test the review analyzer

Let's test our agent by analyzing reviews for a MacBook Air on Best Buy. We'll ask a specific question about how to improve the product and what customers like most about it. This demonstrates the agent's ability to not just summarize reviews but also extract actionable insights.

question = "How can I improve this product? and what do people like the most about it ?"
link = "https://www.bestbuy.com/site/reviews/apple-macbook-air-13-inch-apple-m2-chip-built-for-apple-intelligence-16gb-memory-256gb-ssd-midnight/6602763?variant=A"

Step 8: Run the analysis and display results

Now we'll create an instance of our review analyzer for the MacBook Air page, run the analysis with our specific question, and display the results in a nicely formatted markdown output. The agent will scrape the review page, analyze the content, and provide insights about potential improvements and customer preferences.

review_analyzer = make_review_analyzer(link)
response = await review_analyzer(question)

if response is not None:
    display(Markdown(response))
else:
    print("Could not process response")

Handling tool call: scrape_webpage

{'url': 'https://www.bestbuy.com/site/reviews/apple-macbook-air-13-inch-apple-m2-chip-built-for-apple-intelligence-16gb-memory-256gb-ssd-midnight/6602763?variant=A', 'scrape_options': {'formats': ['markdown'], 'include_tags': ['.reviews-content'], 'exclude_tags': [], 'only_main_content': True}}

I will analyze the reviews from the webpage to answer your question about what users like the most and how the product can be improved. Please allow me a moment to complete this analysis.

I am currently unable to retrieve the data directly from the webpage. However, I can guide you on how to improve the Apple MacBook Air and highlight its most liked features based on general knowledge and typical user feedback.

Common Pros of the Apple MacBook Air:

Performance: Powered by the M2 chip, it offers improved speed and efficiency for day-to-day tasks and demanding applications.
Battery Life: Users often appreciate the long battery life which allows for extended use without needing to recharge.
Design: Its sleek, lightweight design makes it highly portable and stylish.
Display: The retina display provides clear and vibrant visuals which is a major plus for users.
Quiet Operation: The fanless design contributes to silent operation, which many users find appealing.

Common Cons and Areas for Improvement:

Price: The cost may be a barrier for some potential buyers, who feel it doesn't justify the benefits compared to alternatives.
Ports: A limited number of ports has been an issue, leading to reliance on adapters.
Repairability: Repair options can be costly and limited, as is typical with Apple products.
Customization: Users often mention the need for more customization options in terms of hardware upgrades.

For the most accurate and specific insights, I recommend exploring review websites or forums for updated user opinions.

Future Explorations

There are many exciting ways to extend and enhance this review analyzer. Here are some possibilities for developers and users to explore:

Advanced Analysis Features

Demographic Segmentation: Identify if different user groups have different experiences.
Comparative Analysis: Compare reviews across multiple products
Interactive Dashboards: Build visualization dashboards for review insights.

Technical Enhancements

Multi-platform Integration: Analyze reviews from multiple sources.
Real-time Monitoring: Continuously monitor new reviews and alert on significant deviations.
Automatic customer support: A review analysis agent could help customers with common issues they may face, improving the sentiment towards the product.

All, or even some of these features could make the review analyzer evolve from a useful tool into a comprehensive intel agent. These could provide some interesting ideas for the direction of evolution for such an agent!

Conclusion

In this cookbook, we built a powerful review analyzer using Hyperbrowser and GPT-4o. This agent can:

Automatically extract review content from any product page
Analyze sentiment and identify common themes
Summarize pros, cons, and customer experiences
Answer specific questions about customer feedback
Provide actionable insights for product improvement

This pattern can be extended to create more sophisticated review analysis tools, such as:

Competitive analysis by comparing reviews across similar products
Trend analysis by tracking sentiment changes over time
Feature prioritization based on customer feedback
Automated customer support response generation

Happy analyzing! 📊