Building an AI Movie Review Researcher with Hyperbrowser and OpenAI

In this cookbook, we'll build a powerful Movie Review Researcher that can analyze critical and audience reception for any film. This agent will:

Search the web for professional reviews and audience opinions
Extract and analyze content from the most relevant review sites
Synthesize findings into a comprehensive report that highlights praise, criticism, and overall reception
Track sentiment changes over time when reviews span multiple years

By combining Hyperbrowser's web scraping capabilities with OpenAI's language models, we'll create a tool that can extract nuanced insights from critical discourse - saving hours of manual research for film enthusiasts, critics, and industry professionals.

Prerequisites

Before starting, you'll need:

A Hyperbrowser API key (sign up at hyperbrowser.ai if you don't have one)
An OpenAI API key for accessing GPT-4o-mini
Python 3.9+ with asyncio support

Store your API keys in a .env file in the same directory as this notebook:

HYPERBROWSER_API_KEY=your_hyperbrowser_key_here
OPENAI_API_KEY=your_openai_key_here

Step 1: Set up imports and load environment variables

import os
import asyncio
import json

from urllib.parse import urlencode
from typing import Optional

from dotenv import load_dotenv
from hyperbrowser import AsyncHyperbrowser
from hyperbrowser.models.extract import StartExtractJobParams
from hyperbrowser.models.session import CreateSessionParams
from hyperbrowser.tools import WebsiteScrapeTool
from openai.types.chat import (
    ChatCompletionToolParam,
    ChatCompletionMessageToolCall,
    ChatCompletionToolMessageParam,
    ChatCompletionContentPartTextParam,
    ChatCompletionMessageParam,
)
from openai import AsyncOpenAI
from pydantic import BaseModel

from typing_extensions import TypeVar
from IPython.display import display, Markdown

load_dotenv()

Step 2: Initialize clients

Now we'll create instances of the APIs we'll be using: OpenAI for the language model and Hyperbrowser for web searches and content extraction. The AsyncHyperbrowser client allows us to perform multiple operations concurrently.

oai = AsyncOpenAI()
hb = AsyncHyperbrowser(api_key=os.getenv("HYPERBROWSER_API_KEY"))

Step 3: Define the search functionality

Our first core component is the search function that finds review content across the web. This function:

Constructs a search query for the movie reviews
Uses Bing's search engine for comprehensive results
Extracts structured data from the search results page
Implements pagination support for thorough research

We'll use Pydantic models to enforce type safety and ensure our data is properly structured.

class SearchResult(BaseModel):
    """A search result from Bing"""

    title: str
    url: str
    content: str

    def __str__(self):
        return f"Title: {self.title}\nURL: {self.url}\nContent: {self.content}"


class SearchResultList(BaseModel):
    """A list of search results from Bing"""

    results: list[SearchResult]
    total_results_extracted: Optional[int]

    def __str__(self):
        return f"\n\n{'-' * 10}\n\n".join(str(result) for result in self.results)


async def search_tool(movie_name: str, page: int = 1) -> SearchResultList | None:
    if page > 3:
        raise ValueError("Cannot extract results for page number greater than 3.")

    params = urlencode(
        {
            "q": f"{movie_name} reviews",
            "first": max(page - 1, 0) * 10,
            "qs": "HS",
            "FORM": "QBLH",
            "sp": 1,
            "ghc": 1,
            "lq": 0,
        }
    )
    url = f"https://www.bing.com/search?{params}"

    print(movie_name, page, url)

    result = await hb.extract.start_and_wait(
        StartExtractJobParams(
            urls=[url],
            prompt="Extract the title, url, and content of the top 10 search results on this page.",
            schema=SearchResultList,
            session_options=CreateSessionParams(
                use_proxy=True,
                adblock=True,
                trackers=True,
                annoyances=True,
                solve_captchas=True,
            ),
        )
    )

    if not (result.status == "completed" and result.data):
        print(result)
        raise Exception("Failed to extract search results")

    return SearchResultList.model_validate({**result.data})

Step 4: Implement content scraping with rate limiting

Now we'll build the content scraping function with built-in rate limiting. This approach:

Tracks how many pages have been scraped to prevent overwhelming servers
Implements advanced scraping options for clean content extraction
Formats the extracted content for easy processing by the language model

Of note here is that the LLMs can often mess up with counters. Instead, what might be suitable is an explicit counter inside the tool use call itself to limit the resource usage.

This controlled approach ensures we can gather substantial data while respecting website resources.

total_pages_scraped = 0

# Create a lock for safely updating the counter in async functions
pages_scraped_lock = asyncio.Lock()


async def increment_pages_scraped(amount: int = 1):
    """Safely increment the total_pages_scraped counter in async context"""
    global total_pages_scraped
    async with pages_scraped_lock:
        total_pages_scraped += amount
        return total_pages_scraped


async def get_pages_scraped():
    global total_pages_scraped
    async with pages_scraped_lock:
        return total_pages_scraped

Step 5: Define tool interfaces for the language model

For our agent to work, we need to define the tools it can use in a format that OpenAI's function calling system understands. We'll create two tools:

search_reviews: Finds movie reviews using Bing search
scrape_webpage: Extracts detailed content from review websites

These tool definitions include parameter schemas that ensure proper validation of inputs when the AI decides to use them.

MOVIE_SEARCH_TOOL: ChatCompletionToolParam = {
    "type": "function",
    "function": {
        "name": "search_reviews",
        "description": "Search for information about a movies reviews using Bing.",
        "parameters": {
            "type": "object",
            "properties": {
                "movie_name": {
                    "type": "string",
                    "description": "The name of the movie to search for",
                },
                "page": {
                    "type": "integer",
                    "description": "The page number of search results to retrieve",
                },
            },
            "required": ["movie_name", "page"],
            "additionalProperties": False,
        },
        "strict": True,
    },
}

Step 6: Create a tool handler for function calling

The tool handler processes function calls from the language model and returns the results. This component:

Routes each tool call to the appropriate function
Handles error conditions gracefully
Formats the output according to OpenAI's tool message specifications
Manages different content types for search results vs. scraped content

This pattern bridges the gap between the language model's reasoning capabilities and our concrete web interaction functions.

async def handle_tool_call(
    tc: ChatCompletionMessageToolCall,
) -> ChatCompletionToolMessageParam:
    print(f"Handling tool call: {tc.function.name}")

    try:
        if (
            tc.function.name
            == WebsiteScrapeTool.openai_tool_definition["function"]["name"]
        ):
            args = json.loads(tc.function.arguments)
            print(args)
            content_raw = await WebsiteScrapeTool.async_runnable(hb, args)
            content = f"<website>\n<url>{args["url"]}</url>\n<content>\n{content_raw}\n</content>\n</website>"

            return ChatCompletionToolMessageParam(
                {"role": "tool", "tool_call_id": tc.id, "content": content}
            )
        elif tc.function.name == MOVIE_SEARCH_TOOL["function"]["name"]:
            args = json.loads(tc.function.arguments)
            print(args)
            content = await search_tool(**args)
            if content is None or content.results is None:
                raise ValueError("Response from search is none")
            else:
                return ChatCompletionToolMessageParam(
                    {
                        "role": "tool",
                        "tool_call_id": tc.id,
                        "content": [
                            ChatCompletionContentPartTextParam(
                                text=str(search_result), type="text"
                            )
                            for search_result in content.results
                        ],
                    }
                )
        else:
            raise Exception(f"No tool call handler for {tc.function.name}")
    except Exception as e:
        err_msg = f"Error handling tool call: {e}"
        print(err_msg)
        return ChatCompletionToolMessageParam(
            {
                "role": "tool",
                "tool_call_id": tc.id,
                "content": err_msg,
                "is_error": True,  # type: ignore
            }
        )

Step 7: Implement the agent loop

The agent loop orchestrates the interaction between the language model and our tools. This function:

Sends the current conversation and tools to the OpenAI API
Processes tool calls when the model decides to use them
Handles structured output using Pydantic models
Continues the loop until the model provides a final answer

This recursive approach allows for complex multi-step reasoning, where the model can search for general reviews, identify trusted sources, and then dive deeper into specific analyses.

ResponseFormatT = TypeVar(
    "ResponseFormatT",
    # if it isn't given then we don't do any parsing
    default=None,
)


async def agent_loop(
    messages: list[ChatCompletionMessageParam], response_format: type[ResponseFormatT]
):
    while True:
        response = await oai.beta.chat.completions.parse(
            messages=messages,
            model="gpt-4o-mini",
            tools=[MOVIE_SEARCH_TOOL, WebsiteScrapeTool.openai_tool_definition],
            tool_choice="auto",
            response_format=response_format,
        )

        choice = response.choices[0]

        # Append response to messages
        messages.append(choice.message)  # type: ignore

        # Handle tool calls
        if (
            choice.finish_reason == "tool_calls"
            and choice.message.tool_calls is not None
        ):
            tool_result_messages = await asyncio.gather(
                *[handle_tool_call(tc) for tc in choice.message.tool_calls]
            )
            messages.extend(tool_result_messages)

        elif choice.finish_reason == "stop" and choice.message.parsed is not None:
            return choice.message.parsed

        else:
            print(choice)
            raise ValueError(f"Unhandled finish reason: {choice.finish_reason}")

Step 8: Define data model and system prompt

Now we'll create the structures for our final output and craft the system prompt that guides the AI's analysis. The system prompt is crucial - it instructs the model on:

The depth and breadth of analysis expected
How to identify and analyze criticisms and praise
When and how to use each available tool
The format for presenting findings

The ResearchAnalysis model enforces a clear separation between the model's reasoning process and the final polished report.

class ResearchAnalysis(BaseModel):
    chain_of_thought: str
    report: str

    def __str__(self):
        return f"Chain of Thought:\n{self.chain_of_thought}\n\nReport:\n{self.report}"


FINAL_REPORT_SYSTEM_PROMPT = """
You are an expert movie assistant. You are working for a movie production company and your job is to compile information about a certain movies.For this, you will be provided with the movie name. In addition, you will be provided with a certain set of tools to accomplish your job. These tools are

 - A web search tool
   - The tool takes in the movie name, and optionally the search page.  
   - The tool searches bing and returns to you the url, title, and some basic content about the individual search result itself
 - A batch scrape tool
   - The tool takes in a list of urls
   - The tool returns the url, and the markdown content from the website. 

Do not scrape more than 3 pages of results. Also deduplicate the search results once you get them. Make sure that when you're scraping results, you don't go over 10 pages scraped.

Your job is to compile any criticisms, and any praise that the reviews had for the movie. Make sure that the reviews you are getting are for that particular movie and not for a sequel. Also compile the overall opinion people had about the movie. Make clear note of the date of the review. Use this to also inform me if the sentiment about the movie has changed over time. 

Additionally, you should also maintain a scratchpad for your own notes and thoughts about the company as you draft the report.

You must respond with both your chain of thought and the final report.""".strip()


async def research_movie(movie_name: str) -> Optional[ResearchAnalysis]:
    analysis = await agent_loop(
        [
            {
                "role": "system",
                "content": FINAL_REPORT_SYSTEM_PROMPT,
            },
            {
                "role": "user",
                "content": (f"Get me the consensus for the move {movie_name}"),
            },
        ],
        response_format=ResearchAnalysis,
    )

    return analysis

Step 9: Test the researcher with a real movie

Finally, let's put our researcher to work by analyzing reviews for Christopher Nolan's "Inception." This test will show the full workflow:

The agent searches for Inception reviews across multiple pages
It identifies and scrapes the most relevant review sites
It analyzes the content, extracting key praise and criticisms
It synthesizes a comprehensive report that captures critical consensus

You'll see real-time logging of the search and scraping operations as they happen.

analysis = await research_movie("Inception")

Handling tool call: search_reviews

{'movie_name': 'Inception', 'page': 1}

Inception 1 https://www.bing.com/search?q=Inception+reviews&first=0&qs=HS&FORM=QBLH&sp=1&ghc=1&lq=0

Handling tool call: search_reviews

{'movie_name': 'Inception', 'page': 2}

Inception 2 https://www.bing.com/search?q=Inception+reviews&first=10&qs=HS&FORM=QBLH&sp=1&ghc=1&lq=0

Handling tool call: scrape_webpage

{'url': 'https://www.example.com/inception-review-1', 'scrape_options': {'formats': ['markdown'], 'include_tags': ['h1', 'h2', 'p'], 'exclude_tags': [], 'only_main_content': True}}

Handling tool call: scrape_webpage

{'url': 'https://www.example.com/inception-review-2', 'scrape_options': {'formats': ['markdown'], 'include_tags': ['h1', 'h2', 'p'], 'exclude_tags': [], 'only_main_content': True}}

Handling tool call: scrape_webpage

{'url': 'https://www.example.com/inception-review-3', 'scrape_options': {'formats': ['markdown'], 'include_tags': ['h1', 'h2', 'p'], 'exclude_tags': [], 'only_main_content': True}}

Handling tool call: scrape_webpage

{'url': 'https://www.example.com/inception-review-4', 'scrape_options': {'formats': ['markdown'], 'include_tags': ['h1', 'h2', 'p'], 'exclude_tags': [], 'only_main_content': True}}

Handling tool call: scrape_webpage

{'url': 'https://www.example.com/inception-review-5', 'scrape_options': {'formats': ['markdown'], 'include_tags': ['h1', 'h2', 'p'], 'exclude_tags': [], 'only_main_content': True}}

Handling tool call: scrape_webpage

{'url': 'https://www.ign.com/articles/2010/07/06/inception-review', 'scrape_options': {'formats': ['markdown'], 'include_tags': ['h1', 'h2', 'p'], 'exclude_tags': [], 'only_main_content': True}}

Handling tool call: scrape_webpage

{'url': 'https://www.deepfocusreview.com/reviews/inception', 'scrape_options': {'formats': ['markdown'], 'include_tags': ['h1', 'h2', 'p'], 'exclude_tags': [], 'only_main_content': True}}

Handling tool call: scrape_webpage

{'url': 'https://www.theartsdesk.com/film/inception', 'scrape_options': {'formats': ['markdown'], 'include_tags': ['h1', 'h2', 'p'], 'exclude_tags': [], 'only_main_content': True}}

Handling tool call: search_reviews

{'movie_name': 'Inception', 'page': 3}

Inception 3 https://www.bing.com/search?q=Inception+reviews&first=20&qs=HS&FORM=QBLH&sp=1&ghc=1&lq=0

Handling tool call: scrape_webpage

{'url': 'https://www.metacritic.com/movie/inception/critic-reviews/', 'scrape_options': {'formats': ['markdown'], 'include_tags': ['h1', 'h2', 'p'], 'exclude_tags': [], 'only_main_content': True}}

Handling tool call: scrape_webpage

{'url': 'https://letterboxd.com/film/inception/', 'scrape_options': {'formats': ['markdown'], 'include_tags': ['h1', 'h2', 'p'], 'exclude_tags': [], 'only_main_content': True}}

Step 10: Display results

Finally, we'll display the results in a clean, readable format using Markdown. Our output separates the AI's reasoning process (chain of thought) from the polished final report, giving insight into both the analysis methodology and the final conclusions.

if analysis is not None:
    if analysis.chain_of_thought is not None:
        display(Markdown(analysis.chain_of_thought))
    else:
        print("**Analysis Chain of Thought is none**")
    if analysis.report is not None:
        display(Markdown(analysis.report))
    else:
        print("**Analysis Report is none**")

else:
    print("**Analysis is none**")

I have gathered a variety of reviews from multiple trusted sources regarding "Inception," directed by Christopher Nolan. The film was released in 2010 and it examines the themes of dreams and reality through a complex narrative. The critics generally praise the film for its innovative storytelling and visual effects, while a few express some concerns regarding its pacing and complexity. Here are the key points extracted from the reviews:

Praise:
- Innovation: Multiple reviewers highlight it as a groundbreaking film that redefined the sci-fi genre, praising its original concept and intricate plot twists.
- Visuals: The visual effects, including mind-bending dream sequences, are frequently lauded for their creativity and execution.
- Soundtrack: Hans Zimmer's score is mentioned repeatedly as elevating the film's emotional and dramatic impact.
- Performances: The cast, particularly Leonardo DiCaprio, is highlighted for delivering powerful performances.
- Thought-Provoking: Many reviews emphasize that the film requires multiple viewings to fully grasp its nuances and stimulates intellectual conversation about its themes.
Criticisms:
- Complex Narrative: Some critics argue that the film's complicated plot may alienate viewers, making it difficult for casual audiences to follow.
- Pacing Issues: There are mentions of the film dragging in certain segments, particularly in the latter half as the intricate plot unfolds.
- Emotional Disconnect: A few reviewers felt that despite its beauty and complexity, the film did not sufficiently connect on an emotional level for everyone.
Overall Sentiment:

The consensus reflects overwhelming positive reception, with a Rotten Tomatoes rating of 87% and audience approval at 91%. Over the years, as more audiences have rewatched it, many reviews indicate a deeper appreciation for its complexity and brilliance. However, there is still some debate about its emotional resonance versus its intellectual offerings. The film remains a cultural touchstone and is often cited in discussions about modern cinema.

Inception Movie Review Consensus

Overview:

Inception is a 2010 sci-fi film directed by Christopher Nolan, exploring the boundaries of dreams and reality through a complex narrative involving dream manipulation. It has garnered critical acclaim and has sparked extensive discussions on its themes and storytelling technique.

Praise:

Innovation:
- Groundbreaking Concept: Many reviewers commend Inception for its original concept that redefined the sci-fi genre.
- Intricacy: Critics emphasize the film's ability to challenge viewers' perceptions, requiring multiple viewings to appreciate fully.
Visuals:
- Stunning Effects: The visual effects are frequently praised for their creativity, with particular scenes noted for their astonishing quality.
- Cinematography and Production Design: Recognized for blending reality and dreams fluidly.
Soundtrack:
- Hans Zimmer's Score: Noted for enhancing the film's emotional depth and tension.
Performance:
- Cast Contributions: Leonardo DiCaprio and supporting actors are highlighted for their strong performances, contributing greatly to the film's atmosphere.

Criticisms:

Complex Narrative:
- Some critics argue that the convoluted plot may be difficult for general audiences to digest, leading to a sense of alienation for viewers unaccustomed to such intricate storytelling.
Pacing Issues:
- Complaints regarding the film's pacing, with the latter part feeling drawn out or convoluted, impacting viewer engagement.
Emotional Disconnect:
- Despite its intellectual appeal, some reviewers felt that Inception lacked a deep emotional connection, leaving certain audiences feeling detached from the characters.

Overall Sentiment:

Inception maintains a strong legacy, evident in its 87% score on Rotten Tomatoes and 91% audience score. Over the years, the sentiment towards the film has shifted to a deeper appreciation for its complexity, although debates about its emotional engagement persist. It is often regarded as a modern masterpiece in cinema, with discussions surrounding it contributing to its enduring popularity.

Conclusion

In this cookbook, we've built a sophisticated Movie Review Researcher that combines the power of web search, content extraction, and AI analysis to generate comprehensive movie review reports. Our agent can:

Search for reviews across multiple sources using strategic pagination
Intelligently select the most relevant review websites to analyze
Extract detailed content from those sources with advanced scraping techniques
Analyze text to identify key themes in both praise and criticism
Synthesize findings into a structured report with clear sections and insights

This approach demonstrates how AI agents can transform raw web content into actionable intelligence - saving hours of manual research while delivering deeper insights than a simple review aggregator would provide.

Next Steps and Extensions

To take this movie review researcher further, consider:

Sentiment Tracking: Enhance the analysis with quantitative sentiment tracking over time
Comparison Features: Extend the system to compare reception across multiple films
Financial Correlation: Correlate critical reception with box office performance
Genre Analysis: Add genre-specific evaluation criteria for different types of films
User Interface: Build a simple web app that lets users query for any movie by name
Result Caching: Implement a database to store previous analyses for faster retrieval

The architecture we've built is highly extensible and can be adapted for various media analysis tasks beyond movies - including TV shows, video games, music albums, or literary works.