Building an AI Movie Review Researcher with Hyperbrowser and OpenAI
In this cookbook, we'll build a powerful Movie Review Researcher that can analyze critical and audience reception for any film. This agent will:
- Search the web for professional reviews and audience opinions
- Extract and analyze content from the most relevant review sites
- Synthesize findings into a comprehensive report that highlights praise, criticism, and overall reception
- Track sentiment changes over time when reviews span multiple years
By combining Hyperbrowser's web scraping capabilities with OpenAI's language models, we'll create a tool that can extract nuanced insights from critical discourse - saving hours of manual research for film enthusiasts, critics, and industry professionals.
Prerequisites
Before starting, you'll need:
- A Hyperbrowser API key (sign up at hyperbrowser.ai if you don't have one)
- An OpenAI API key for accessing GPT-4o-mini
- Python 3.9+ with asyncio support
Store your API keys in a .env
file in the same directory as this notebook:
HYPERBROWSER_API_KEY=your_hyperbrowser_key_here
OPENAI_API_KEY=your_openai_key_here
Step 1: Set up imports and load environment variables
import osimport asyncioimport jsonfrom urllib.parse import urlencodefrom typing import Optionalfrom dotenv import load_dotenvfrom hyperbrowser import AsyncHyperbrowserfrom hyperbrowser.models.extract import StartExtractJobParamsfrom hyperbrowser.models.session import CreateSessionParamsfrom hyperbrowser.tools import WebsiteScrapeToolfrom openai.types.chat import (ChatCompletionToolParam,ChatCompletionMessageToolCall,ChatCompletionToolMessageParam,ChatCompletionContentPartTextParam,ChatCompletionMessageParam,)from openai import AsyncOpenAIfrom pydantic import BaseModelfrom typing_extensions import TypeVarfrom IPython.display import display, Markdownload_dotenv()
Step 2: Initialize clients
Now we'll create instances of the APIs we'll be using: OpenAI for the language model and Hyperbrowser for web searches and content extraction. The AsyncHyperbrowser client allows us to perform multiple operations concurrently.
oai = AsyncOpenAI()hb = AsyncHyperbrowser(api_key=os.getenv("HYPERBROWSER_API_KEY"))
Step 3: Define the search functionality
Our first core component is the search function that finds review content across the web. This function:
- Constructs a search query for the movie reviews
- Uses Bing's search engine for comprehensive results
- Extracts structured data from the search results page
- Implements pagination support for thorough research
We'll use Pydantic models to enforce type safety and ensure our data is properly structured.
class SearchResult(BaseModel):"""A search result from Bing"""title: strurl: strcontent: strdef __str__(self):return f"Title: {self.title}\nURL: {self.url}\nContent: {self.content}"class SearchResultList(BaseModel):"""A list of search results from Bing"""results: list[SearchResult]total_results_extracted: Optional[int]def __str__(self):return f"\n\n{'-' * 10}\n\n".join(str(result) for result in self.results)async def search_tool(movie_name: str, page: int = 1) -> SearchResultList | None:if page > 3:raise ValueError("Cannot extract results for page number greater than 3.")params = urlencode({"q": f"{movie_name} reviews","first": max(page - 1, 0) * 10,"qs": "HS","FORM": "QBLH","sp": 1,"ghc": 1,"lq": 0,})url = f"https://www.bing.com/search?{params}"print(movie_name, page, url)result = await hb.extract.start_and_wait(StartExtractJobParams(urls=[url],prompt="Extract the title, url, and content of the top 10 search results on this page.",schema=SearchResultList,session_options=CreateSessionParams(use_proxy=True,adblock=True,trackers=True,annoyances=True,solve_captchas=True,),))if not (result.status == "completed" and result.data):print(result)raise Exception("Failed to extract search results")return SearchResultList.model_validate({**result.data})
Step 4: Implement content scraping with rate limiting
Now we'll build the content scraping function with built-in rate limiting. This approach:
- Tracks how many pages have been scraped to prevent overwhelming servers
- Implements advanced scraping options for clean content extraction
- Formats the extracted content for easy processing by the language model
Of note here is that the LLMs can often mess up with counters. Instead, what might be suitable is an explicit counter inside the tool use call itself to limit the resource usage.
This controlled approach ensures we can gather substantial data while respecting website resources.
total_pages_scraped = 0# Create a lock for safely updating the counter in async functionspages_scraped_lock = asyncio.Lock()async def increment_pages_scraped(amount: int = 1):"""Safely increment the total_pages_scraped counter in async context"""global total_pages_scrapedasync with pages_scraped_lock:total_pages_scraped += amountreturn total_pages_scrapedasync def get_pages_scraped():global total_pages_scrapedasync with pages_scraped_lock:return total_pages_scraped
Step 5: Define tool interfaces for the language model
For our agent to work, we need to define the tools it can use in a format that OpenAI's function calling system understands. We'll create two tools:
- search_reviews: Finds movie reviews using Bing search
- scrape_webpage: Extracts detailed content from review websites
These tool definitions include parameter schemas that ensure proper validation of inputs when the AI decides to use them.
MOVIE_SEARCH_TOOL: ChatCompletionToolParam = {"type": "function","function": {"name": "search_reviews","description": "Search for information about a movies reviews using Bing.","parameters": {"type": "object","properties": {"movie_name": {"type": "string","description": "The name of the movie to search for",},"page": {"type": "integer","description": "The page number of search results to retrieve",},},"required": ["movie_name", "page"],"additionalProperties": False,},"strict": True,},}
Step 6: Create a tool handler for function calling
The tool handler processes function calls from the language model and returns the results. This component:
- Routes each tool call to the appropriate function
- Handles error conditions gracefully
- Formats the output according to OpenAI's tool message specifications
- Manages different content types for search results vs. scraped content
This pattern bridges the gap between the language model's reasoning capabilities and our concrete web interaction functions.
async def handle_tool_call(tc: ChatCompletionMessageToolCall,) -> ChatCompletionToolMessageParam:print(f"Handling tool call: {tc.function.name}")try:if (tc.function.name== WebsiteScrapeTool.openai_tool_definition["function"]["name"]):args = json.loads(tc.function.arguments)print(args)content_raw = await WebsiteScrapeTool.async_runnable(hb, args)content = f"<website>\n<url>{args["url"]}</url>\n<content>\n{content_raw}\n</content>\n</website>"return ChatCompletionToolMessageParam({"role": "tool", "tool_call_id": tc.id, "content": content})elif tc.function.name == MOVIE_SEARCH_TOOL["function"]["name"]:args = json.loads(tc.function.arguments)print(args)content = await search_tool(**args)if content is None or content.results is None:raise ValueError("Response from search is none")else:return ChatCompletionToolMessageParam({"role": "tool","tool_call_id": tc.id,"content": [ChatCompletionContentPartTextParam(text=str(search_result), type="text")for search_result in content.results],})else:raise Exception(f"No tool call handler for {tc.function.name}")except Exception as e:err_msg = f"Error handling tool call: {e}"print(err_msg)return ChatCompletionToolMessageParam({"role": "tool","tool_call_id": tc.id,"content": err_msg,"is_error": True, # type: ignore})
Step 7: Implement the agent loop
The agent loop orchestrates the interaction between the language model and our tools. This function:
- Sends the current conversation and tools to the OpenAI API
- Processes tool calls when the model decides to use them
- Handles structured output using Pydantic models
- Continues the loop until the model provides a final answer
This recursive approach allows for complex multi-step reasoning, where the model can search for general reviews, identify trusted sources, and then dive deeper into specific analyses.
ResponseFormatT = TypeVar("ResponseFormatT",# if it isn't given then we don't do any parsingdefault=None,)async def agent_loop(messages: list[ChatCompletionMessageParam], response_format: type[ResponseFormatT]):while True:response = await oai.beta.chat.completions.parse(messages=messages,model="gpt-4o-mini",tools=[MOVIE_SEARCH_TOOL, WebsiteScrapeTool.openai_tool_definition],tool_choice="auto",response_format=response_format,)choice = response.choices[0]# Append response to messagesmessages.append(choice.message) # type: ignore# Handle tool callsif (choice.finish_reason == "tool_calls"and choice.message.tool_calls is not None):tool_result_messages = await asyncio.gather(*[handle_tool_call(tc) for tc in choice.message.tool_calls])messages.extend(tool_result_messages)elif choice.finish_reason == "stop" and choice.message.parsed is not None:return choice.message.parsedelse:print(choice)raise ValueError(f"Unhandled finish reason: {choice.finish_reason}")
Step 8: Define data model and system prompt
Now we'll create the structures for our final output and craft the system prompt that guides the AI's analysis. The system prompt is crucial - it instructs the model on:
- The depth and breadth of analysis expected
- How to identify and analyze criticisms and praise
- When and how to use each available tool
- The format for presenting findings
The ResearchAnalysis model enforces a clear separation between the model's reasoning process and the final polished report.
class ResearchAnalysis(BaseModel):chain_of_thought: strreport: strdef __str__(self):return f"Chain of Thought:\n{self.chain_of_thought}\n\nReport:\n{self.report}"FINAL_REPORT_SYSTEM_PROMPT = """You are an expert movie assistant. You are working for a movie production company and your job is to compile information about a certain movies.For this, you will be provided with the movie name. In addition, you will be provided with a certain set of tools to accomplish your job. These tools are- A web search tool- The tool takes in the movie name, and optionally the search page.- The tool searches bing and returns to you the url, title, and some basic content about the individual search result itself- A batch scrape tool- The tool takes in a list of urls- The tool returns the url, and the markdown content from the website.Do not scrape more than 3 pages of results. Also deduplicate the search results once you get them. Make sure that when you're scraping results, you don't go over 10 pages scraped.Your job is to compile any criticisms, and any praise that the reviews had for the movie. Make sure that the reviews you are getting are for that particular movie and not for a sequel. Also compile the overall opinion people had about the movie. Make clear note of the date of the review. Use this to also inform me if the sentiment about the movie has changed over time.Additionally, you should also maintain a scratchpad for your own notes and thoughts about the company as you draft the report.You must respond with both your chain of thought and the final report.""".strip()async def research_movie(movie_name: str) -> Optional[ResearchAnalysis]:analysis = await agent_loop([{"role": "system","content": FINAL_REPORT_SYSTEM_PROMPT,},{"role": "user","content": (f"Get me the consensus for the move {movie_name}"),},],response_format=ResearchAnalysis,)return analysis
Step 9: Test the researcher with a real movie
Finally, let's put our researcher to work by analyzing reviews for Christopher Nolan's "Inception." This test will show the full workflow:
- The agent searches for Inception reviews across multiple pages
- It identifies and scrapes the most relevant review sites
- It analyzes the content, extracting key praise and criticisms
- It synthesizes a comprehensive report that captures critical consensus
You'll see real-time logging of the search and scraping operations as they happen.
analysis = await research_movie("Inception")
Handling tool call: search_reviews {'movie_name': 'Inception', 'page': 1} Inception 1 https://www.bing.com/search?q=Inception+reviews&first=0&qs=HS&FORM=QBLH&sp=1&ghc=1&lq=0 Handling tool call: search_reviews {'movie_name': 'Inception', 'page': 2} Inception 2 https://www.bing.com/search?q=Inception+reviews&first=10&qs=HS&FORM=QBLH&sp=1&ghc=1&lq=0 Handling tool call: scrape_webpage {'url': 'https://www.example.com/inception-review-1', 'scrape_options': {'formats': ['markdown'], 'include_tags': ['h1', 'h2', 'p'], 'exclude_tags': [], 'only_main_content': True}} Handling tool call: scrape_webpage {'url': 'https://www.example.com/inception-review-2', 'scrape_options': {'formats': ['markdown'], 'include_tags': ['h1', 'h2', 'p'], 'exclude_tags': [], 'only_main_content': True}} Handling tool call: scrape_webpage {'url': 'https://www.example.com/inception-review-3', 'scrape_options': {'formats': ['markdown'], 'include_tags': ['h1', 'h2', 'p'], 'exclude_tags': [], 'only_main_content': True}} Handling tool call: scrape_webpage {'url': 'https://www.example.com/inception-review-4', 'scrape_options': {'formats': ['markdown'], 'include_tags': ['h1', 'h2', 'p'], 'exclude_tags': [], 'only_main_content': True}} Handling tool call: scrape_webpage {'url': 'https://www.example.com/inception-review-5', 'scrape_options': {'formats': ['markdown'], 'include_tags': ['h1', 'h2', 'p'], 'exclude_tags': [], 'only_main_content': True}} Handling tool call: scrape_webpage {'url': 'https://www.ign.com/articles/2010/07/06/inception-review', 'scrape_options': {'formats': ['markdown'], 'include_tags': ['h1', 'h2', 'p'], 'exclude_tags': [], 'only_main_content': True}} Handling tool call: scrape_webpage {'url': 'https://www.deepfocusreview.com/reviews/inception', 'scrape_options': {'formats': ['markdown'], 'include_tags': ['h1', 'h2', 'p'], 'exclude_tags': [], 'only_main_content': True}} Handling tool call: scrape_webpage {'url': 'https://www.theartsdesk.com/film/inception', 'scrape_options': {'formats': ['markdown'], 'include_tags': ['h1', 'h2', 'p'], 'exclude_tags': [], 'only_main_content': True}} Handling tool call: search_reviews {'movie_name': 'Inception', 'page': 3} Inception 3 https://www.bing.com/search?q=Inception+reviews&first=20&qs=HS&FORM=QBLH&sp=1&ghc=1&lq=0 Handling tool call: scrape_webpage {'url': 'https://www.metacritic.com/movie/inception/critic-reviews/', 'scrape_options': {'formats': ['markdown'], 'include_tags': ['h1', 'h2', 'p'], 'exclude_tags': [], 'only_main_content': True}} Handling tool call: scrape_webpage {'url': 'https://letterboxd.com/film/inception/', 'scrape_options': {'formats': ['markdown'], 'include_tags': ['h1', 'h2', 'p'], 'exclude_tags': [], 'only_main_content': True}}
Step 10: Display results
Finally, we'll display the results in a clean, readable format using Markdown. Our output separates the AI's reasoning process (chain of thought) from the polished final report, giving insight into both the analysis methodology and the final conclusions.
if analysis is not None:if analysis.chain_of_thought is not None:display(Markdown(analysis.chain_of_thought))else:print("**Analysis Chain of Thought is none**")if analysis.report is not None:display(Markdown(analysis.report))else:print("**Analysis Report is none**")else:print("**Analysis is none**")
I have gathered a variety of reviews from multiple trusted sources regarding "Inception," directed by Christopher Nolan. The film was released in 2010 and it examines the themes of dreams and reality through a complex narrative. The critics generally praise the film for its innovative storytelling and visual effects, while a few express some concerns regarding its pacing and complexity. Here are the key points extracted from the reviews:
-
Praise:
-
Innovation: Multiple reviewers highlight it as a groundbreaking film that redefined the sci-fi genre, praising its original concept and intricate plot twists.
-
Visuals: The visual effects, including mind-bending dream sequences, are frequently lauded for their creativity and execution.
-
Soundtrack: Hans Zimmer's score is mentioned repeatedly as elevating the film's emotional and dramatic impact.
-
Performances: The cast, particularly Leonardo DiCaprio, is highlighted for delivering powerful performances.
-
Thought-Provoking: Many reviews emphasize that the film requires multiple viewings to fully grasp its nuances and stimulates intellectual conversation about its themes.
-
-
Criticisms:
-
Complex Narrative: Some critics argue that the film's complicated plot may alienate viewers, making it difficult for casual audiences to follow.
-
Pacing Issues: There are mentions of the film dragging in certain segments, particularly in the latter half as the intricate plot unfolds.
-
Emotional Disconnect: A few reviewers felt that despite its beauty and complexity, the film did not sufficiently connect on an emotional level for everyone.
-
-
Overall Sentiment:
The consensus reflects overwhelming positive reception, with a Rotten Tomatoes rating of 87% and audience approval at 91%. Over the years, as more audiences have rewatched it, many reviews indicate a deeper appreciation for its complexity and brilliance. However, there is still some debate about its emotional resonance versus its intellectual offerings. The film remains a cultural touchstone and is often cited in discussions about modern cinema.
Inception Movie Review Consensus
Overview:
Inception is a 2010 sci-fi film directed by Christopher Nolan, exploring the boundaries of dreams and reality through a complex narrative involving dream manipulation. It has garnered critical acclaim and has sparked extensive discussions on its themes and storytelling technique.
Praise:
-
Innovation:
-
Groundbreaking Concept: Many reviewers commend Inception for its original concept that redefined the sci-fi genre.
-
Intricacy: Critics emphasize the film's ability to challenge viewers' perceptions, requiring multiple viewings to appreciate fully.
-
-
Visuals:
-
Stunning Effects: The visual effects are frequently praised for their creativity, with particular scenes noted for their astonishing quality.
-
Cinematography and Production Design: Recognized for blending reality and dreams fluidly.
-
-
Soundtrack:
- Hans Zimmer's Score: Noted for enhancing the film's emotional depth and tension.
-
Performance:
- Cast Contributions: Leonardo DiCaprio and supporting actors are highlighted for their strong performances, contributing greatly to the film's atmosphere.
Criticisms:
-
Complex Narrative:
- Some critics argue that the convoluted plot may be difficult for general audiences to digest, leading to a sense of alienation for viewers unaccustomed to such intricate storytelling.
-
Pacing Issues:
- Complaints regarding the film's pacing, with the latter part feeling drawn out or convoluted, impacting viewer engagement.
-
Emotional Disconnect:
- Despite its intellectual appeal, some reviewers felt that Inception lacked a deep emotional connection, leaving certain audiences feeling detached from the characters.
Overall Sentiment:
- Inception maintains a strong legacy, evident in its 87% score on Rotten Tomatoes and 91% audience score. Over the years, the sentiment towards the film has shifted to a deeper appreciation for its complexity, although debates about its emotional engagement persist. It is often regarded as a modern masterpiece in cinema, with discussions surrounding it contributing to its enduring popularity.
Conclusion
In this cookbook, we've built a sophisticated Movie Review Researcher that combines the power of web search, content extraction, and AI analysis to generate comprehensive movie review reports. Our agent can:
- Search for reviews across multiple sources using strategic pagination
- Intelligently select the most relevant review websites to analyze
- Extract detailed content from those sources with advanced scraping techniques
- Analyze text to identify key themes in both praise and criticism
- Synthesize findings into a structured report with clear sections and insights
This approach demonstrates how AI agents can transform raw web content into actionable intelligence - saving hours of manual research while delivering deeper insights than a simple review aggregator would provide.
Next Steps and Extensions
To take this movie review researcher further, consider:
- Sentiment Tracking: Enhance the analysis with quantitative sentiment tracking over time
- Comparison Features: Extend the system to compare reception across multiple films
- Financial Correlation: Correlate critical reception with box office performance
- Genre Analysis: Add genre-specific evaluation criteria for different types of films
- User Interface: Build a simple web app that lets users query for any movie by name
- Result Caching: Implement a database to store previous analyses for faster retrieval
The architecture we've built is highly extensible and can be adapted for various media analysis tasks beyond movies - including TV shows, video games, music albums, or literary works.