Building a YouTube Video Chat AI with Hyperbrowser and OpenAI
In this cookbook, we'll build a powerful YouTube Video Analyst that can automatically extract transcripts from any YouTube video and allow you to have interactive conversations about the content. This approach combines:
- Hyperbrowser for accessing YouTube and extracting transcripts in a reliable way
- Playwright for browser automation and interaction with YouTube's interface
- OpenAI's language models for understanding and responding to questions about the video content
By the end of this cookbook, you'll have a reusable tool that can help you extract insights from any YouTube video without having to watch it entirely!
Prerequisites
Before starting, make sure you have:
- A Hyperbrowser API key (sign up at hyperbrowser.ai if you don't have one, it's free)
- An OpenAI API key
- Python 3.9+ installed
Both API keys should be stored in a .env
file in the same directory as this notebook with the following format:
HYPERBROWSER_API_KEY=your_hyperbrowser_key_here
OPENAI_API_KEY=your_openai_key_here
Step 1: Set up imports and initialize clients
First, we'll import the necessary libraries and initialize our clients. We're using Hyperbrowser to access YouTube, Playwright for browser automation, and OpenAI for natural language processing.
import osimport asynciofrom IPython.display import display, Markdownfrom playwright.async_api import async_playwright, Pagefrom openai import AsyncOpenAIfrom openai.types.chat import ChatCompletionMessageParamfrom dotenv import load_dotenv# Load environment variablesload_dotenv()from hyperbrowser.client.async_client import AsyncHyperbrowserfrom hyperbrowser.models.session import CreateSessionParamsfrom openai.types.chat.chat_completion import ChatCompletion# Initialize OpenAI clientclient = AsyncOpenAI(api_key=os.getenv("OPENAI_API_KEY"))
Step 2: Create transcript extraction functions
Here we'll define functions to extract and format the transcript from a YouTube video. The process involves:
- Navigating to the YouTube video page
- Finding and clicking the "Show transcript" button
- Extracting the transcript text from the page
- Formatting the transcript for easy reading and analysis
async def get_youtube_transcript(page: Page, url: str):"""Get the transcript of a YouTube video directly from the page using Playwright."""try:# Brief delay to ensure the video is loadedawait asyncio.sleep(0.5)# Pause the videoawait page.keyboard.press("k")# Wait for the video player to loaddescription_selector = await page.wait_for_selector("div#description", state="visible")if description_selector is None:return Noneawait description_selector.click()# Click the "Show transcript" buttontranscript_show_more_selector = await page.wait_for_selector('button[aria-label="Show transcript"]', state="visible")if transcript_show_more_selector is None:return Noneawait transcript_show_more_selector.click()# Wait for the transcript to be visibletranscript_display_selector = await page.wait_for_selector("ytd-transcript-segment-list-renderer", state="visible")if transcript_display_selector is None:return None# Extract the transcript segmentstranscript_segments: list[dict] = await transcript_display_selector.evaluate("""()=>([...document.querySelector("ytd-transcript-segment-list-renderer").querySelector("div#segments-container").querySelectorAll("ytd-transcript-segment-renderer")].map(e=>({text: e.querySelector("yt-formatted-string").innerText})))""")# Return the transcript dataif transcript_segments and isinstance(transcript_segments, list):return transcript_segmentselse:print("Failed to extract transcript segments")return Noneexcept Exception as e:print(f"Error getting transcript: {str(e)}")return Nonedef format_transcript(transcript_segments):"""Format transcript segments into a readable string."""if not transcript_segments:return ""formatted_text = ""for segment in transcript_segments:formatted_text += f"{segment['text']} "return formatted_text.strip()async def get_transcript(url: str):async with async_playwright() as playwright:client = AsyncHyperbrowser(api_key=os.getenv("HYPERBROWSER_API_KEY"),)session = await client.sessions.create(CreateSessionParams(use_proxy=True))browser_url = session.ws_endpointif browser_url is None:raise Exception("Browser URL not found")browser = await playwright.chromium.connect_over_cdp(browser_url)context = await browser.new_context()page = await context.new_page()await page.goto(url)# Get video transcript directly from YouTube pagetranscript_segments = await get_youtube_transcript(page, url)return transcript_segments
Step 3: Implement the chat functionality
Now we'll create a function that allows us to chat with the video transcript. This function:
- Takes the transcript segments, a user prompt, and optional chat history
- Formats a message for the OpenAI API with system instructions
- Sends the message to OpenAI and gets a response
- Maintains conversation context for follow-up questions
async def chat_with_transcript(transcript_segments, prompt, chat_history=None,):"""Use OpenAI to chat with the transcript content."""try:# Format the transcript for the promptformatted_transcript = format_transcript(transcript_segments)# Start with system messagemessages: list[ChatCompletionMessageParam] = [{"role": "system","content": f"You are an AI assistant that helps users understand the content of a YouTube video. Here is the transcript of the video:\n\n{formatted_transcript}\n\nAnswer questions based only on the content of this transcript. If you don't know the answer, preface your response by saying that you are inferring the answer based on your training data, and not the transcript.",},]# Add chat history to provide contextif chat_history:for prev_query, prev_response in chat_history:messages.append({"role": "user", "content": prev_query})messages.append({"role": "assistant", "content": prev_response})# Add current user querymessages.append({"role": "user", "content": prompt})# Call OpenAI APIresponse: ChatCompletion = await client.chat.completions.create(model="gpt-4o-mini", messages=messages, temperature=0.7, max_tokens=500)return response.choices[0].message.contentexcept Exception as e:return f"Error: {str(e)}"
Step 4: Extract the transcript from the YouTube video
Let's run our transcript extraction function on the YouTube video URL we specified earlier. This will open a browser session through Hyperbrowser, navigate to the video, and extract the transcript.
# Set your YouTube URL hereyoutube_url = "https://www.youtube.com/watch?v=dQw4w9WgXcQ" # Replace with your video URLtranscript_segments = await get_transcript(youtube_url)
Step 5: Ask questions about the video content
Now that we have the transcript, let's ask some questions about the video! We'll start by initializing a chat history to keep track of our conversation, then ask our first question.
# Initialize chat historychat_history = []# Ask your first question herequestion = "What is the main topic of this video?"display(Markdown(f"**Question:** {question}"))if transcript_segments:response = await chat_with_transcript(transcript_segments, question, chat_history)display(Markdown(f"**Answer:** {response}"))# Add to chat historychat_history.append((question, response))else:display(Markdown("Cannot answer questions without a transcript. Please check the YouTube URL."))
Question: What is the main topic of this video? Answer: The main topic of the video is the song "Never Gonna Give You Up" by Rick Astley. The lyrics express themes of commitment, loyalty, and reassurance in a romantic relationship.
Step 6: Ask follow-up questions
Now let's ask a follow-up question to demonstrate how the conversation history helps provide context for subsequent answers.
# Ask another question (copy this cell for more questions)question = "What are the key points mentioned?"display(Markdown(f"**Question:** {question}"))if transcript_segments:response = await chat_with_transcript(transcript_segments, question, chat_history)display(Markdown(f"**Answer:** {response}"))# Add to chat historychat_history.append((question, response))else:display(Markdown("Cannot answer questions without a transcript. Please check the YouTube URL."))
Question: What are the key points mentioned? Answer: The key points mentioned in the transcript include:
-
Commitment and Loyalty: The singer emphasizes that he will never give up, let down, or desert the person he is addressing.
-
Emotional Connection: There is a recognition of a deep emotional bond, as both individuals have known each other for a long time and understand each other's feelings.
-
Reassurance: The singer wants to convey how he feels and assures the other person that he will not hurt them or say goodbye.
-
Acknowledgment of Feelings: The lyrics suggest that both individuals are aware of their feelings but may be hesitant to express them.
Overall, the song conveys a message of unwavering support and love.
Conclusion
Congratulations! You've built a YouTube Video AI Analyst that can:
- Automatically extract transcripts from any YouTube video
- Answer questions about the video content
- Maintain conversation context for natural follow-up questions
- Save you time by avoiding watching entire videos when you only need specific information
This powerful tool combines the capabilities of Hyperbrowser for reliable browser automation, Playwright for YouTube interaction, and OpenAI for intelligent question answering.
Next Steps
To take this project further, you might consider:
- Adding support for video timestamp extraction to only extract specific parts of the video
- Implementing automatic video summarization to get a quick overview of longer videos
- Creating a user interface for easier interaction
- Extracting and analyzing comments to understand viewer reactions, and correlating them to timestamps.
Happy analyzing! 🎬
Relevant Links
- Hyperbrowser
- Playwright Documentation
- OpenAI API Documentation
- YouTube Data API (an alternative approach)