Building a YouTube Video Chat AI with Hyperbrowser and OpenAI

In this cookbook, we'll build a powerful YouTube Video Analyst that can automatically extract transcripts from any YouTube video and allow you to have interactive conversations about the content. This approach combines:

Hyperbrowser for accessing YouTube and extracting transcripts in a reliable way
Playwright for browser automation and interaction with YouTube's interface
OpenAI's language models for understanding and responding to questions about the video content

By the end of this cookbook, you'll have a reusable tool that can help you extract insights from any YouTube video without having to watch it entirely!

Prerequisites

Before starting, make sure you have:

A Hyperbrowser API key (sign up at hyperbrowser.ai if you don't have one, it's free)
An OpenAI API key
Python 3.9+ installed

Both API keys should be stored in a .env file in the same directory as this notebook with the following format:

HYPERBROWSER_API_KEY=your_hyperbrowser_key_here
OPENAI_API_KEY=your_openai_key_here

Step 1: Set up imports and initialize clients

First, we'll import the necessary libraries and initialize our clients. We're using Hyperbrowser to access YouTube, Playwright for browser automation, and OpenAI for natural language processing.

import os
import asyncio
from IPython.display import display, Markdown
from playwright.async_api import async_playwright, Page
from openai import AsyncOpenAI
from openai.types.chat import ChatCompletionMessageParam
from dotenv import load_dotenv

# Load environment variables
load_dotenv()

from hyperbrowser.client.async_client import AsyncHyperbrowser
from hyperbrowser.models.session import CreateSessionParams
from openai.types.chat.chat_completion import ChatCompletion

# Initialize OpenAI client
client = AsyncOpenAI(api_key=os.getenv("OPENAI_API_KEY"))

Step 2: Create transcript extraction functions

Here we'll define functions to extract and format the transcript from a YouTube video. The process involves:

Navigating to the YouTube video page
Finding and clicking the "Show transcript" button
Extracting the transcript text from the page
Formatting the transcript for easy reading and analysis

async def get_youtube_transcript(page: Page, url: str):
    """Get the transcript of a YouTube video directly from the page using Playwright."""
    try:
        # Brief delay to ensure the video is loaded
        await asyncio.sleep(0.5)
        # Pause the video
        await page.keyboard.press("k")

        # Wait for the video player to load
        description_selector = await page.wait_for_selector(
            "div#description", state="visible"
        )
        if description_selector is None:
            return None
        await description_selector.click()

        # Click the "Show transcript" button
        transcript_show_more_selector = await page.wait_for_selector(
            'button[aria-label="Show transcript"]', state="visible"
        )
        if transcript_show_more_selector is None:
            return None
        await transcript_show_more_selector.click()

        # Wait for the transcript to be visible
        transcript_display_selector = await page.wait_for_selector(
            "ytd-transcript-segment-list-renderer", state="visible"
        )
        if transcript_display_selector is None:
            return None

        # Extract the transcript segments
        transcript_segments: list[dict] = await transcript_display_selector.evaluate(
            """()=>(
                    [...document
                        .querySelector("ytd-transcript-segment-list-renderer")
                        .querySelector("div#segments-container")
                        .querySelectorAll("ytd-transcript-segment-renderer")
                    ].map(e=>({
                        text: e.querySelector("yt-formatted-string").innerText
                        })
                    )
                )"""
        )

        # Return the transcript data
        if transcript_segments and isinstance(transcript_segments, list):
            return transcript_segments
        else:
            print("Failed to extract transcript segments")
            return None

    except Exception as e:
        print(f"Error getting transcript: {str(e)}")
        return None


def format_transcript(transcript_segments):
    """Format transcript segments into a readable string."""
    if not transcript_segments:
        return ""

    formatted_text = ""
    for segment in transcript_segments:
        formatted_text += f"{segment['text']} "

    return formatted_text.strip()

async def get_transcript(url: str):
    async with async_playwright() as playwright:
        client = AsyncHyperbrowser(
            api_key=os.getenv("HYPERBROWSER_API_KEY"),
        )
        session = await client.sessions.create(CreateSessionParams(use_proxy=True))
        browser_url = session.ws_endpoint
        if browser_url is None:
            raise Exception("Browser URL not found")

        browser = await playwright.chromium.connect_over_cdp(browser_url)
        context = await browser.new_context()
        page = await context.new_page()

        await page.goto(url)

        # Get video transcript directly from YouTube page
        transcript_segments = await get_youtube_transcript(page, url)
        return transcript_segments

Step 3: Implement the chat functionality

Now we'll create a function that allows us to chat with the video transcript. This function:

Takes the transcript segments, a user prompt, and optional chat history
Formats a message for the OpenAI API with system instructions
Sends the message to OpenAI and gets a response
Maintains conversation context for follow-up questions

async def chat_with_transcript(
    transcript_segments, prompt, chat_history=None,
):
    """Use OpenAI to chat with the transcript content."""
    try:
        # Format the transcript for the prompt
        formatted_transcript = format_transcript(transcript_segments)

        # Start with system message
        messages: list[ChatCompletionMessageParam] = [
            {
                "role": "system",
                "content": f"You are an AI assistant that helps users understand the content of a YouTube video. Here is the transcript of the video:\n\n{formatted_transcript}\n\nAnswer questions based only on the content of this transcript. If you don't know the answer, preface your response by saying that you are inferring the answer based on your training data, and not the transcript.",
            },
        ]

        # Add chat history to provide context
        if chat_history:
            for prev_query, prev_response in chat_history:
                messages.append({"role": "user", "content": prev_query})
                messages.append({"role": "assistant", "content": prev_response})

        # Add current user query
        messages.append({"role": "user", "content": prompt})

        # Call OpenAI API
        response: ChatCompletion = await client.chat.completions.create(
            model="gpt-4o-mini", messages=messages, temperature=0.7, max_tokens=500
        )

        return response.choices[0].message.content
    except Exception as e:
        return f"Error: {str(e)}"

Step 4: Extract the transcript from the YouTube video

Let's run our transcript extraction function on the YouTube video URL we specified earlier. This will open a browser session through Hyperbrowser, navigate to the video, and extract the transcript.

# Set your YouTube URL here
youtube_url = "https://www.youtube.com/watch?v=dQw4w9WgXcQ"  # Replace with your video URL

transcript_segments = await get_transcript(youtube_url)

Step 5: Ask questions about the video content

Now that we have the transcript, let's ask some questions about the video! We'll start by initializing a chat history to keep track of our conversation, then ask our first question.

# Initialize chat history
chat_history = []

# Ask your first question here
question = "What is the main topic of this video?"

display(Markdown(f"**Question:** {question}"))

if transcript_segments:
    response = await chat_with_transcript(transcript_segments, question, chat_history)
    display(Markdown(f"**Answer:** {response}"))

    # Add to chat history
    chat_history.append((question, response))
else:
    display(
        Markdown(
            "Cannot answer questions without a transcript. Please check the YouTube URL."
        )
    )

Question: What is the main topic of this video? Answer: The main topic of the video is the song "Never Gonna Give You Up" by Rick Astley. The lyrics express themes of commitment, loyalty, and reassurance in a romantic relationship.

Step 6: Ask follow-up questions

Now let's ask a follow-up question to demonstrate how the conversation history helps provide context for subsequent answers.

# Ask another question (copy this cell for more questions)
question = "What are the key points mentioned?"

display(Markdown(f"**Question:** {question}"))

if transcript_segments:
    response = await chat_with_transcript(transcript_segments, question, chat_history)
    display(Markdown(f"**Answer:** {response}"))

    # Add to chat history
    chat_history.append((question, response))
else:
    display(
        Markdown(
            "Cannot answer questions without a transcript. Please check the YouTube URL."
        )
    )

Question: What are the key points mentioned? Answer: The key points mentioned in the transcript include:

Commitment and Loyalty: The singer emphasizes that he will never give up, let down, or desert the person he is addressing.
Emotional Connection: There is a recognition of a deep emotional bond, as both individuals have known each other for a long time and understand each other's feelings.
Reassurance: The singer wants to convey how he feels and assures the other person that he will not hurt them or say goodbye.
Acknowledgment of Feelings: The lyrics suggest that both individuals are aware of their feelings but may be hesitant to express them.

Overall, the song conveys a message of unwavering support and love.

Conclusion

Congratulations! You've built a YouTube Video AI Analyst that can:

Automatically extract transcripts from any YouTube video
Answer questions about the video content
Maintain conversation context for natural follow-up questions
Save you time by avoiding watching entire videos when you only need specific information

This powerful tool combines the capabilities of Hyperbrowser for reliable browser automation, Playwright for YouTube interaction, and OpenAI for intelligent question answering.

Next Steps

To take this project further, you might consider:

Adding support for video timestamp extraction to only extract specific parts of the video
Implementing automatic video summarization to get a quick overview of longer videos
Creating a user interface for easier interaction
Extracting and analyzing comments to understand viewer reactions, and correlating them to timestamps.

Happy analyzing! 🎬

Relevant Links

Hyperbrowser
Playwright Documentation
OpenAI API Documentation
YouTube Data API (an alternative approach)