Building a Documentation Q&A Agent with Hyperbrowser and o3-mini

In this coookbook, we'll build a powerful documentation Q&A agent that can answer questions about any company's products by automatically scraping their documentation. This approach combines:

  • Hyperbrowser for reading web pages in LLM-friendly Markdown format
  • OpenAI's o3-mini reasoning model for natural language understanding and response generation
  • Tool-calling to create an agent that can browse the web autonomously

By the end of this cookbook, you'll have a reusable agent that can be configured for any company's documentation site!

Prerequisites

Before starting, make sure you have:

  1. A Hyperbrowser API key (sign up at hyperbrowser.ai if you don't have one, it's free)
  2. An OpenAI API key
  3. Python 3.9+ installed

Both API keys should be stored in a .env file in the same directory as this notebook with the following format:

HYPERBROWSER_API_KEY=your_hyperbrowser_key_here
OPENAI_API_KEY=your_openai_key_here

Step 1: Set up imports and load environment variables

import asyncio
import json
import os
from dotenv import load_dotenv
from hyperbrowser import AsyncHyperbrowser
from hyperbrowser.tools import WebsiteScrapeTool
from openai import AsyncOpenAI
from openai.types.chat import (
ChatCompletionMessageParam,
ChatCompletionMessageToolCall,
ChatCompletionToolMessageParam,
)
load_dotenv()

Step 2: Initialize clients

hb = AsyncHyperbrowser(api_key=os.getenv("HYPERBROWSER_API_KEY"))
oai = AsyncOpenAI()

Step 3: Create helper functions for tool handling

Next, we'll define a function to handle tool calls from the LLM. This function will accept a ChatCompletionMessageToolCall object and return a ChatCompletionToolMessageParam object.

This function currently only works with the WebsiteScrapeTool but you can change this code to work with your own custom tools pretty easily.

async def handle_tool_call(
tc: ChatCompletionMessageToolCall,
) -> ChatCompletionToolMessageParam:
print(f"Handling tool call: {tc.function.name}")
try:
if (
tc.function.name != WebsiteScrapeTool.openai_tool_definition["function"]["name"]
):
raise ValueError(f"Tool not found: {tc.function.name}")
args = json.loads(tc.function.arguments)
content = await WebsiteScrapeTool.async_runnable(hb=hb, params=args)
return {"role": "tool", "tool_call_id": tc.id, "content": content}
except Exception as e:
err_msg = f"Error handling tool call: {e}"
print(err_msg)
return {
"role": "tool",
"tool_call_id": tc.id,
"content": err_msg,
"is_error": True,
}

Step 4: Implement the agent loop

Now we'll create the main agent loop that orchestrates the conversation between the user, the LLM, and the tools. This function:

  1. Takes a list of messages (including system prompt and user query)
  2. Sends them to the OpenAI API
  3. Processes any tool calls that the LLM makes
  4. Continues the conversation until the LLM provides a final answer

This is the core of our agent's functionality.

async def agent_loop(messages: list[ChatCompletionMessageParam]) -> str:
while True:
response = await oai.chat.completions.create(
messages=messages,
model="gpt-4o",
tools=[
WebsiteScrapeTool.openai_tool_definition,
],
max_completion_tokens=8000,
)
choice = response.choices[0]
# Append response to messages
messages.append(choice.message)
# Handle tool calls
if choice.finish_reason == "tool_calls":
tool_result_messages = await asyncio.gather(
*[handle_tool_call(tc) for tc in choice.message.tool_calls]
)
messages.extend(tool_result_messages)
elif choice.finish_reason == "stop":
return choice.message.content
else:
raise ValueError(f"Unhandled finish reason: {choice.finish_reason}")

Step 5: Design the system prompt

The system prompt is crucial for guiding the LLM's behavior. Our prompt:

  1. Establishes the LLM as an expert on a specific company's products
  2. Explains the available tools and how to use them
  3. Provides a structured approach for answering questions
  4. Sets guidelines for handling different types of queries

This prompt uses placeholders that will be filled in when we create a specific agent instance.

SYSTEM_PROMPT = """
You are an expert on {company_name}'s products and documentation. You have access to a 'scrape_webpage' tool \
that allows you to read web pages by providing a URL.
This is {company_name}'s documentation site's LLMs.txt URL: {llms_txt_url}.
The llms.txt file contains links to all {company_name}'s product documentation pages.
When answering questions:
1. If the question is about {company_name}'s products, use the 'scrape_webpage' tool to get the contents of \
the llms.txt file
2. If any of the URLs in the llms.txt file are relevant to the question, use the 'scrape_webpage' tool to \
get the contents of the page
3. Provide detailed answers with citations to the specific documentation pages
4. If you can't find the answer in the docs, respond with: "I don't know the answer to that. I couldn't find \
anything relevant to it in the docs, please try contacting the {company_name} team."
5. If the question is unrelated to {company_name}'s products, respond with: "I can only answer questions \
about {company_name}'s products"
Always cite your sources by including the relevant documentation URLs. Respond with your chain of thought \
and the final answer to the user.
""".strip()

Step 6: Create a factory function for generating Q&A agents

Now we'll create a factory function that generates a specialized Q&A agent for any company's documentation. This function:

  1. Takes a company name and documentation URL as input
  2. Formats the system prompt with these values
  3. Returns a function that can answer questions about that company's products

This approach makes our solution reusable for different companies and documentation sites.

def make_support_agent(company_name: str, docs_url: str) -> str:
# Popular documentation providers like Gitbook, Mintlify etc automatically generate a llms.txt file
# for documentation sites hosted on their platforms.
if docs_url.startswith("http://") or docs_url.startswith("https://"):
llms_txt_url = f"{docs_url}/llms.txt"
else:
llms_txt_url = f"https://{docs_url}/llms.txt"
sysprompt = SYSTEM_PROMPT.format(
company_name=company_name,
llms_txt_url=llms_txt_url,
)
async def qna(question: str) -> str:
return await agent_loop([
{"role": "system", "content": sysprompt},
{"role": "user", "content": question},
])
return qna

Step 7: Test the agent with a real question

Let's test our agent by creating an instance for Hyperbrowser's documentation and asking it a question. This will demonstrate the full workflow:

  1. The agent receives a question about CAPTCHAs in Hyperbrowser
  2. It uses the scrape_webpage tool to access the documentation
  3. It processes the information and formulates a detailed answer
  4. It returns the answer with citations to the relevant documentation

You'll see the tool calls being made in real-time as the agent works through the question.

hyperbrowser_qna = make_support_agent("Hyperbrowser", "https://docs.hyperbrowser.ai")
question = "I'm getting blocked by CAPTCHAs when scraping a website with hyperbrowser. How do I fix it? I'm using the python sdk"
answer = await hyperbrowser_qna(question)
print("\n\n", "="*20, "Answer", "="*20, "\n\n")
print(answer)
Handling tool call: scrape_webpage

Handling tool call: scrape_webpage

Handling tool call: scrape_webpage





 ==================== Answer ==================== 





To address the issue of being blocked by CAPTCHAs when using Hyperbrowser's Python SDK, you can leverage Hyperbrowser's CAPTCHA solving feature. Here’s a summary of how to integrate CAPTCHA solving into your scraping tasks:



1. **CAPTCHA Solving Feature**: Hyperbrowser provides an integrated CAPTCHA solver that allows you to scrape websites without being blocked. However, please note that to use this feature, you must be on a paid plan.



2. **Setup for CAPTCHA Solving**: When creating a Hyperbrowser session, enable CAPTCHA solving by setting the `solveCaptchas` parameter to `true`. This can be done via the SDK when creating a new session:



   ```python

   from hyperbrowser import AsyncHyperbrowser

   import os

   from dotenv import load_dotenv

   import asyncio



   load_dotenv()

   client = AsyncHyperbrowser(api_key=os.getenv("HYPERBROWSER_API_KEY"))



   async def main():

       session = await client.sessions.create(solveCaptchas=True)

       # Your scraping code here

       await client.sessions.stop(session.id)



   asyncio.get_event_loop().run_until_complete(main())

   ```



3. **Session Creation**: The session configured with CAPTCHA solving will automatically handle CAPTCHAs encountered during navigation or data extraction processes.



4. **Running the Scraper**: Implement the rest of your code to perform scraping. The CAPTCHA solving process runs in the background, solving CAPTCHAs and continuing page loads without your script being explicitly aware of CAPTCHA handling.



For full details and more code examples, you can refer to the official documentation on [CAPTCHA Solving with Hyperbrowser](https://docs.hyperbrowser.ai/guides/captcha-solving) and the [Python SDK](https://docs.hyperbrowser.ai/reference/sdks/python).



By utilizing these features, you should be able to bypass CAPTCHA challenges and perform your web scraping tasks more smoothly.

Step 8: Try it with your own questions

Now that you've seen how the agent works, you can try asking your own questions about Hyperbrowser or create agents for other companies' documentation. Simply modify the code below with your question or create a new agent for a different company.

# Example: Create an agent for a different company
# langchain_qna = make_support_agent("Anthropic", "https://docs.anthropic.com")
# question = "How do I build a computer use agent?"
# answer = await langchain_qna(question)
# print(answer)

Feel free to experiment with different questions and documentation sites!

Conclusion

In this cookbook, we built a powerful documentation Q&A agent using Hyperbrowser and OpenAI. This agent can:

  1. Autonomously navigate documentation websites
  2. Extract relevant information based on user questions
  3. Provide detailed, cited answers from the documentation
  4. Be easily adapted for different companies and products
  5. Always stay up to date with the latest documentation because it scrapes the llms.txt file on every run

This pattern can be extended to create more sophisticated agents that can interact with multiple websites, use additional tools, or be integrated into larger applications.

Next Steps

To take this further, you might consider:

  • Adding memory to the agent to remember previous questions and answers
  • Implementing caching to improve performance
  • Creating a web interface for easier interaction
  • Adding more tools for different types of web interactions
  • Making swarms of such agents to answer questions about integrating multiple products together

Happy building!