Building a Documentation Q&A Agent with Hyperbrowser and o3-mini
In this coookbook, we'll build a powerful documentation Q&A agent that can answer questions about any company's products by automatically scraping their documentation. This approach combines:
- Hyperbrowser for reading web pages in LLM-friendly Markdown format
- OpenAI's o3-mini reasoning model for natural language understanding and response generation
- Tool-calling to create an agent that can browse the web autonomously
By the end of this cookbook, you'll have a reusable agent that can be configured for any company's documentation site!
Prerequisites
Before starting, make sure you have:
- A Hyperbrowser API key (sign up at hyperbrowser.ai if you don't have one, it's free)
- An OpenAI API key
- Python 3.9+ installed
Both API keys should be stored in a .env
file in the same directory as this notebook with the following format:
HYPERBROWSER_API_KEY=your_hyperbrowser_key_hereOPENAI_API_KEY=your_openai_key_here
Step 1: Set up imports and load environment variables
import asyncioimport jsonimport osfrom dotenv import load_dotenvfrom hyperbrowser import AsyncHyperbrowserfrom hyperbrowser.tools import WebsiteScrapeToolfrom openai import AsyncOpenAIfrom openai.types.chat import (ChatCompletionMessageParam,ChatCompletionMessageToolCall,ChatCompletionToolMessageParam,)load_dotenv()
Step 2: Initialize clients
hb = AsyncHyperbrowser(api_key=os.getenv("HYPERBROWSER_API_KEY"))oai = AsyncOpenAI()
Step 3: Create helper functions for tool handling
Next, we'll define a function to handle tool calls from the LLM. This function will accept a ChatCompletionMessageToolCall
object and return a ChatCompletionToolMessageParam
object.
This function currently only works with the WebsiteScrapeTool
but you can change this code to work with your own custom tools pretty easily.
async def handle_tool_call(tc: ChatCompletionMessageToolCall,) -> ChatCompletionToolMessageParam:print(f"Handling tool call: {tc.function.name}")try:if (tc.function.name != WebsiteScrapeTool.openai_tool_definition["function"]["name"]):raise ValueError(f"Tool not found: {tc.function.name}")args = json.loads(tc.function.arguments)content = await WebsiteScrapeTool.async_runnable(hb=hb, params=args)return {"role": "tool", "tool_call_id": tc.id, "content": content}except Exception as e:err_msg = f"Error handling tool call: {e}"print(err_msg)return {"role": "tool","tool_call_id": tc.id,"content": err_msg,"is_error": True,}
Step 4: Implement the agent loop
Now we'll create the main agent loop that orchestrates the conversation between the user, the LLM, and the tools. This function:
- Takes a list of messages (including system prompt and user query)
- Sends them to the OpenAI API
- Processes any tool calls that the LLM makes
- Continues the conversation until the LLM provides a final answer
This is the core of our agent's functionality.
async def agent_loop(messages: list[ChatCompletionMessageParam]) -> str:while True:response = await oai.chat.completions.create(messages=messages,model="gpt-4o",tools=[WebsiteScrapeTool.openai_tool_definition,],max_completion_tokens=8000,)choice = response.choices[0]# Append response to messagesmessages.append(choice.message)# Handle tool callsif choice.finish_reason == "tool_calls":tool_result_messages = await asyncio.gather(*[handle_tool_call(tc) for tc in choice.message.tool_calls])messages.extend(tool_result_messages)elif choice.finish_reason == "stop":return choice.message.contentelse:raise ValueError(f"Unhandled finish reason: {choice.finish_reason}")
Step 5: Design the system prompt
The system prompt is crucial for guiding the LLM's behavior. Our prompt:
- Establishes the LLM as an expert on a specific company's products
- Explains the available tools and how to use them
- Provides a structured approach for answering questions
- Sets guidelines for handling different types of queries
This prompt uses placeholders that will be filled in when we create a specific agent instance.
SYSTEM_PROMPT = """You are an expert on {company_name}'s products and documentation. You have access to a 'scrape_webpage' tool \that allows you to read web pages by providing a URL.This is {company_name}'s documentation site's LLMs.txt URL: {llms_txt_url}.The llms.txt file contains links to all {company_name}'s product documentation pages.When answering questions:1. If the question is about {company_name}'s products, use the 'scrape_webpage' tool to get the contents of \the llms.txt file2. If any of the URLs in the llms.txt file are relevant to the question, use the 'scrape_webpage' tool to \get the contents of the page3. Provide detailed answers with citations to the specific documentation pages4. If you can't find the answer in the docs, respond with: "I don't know the answer to that. I couldn't find \anything relevant to it in the docs, please try contacting the {company_name} team."5. If the question is unrelated to {company_name}'s products, respond with: "I can only answer questions \about {company_name}'s products"Always cite your sources by including the relevant documentation URLs. Respond with your chain of thought \and the final answer to the user.""".strip()
Step 6: Create a factory function for generating Q&A agents
Now we'll create a factory function that generates a specialized Q&A agent for any company's documentation. This function:
- Takes a company name and documentation URL as input
- Formats the system prompt with these values
- Returns a function that can answer questions about that company's products
This approach makes our solution reusable for different companies and documentation sites.
def make_support_agent(company_name: str, docs_url: str) -> str:# Popular documentation providers like Gitbook, Mintlify etc automatically generate a llms.txt file# for documentation sites hosted on their platforms.if docs_url.startswith("http://") or docs_url.startswith("https://"):llms_txt_url = f"{docs_url}/llms.txt"else:llms_txt_url = f"https://{docs_url}/llms.txt"sysprompt = SYSTEM_PROMPT.format(company_name=company_name,llms_txt_url=llms_txt_url,)async def qna(question: str) -> str:return await agent_loop([{"role": "system", "content": sysprompt},{"role": "user", "content": question},])return qna
Step 7: Test the agent with a real question
Let's test our agent by creating an instance for Hyperbrowser's documentation and asking it a question. This will demonstrate the full workflow:
- The agent receives a question about CAPTCHAs in Hyperbrowser
- It uses the
scrape_webpage
tool to access the documentation - It processes the information and formulates a detailed answer
- It returns the answer with citations to the relevant documentation
You'll see the tool calls being made in real-time as the agent works through the question.
hyperbrowser_qna = make_support_agent("Hyperbrowser", "https://docs.hyperbrowser.ai")question = "I'm getting blocked by CAPTCHAs when scraping a website with hyperbrowser. How do I fix it? I'm using the python sdk"answer = await hyperbrowser_qna(question)print("\n\n", "="*20, "Answer", "="*20, "\n\n")print(answer)
Handling tool call: scrape_webpage Handling tool call: scrape_webpage Handling tool call: scrape_webpage ==================== Answer ==================== To address the issue of being blocked by CAPTCHAs when using Hyperbrowser's Python SDK, you can leverage Hyperbrowser's CAPTCHA solving feature. Here’s a summary of how to integrate CAPTCHA solving into your scraping tasks: 1. **CAPTCHA Solving Feature**: Hyperbrowser provides an integrated CAPTCHA solver that allows you to scrape websites without being blocked. However, please note that to use this feature, you must be on a paid plan. 2. **Setup for CAPTCHA Solving**: When creating a Hyperbrowser session, enable CAPTCHA solving by setting the `solveCaptchas` parameter to `true`. This can be done via the SDK when creating a new session: ```python from hyperbrowser import AsyncHyperbrowser import os from dotenv import load_dotenv import asyncio load_dotenv() client = AsyncHyperbrowser(api_key=os.getenv("HYPERBROWSER_API_KEY")) async def main(): session = await client.sessions.create(solveCaptchas=True) # Your scraping code here await client.sessions.stop(session.id) asyncio.get_event_loop().run_until_complete(main()) ``` 3. **Session Creation**: The session configured with CAPTCHA solving will automatically handle CAPTCHAs encountered during navigation or data extraction processes. 4. **Running the Scraper**: Implement the rest of your code to perform scraping. The CAPTCHA solving process runs in the background, solving CAPTCHAs and continuing page loads without your script being explicitly aware of CAPTCHA handling. For full details and more code examples, you can refer to the official documentation on [CAPTCHA Solving with Hyperbrowser](https://docs.hyperbrowser.ai/guides/captcha-solving) and the [Python SDK](https://docs.hyperbrowser.ai/reference/sdks/python). By utilizing these features, you should be able to bypass CAPTCHA challenges and perform your web scraping tasks more smoothly.
Step 8: Try it with your own questions
Now that you've seen how the agent works, you can try asking your own questions about Hyperbrowser or create agents for other companies' documentation. Simply modify the code below with your question or create a new agent for a different company.
# Example: Create an agent for a different company# langchain_qna = make_support_agent("Anthropic", "https://docs.anthropic.com")# question = "How do I build a computer use agent?"# answer = await langchain_qna(question)# print(answer)
Feel free to experiment with different questions and documentation sites!
Conclusion
In this cookbook, we built a powerful documentation Q&A agent using Hyperbrowser and OpenAI. This agent can:
- Autonomously navigate documentation websites
- Extract relevant information based on user questions
- Provide detailed, cited answers from the documentation
- Be easily adapted for different companies and products
- Always stay up to date with the latest documentation because it scrapes the llms.txt file on every run
This pattern can be extended to create more sophisticated agents that can interact with multiple websites, use additional tools, or be integrated into larger applications.
Next Steps
To take this further, you might consider:
- Adding memory to the agent to remember previous questions and answers
- Implementing caching to improve performance
- Creating a web interface for easier interaction
- Adding more tools for different types of web interactions
- Making swarms of such agents to answer questions about integrating multiple products together
Happy building!