Building a Documentation-based Coding agent
In this cookbook, we'll create an intelligent coder that can extract information from technical documentation and generate code based on that documentation. This agent will:
- Navigate to any documentation page or website
- Extract and understand the technical content
- Follow links to gather additional context when needed
- Generate working code examples based on documentation specifications
- Provide clear explanations and instructions for implementation
This approach combines:
- Hyperbrowser for web scraping and documentation extraction
- OpenAI's GPT-4o for technical understanding and code generation
After going through this, you'll have a versatile tool that can help developers quickly implement features from documentation without having to parse through lengthy technical material manually!
Prerequisites
Before starting, you'll need:
- A Hyperbrowser API key (sign up at hyperbrowser.ai if you don't have one)
- An OpenAI API key with access to GPT-4o
Store these API keys in a .env
file in the same directory as this notebook:
HYPERBROWSER_API_KEY=your_hyperbrowser_key_here
OPENAI_API_KEY=your_openai_key_here
Step 1: Set up imports and load environment variables
import asyncioimport jsonimport osfrom dotenv import load_dotenvfrom hyperbrowser import AsyncHyperbrowserfrom hyperbrowser.tools import WebsiteScrapeToolfrom openai import AsyncOpenAIfrom openai.types.chat import (ChatCompletionMessageParam,ChatCompletionMessageToolCall,ChatCompletionToolMessageParam,)from IPython.display import Markdown, displayload_dotenv()
Step 2: Initialize API clients
Here we create instances of the Hyperbrowser and OpenAI clients using our API keys. These clients will be responsible for web scraping and code generation respectively.
hb = AsyncHyperbrowser(api_key=os.getenv("HYPERBROWSER_API_KEY"))llm = AsyncOpenAI()
Step 3: Implement the tool handler
The tool handler function processes requests from the LLM to interact with our web scraping functionality.
- Receives tool call parameters from the LLM
- Validates that the requested tool is available, in this case we use the
scrape_webpage
tool from the Hyperbrowser package. - Executes the web scraping operation with the specified parameters
- Returns the scraped content or handles any errors that occur
This function is crucial for enabling the LLM to access documentation content dynamically as it explores technical specifications.
async def handle_tool_call(tc: ChatCompletionMessageToolCall,) -> ChatCompletionToolMessageParam:print(f"Handling tool call: {tc.function.name}")try:if (tc.function.name!= WebsiteScrapeTool.openai_tool_definition["function"]["name"]):raise ValueError(f"Tool not found: {tc.function.name}")args = json.loads(tc.function.arguments)print(args)content = await WebsiteScrapeTool.async_runnable(hb=hb, params=args)return {"role": "tool", "tool_call_id": tc.id, "content": content}except Exception as e:err_msg = f"Error handling tool call: {e}"print(err_msg)return {"role": "tool","tool_call_id": tc.id,"content": err_msg,"is_error": True, # type: ignore}
Step 4: Create the agent loop
Now we implement the core agent loop that orchestrates the conversation between:
- The user (who asks for code implementation based on documentation)
- The LLM (which analyzes the request and determines what information is needed)
- Our tool (which fetches documentation content from websites)
This recursive pattern allows for sophisticated interactions where the agent can gather information iteratively, exploring multiple documentation pages if necessary to fully understand the technical requirements before generating code.
async def agent_loop(messages: list[ChatCompletionMessageParam]) -> str:while True:response = await llm.chat.completions.create(messages=messages,model="gpt-4o",tools=[WebsiteScrapeTool.openai_tool_definition,],max_completion_tokens=8000,)choice = response.choices[0]# Append response to messagesmessages.append(choice.message) # type: ignore# Handle tool callsif (choice.finish_reason == "tool_calls"and choice.message.tool_calls is not None):tool_result_messages = await asyncio.gather(*[handle_tool_call(tc) for tc in choice.message.tool_calls])messages.extend(tool_result_messages)elif choice.finish_reason == "stop" and choice.message.content is not None:return choice.message.contentelse:print(choice)raise ValueError(f"Unhandled finish reason: {choice.finish_reason}")
Step 5: Design the system prompt
The system prompt is crucial for guiding the LLM's behavior. Our prompt establishes the agent as an expert coder analyzer that can:
- Extract technical information from documentation pages
- Follow links to gather additional context when needed
- Determine if the current knowledge is sufficient to generate the solution.
- Generate code based strictly on the documentation specifications
- Avoid making unsupported assumptions beyond what's in the documentation
This approach ensures that the code generated is as close as possible to the documented specifications.
SYSTEM_PROMPT = """You are an expert coder analyzer. You have access to a 'scrape_webpage' tool which can be used to get markdown data from a webpage. You can analyze the webpage and generate code based on the information so scraped. Base whatever code you generate on the documentation you extract. Do not deviate from that or make your own assumptions.Keep in mind that some times, the information provided might not be sufficient, and you might have to scrape other pages to arrive at the appropriate documentation. Links to these pages can usually be obtained from the initial page itself.This is the link to a piece of documentation {link}. Analyze the documentation and generate code based on whatever the user requires you to do.""".strip()
Step 6: Create a factory function for documentation analyzers
Now we'll create a factory function that generates specialized documentation analyzer agents for any docs site. This function:
- Takes a URL to a documentation page as input
- Ensures the URL has the proper format (adding https:// if needed)
- Formats the system prompt with this URL
- Returns a function that can answer questions and generate code based on that documentation
This approach makes our solution reusable for working with documentation from any library, API, or framework.
from typing import Coroutine, Any, Callabledef make_documentation_analyzer(link_to_docs: str,) -> Callable[..., Coroutine[Any, Any, str]]:# Popular documentation providers like Gitbook, Mintlify etc automatically generate a llms.txt file# for documentation sites hosted on their platforms.if not (link_to_docs.startswith("http://") or link_to_docs.startswith("https://")):link_to_docs = f"https://{link_to_docs}"sysprompt = SYSTEM_PROMPT.format(link=link_to_docs,)async def document_analyzer(question: str) -> str:return await agent_loop([{"role": "system", "content": sysprompt},{"role": "user", "content": question},])return document_analyzer
Step 7: Test the documentation analyzer
Let's test our agent by asking it to generate code based on the Hyperbrowser (our!) documentation. We'll request a coded example of how to get search results from Google using Hyperbrowser. This sould demonstrate the agent's ability to navigate documentation and generate practical code examples.
One of the key things of this approach is that it doesn't box the agent. The agent can explore all options and choose the most suitable one.
document_analyzer = make_documentation_analyzer("https://docs.hyperbrowser.ai")doc_example = await document_analyzer("Can you tell me how I could get the search result link and search result name from www.google.com?")
Handling tool call: scrape_webpage {'url': 'https://docs.hyperbrowser.ai', 'scrape_options': {'include_tags': ['a'], 'exclude_tags': [], 'only_main_content': True}} Handling tool call: scrape_webpage {'url': 'https://docs.hyperbrowser.ai/get-started/quickstart/puppeteer', 'scrape_options': {'include_tags': ['p', 'code'], 'exclude_tags': [], 'only_main_content': True}}
Step 8: Display the generated code
Now we'll display the results of our documentation analyzer. The agent has scraped the Hyperbrowser documentation, understood the relevant APIs for web scraping with Puppeteer, and generated a complete code example for extracting search results from Google.
display(Markdown(doc_example))
To get the search result links and search result names from Google using Puppeteer with Hyperbrowser, you would need to set up an environment with Puppeteer and an API key from Hyperbrowser. Below is an example code in Node.js using Puppeteer:
import { connect } from "puppeteer-core";import { config } from "dotenv";config();const main = async () => {// Connect to the Hyperbrowser sessionconst browser = await connect({browserWSEndpoint: `wss://connect.hyperbrowser.ai?apiKey=${process.env.HYPERBROWSER_API_KEY}`,});const [page] = await browser.pages();// Navigate to Googleawait page.goto("https://www.google.com");// Perform searchawait page.type('input[name="q"]', "example search query");await page.keyboard.press('Enter');await page.waitForNavigation();// Extract search resultsconst searchResults = await page.evaluate(() => {const results = [];document.querySelectorAll('.tF2Cxc').forEach((result) => {const link = result.querySelector('a')?.href;const title = result.querySelector('h3')?.innerText;if (link && title) {results.push({ link, title });}});return results;});console.log(searchResults);// Clean upawait browser.close();};main();
Instructions:
-
Install Required Packages:
- Run
npm install puppeteer-core @hyperbrowser/sdk dotenv
to install the necessary packages.
- Run
-
Set Up Your Environment Variable:
- Obtain your API key from the Hyperbrowser dashboard and add it to a
.env
file asHYPERBROWSER_API_KEY=your_api_key_here
.
- Obtain your API key from the Hyperbrowser dashboard and add it to a
-
Run the Script:
- Execute the script with Node.js, and it will print out the search results with links and titles.
This will scrape the search result links and names from Google using Puppeteer with Hyperbrowser.
Future Explorations
There are many exciting ways to extend and enhance this documentation analyzer. Here are some directions for developers and users to explore:
Advanced Documentation Processing
- Multi-page Crawling
- Schema and Type Inference
Code Generation Capabilities
- Test Generation
- Code Customization Options
Documentation Analysis
- Documentation Quality Assessment
- Missing Documentation Detection
This provides some exciting directions in which the documentation analyzer could develop further. It could become an even more powerful tool for developers, technical writers, and API designers, bridging the gap between documentation and implementation while improving the overall quality of both.
Conclusion
In this cookbook, we built a powerful documentation analyzer using Hyperbrowser and GPT-4o. This agent can:
- Automatically extract technical information from documentation websites
- Navigate between related documentation pages to gather comprehensive context
- Generate working code examples based on the documented specifications
- Provide step-by-step instructions for implementation
- Ensure code is compliant with the official API patterns and best practices
This pattern can be extended to create more sophisticated documentation tools, such as:
- Multi-framework code generators that provide implementations in various languages
- Integration assistants that combine multiple APIs according to documentation
- Migration assistants that help convert code between different library versions
- Documentation gap analyzers that identify missing or unclear sections