Building a Documentation-based Coding agent

In this cookbook, we'll create an intelligent coder that can extract information from technical documentation and generate code based on that documentation. This agent will:

Navigate to any documentation page or website
Extract and understand the technical content
Follow links to gather additional context when needed
Generate working code examples based on documentation specifications
Provide clear explanations and instructions for implementation

This approach combines:

Hyperbrowser for web scraping and documentation extraction
OpenAI's GPT-4o for technical understanding and code generation

After going through this, you'll have a versatile tool that can help developers quickly implement features from documentation without having to parse through lengthy technical material manually!

Prerequisites

Before starting, you'll need:

A Hyperbrowser API key (sign up at hyperbrowser.ai if you don't have one)
An OpenAI API key with access to GPT-4o

Store these API keys in a .env file in the same directory as this notebook:

HYPERBROWSER_API_KEY=your_hyperbrowser_key_here
OPENAI_API_KEY=your_openai_key_here

Step 1: Set up imports and load environment variables

import asyncio
import json
import os

from dotenv import load_dotenv
from hyperbrowser import AsyncHyperbrowser
from hyperbrowser.tools import WebsiteScrapeTool
from openai import AsyncOpenAI
from openai.types.chat import (
    ChatCompletionMessageParam,
    ChatCompletionMessageToolCall,
    ChatCompletionToolMessageParam,
)
from IPython.display import Markdown, display

load_dotenv()

Step 2: Initialize API clients

Here we create instances of the Hyperbrowser and OpenAI clients using our API keys. These clients will be responsible for web scraping and code generation respectively.

hb = AsyncHyperbrowser(api_key=os.getenv("HYPERBROWSER_API_KEY"))
llm = AsyncOpenAI()

Step 3: Implement the tool handler

The tool handler function processes requests from the LLM to interact with our web scraping functionality.

Receives tool call parameters from the LLM
Validates that the requested tool is available, in this case we use the scrape_webpage tool from the Hyperbrowser package.
Executes the web scraping operation with the specified parameters
Returns the scraped content or handles any errors that occur

This function is crucial for enabling the LLM to access documentation content dynamically as it explores technical specifications.

async def handle_tool_call(
    tc: ChatCompletionMessageToolCall,
) -> ChatCompletionToolMessageParam:
    print(f"Handling tool call: {tc.function.name}")

    try:
        if (
            tc.function.name
            != WebsiteScrapeTool.openai_tool_definition["function"]["name"]
        ):
            raise ValueError(f"Tool not found: {tc.function.name}")

        args = json.loads(tc.function.arguments)
        print(args)
        content = await WebsiteScrapeTool.async_runnable(hb=hb, params=args)

        return {"role": "tool", "tool_call_id": tc.id, "content": content}

    except Exception as e:
        err_msg = f"Error handling tool call: {e}"
        print(err_msg)
        return {
            "role": "tool",
            "tool_call_id": tc.id,
            "content": err_msg,
            "is_error": True,  # type: ignore
        }

Step 4: Create the agent loop

Now we implement the core agent loop that orchestrates the conversation between:

The user (who asks for code implementation based on documentation)
The LLM (which analyzes the request and determines what information is needed)
Our tool (which fetches documentation content from websites)

This recursive pattern allows for sophisticated interactions where the agent can gather information iteratively, exploring multiple documentation pages if necessary to fully understand the technical requirements before generating code.

async def agent_loop(messages: list[ChatCompletionMessageParam]) -> str:
    while True:
        response = await llm.chat.completions.create(
            messages=messages,
            model="gpt-4o",
            tools=[
                WebsiteScrapeTool.openai_tool_definition,
            ],
            max_completion_tokens=8000,
        )

        choice = response.choices[0]

        # Append response to messages
        messages.append(choice.message)  # type: ignore

        # Handle tool calls
        if (
            choice.finish_reason == "tool_calls"
            and choice.message.tool_calls is not None
        ):
            tool_result_messages = await asyncio.gather(
                *[handle_tool_call(tc) for tc in choice.message.tool_calls]
            )
            messages.extend(tool_result_messages)

        elif choice.finish_reason == "stop" and choice.message.content is not None:
            return choice.message.content

        else:
            print(choice)
            raise ValueError(f"Unhandled finish reason: {choice.finish_reason}")

Step 5: Design the system prompt

The system prompt is crucial for guiding the LLM's behavior. Our prompt establishes the agent as an expert coder analyzer that can:

Extract technical information from documentation pages
Follow links to gather additional context when needed
Determine if the current knowledge is sufficient to generate the solution.
Generate code based strictly on the documentation specifications
Avoid making unsupported assumptions beyond what's in the documentation

This approach ensures that the code generated is as close as possible to the documented specifications.

SYSTEM_PROMPT = """
You are an expert coder analyzer. You have access to a 'scrape_webpage' tool which can be used to get markdown data from a webpage. You can analyze the webpage and generate code based on the information so scraped. Base whatever code you generate on the documentation you extract. Do not deviate from that or make your own assumptions.

Keep in mind that some times, the information provided might not be sufficient, and you might have to scrape other pages to arrive at the appropriate documentation. Links to these pages can usually be obtained from the initial page itself.

This is the link to a piece of documentation {link}. Analyze the documentation and generate code based on whatever the user requires you to do.
""".strip()

Step 6: Create a factory function for documentation analyzers

Now we'll create a factory function that generates specialized documentation analyzer agents for any docs site. This function:

Takes a URL to a documentation page as input
Ensures the URL has the proper format (adding https:// if needed)
Formats the system prompt with this URL
Returns a function that can answer questions and generate code based on that documentation

This approach makes our solution reusable for working with documentation from any library, API, or framework.

from typing import Coroutine, Any, Callable


def make_documentation_analyzer(
    link_to_docs: str,
) -> Callable[..., Coroutine[Any, Any, str]]:
    # Popular documentation providers like Gitbook, Mintlify etc automatically generate a llms.txt file
    # for documentation sites hosted on their platforms.
    if not (link_to_docs.startswith("http://") or link_to_docs.startswith("https://")):
        link_to_docs = f"https://{link_to_docs}"

    sysprompt = SYSTEM_PROMPT.format(
        link=link_to_docs,
    )

    async def document_analyzer(question: str) -> str:
        return await agent_loop(
            [
                {"role": "system", "content": sysprompt},
                {"role": "user", "content": question},
            ]
        )

    return document_analyzer

Step 7: Test the documentation analyzer

Let's test our agent by asking it to generate code based on the Hyperbrowser (our!) documentation. We'll request a coded example of how to get search results from Google using Hyperbrowser. This sould demonstrate the agent's ability to navigate documentation and generate practical code examples.

One of the key things of this approach is that it doesn't box the agent. The agent can explore all options and choose the most suitable one.

document_analyzer = make_documentation_analyzer("https://docs.hyperbrowser.ai")
doc_example = await document_analyzer(
    "Can you tell me how I could get the search result link and search result name from www.google.com?"
)

Handling tool call: scrape_webpage

{'url': 'https://docs.hyperbrowser.ai', 'scrape_options': {'include_tags': ['a'], 'exclude_tags': [], 'only_main_content': True}}

Handling tool call: scrape_webpage

{'url': 'https://docs.hyperbrowser.ai/get-started/quickstart/puppeteer', 'scrape_options': {'include_tags': ['p', 'code'], 'exclude_tags': [], 'only_main_content': True}}

Step 8: Display the generated code

Now we'll display the results of our documentation analyzer. The agent has scraped the Hyperbrowser documentation, understood the relevant APIs for web scraping with Puppeteer, and generated a complete code example for extracting search results from Google.

display(Markdown(doc_example))

To get the search result links and search result names from Google using Puppeteer with Hyperbrowser, you would need to set up an environment with Puppeteer and an API key from Hyperbrowser. Below is an example code in Node.js using Puppeteer:

import { connect } from "puppeteer-core";

import { config } from "dotenv";

config();



const main = async () => {

  // Connect to the Hyperbrowser session

  const browser = await connect({

    browserWSEndpoint: `wss://connect.hyperbrowser.ai?apiKey=${process.env.HYPERBROWSER_API_KEY}`,

  });



  const [page] = await browser.pages();



  // Navigate to Google

  await page.goto("https://www.google.com");



  // Perform search

  await page.type('input[name="q"]', "example search query");

  await page.keyboard.press('Enter');

  await page.waitForNavigation();



  // Extract search results

  const searchResults = await page.evaluate(() => {

    const results = [];

    document.querySelectorAll('.tF2Cxc').forEach((result) => {

      const link = result.querySelector('a')?.href;

      const title = result.querySelector('h3')?.innerText;

      if (link && title) {

        results.push({ link, title });

      }

    });

    return results;

  });



  console.log(searchResults);



  // Clean up

  await browser.close();

};



main();

Instructions:

Install Required Packages:
- Run npm install puppeteer-core @hyperbrowser/sdk dotenv to install the necessary packages.
Set Up Your Environment Variable:
- Obtain your API key from the Hyperbrowser dashboard and add it to a .env file as HYPERBROWSER_API_KEY=your_api_key_here.
Run the Script:
- Execute the script with Node.js, and it will print out the search results with links and titles.

This will scrape the search result links and names from Google using Puppeteer with Hyperbrowser.

Future Explorations

There are many exciting ways to extend and enhance this documentation analyzer. Here are some directions for developers and users to explore:

Advanced Documentation Processing

Multi-page Crawling
Schema and Type Inference

Code Generation Capabilities

Test Generation
Code Customization Options

Documentation Analysis

Documentation Quality Assessment
Missing Documentation Detection

This provides some exciting directions in which the documentation analyzer could develop further. It could become an even more powerful tool for developers, technical writers, and API designers, bridging the gap between documentation and implementation while improving the overall quality of both.

Conclusion

In this cookbook, we built a powerful documentation analyzer using Hyperbrowser and GPT-4o. This agent can:

Automatically extract technical information from documentation websites
Navigate between related documentation pages to gather comprehensive context
Generate working code examples based on the documented specifications
Provide step-by-step instructions for implementation
Ensure code is compliant with the official API patterns and best practices

This pattern can be extended to create more sophisticated documentation tools, such as:

Multi-framework code generators that provide implementations in various languages
Integration assistants that combine multiple APIs according to documentation
Migration assistants that help convert code between different library versions
Documentation gap analyzers that identify missing or unclear sections