Building Autonomous Web Agents with Hyperbrowser and GPT-4o

In this cookbook, we'll demonstrate how to create autonomous web agents that can independently navigate the web and perform complex tasks without step-by-step human guidance. These agents can:

  1. Visit websites and understand their content
  2. Navigate between pages following logical paths
  3. Extract and synthesize information from multiple sources
  4. Perform specific tasks like summarization, research, or data collection

All of this can be done from within Hyperbrowser itself. We'll be using the browser_use agent for Hyperbrowser to accomplish all of the navigation, interaction, and summarisation.

By the end of this cookbook, you'll understand how to deploy agents for various web-based tasks with minimal human intervention!

Prerequisites

Before starting, you'll need:

  1. A Hyperbrowser API key (sign up at hyperbrowser.ai if you don't have one)

Store these API keys in a .env file in the same directory as this notebook:

HYPERBROWSER_API_KEY=your_hyperbrowser_key_here

Step 1: Import Libraries and Load Environment Variables

We start by importing the necessary packages and initializing our environment. The key components we'll use:

  • hyperbrowser: The main SDK for interacting with the Hyperbrowser API
  • AsyncHyperbrowser: The asynchronous client for making API calls
  • StartBrowserUseTaskParams: Parameters class for configuring autonomous browsing tasks
  • IPython.display: For rendering Markdown output in the notebook

We'll also load our environment variables from the .env file to authenticate our API clients.

import os
from dotenv import load_dotenv
from hyperbrowser import AsyncHyperbrowser
from hyperbrowser.models import StartBrowserUseTaskParams
from IPython.display import Markdown, display
load_dotenv()

Step 2: Initialize API Clients

We'll create an instance of the Hyperbrowser client that will handle the web navigation, browsing, and data extraction

hb = AsyncHyperbrowser(api_key=os.getenv("HYPERBROWSER_API_KEY"))

Step 3: Define the Autonomous Agent Task Function

Now we'll define a function that creates and runs an autonomous web agent. This function demonstrates the simplicity of Hyperbrowser's agent interface:

  1. We define our task in natural language ("go to Hacker News and summarize the top 5 posts")
  2. The agent autonomously navigates to the website, identifies the relevant posts, and creates a summary
  3. The agent returns the summary in the requested format (markdown)

This approach requires minimal code compared to building the entire agent logic from scratch. The Hyperbrowser API handles complex web interactions, navigation decisions, and content extraction behind the scenes.

async def summarize_hn_top_posts():
resp = await hb.agents.browser_use.start_and_wait(
StartBrowserUseTaskParams(
task="go to Hacker News and summarize the top 5 posts of the day. Respond in markdown format."
)
)
if resp.data is not None:
return resp.data.final_result
return None

Step 4: Execute the Agent and Display Results

Now we'll run our agent and display its results. The process works as follows:

  1. We call our summarize_hn_top_posts() function asynchronously
  2. The agent performs the required navigation, interactions, and sorting internally, and returns the summary.
  3. We display the formatted results in the notebook

The output below shows real-time results from Hacker News, demonstrating how the agent can autonomously gather and format information.

response = await summarize_hn_top_posts()
if response is not None:
display(Markdown(response))
else:
print("No response from the agent")

Here's a summary of the top 5 posts from Hacker News:

Understanding Autonomous Agents: A New Paradigm in Web Automation

The Hacker News example above demonstrates a fundamental advancement in web automation technology. Traditional web automation requires developers to implement precise instructions: navigating to specific URLs, locating elements through selectors, and handling various edge cases. Autonomous agents, by contrast, operate at a higher level of abstraction, understanding and executing tasks through natural language descriptions.

Here's a few key capabilities that such autonomous agents have

Task Understanding and Execution
Instead of writing explicit navigation and extraction code, we simply describe the desired outcome. The agent determines how to reach the website, identify relevant content, and format the output appropriately.

Adaptability and Resilience
When websites update their layouts or HTML structure, traditional scrapers often break. Autonomous agents can adapt automatically, understanding the purpose of the task rather than relying on specific selectors or patterns.

Development Efficiency
What might have required highly specialised knowledge about browsers and automation, can be done simply with basic python and a good description of the task. That also means that a lot more people can participate in the development process.

That said, autonomous agents are probably not going to be suitable for all scenarious immediately. Some applications require specific information, hidden inputs, or key strokes that just aren't known to the agent. This still represents a significant advancement in web automation. As demonstrated by our example, they enable developers to focus on what they want to accomplish rather than how to accomplish it, marking a fundamental shift in how we approach web automation tasks.

Conclusion

In this cookbook, we've demonstrated how to create an autonomous web agent using Hyperbrowser's browser_use API. With just a few lines of code, we were able to:

  1. Create an agent that navigates to Hacker News
  2. Extract the current top 5 posts
  3. Format them into a clean markdown summary with links
  4. Display the results directly in our notebook

This example shows the power of autonomous agents for web automation tasks. Instead of writing complex web scraping code with selectors and navigation logic, we simply described our task in natural language and let the agent handle the details.