Building a Chess Move Solver with Hyperbrowser and Claude

In this cookbook, we'll build a smart chess puzzle solver that can analyze a chess position and recommend the best next move. This approach combines:

Hyperbrowser for capturing screenshots of chess positions from websites
Anthropic's Claude 3.7 Sonnet model for analyzing the position and determining the best move
Tool-calling to create an agent that can work with visual chess data

By the end of this cookbook, you'll have a reusable agent that can solve chess puzzles from websites like Lichess!

Prerequisites

Before starting, make sure you have:

A Hyperbrowser API key (sign up at hyperbrowser.ai if you don't have one, it's free)
An Anthropic API key
Python 3.9+ installed

Both API keys should be stored in a .env file in the same directory as this notebook with the following format:

HYPERBROWSER_API_KEY=your_hyperbrowser_key_here
ANTHROPIC_API_KEY=your_anthropic_key_here

Step 1: Set up imports and load environment variables

import asyncio
import os
import base64

from dotenv import load_dotenv
from hyperbrowser import AsyncHyperbrowser
from hyperbrowser.tools import WebsiteScreenshotTool

from anthropic import AsyncAnthropic
from anthropic.types import (
    MessageParam,
    ToolUseBlock,
    ToolResultBlockParam,
)

from typing import Coroutine, Any, Callable

from IPython.display import display, Markdown

import requests

load_dotenv()

Step 2: Initialize clients

hb = AsyncHyperbrowser(api_key=os.getenv("HYPERBROWSER_API_KEY"))
llm = AsyncAnthropic()

Step 3: Create helper functions for tool handling

Next, we'll define a function to handle tool calls from the LLM. This function will process the screenshot tool calls and return the results to the agent.

async def handle_tool_call(
    tc: ToolUseBlock,
) -> ToolResultBlockParam:
    print(f"Handling tool call: {tc.name}")
    try:
        if tc.name != WebsiteScreenshotTool.anthropic_tool_definition["name"]:
            raise ValueError(f"Tool not found: {tc.name}")
        args = tc.input
        print(args)
        # Convert args to dict if it's not already a dict
        params = args if isinstance(args, dict) else dict(args)  # type:ignore
        screenshot_url = await WebsiteScreenshotTool.async_runnable(
            hb=hb, params=params
        )
        response = requests.get(screenshot_url)
        if response.status_code == 200:
            image_base64 = base64.b64encode(response.content).decode("utf-8")
            screenshot = f"data:image/webp;base64,{image_base64}"
            return ToolResultBlockParam(
                tool_use_id=tc.id,
                type="tool_result",
                content=screenshot,
                is_error=False,
            )
        else:
            return ToolResultBlockParam(
                tool_use_id=tc.id,
                type="tool_result",
                content="Could not get screenshot from hyperbrowser screenshot tool",
                is_error=True,
            )

    except Exception as e:
        err_msg = f"Error handling tool call: {e}"
        print(err_msg)
        return ToolResultBlockParam(
            tool_use_id=tc.id, type="tool_result", content=str(e), is_error=True
        )

Step 4: Implement the agent loop

Now we'll create the main agent loop that orchestrates the conversation between the user, the LLM, and the tools. This function:

Takes a list of messages (including system prompt and user query)
Sends them to the Anthropic API
Processes any tool calls that the LLM makes
Continues the conversation until the LLM provides a final answer

This is the core of our chess-solving agent's functionality.

async def agent_loop(messages: list[MessageParam]) -> str:
    while True:
        response = await llm.messages.create(
            messages=messages,
            model="claude-3-7-sonnet-latest",
            max_tokens=8000,
            tools=[
                WebsiteScreenshotTool.anthropic_tool_definition,
            ],
        )

        choice = response

        if choice.stop_reason == "tool_use":
            if choice.content[0].type == "text":
                print(choice.content[0].text)
            tool_use = next(
                block for block in response.content if block.type == "tool_use"
            )
            if tool_use is not None:
                messages.append(
                    {
                        "role": "assistant",
                        "content": choice.content,
                    }
                )

                # Handle tool calls
                if choice.content is not None:
                    tool_result_messages = await asyncio.gather(
                        *[
                            handle_tool_call(tc)
                            for tc in choice.content
                            if tc.type == "tool_use"
                        ]
                    )
                    messages.append(
                        MessageParam(content=tool_result_messages, role="user")
                    )
        elif choice.stop_reason == "stop_sequence" or choice.stop_reason == "end_turn":
            text_block = next(block for block in choice.content if block.type == "text")
            return text_block.text

        else:
            print(choice)
            raise ValueError(f"Unhandled finish reason: {choice.stop_reason}")

Step 5: Design the system prompt

The system prompt is crucial for guiding the LLM's behavior. Our prompt establishes the LLM as a chess expert and provides instructions on how to analyze chess positions and report the best moves.

SYSTEM_PROMPT = """
You are an expert chess solver. You have access to a 'scrape_webpage' tool which can be used to take a screenshot of the current position. 

This is the link to a chess game {chess_game_url}. You are given a position and you need to find the next move.
The page contains the current position and tells you the color of the piece to move, usually listed as "Find the best move for white" or "Find the best move for black"."

Make sure that the piece you're moving is actually of the color you're asked to move for. In addition, make sure that no piece blocks the natural movement of the piece you're trying to move.

You are required to response with 
1. The best piece to move (one between a pawn, knight, bishop, rook, queen, or king)
2. the current position of the piece to move (usually listed as "a4" or "h8")
3. the next position of the piece to move (usually listed as "a5" or "h7")

Try to answer the response sticking as close as possible to these 3 parameters. If you cannot tell the next best position according to the users prompts, then state that you cannot. Do not ask followup questions here.

Return the final response formatted as markdown
""".strip()

Step 6: Create a factory function for generating chess-solving agents

Now we'll create a factory function that generates a specialized chess-solving agent. This function:

Takes a chess game URL as input
Formats the system prompt with this URL
Returns a function that can analyze and solve chess positions

This approach makes our solution reusable for different chess puzzles from various websites.

def make_chess_agent(
    link_to_chess_game: str,
) -> Callable[..., Coroutine[Any, Any, str]]:
    # Popular documentation providers like Gitbook, Mintlify etc automatically generate a llms.txt file
    # for documentation sites hosted on their platforms.
    if not (
        link_to_chess_game.startswith("http://")
        or link_to_chess_game.startswith("https://")
    ):
        link_to_chess_game = f"https://{link_to_chess_game}"

    sysprompt = SYSTEM_PROMPT.format(
        chess_game_url=link_to_chess_game,
    )

    async def solve_chess(question: str) -> str:
        return await agent_loop(
            [
                {"role": "assistant", "content": sysprompt},
                {"role": "user", "content": question},
            ]
        )

    return solve_chess

Step 7: Test the agent with a real chess puzzle

Let's test our agent by creating an instance for a Lichess chess puzzle and asking it to find the best move. This will demonstrate the full workflow:

The agent receives a question about the best move for a chess position
It uses the scrape_webpage tool to take a screenshot of the position
It analyzes the position and determines the best move
It returns the answer in the specified format

You'll see the tool calls being made in real-time as the agent works through the puzzle.

link_to_chess_game = "https://lichess.org/training/ntE6Z"

question = "What is the best move for white?"

agent = make_chess_agent(link_to_chess_game)

response = await agent(question)

display(Markdown(response))

I'll help you solve this chess puzzle. First, let me take a screenshot of the current position to analyze it.

Handling tool call: screenshot_webpage

{'url': 'https://lichess.org/training/ntE6Z', 'scrape_options': {'include_tags': ['body'], 'exclude_tags': [], 'only_main_content': True, 'formats': ['screenshot']}}

Based on the chess position shown on the Lichess training puzzle, I can analyze the best move for white.

Looking at the current position, I can see that:

White is asked to make the best move
White has a rook on h1
There's a pawn structure with white pawns advanced
Black's king is on the kingside

The best move for white in this position is:

Rook from h1 to h8, delivering checkmate

This is a classic checkmate pattern where the rook delivers the final blow along the h-file, with no pieces able to block or capture the rook. The rook's movement to h8 delivers an immediate checkmate to the black king.

Conclusion

In this cookbook, we built a powerful chess puzzle solver using Hyperbrowser and Claude. This agent can:

Access and capture screenshots of chess positions from websites
Analyze the visual representation of a chess board
Determine the best next move based on the current position
Provide a clear, structured response with the piece, current position, and target position

This pattern can be extended to create more sophisticated chess analysis tools or be adapted for other visual puzzle-solving tasks.

Next Steps

To take this further, you might consider:

Adding support for multiple chess puzzle platforms
Implementing move validation to ensure the suggested moves are legal
Creating a web interface where users can paste chess puzzle links
Adding explanations for why a particular move is best
Extending the agent to recommend multiple good moves with pros and cons

Happy chess solving!