Building a Chess Move Solver with Hyperbrowser and Claude
In this cookbook, we'll build a smart chess puzzle solver that can analyze a chess position and recommend the best next move. This approach combines:
- Hyperbrowser for capturing screenshots of chess positions from websites
- Anthropic's Claude 3.7 Sonnet model for analyzing the position and determining the best move
- Tool-calling to create an agent that can work with visual chess data
By the end of this cookbook, you'll have a reusable agent that can solve chess puzzles from websites like Lichess!
Prerequisites
Before starting, make sure you have:
- A Hyperbrowser API key (sign up at hyperbrowser.ai if you don't have one, it's free)
- An Anthropic API key
- Python 3.9+ installed
Both API keys should be stored in a .env
file in the same directory as this notebook with the following format:
HYPERBROWSER_API_KEY=your_hyperbrowser_key_here
ANTHROPIC_API_KEY=your_anthropic_key_here
Step 1: Set up imports and load environment variables
import asyncioimport osimport base64from dotenv import load_dotenvfrom hyperbrowser import AsyncHyperbrowserfrom hyperbrowser.tools import WebsiteScreenshotToolfrom anthropic import AsyncAnthropicfrom anthropic.types import (MessageParam,ToolUseBlock,ToolResultBlockParam,)from typing import Coroutine, Any, Callablefrom IPython.display import display, Markdownimport requestsload_dotenv()
Step 2: Initialize clients
hb = AsyncHyperbrowser(api_key=os.getenv("HYPERBROWSER_API_KEY"))llm = AsyncAnthropic()
Step 3: Create helper functions for tool handling
Next, we'll define a function to handle tool calls from the LLM. This function will process the screenshot tool calls and return the results to the agent.
async def handle_tool_call(tc: ToolUseBlock,) -> ToolResultBlockParam:print(f"Handling tool call: {tc.name}")try:if tc.name != WebsiteScreenshotTool.anthropic_tool_definition["name"]:raise ValueError(f"Tool not found: {tc.name}")args = tc.inputprint(args)# Convert args to dict if it's not already a dictparams = args if isinstance(args, dict) else dict(args) # type:ignorescreenshot_url = await WebsiteScreenshotTool.async_runnable(hb=hb, params=params)response = requests.get(screenshot_url)if response.status_code == 200:image_base64 = base64.b64encode(response.content).decode("utf-8")screenshot = f"data:image/webp;base64,{image_base64}"return ToolResultBlockParam(tool_use_id=tc.id,type="tool_result",content=screenshot,is_error=False,)else:return ToolResultBlockParam(tool_use_id=tc.id,type="tool_result",content="Could not get screenshot from hyperbrowser screenshot tool",is_error=True,)except Exception as e:err_msg = f"Error handling tool call: {e}"print(err_msg)return ToolResultBlockParam(tool_use_id=tc.id, type="tool_result", content=str(e), is_error=True)
Step 4: Implement the agent loop
Now we'll create the main agent loop that orchestrates the conversation between the user, the LLM, and the tools. This function:
- Takes a list of messages (including system prompt and user query)
- Sends them to the Anthropic API
- Processes any tool calls that the LLM makes
- Continues the conversation until the LLM provides a final answer
This is the core of our chess-solving agent's functionality.
async def agent_loop(messages: list[MessageParam]) -> str:while True:response = await llm.messages.create(messages=messages,model="claude-3-7-sonnet-latest",max_tokens=8000,tools=[WebsiteScreenshotTool.anthropic_tool_definition,],)choice = responseif choice.stop_reason == "tool_use":if choice.content[0].type == "text":print(choice.content[0].text)tool_use = next(block for block in response.content if block.type == "tool_use")if tool_use is not None:messages.append({"role": "assistant","content": choice.content,})# Handle tool callsif choice.content is not None:tool_result_messages = await asyncio.gather(*[handle_tool_call(tc)for tc in choice.contentif tc.type == "tool_use"])messages.append(MessageParam(content=tool_result_messages, role="user"))elif choice.stop_reason == "stop_sequence" or choice.stop_reason == "end_turn":text_block = next(block for block in choice.content if block.type == "text")return text_block.textelse:print(choice)raise ValueError(f"Unhandled finish reason: {choice.stop_reason}")
Step 5: Design the system prompt
The system prompt is crucial for guiding the LLM's behavior. Our prompt establishes the LLM as a chess expert and provides instructions on how to analyze chess positions and report the best moves.
SYSTEM_PROMPT = """You are an expert chess solver. You have access to a 'scrape_webpage' tool which can be used to take a screenshot of the current position.This is the link to a chess game {chess_game_url}. You are given a position and you need to find the next move.The page contains the current position and tells you the color of the piece to move, usually listed as "Find the best move for white" or "Find the best move for black"."Make sure that the piece you're moving is actually of the color you're asked to move for. In addition, make sure that no piece blocks the natural movement of the piece you're trying to move.You are required to response with1. The best piece to move (one between a pawn, knight, bishop, rook, queen, or king)2. the current position of the piece to move (usually listed as "a4" or "h8")3. the next position of the piece to move (usually listed as "a5" or "h7")Try to answer the response sticking as close as possible to these 3 parameters. If you cannot tell the next best position according to the users prompts, then state that you cannot. Do not ask followup questions here.Return the final response formatted as markdown""".strip()
Step 6: Create a factory function for generating chess-solving agents
Now we'll create a factory function that generates a specialized chess-solving agent. This function:
- Takes a chess game URL as input
- Formats the system prompt with this URL
- Returns a function that can analyze and solve chess positions
This approach makes our solution reusable for different chess puzzles from various websites.
def make_chess_agent(link_to_chess_game: str,) -> Callable[..., Coroutine[Any, Any, str]]:# Popular documentation providers like Gitbook, Mintlify etc automatically generate a llms.txt file# for documentation sites hosted on their platforms.if not (link_to_chess_game.startswith("http://")or link_to_chess_game.startswith("https://")):link_to_chess_game = f"https://{link_to_chess_game}"sysprompt = SYSTEM_PROMPT.format(chess_game_url=link_to_chess_game,)async def solve_chess(question: str) -> str:return await agent_loop([{"role": "assistant", "content": sysprompt},{"role": "user", "content": question},])return solve_chess
Step 7: Test the agent with a real chess puzzle
Let's test our agent by creating an instance for a Lichess chess puzzle and asking it to find the best move. This will demonstrate the full workflow:
- The agent receives a question about the best move for a chess position
- It uses the
scrape_webpage
tool to take a screenshot of the position - It analyzes the position and determines the best move
- It returns the answer in the specified format
You'll see the tool calls being made in real-time as the agent works through the puzzle.
link_to_chess_game = "https://lichess.org/training/ntE6Z"question = "What is the best move for white?"agent = make_chess_agent(link_to_chess_game)response = await agent(question)display(Markdown(response))
I'll help you solve this chess puzzle. First, let me take a screenshot of the current position to analyze it. Handling tool call: screenshot_webpage {'url': 'https://lichess.org/training/ntE6Z', 'scrape_options': {'include_tags': ['body'], 'exclude_tags': [], 'only_main_content': True, 'formats': ['screenshot']}}
Based on the chess position shown on the Lichess training puzzle, I can analyze the best move for white.
Looking at the current position, I can see that:
-
White is asked to make the best move
-
White has a rook on h1
-
There's a pawn structure with white pawns advanced
-
Black's king is on the kingside
The best move for white in this position is:
Rook from h1 to h8, delivering checkmate
This is a classic checkmate pattern where the rook delivers the final blow along the h-file, with no pieces able to block or capture the rook. The rook's movement to h8 delivers an immediate checkmate to the black king.
Conclusion
In this cookbook, we built a powerful chess puzzle solver using Hyperbrowser and Claude. This agent can:
- Access and capture screenshots of chess positions from websites
- Analyze the visual representation of a chess board
- Determine the best next move based on the current position
- Provide a clear, structured response with the piece, current position, and target position
This pattern can be extended to create more sophisticated chess analysis tools or be adapted for other visual puzzle-solving tasks.
Next Steps
To take this further, you might consider:
- Adding support for multiple chess puzzle platforms
- Implementing move validation to ensure the suggested moves are legal
- Creating a web interface where users can paste chess puzzle links
- Adding explanations for why a particular move is best
- Extending the agent to recommend multiple good moves with pros and cons
Happy chess solving!