> ## Documentation Index
> Fetch the complete documentation index at: https://hyperbrowser.ai/docs/llms.txt
> Use this file to discover all available pages before exploring further.

# Gemini Computer Use

> Automate browser tasks with Google's Gemini computer use capabilities

Gemini Computer Use allows gemini to directly interact with your computer to perform tasks much like a human. This capability allows gemini to move the cursor, click buttons, type text, and navigate the web, thereby automating complex, multi-step workflows.

Hyperbrowser makes it simple to run Gemini Computer Use tasks in managed cloud browsers. Start a task with a single API call, then poll for results or use our SDK's blocking methods that handle everything automatically.

You can view your Gemini Computer Use tasks in the [dashboard](https://app.hyperbrowser.ai/features/agents/gemini-computer-use).

## How It Works

You can use Gemini Computer Use in two ways:

1. **Start and Wait**: SDKs provide a `startAndWait()` method that blocks until the task completes and returns the result
2. **Async Pattern**: Start a task, get a job ID, then poll for status and results—useful for long-running tasks or when you want more control

## Installation

<CodeGroup>
  ```bash npm theme={null}
  npm install @hyperbrowser/sdk dotenv
  ```

  ```bash yarn theme={null}
  yarn add @hyperbrowser/sdk dotenv
  ```

  ```bash pip theme={null}
  pip install hyperbrowser python-dotenv
  ```

  ```bash uv theme={null}
  uv add hyperbrowser python-dotenv
  ```
</CodeGroup>

## Quick Start

The simplest way to run a Gemini Computer Use task is with the `startAndWait()` method, which handles everything for you:

<CodeGroup>
  ```typescript Node.js theme={null}
  import { Hyperbrowser } from "@hyperbrowser/sdk";
  import { config } from "dotenv";

  config();

  const client = new Hyperbrowser({
    apiKey: process.env.HYPERBROWSER_API_KEY,
  });

  async function main() {
    const result = await client.agents.geminiComputerUse.startAndWait({
      task: "Go to Hacker News and tell me the title of the top post",
      maxSteps: 20,
    });

    console.log(`Output:\n${result.data?.finalResult}`);
  }

  main().catch((err) => {
    console.error(`Error: ${err.message}`);
  });
  ```

  ```python Python theme={null}
  from hyperbrowser import Hyperbrowser
  from hyperbrowser.models import StartGeminiComputerUseTaskParams
  import os
  from dotenv import load_dotenv

  load_dotenv()

  client = Hyperbrowser(api_key=os.getenv("HYPERBROWSER_API_KEY"))

  result = client.agents.gemini_computer_use.start_and_wait(
      params=StartGeminiComputerUseTaskParams(
          task="Go to Hacker News and tell me the title of the top post",
          max_steps=20
      )
  )

  print(f"Output:\n{result.data.final_result}")
  ```

  ```bash cURL theme={null}
  # Start the task
  curl -X POST https://api.hyperbrowser.ai/api/task/gemini-computer-use \
    -H "Content-Type: application/json" \
    -H "x-api-key: YOUR_API_KEY" \
    -d '{
      "task": "Go to Hacker News and tell me the title of the top post",
      "maxSteps": 20
    }'

  # Response: {"jobId": "abc123", "liveUrl": "https://..."}

  # Check status
  curl https://api.hyperbrowser.ai/api/task/gemini-computer-use/abc123/status \
    -H "x-api-key: YOUR_API_KEY"

  # Get full results
  curl https://api.hyperbrowser.ai/api/task/gemini-computer-use/abc123 \
    -H "x-api-key: YOUR_API_KEY"
  ```
</CodeGroup>

## Async Pattern

When you need more control, use the async pattern to start a task and poll for results:

<CodeGroup>
  ```typescript Node.js theme={null}
  import { Hyperbrowser } from "@hyperbrowser/sdk";
  import { config } from "dotenv";

  config();

  const client = new Hyperbrowser({
    apiKey: process.env.HYPERBROWSER_API_KEY,
  });

  async function main() {
    try {
      // Start the task
      const task = await client.agents.geminiComputerUse.start({
        task: "What is the title of the first post on Hacker News today?",
        maxSteps: 20,
      });

      console.log(`Task started: ${task.jobId}`);
      console.log(`Watch live: ${task.liveUrl}`);

      // Poll for completion
      let result;
      while (true) {
        result = await client.agents.geminiComputerUse.getStatus(task.jobId);
        console.log(`Status: ${result.status}`);

        if (result.status === "completed" || result.status === "failed") {
          break;
        }

        await new Promise((resolve) => setTimeout(resolve, 5000)); // Wait 5s
      }

      const fullResult = await client.agents.geminiComputerUse.get(task.jobId);

      if (fullResult.status === "completed") {
        console.log("Result:", fullResult.data?.finalResult);
        console.log("Steps taken:", fullResult.data?.steps?.length);
      } else {
        console.error("Task failed:", fullResult.error);
      }
    } catch (err) {
      console.error(`Error: ${err.message}`);
    }
  }

  main();
  ```

  ```python Python theme={null}
  import asyncio
  from hyperbrowser import AsyncHyperbrowser
  from hyperbrowser.models import StartGeminiComputerUseTaskParams
  from dotenv import load_dotenv
  import os

  load_dotenv()

  client = AsyncHyperbrowser(api_key=os.getenv("HYPERBROWSER_API_KEY"))


  async def main():
      try:
          # Start the task
          task = await client.agents.gemini_computer_use.start(
              params=StartGeminiComputerUseTaskParams(
                  task="What is the title of the first post on Hacker News today?",
                  max_steps=20,
              )
          )

          print(f"Task started: {task.job_id}")
          print(f"Watch live: {task.live_url}")

          # Poll for completion
          while True:
              result = await client.agents.gemini_computer_use.get_status(task.job_id)
              print(f"Status: {result.status}")

              if result.status in ["completed", "failed"]:
                  break

              await asyncio.sleep(5)  # Wait 5s

          full_result = await client.agents.gemini_computer_use.get(task.job_id)

          if full_result.status == "completed":
              print("Result:", full_result.data.final_result)
              print(
                  "Steps taken:",
                  len(full_result.data.steps) if full_result.data.steps else 0,
              )
          else:
              print("Task failed:", full_result.error)
      except Exception as e:
          print(f"Error: {e}")


  if __name__ == "__main__":
      asyncio.run(main())
  ```
</CodeGroup>

## Stop a Running Task

Stop a task before it completes:

<CodeGroup>
  ```typescript Node.js theme={null}
  await client.agents.geminiComputerUse.stop("job-id");
  ```

  ```python Python theme={null}
  client.agents.gemini_computer_use.stop("job-id")
  ```

  ```bash cURL theme={null}
  curl -X PUT https://api.hyperbrowser.ai/api/task/gemini-computer-use/job-id/stop \
    -H "x-api-key: YOUR_API_KEY"
  ```
</CodeGroup>

## Parameters

<ParamField path="task" type="string" required>
  Natural language description of what you want Gemini to accomplish. Be specific for best results.
</ParamField>

<ParamField path="llm" type="string" default="gemini-3-flash-preview">
  Gemini model to use. Available options:

  * `"gemini-3-flash-preview"` - Gemini 3 Flash Preview (recommended)
  * `"gemini-2.5-computer-use-preview-10-2025"` - Gemini 2.5 Computer Use Preview
</ParamField>

<ParamField path="maxSteps" type="number" default="20">
  Maximum number of actions Gemini can take (clicks, typing, navigation, etc.). Increase for complex tasks.
</ParamField>

<ParamField path="maxFailures" type="number" default="3">
  Maximum consecutive failures before the task is aborted.
</ParamField>

<ParamField path="sessionId" type="string">
  ID of an existing browser session to reuse. Useful for multi-step workflows that need to maintain the same browser session.
</ParamField>

<ParamField path="keepBrowserOpen" type="boolean" default="false">
  Keep the browser session alive after task completion.
</ParamField>

<ParamField path="useComputerAction" type="boolean" default="false">
  Allow the agent to interact by executing actions on the actual computer not just within the page. Allows the agent to see the entire screen instead of just the page contents.
</ParamField>

<ParamField path="sessionOptions" type="object">
  [Session configuration](/api-reference/start-a-gemini-computer-use-task#body-session-options) (proxy, stealth, captcha solving, etc.). Only applies when creating a new session. If you provide an existing `sessionId`, these options are ignored.
</ParamField>

<ParamField path="useCustomApiKeys" type="boolean" default="false">
  Use your own Google API key instead of consuming Hyperbrowser credits for LLM calls. You will only be charged for browser usage.
</ParamField>

<ParamField path="apiKeys" type="object">
  API key for `google`. Required when `useCustomApiKeys` is `true`.

  ```typescript theme={null}
  {
    google: "..."
  }
  ```
</ParamField>

<Tip>
  The agent may not complete the task within the specified `maxSteps`. If that happens, try increasing the `maxSteps` parameter.

  Additionally, the browser session used by the AI Agent will time out based on your team's default Session Timeout settings or the session's `timeoutMinutes` parameter if provided. You can adjust the default Session Timeout in the [Settings page](https://app.hyperbrowser.ai/settings).
</Tip>

<Note>
  `useComputerAction` can often be better for completing tasks but may require more steps. It is especially useful when the agent needs to interact with elements on the page that might not be accessible by or visible to Playwright. Since it allows the agent to see and interact with the entire screen, it is much more powerful. Instead of executing actions with Playwright which can only interact with the page via CDP, computer actions allow the agent to interact directly with computer primitives (direct clicks, typing, scroll, etc.).
</Note>

## Reuse Browser Sessions

You can pass in an existing `sessionId` to the Gemini Computer Use task so that it can execute the task on an existing session. Also, if you want to keep the session open after executing the task, you can supply the `keepBrowserOpen` parameter.

<CodeGroup>
  ```typescript Node.js theme={null}
  import { Hyperbrowser } from "@hyperbrowser/sdk";
  import { config } from "dotenv";

  config();

  const client = new Hyperbrowser({
    apiKey: process.env.HYPERBROWSER_API_KEY,
  });

  const main = async () => {
    const session = await client.sessions.create();

    try {
      const result = await client.agents.geminiComputerUse.startAndWait({
        task: "What is the title of the first post on Hacker News today?",
        sessionId: session.id,
        keepBrowserOpen: true,
      });

      console.log(`Output:\n${result.data?.finalResult}`);

      const result2 = await client.agents.geminiComputerUse.startAndWait({
        task: "Tell me how many upvotes the first post has.",
        sessionId: session.id,
      });

      console.log(`\nOutput:\n${result2.data?.finalResult}`);
    } catch (err) {
      console.error(`Error: ${err}`);
    } finally {
      await client.sessions.stop(session.id);
    }
  };

  main().catch((err) => {
    console.error(`Error: ${err.message}`);
  });
  ```

  ```python Python theme={null}
  import os
  from hyperbrowser import Hyperbrowser
  from hyperbrowser.models import StartGeminiComputerUseTaskParams
  from dotenv import load_dotenv

  load_dotenv()

  client = Hyperbrowser(api_key=os.getenv("HYPERBROWSER_API_KEY"))


  def main():
      session = client.sessions.create()

      try:
          resp = client.agents.gemini_computer_use.start_and_wait(
              StartGeminiComputerUseTaskParams(
                  task="What is the title of the first post on Hacker News today?",
                  session_id=session.id,
                  keep_browser_open=True,
              )
          )

          print(f"Output:\n{resp.data.final_result}")

          resp2 = client.agents.gemini_computer_use.start_and_wait(
              StartGeminiComputerUseTaskParams(
                  task="Tell me how many upvotes the first post has.",
                  session_id=session.id,
              )
          )

          print(f"\nOutput:\n{resp2.data.final_result}")
      except Exception as e:
          print(f"Error: {e}")
      finally:
          client.sessions.stop(session.id)


  if __name__ == "__main__":
      try:
          main()
      except Exception as e:
          print(f"Error: {e}")
  ```
</CodeGroup>

<Warning>
  Always set `keepBrowserOpen: true` on tasks that you want to reuse the session from. Otherwise, the session will be automatically closed when the task completes.
</Warning>

## Using Your Own API Keys

Bring your own Google API key to avoid consuming Hyperbrowser credits for LLM calls. You'll still be charged for browser session usage, but save on token costs.

<CodeGroup>
  ```typescript Node.js theme={null}
  import { Hyperbrowser } from "@hyperbrowser/sdk";
  import { config } from "dotenv";

  config();

  const client = new Hyperbrowser({
    apiKey: process.env.HYPERBROWSER_API_KEY,
  });

  const main = async () => {
    const result = await client.agents.geminiComputerUse.startAndWait({
      task: "What is the title of the first post on Hacker News today?",
      useCustomApiKeys: true,
      apiKeys: {
        google: "<GOOGLE_API_KEY>",
      },
    });

    console.log(`Output:\n\n${result.data?.finalResult}`);
  };

  main().catch((err) => {
    console.error(`Error: ${err.message}`);
  });
  ```

  ```python Python theme={null}
  import os
  from hyperbrowser import Hyperbrowser
  from hyperbrowser.models import StartGeminiComputerUseTaskParams, GeminiComputerUseApiKeys
  from dotenv import load_dotenv

  load_dotenv()

  client = Hyperbrowser(api_key=os.getenv("HYPERBROWSER_API_KEY"))


  def main():
      resp = client.agents.gemini_computer_use.start_and_wait(
          StartGeminiComputerUseTaskParams(
              task="What is the title of the first post on HackerNews today?",
              use_custom_api_keys=True,
              api_keys=GeminiComputerUseApiKeys(
                  google="<GOOGLE_API_KEY>",
              )
          )
      )

      print(f"Output:\n\n{resp.data.final_result}")


  if __name__ == "__main__":
      try:
          main()
      except Exception as e:
          print(f"Error: {e}")
  ```

  ```bash cURL theme={null}
  curl -X POST https://api.hyperbrowser.ai/api/task/gemini-computer-use \
    -H "Content-Type: application/json" \
    -H "x-api-key: YOUR_HYPERBROWSER_API_KEY" \
    -d '{
      "task": "What is the title of the first post on Hacker News today?",
      "useCustomApiKeys": true,
      "apiKeys": {
        "google": "YOUR_GOOGLE_API_KEY"
      }
    }'
  ```
</CodeGroup>

## Session Configuration

Customize the browser session used by Gemini Computer Use with session options.

<CodeGroup>
  ```typescript Node.js theme={null}
  import { Hyperbrowser } from "@hyperbrowser/sdk";
  import { config } from "dotenv";

  config();

  const client = new Hyperbrowser({
    apiKey: process.env.HYPERBROWSER_API_KEY,
  });

  const main = async () => {
    const result = await client.agents.geminiComputerUse.startAndWait({
      task: "What is the title of the first post on Hacker News today?",
      sessionOptions: {
        acceptCookies: true,
      }
    });

    console.log(`Output:\n\n${result.data?.finalResult}`);
  };

  main().catch((err) => {
    console.error(`Error: ${err.message}`);
  });
  ```

  ```python Python theme={null}
  import os
  from hyperbrowser import Hyperbrowser
  from hyperbrowser.models import StartGeminiComputerUseTaskParams, CreateSessionParams
  from dotenv import load_dotenv

  load_dotenv()

  client = Hyperbrowser(api_key=os.getenv("HYPERBROWSER_API_KEY"))


  def main():
      resp = client.agents.gemini_computer_use.start_and_wait(
          StartGeminiComputerUseTaskParams(
              task="What is the title of the first post on Hacker News today?",
              session_options=CreateSessionParams(
                  accept_cookies=True,
              ),
          )
      )

      print(f"Output:\n\n{resp.data.final_result}")


  if __name__ == "__main__":
      try:
          main()
      except Exception as e:
          print(f"Error: {e}")
  ```

  ```bash cURL theme={null}
  curl -X POST https://api.hyperbrowser.ai/api/task/gemini-computer-use \
    -H 'Content-Type: application/json' \
    -H 'x-api-key: <YOUR_API_KEY>' \
    -d '{
        "task": "What is the title of the first post on Hacker News today?",
        "sessionOptions": {
            "acceptCookies": true
        }
    }'
  ```
</CodeGroup>

<Note>
  `sessionOptions` only applies when creating a new session. If you provide a `sessionId`, these options are ignored.
</Note>

<Warning>
  Proxies and CAPTCHA solving add latency to page navigation. Only enable them when necessary for your use case.
</Warning>

## Best Practices

<AccordionGroup>
  <Accordion title="Write clear, specific task descriptions">
    Be explicit about what you want Gemini to do. Instead of "check the website", say "go to example.com, find the pricing page, and extract the cost of the Enterprise plan".
  </Accordion>

  <Accordion title="Set appropriate maxSteps">
    Simple tasks need 10-20 steps. Complex multi-page workflows might need 50+ steps. Monitor failed tasks and adjust accordingly.
  </Accordion>

  <Accordion title="Reuse sessions for multi-step workflows">
    It is usually better to split up complex tasks into smaller, more manageable ones and execute them as separate agent calls on the same session.
  </Accordion>
</AccordionGroup>
