Skip to main content
Computer Actions let you drive a live session using screen-level primitives like click, type, drag, scroll, and screenshot. Use them when DOM automation is unreliable or impossible (canvas apps, remote desktops, non-standard controls, or stubborn overlays).

When To Use Computer Actions

  • You need full-screen interaction, not just DOM selectors.
  • The page uses canvas or custom rendering where selectors are unreliable.
  • You want a robust fallback when Playwright/Puppeteer actions fail.

How It Works

Every session includes a computerActionEndpoint. The SDKs wrap this for you with client.computerAction.* (Node.js) and client.computer_action.* (Python). You can pass either a session ID or the full session object, and passing the session object is recommended.
Coordinates are pixels relative to the top-left of the session screen. The default screen size is 1280x720. If you change screen in session creation, adjust your coordinates accordingly.

Quickstart

This example navigates using keyboard input, scrolls, and captures a screenshot.
import { Hyperbrowser } from "@hyperbrowser/sdk";
import { config } from "dotenv";

config();

const client = new Hyperbrowser({
  apiKey: process.env.HYPERBROWSER_API_KEY,
});

async function main() {
  const session = await client.sessions.create();

  try {
    // Focus address bar and navigate
    await client.computerAction.pressKeys(session, ["Control_L", "l"]);
    await client.computerAction.typeText(session, "https://example.com");
    await client.computerAction.pressKeys(session, ["Return"]);

    const shot = await client.computerAction.screenshot(session);

    console.log(shot.screenshot); // base64
  } finally {
    await client.sessions.stop(session.id);
  }
}

main().catch(console.error);

Action Examples

Click

const response = await client.computerAction.click(
  "session-id",  // or session object
  500,  // x coordinate
  300,  // y coordinate
  "left",  // button: "left" | "right" | "middle" | "back" | "forward" | "wheel"
  1,  // number of clicks
  false  // do not return screenshot (default: false)
);

console.log(response.success);
console.log(response.screenshot);  // base64 if requested

Type Text

const response = await client.computerAction.typeText(
  "session-id",
  "Hello, World!",
  false  // do not return screenshot (default: false)
);

Press Keys

Uses the xdotool format for keys: https://github.com/sickcodes/xdotool-gui/blob/master/key_list.csv
const response = await client.computerAction.pressKeys(
  "session-id",
  ["Control_L", "a"],  // Key combination
  false  // do not return screenshot (default: false)
);

Move Mouse

const response = await client.computerAction.moveMouse(
  "session-id",
  500,  // x
  300,  // y
  false  // do not return screenshot (default: false)
);

Drag

const response = await client.computerAction.drag(
  "session-id",
  [
    { x: 100, y: 100 },
    { x: 200, y: 200 },
    { x: 300, y: 300 },
  ],
  false  // do not return screenshot (default: false)
);

Scroll

const response = await client.computerAction.scroll(
  "session-id",
  500,  // x position
  300,  // y position
  0,  // scroll x delta
  100,  // scroll y delta
  false  // do not return screenshot (default: false)
);

Screenshot

const response = await client.computerAction.screenshot("session-id");
console.log(response.screenshot);  // base64