Computer Actions let you drive a live session using screen-level primitives like click, type, drag, scroll, and screenshot. Use them when DOM automation is unreliable or impossible (canvas apps, remote desktops, non-standard controls, or stubborn overlays).
When To Use Computer Actions
- You need full-screen interaction, not just DOM selectors.
- The page uses canvas or custom rendering where selectors are unreliable.
- You want a robust fallback when Playwright/Puppeteer actions fail.
How It Works
Every session includes a computerActionEndpoint. The SDKs wrap this for you with client.computerAction.* (Node.js) and client.computer_action.* (Python). You can pass either a session ID or the full session object, and passing the session object is recommended.
Coordinates are pixels relative to the top-left of the session screen. The default screen size is 1280x720. If you change screen in session creation, adjust your coordinates accordingly.
Quickstart
This example navigates using keyboard input, scrolls, and captures a screenshot.
import { Hyperbrowser } from "@hyperbrowser/sdk";
import { config } from "dotenv";
config();
const client = new Hyperbrowser({
apiKey: process.env.HYPERBROWSER_API_KEY,
});
async function main() {
const session = await client.sessions.create();
try {
// Focus address bar and navigate
await client.computerAction.pressKeys(session, ["Control_L", "l"]);
await client.computerAction.typeText(session, "https://example.com");
await client.computerAction.pressKeys(session, ["Return"]);
const shot = await client.computerAction.screenshot(session);
console.log(shot.screenshot); // base64
} finally {
await client.sessions.stop(session.id);
}
}
main().catch(console.error);
Action Examples
Click
const response = await client.computerAction.click(
"session-id", // or session object
500, // x coordinate
300, // y coordinate
"left", // button: "left" | "right" | "middle" | "back" | "forward" | "wheel"
1, // number of clicks
false // do not return screenshot (default: false)
);
console.log(response.success);
console.log(response.screenshot); // base64 if requested
Type Text
const response = await client.computerAction.typeText(
"session-id",
"Hello, World!",
false // do not return screenshot (default: false)
);
Press Keys
Uses the xdotool format for keys: https://github.com/sickcodes/xdotool-gui/blob/master/key_list.csv
const response = await client.computerAction.pressKeys(
"session-id",
["Control_L", "a"], // Key combination
false // do not return screenshot (default: false)
);
Move Mouse
const response = await client.computerAction.moveMouse(
"session-id",
500, // x
300, // y
false // do not return screenshot (default: false)
);
Drag
const response = await client.computerAction.drag(
"session-id",
[
{ x: 100, y: 100 },
{ x: 200, y: 200 },
{ x: 300, y: 300 },
],
false // do not return screenshot (default: false)
);
const response = await client.computerAction.scroll(
"session-id",
500, // x position
300, // y position
0, // scroll x delta
100, // scroll y delta
false // do not return screenshot (default: false)
);
Screenshot
const response = await client.computerAction.screenshot("session-id");
console.log(response.screenshot); // base64