Robutler

Agent command interface

An app becomes agent-driveable by exposing a command surface: a set of named commands an agent can call to read and change the app, using the same code paths the app's own UI uses. This page covers the two halves of that surface and how agents invoke it.

  • Declaration lives in the manifest's interface.commands. This is what agents read to discover what they can call.
  • Runtime lives in host.commands.handle. This is where the app actually fields each call.

Keep the two in sync: a command an agent can see but the app does not handle errors, and a command the app handles but never declares is invisible to discovery.

Declaring the interface

The manifest's interface carries a plain-English description, the commands an agent may invoke, and the events the app emits.

interface WidgetInterface {
  description: string;
  commands?: Record<string, {
    description: string;
    args?: unknown;             // free-form JSON-schema-ish shape; agents read but do not validate
    returns?: unknown;
    streams?: readonly string[]; // event names this command emits while running
  }>;
  events?: Record<string, { description: string }>;
}

Example (a slide deck):

{
  "interface": {
    "description": "A pitch deck: fullscreen, navigable slides.",
    "commands": {
      "getState": { "description": "Return { index, total, slideId }.", "returns": "{ index, total, slideId }" },
      "next": { "description": "Advance one slide." },
      "goTo": { "description": "Jump to a slide.", "args": { "index": "number" } }
    },
    "events": {
      "deck.slide_view": { "description": "Fires when a slide becomes active; payload has { slide, slideIndex }." }
    }
  }
}

args and returns are descriptive shapes; the host does not validate payloads against them. They exist so an agent knows how to shape a call. streams lists the event names a long-running command emits (see streaming commands).

Handling commands at runtime

Register a handler for each declared command with host.commands.handle:

await host.ready();

host.commands.handle('goTo', ({ index }: { index: number }) => {
  showSlide(index);
  return { ok: true, index };
});

host.commands.handle('getState', () => ({ index, total: slides.length, slideId: slides[index].id }));

See host.commands for the handler signature, the ctx argument, and teardown.

Streaming commands

A command that does long work returns a runId immediately and emits progress events through host.emit. List those event names in the command's streams so the agent subscribes ahead of invoking:

{ "render": { "description": "Render the scene.", "streams": ["render.progress", "render.done"] } }
host.commands.handle('render', async ({ scene }) => {
  const runId = crypto.randomUUID();
  (async () => {
    for await (const frame of renderScene(scene)) {
      await host.emit({ type: 'render.progress', runId, frame });
    }
    await host.emit({ type: 'render.done', runId });
  })();
  return { runId };
});

Built-in commands

Every app answers a set of built-in commands without registering anything. Their names are prefixed with __:

CommandWhere it runsWhat it returns
__describeCommand busThe app's declared interface (commands + events). Handy for list -> __describe -> invoke ordering.
__getStateCommand busThe workspace item DTO snapshot (type, state, geometry). Read-only.
__screenshotIn the appA PNG data URL of the app DOM (the host.screenshot path).
__recordIn the appRecords a canvas or video app to a stored clip ({ durationMs, fps?, selector? }), returns a content ref.
__captureHost-sideRegion Capture of the app iframe: taint-free real-app footage. Needs the user to have started a capture session on the tab. PNG, or a WebM clip with durationMs.

__describe and __getState are answered bus-side and work without a live tab. __screenshot and __record run inside the app; __capture runs on the host. (A __reload builtin also reloads the app iframe to pick up bundle edits.)

How agents invoke

Agents discover the surface with workspace_widgets_list(workspaceId), which returns each app on the workspace and the commands it understands. An entry is flagged live when the app is open in one of the agent's tabs and therefore invokable now.

To run a command, the agent calls:

workspace_widgets_invoke(workspaceId, itemId, name, args?, timeoutMs?)

name is a declared command or a built-in. The call requires the app open in a browser tab; the result is the command's return value. See MCP build tools for the tooling, and What's ready for the current state of durable, multi-tab orchestration.

On this page