02/08/2026 - OpenClaw Architecture Deep Dive

# 02/08/2026 - OpenClaw Architecture Deep Dive Generated by Opus 4.6 with a one-shot prompt. > **Repository:** [github.com/openclaw/openclaw](https://github.com/openclaw/openclaw) > **Stars:** ~175k | **License:** Open Source | **Runtime:** Node >= 22 (TypeScript) > **Tagline:** "Your own personal AI assistant. Any OS. Any Platform. The lobster way." --- ## Table of Contents [TOC] --- ## 1. High-Level Architecture Overview OpenClaw is a **self-hosted, personal AI assistant** that you run on your own devices. It is not a cloud service — it is a local-first daemon that connects to the messaging platforms you already use (WhatsApp, Telegram, Slack, Discord, Signal, iMessage, Microsoft Teams, and many more) and routes inbound messages through an embedded AI agent runtime that can think, call tools, execute code, browse the web, control a visual canvas, and reply back through the same channel. ### The Core Mental Model ``` Messaging Channels Device Nodes (WhatsApp, Telegram, Slack, (macOS app, iOS, Android, Discord, Signal, iMessage, headless node hosts) Teams, Matrix, WebChat, ...) │ │ ▼ ▼ ┌─────────────────────────────────────────────────────────┐ │ GATEWAY │ │ (WebSocket Control Plane) │ │ ws://127.0.0.1:18789 │ │ │ │ ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌────────┐ │ │ │ Sessions │ │ Channels │ │ Router │ │ Config │ │ │ │ Manager │ │ Manager │ │ (bindings│ │ Store │ │ │ └──────────┘ └──────────┘ │ + agents)│ └────────┘ │ │ └──────────┘ │ │ ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌────────┐ │ │ │ Agent │ │ Tools │ │ Cron/ │ │ Plugin │ │ │ │ Runtime │ │ Engine │ │ Webhooks │ │ Loader │ │ │ │ (Pi-mono)│ │ │ │ │ │ │ │ │ └──────────┘ └──────────┘ └──────────┘ └────────┘ │ │ │ │ ┌──────────┐ ┌──────────┐ ┌──────────┐ │ │ │ Security │ │ Pairing │ │ Canvas │ │ │ │ + Auth │ │ Store │ │ Host │ │ │ └──────────┘ └──────────┘ └──────────┘ │ └─────────────────────────────────────────────────────────┘ │ ├── Pi Agent Runtime (embedded, RPC-style) ├── CLI surface (openclaw ...) ├── Control UI + WebChat (served from Gateway) ├── macOS menu bar app └── iOS / Android companion nodes ``` The **Gateway** is the single, long-lived daemon that owns everything: all messaging connections, the agent runtime, session state, tool dispatch, cron jobs, webhooks, the WebSocket API, and the web-based Control UI. Everything else — the CLI, the macOS app, the iOS/Android nodes, the WebChat UI — is a **client** that connects to the Gateway over WebSocket. ### Design Philosophy 1. **Local-first:** The Gateway runs on your machine (or a small Linux server). Your data never leaves your infrastructure unless you explicitly configure cloud channels. 2. **Single control plane:** One Gateway per host. It is the sole owner of messaging sessions (e.g., the WhatsApp Baileys session), which prevents conflicts. 3. **Channel-agnostic:** The agent doesn't know or care which messaging platform a message came from. Channels are adapters that normalize inbound/outbound messages. 4. **Tool-rich:** The agent has first-class access to shell execution, file I/O, browser automation, device cameras, screen recording, canvas rendering, cron scheduling, and cross-session agent-to-agent messaging. 5. **Security-conscious:** OpenClaw treats every inbound DM as untrusted input. DM pairing, allowlists, sandboxing, and tool policies are layered defenses. ### Technology Stack - **Language:** TypeScript (Node.js >= 22) - **Build:** pnpm monorepo with tsdown bundler - **Testing:** Vitest (unit, integration, e2e, gateway, extensions, live) - **Protocol:** WebSocket with JSON text frames; TypeBox schemas for type safety - **Channels:** Baileys (WhatsApp), grammY (Telegram), Bolt (Slack), discord.js (Discord), signal-cli, and many more via plugins - **Agent Core:** Derived from pi-mono (the Pi coding agent runtime) - **Companion Apps:** Swift (macOS/iOS), Kotlin (Android) --- ## 2. The Gateway — WebSocket Control Plane The Gateway is the beating heart of OpenClaw. It is a single Node.js process that multiplexes WebSocket and HTTP on one port (default `18789`), bound to loopback by default for security. ### What the Gateway Owns - **Provider connections** to all configured messaging channels (WhatsApp, Telegram, Slack, etc.) - A **typed WebSocket API** with request/response and server-push event patterns - **Session state** for all agents (stored as JSONL transcripts on disk) - **Agent execution** — the embedded pi-mono agent runtime runs within the Gateway process - **Tool dispatch** — tools like `exec`, `browser`, `canvas`, `cron` are registered and executed here - **Cron scheduling** and **webhook ingestion** - **Device pairing** and **node management** - **The Control UI** and **WebChat** — served as static assets directly from the Gateway's HTTP surface ### Wire Protocol The Gateway uses a **JSON-over-WebSocket** protocol. Communication follows a strict lifecycle: 1. **Handshake:** The first frame from any client **must** be a `connect` request. It includes device identity, auth credentials, and role (`operator` or `node`). 2. **Request/Response:** After handshake, clients send `{type: "req", id, method, params}` and receive `{type: "res", id, ok, payload|error}`. 3. **Server-Push Events:** The Gateway pushes events like `{type: "event", event, payload, seq?, stateVersion?}` for agent streaming, presence, health, heartbeats, and cron. 4. **Auth:** If `OPENCLAW_GATEWAY_TOKEN` or a password is configured, `connect.params.auth.token` must match or the socket is closed immediately. 5. **Idempotency:** Side-effecting methods (`send`, `agent`) require idempotency keys; the server maintains a short-lived dedupe cache. ### Connection Lifecycle ``` Client Gateway | | |---- req:connect ------------------>| |<------ res (ok) -------------------| (hello-ok carries snapshot: presence + health) | | |<------ event:presence -------------| |<------ event:tick -----------------| | | |------- req:agent ----------------->| |<------ res:agent -------------------| (ack: {runId, status:"accepted"}) |<------ event:agent ----------------| (streaming: assistant deltas, tool events) |<------ res:agent -------------------| (final: {runId, status, summary}) | | ``` ### Protocol Typing and Codegen The protocol is defined using **TypeBox** schemas, from which: - **JSON Schema** is generated for runtime validation - **Swift models** are generated for the macOS/iOS companion apps - Both clients and server share the same type definitions ### Pairing and Local Trust All WebSocket clients include a **device identity** on connect. New device IDs trigger a pairing approval flow. Local connects (from loopback) can be auto-approved, while non-local connects require explicit approval and challenge-nonce signing. This is separate from Gateway auth (token/password), which applies to all connections. ### Operations - **Start:** `openclaw gateway` (foreground) or install as a daemon (`openclaw onboard --install-daemon`) - **Health:** `health` method over WS, also included in the `hello-ok` handshake response - **Supervision:** launchd (macOS) or systemd (Linux) user service for auto-restart - **Doctor:** `openclaw doctor` checks for common misconfigurations and can auto-fix some issues --- ## 3. Agent Runtime & Agent Loop ### Pi-Mono Foundation OpenClaw's agent runtime is derived from **pi-mono**, the open-source Pi coding agent. However, OpenClaw owns its own session management, discovery, and tool wiring — it does not use Pi's session folders or agent settings. The agent runs in **embedded RPC mode** within the Gateway process. ### The Agent Workspace Every agent has a **workspace directory** (default: `~/.openclaw/workspace`) that serves as the agent's working directory for all tool execution. The workspace contains user-editable bootstrap files that are injected into the agent's context on the first turn of each session: | File | Purpose | |------|---------| | `AGENTS.md` | Operating instructions and persistent "memory" notes | | `SOUL.md` | Persona definition — boundaries, tone, personality | | `TOOLS.md` | User-maintained notes on tool usage and conventions | | `BOOTSTRAP.md` | One-time first-run ritual (deleted after completion) | | `IDENTITY.md` | Agent name, vibe, emoji | | `USER.md` | User profile and preferences | These files give the user fine-grained control over the agent's behavior without modifying any code. Large files are automatically trimmed with markers so prompts stay within token limits. ### The Agent Loop (End-to-End) An agent loop is a single, serialized execution that turns an inbound message into tool calls, reasoning, and a final reply. Here is the full lifecycle: 1. **Entry:** The `agent` RPC validates parameters, resolves the session (by key or ID), persists session metadata, and returns `{runId, acceptedAt}` immediately. 2. **Preparation:** - Resolve model + thinking/verbose defaults - Load skills snapshot - Acquire a per-session write lock - Resolve and create workspace; sandboxed runs may redirect to an isolated workspace - Load bootstrap/context files 3. **Prompt Assembly:** - Build system prompt from: base prompt + skills prompt + bootstrap context + per-run overrides - Enforce model-specific limits and compaction reserve tokens 4. **Execution (`runEmbeddedPiAgent`):** - Serialize the run through per-session + global queues (prevents races) - Resolve model + auth profile and build the Pi session - Subscribe to pi-mono events and stream assistant/tool deltas - Enforce timeout — abort if exceeded (default: 600 seconds) - The agent can: think, call tools (shell, files, browser, canvas, etc.), and produce text output 5. **Streaming:** - `subscribeEmbeddedPiSession` bridges pi-mono events to OpenClaw's event system: - Tool events → `stream: "tool"` - Assistant deltas → `stream: "assistant"` - Lifecycle events → `stream: "lifecycle"` (`phase: "start" | "end" | "error"`) - Block streaming can emit partial replies as the agent writes 6. **Reply Shaping:** - Assemble final payloads from assistant text, tool summaries, and error text - Filter `NO_REPLY` tokens (silent completions) - Deduplicate messaging tool sends - If no renderable payloads remain and a tool errored, emit a fallback error reply 7. **Compaction & Retries:** - Auto-compaction triggers when context approaches limits - On retry, in-memory buffers and tool summaries are reset ### Steering While Streaming When queue mode is `steer`, inbound messages during an active run are injected into the current execution. The queue is checked **after each tool call** — if a queued message is present, remaining tool calls are skipped, and the queued message is injected before the next assistant response. This allows real-time course correction of long-running agent tasks. ### Hook Points OpenClaw provides two hook systems for intercepting the agent lifecycle: - **Gateway hooks:** Event-driven scripts for commands and lifecycle events (`agent:bootstrap`, `/new`, `/reset`, etc.) - **Plugin hooks:** Extension points inside the agent/tool lifecycle: `before_agent_start`, `agent_end`, `before_tool_call`, `after_tool_call`, `message_received`, `message_sending`, `session_start`, `session_end`, `gateway_start`, `gateway_stop`, and more. --- ## 4. Session Management & Memory ### Session Model OpenClaw's session model is central to how conversations are tracked, isolated, and persisted. **Core principle:** One direct-chat session per agent is primary. All DMs collapse to `agent:<agentId>:<mainKey>` (default key: `main`), while group/channel chats get isolated session keys. ### DM Scoping The `session.dmScope` setting controls how direct messages are grouped: | Scope | Behavior | |-------|----------| | `main` (default) | All DMs share the main session for continuity across devices/channels | | `per-peer` | Isolate by sender ID across channels | | `per-channel-peer` | Isolate by channel + sender (recommended for multi-user inboxes) | | `per-account-channel-peer` | Isolate by account + channel + sender (for multi-account inboxes) | **Identity links** allow the same person to share a session across channels by mapping provider-prefixed peer IDs (e.g., `telegram:123456`) to a canonical identity. ### Session Key Mapping | Source | Key Pattern | |--------|-------------| | Direct chat (main) | `agent:<agentId>:<mainKey>` | | Direct chat (per-peer) | `agent:<agentId>:dm:<peerId>` | | Direct chat (per-channel-peer) | `agent:<agentId>:<channel>:dm:<peerId>` | | Group chat | `agent:<agentId>:<channel>:group:<id>` | | Telegram forum topic | `...:topic:<threadId>` appended to group key | | Cron job | `cron:<job.id>` | | Webhook | `hook:<uuid>` | | Node run | `node-<nodeId>` | ### Where State Lives All session state is **owned by the Gateway**: - **Store file:** `~/.openclaw/agents/<agentId>/sessions/sessions.json` — a map of `sessionKey → { sessionId, updatedAt, ... }` - **Transcripts:** `~/.openclaw/agents/<agentId>/sessions/<SessionId>.jsonl` — full conversation history in JSON Lines format - UI clients must query the Gateway for session lists and token counts; they do not parse transcripts directly ### Session Lifecycle - **Daily reset** (default): Sessions expire at 4:00 AM local time. A session is stale once its last update predates the most recent daily reset. - **Idle reset** (optional): A sliding idle window in minutes. When both daily and idle resets are configured, whichever expires first forces a new session. - **Per-type overrides:** Different policies for `dm`, `group`, and `thread` session types. - **Per-channel overrides:** Different policies per channel. - **Manual reset:** `/new` or `/reset` commands start a fresh session. `/new <model>` accepts a model alias to switch models. ### Session Pruning OpenClaw trims **old tool results** from in-memory context before LLM calls. This does not rewrite JSONL history — it's a view-layer optimization that keeps the context window focused on recent, relevant information. ### Compaction When a session nears auto-compaction thresholds, OpenClaw can run a **silent memory flush** turn that reminds the model to write durable notes to disk before the context is summarized and compressed. ### Memory OpenClaw supports pluggable memory backends: - **memory-core**: Built-in memory search plugin (default) - **memory-lancedb**: Long-term memory with auto-recall and capture using LanceDB for vector storage --- ## 5. Chat Channels — Multi-Platform Messaging OpenClaw's channel system is one of its most impressive features — it supports an extraordinary number of messaging platforms through a unified adapter architecture. ### Supported Channels #### Built-in (Core) | Channel | Library/Protocol | Notes | |---------|-----------------|-------| | **WhatsApp** | Baileys | QR pairing; most popular channel | | **Telegram** | grammY (Bot API) | Simple bot token setup; supports groups and forum topics | | **Discord** | discord.js | Bot API + Gateway; servers, channels, DMs, threads | | **Slack** | Bolt SDK | Workspace apps; Socket Mode | | **Signal** | signal-cli | Privacy-focused; requires external signal-cli binary | | **iMessage (legacy)** | imsg CLI | macOS-only; deprecated in favor of BlueBubbles | | **WebChat** | Gateway WS | Built-in; no separate configuration needed | #### Plugin/Extension Channels | Channel | Notes | |---------|-------| | **BlueBubbles** | Recommended iMessage integration via REST API | | **Google Chat** | Google Chat API webhook app | | **Microsoft Teams** | Bot Framework; enterprise support | | **Matrix** | Matrix protocol | | **Feishu/Lark** | WebSocket bot | | **LINE** | LINE Messaging API | | **Mattermost** | Bot API + WebSocket | | **Nextcloud Talk** | Self-hosted chat | | **Nostr** | Decentralized DMs via NIP-04 | | **Tlon** | Urbit-based messenger | | **Twitch** | IRC connection | | **Zalo** | Vietnam's popular messenger (OA Bot API) | | **Zalo Personal** | Personal account via QR login | ### Channel Architecture Every channel implements a common adapter interface: 1. **Config:** `listAccountIds()` + `resolveAccount()` — enumerate and resolve accounts 2. **Capabilities:** Declare supported chat types (direct, group), media support, threading, etc. 3. **Outbound:** `sendText()` + optional `sendMedia()`, `sendReaction()`, `editMessage()`, etc. 4. **Inbound:** Normalize incoming messages into a common `InboundMessage` format with sender, content, media, and routing metadata 5. **Security:** DM policy enforcement (pairing/allowlist/open/disabled) 6. **Optional adapters:** Setup wizard, mentions, threading, streaming, native commands ### Multi-Account Support Channels like WhatsApp support **multiple accounts** via `accountId`. Each account can be independently configured and routed to different agents: ```json5 { channels: { whatsapp: { accounts: { personal: { /* config */ }, business: { /* config */ }, }, }, }, } ``` ### Group Behavior - Group chats get isolated session keys (`agent:<agentId>:<channel>:group:<id>`) - **Mention gating:** The bot only responds when @mentioned (configurable per channel) - **Group allowlists:** Restrict which groups the bot will participate in - **Group policy:** `allowlist` (default) or `open` (per-group sender filtering available) - **Broadcast groups:** Send one message to multiple groups simultaneously ### DM Safety All DM-capable channels enforce a DM policy **before** message processing: - `pairing` (default): Unknown senders get a short pairing code; bot ignores the message until approved - `allowlist`: Unknown senders are silently blocked - `open`: Anyone can DM (requires explicit `"*"` in allowlist — deliberate opt-in) - `disabled`: Ignore all DMs --- ## 6. Tools System OpenClaw exposes a rich set of **first-class agent tools** — these are typed function calls that the agent can invoke directly, with no shell scripting required. ### Tool Inventory #### Filesystem & Runtime | Tool | Description | |------|-------------| | `exec` | Run shell commands in the workspace. Supports background execution, timeouts, PTY mode, elevated mode, and routing to sandbox/gateway/node hosts. | | `process` | Manage background exec sessions (list, poll, log, write, kill, clear). | | `read` / `write` / `edit` | File I/O within the workspace. | | `apply_patch` | Apply structured multi-hunk patches across files (experimental, OpenAI models only). | #### Web & Search | Tool | Description | |------|-------------| | `web_search` | Brave Search API integration with caching. | | `web_fetch` | Fetch and extract readable content from URLs (HTML → markdown). | | `browser` | Full browser automation via OpenClaw-managed Chrome/Chromium (CDP). | #### Messaging & Coordination | Tool | Description | |------|-------------| | `message` | Send messages and channel actions across all platforms (send, react, edit, delete, pin, thread, search, polls, member/role management). | | `sessions_list` | Discover active sessions and their metadata. | | `sessions_history` | Fetch transcript logs for any session. | | `sessions_send` | Message another session with optional reply-back ping-pong. | | `sessions_spawn` | Start a sub-agent run (non-blocking). | | `session_status` | Check current session status and model info. | | `agents_list` | List agent IDs available for sub-agent targeting. | #### Device & Media | Tool | Description | |------|-------------| | `nodes` | Discover and target paired nodes; send notifications; capture camera/screen; get location. | | `canvas` | Drive the node Canvas — present HTML, evaluate JS, A2UI push, snapshot. | | `image` | Analyze images with the configured image model. | #### Automation | Tool | Description | |------|-------------| | `cron` | Manage Gateway cron jobs and wakeups (add, update, remove, run, list). | | `gateway` | Restart, apply config updates, or trigger updates to the running Gateway. | ### Tool Policy System OpenClaw provides a sophisticated, layered tool policy system: 1. **Tool profiles** set a base allowlist: `minimal`, `coding`, `messaging`, or `full` 2. **Provider-specific policies** can further restrict tools for specific model providers or individual models 3. **Global allow/deny lists** (`tools.allow` / `tools.deny`) — deny always wins 4. **Per-agent overrides** — each agent can have its own tool profile and allow/deny 5. **Tool groups** — shorthand references like `group:fs`, `group:runtime`, `group:sessions`, `group:web`, `group:ui`, `group:automation`, `group:messaging`, `group:nodes` Example: A support agent with messaging-only tools plus Slack and Discord actions: ```json5 { tools: { profile: "messaging", allow: ["slack", "discord"], }, } ``` ### How Tools Are Presented to the Agent Tools are exposed in **two parallel channels**: 1. **System prompt text:** A human-readable list with usage guidance 2. **Tool schema:** Structured function definitions sent to the model API Both channels must include a tool for the model to be able to call it. --- ## 7. Nodes — Device Peripherals Nodes are **companion devices** that extend the Gateway's reach to physical hardware. A node connects to the same Gateway WebSocket (port 18789) with `role: "node"` and exposes a command surface for device-local actions. ### What Nodes Can Do | Command | Description | |---------|-------------| | `canvas.present` / `canvas.hide` / `canvas.eval` / `canvas.snapshot` | Control and capture the on-device visual canvas | | `canvas.a2ui_push` / `canvas.a2ui_reset` | Push A2UI components to the canvas | | `camera.snap` / `camera.clip` | Take photos or record video clips (front/back camera) | | `screen.record` | Record the device screen (up to 60 seconds) | | `location.get` | Get GPS coordinates (lat/lon/accuracy/timestamp) | | `system.run` | Execute shell commands on the node (with exec approvals) | | `system.notify` | Send OS notifications | | `sms.send` | Send SMS (Android only, with permission) | ### Node Types | Type | Description | |------|-------------| | **macOS app (node mode)** | The menu bar app connects as a node, exposing canvas/camera/system commands | | **iOS node** | Canvas, Voice Wake, Talk Mode, camera, screen recording, Bonjour pairing | | **Android node** | Canvas, Talk Mode, camera, screen recording, optional SMS | | **Headless node host** | Cross-platform (Linux/Windows/macOS); exposes `system.run` / `system.which` for remote execution | ### What Runs Where This is a critical architectural distinction: - **Gateway host:** Receives messages, runs the model, routes tool calls. `exec` runs here by default. - **Node host:** Executes device-local actions (`system.run`, camera, screen, location, notifications) via `node.invoke`. In short: **the agent's brain lives on the Gateway; device capabilities live on nodes.** ### Pairing & Security Nodes use device pairing — they present a device identity during WebSocket connect, and the Gateway creates a pairing request for approval. Exec approvals on nodes are enforced locally via `~/.openclaw/exec-approvals.json`, with three security levels: - `deny`: Block all execution - `allowlist`: Only allow pre-approved commands - `full`: Allow any command (dangerous) --- ## 8. Multi-Agent Routing OpenClaw can host **multiple fully isolated agents** within a single Gateway process. Each agent is a complete "brain" with its own workspace, state directory, auth profiles, session store, and persona. ### What Defines an Agent | Component | Location | |-----------|----------| | Workspace | `~/.openclaw/workspace-<agentId>` | | State directory | `~/.openclaw/agents/<agentId>/agent` | | Session store | `~/.openclaw/agents/<agentId>/sessions` | | Auth profiles | `~/.openclaw/agents/<agentId>/agent/auth-profiles.json` | | Skills | Per-agent via `<workspace>/skills/`, shared via `~/.openclaw/skills` | ### Routing via Bindings Inbound messages are routed to agents via **bindings** — declarative rules that match on channel, account, peer (DM/group/channel ID), guild (Discord), or team (Slack). Most-specific match wins: 1. `peer` match (exact DM/group/channel ID) — highest priority 2. `guildId` (Discord) 3. `teamId` (Slack) 4. `accountId` match for a channel 5. Channel-level match (`accountId: "*"`) 6. Fallback to default agent ### Use Cases **Two WhatsApp numbers, two agents:** ```json5 { agents: { list: [ { id: "home", workspace: "~/.openclaw/workspace-home" }, { id: "work", workspace: "~/.openclaw/workspace-work" }, ], }, bindings: [ { agentId: "home", match: { channel: "whatsapp", accountId: "personal" } }, { agentId: "work", match: { channel: "whatsapp", accountId: "biz" } }, ], } ``` **One WhatsApp number, different people get different agents:** ```json5 { bindings: [ { agentId: "alex", match: { channel: "whatsapp", peer: { kind: "dm", id: "+15551230001" } } }, { agentId: "mia", match: { channel: "whatsapp", peer: { kind: "dm", id: "+15551230002" } } }, ], } ``` **WhatsApp for casual chat, Telegram for deep work (different models):** ```json5 { agents: { list: [ { id: "chat", model: "anthropic/claude-sonnet-4-5" }, { id: "opus", model: "anthropic/claude-opus-4-6" }, ], }, bindings: [ { agentId: "chat", match: { channel: "whatsapp" } }, { agentId: "opus", match: { channel: "telegram" } }, ], } ``` ### Per-Agent Sandboxing & Tool Policy Each agent can have its own sandbox configuration and tool restrictions. A personal agent might have full host access, while a family agent in a shared group might be sandboxed with read-only tools: ```json5 { agents: { list: [ { id: "personal", sandbox: { mode: "off" } }, { id: "family", sandbox: { mode: "all", scope: "agent" }, tools: { allow: ["read"], deny: ["exec", "write", "edit"] }, }, ], }, } ``` --- ## 9. Skills & ClawHub Registry ### What Are Skills? Skills are **AgentSkills-compatible** instruction bundles that teach the agent how to use specific tools and workflows. Each skill is a directory containing a `SKILL.md` file with YAML frontmatter and markdown instructions. ### Skill Locations (Precedence) | Location | Precedence | Description | |----------|------------|-------------| | Workspace skills (`<workspace>/skills/`) | Highest | Per-agent, user-owned | | Managed/local skills (`~/.openclaw/skills/`) | Medium | Shared across all agents on the machine | | Bundled skills (shipped with install) | Lowest | Come with the npm package or app | Name conflicts are resolved by precedence — workspace wins. ### Skill Format ```markdown --- name: nano-banana-pro description: Generate or edit images via Gemini 3 Pro Image metadata: {"openclaw":{"requires": {"bins": ["uv"], "env": ["GEMINI_API_KEY"]}, "primaryEnv": "GEMINI_API_KEY"}} --- Instructions for the agent on how to use this skill... Use `{baseDir}` to reference the skill folder path. ``` ### Load-Time Gating Skills are **filtered at load time** using metadata gates: - `requires.bins`: Required binaries on PATH - `requires.anyBins`: At least one must exist - `requires.env`: Required environment variables (or config-provided) - `requires.config`: Required truthy config paths in `openclaw.json` - `os`: Platform filter (`darwin`, `linux`, `win32`) - `always: true`: Skip all gates ### Environment Injection When an agent run starts, OpenClaw reads skill metadata, applies any configured environment variables or API keys to `process.env`, builds the system prompt with eligible skills, and restores the original environment after the run ends. This is scoped per-run, not global. ### ClawHub **ClawHub** ([clawhub.com](https://clawhub.com)) is the public skills registry. Users can discover, install, update, and sync skills: ```bash clawhub install <skill-slug> clawhub update --all clawhub sync --all ``` ### Token Impact Skills cost tokens in the system prompt. The overhead formula is: ``` total_chars = 195 + Σ(97 + len(name) + len(description) + len(location)) ``` At ~4 chars/token, each skill costs roughly 25+ tokens plus field lengths. --- ## 10. Plugin / Extension System ### Architecture Plugins are **TypeScript modules** loaded at runtime via jiti (just-in-time TypeScript execution). They run **in-process** with the Gateway, giving them full access to the Gateway API. ### What Plugins Can Register | Capability | Example | |-----------|---------| | Gateway RPC methods | `voicecall.start`, `voicecall.status` | | Gateway HTTP handlers | Custom webhook endpoints | | Agent tools | `voice_call` tool for the agent to use | | CLI commands | `openclaw voicecall start` | | Background services | Long-running processes within the Gateway | | Auto-reply commands | Slash commands that execute without the AI agent | | Skills | Bundled skill folders shipped with the plugin | | Hooks | Event-driven automation bundled with the plugin | | Messaging channels | Full channel adapters (see Channel Plugins below) | | Model provider auth | OAuth, API key, device code flows for model providers | ### Plugin Discovery & Precedence 1. Config paths (`plugins.load.paths`) 2. Workspace extensions (`<workspace>/.openclaw/extensions/`) 3. Global extensions (`~/.openclaw/extensions/`) 4. Bundled extensions (shipped with OpenClaw, **disabled by default**) Each plugin must include a `openclaw.plugin.json` manifest. ### Plugin Configuration ```json5 { plugins: { enabled: true, allow: ["voice-call"], deny: ["untrusted-plugin"], entries: { "voice-call": { enabled: true, config: { provider: "twilio" }, }, }, slots: { memory: "memory-core", // exclusive slot — only one memory plugin active }, }, } ``` ### Channel Plugins Plugins can register **full messaging channel adapters** that behave identically to built-in channels. The channel config lives under `channels.<id>` and the plugin provides: - Config resolution (account listing, account resolution) - Capabilities declaration - Outbound delivery (send text, media, reactions, etc.) - Inbound message normalization - Optional: setup wizard, security, status, mentions, threading, streaming, native commands ### Distribution Plugins are distributed as npm packages under `@openclaw/*`. Installation is straightforward: ```bash openclaw plugins install @openclaw/voice-call ``` The installer uses `npm pack`, extracts into `~/.openclaw/extensions/<id>/`, and enables the plugin in config. --- ## 11. Streaming, Chunking & Message Delivery OpenClaw has a sophisticated multi-layer streaming system for delivering agent responses to messaging channels. ### Two Streaming Layers 1. **Block streaming (all channels):** Emit completed blocks of text as the assistant writes. These are normal channel messages — not token deltas. 2. **Token-ish streaming (Telegram only):** Update a draft bubble with partial text during generation; the final message replaces it. There is **no real token streaming** to external channel messages. Telegram draft streaming is the only partial-stream surface. ### Block Streaming Pipeline ``` Model output └─ text_delta events ├─ (blockStreamingBreak=text_end) │ └─ chunker emits blocks as buffer grows └─ (blockStreamingBreak=message_end) └─ chunker flushes at message_end └─ channel send (block replies) ``` ### Chunking Algorithm The `EmbeddedBlockChunker` implements intelligent text splitting: - **Low bound:** Don't emit until buffer >= `minChars` - **High bound:** Prefer splits before `maxChars`; hard-break at `maxChars` if forced - **Break preference:** paragraph → newline → sentence → whitespace → hard break - **Code fence safety:** Never split inside code fences; when forced, close and reopen the fence to keep Markdown valid - **Channel caps:** `maxChars` is clamped to per-channel `textChunkLimit` ### Coalescing To reduce "single-line spam," consecutive block chunks can be merged before sending: - Wait for idle gaps (`idleMs`) before flushing - Buffers are capped by `maxChars` and flush if exceeded - `minChars` prevents tiny fragments from sending - Different defaults per channel (e.g., Signal/Slack/Discord default to 1500 char minimum) ### Human-Like Pacing Optional randomized pauses between block replies (800–2500ms in "natural" mode) make multi-bubble responses feel more human. ### Telegram Draft Streaming Telegram is special — it supports `sendMessageDraft` for updating a draft bubble in real-time: - `partial` mode: Update with the latest stream text - `block` mode: Update in chunked blocks - `off`: No draft streaming When draft streaming is active, block streaming is disabled for that reply to avoid double-streaming. --- ## 12. Command Queue & Concurrency ### Why a Queue? Multiple inbound messages arriving close together can cause expensive agent runs to collide — competing for session files, logs, and upstream rate limits. OpenClaw serializes inbound runs through a lane-aware FIFO queue. ### How It Works - A **lane-aware FIFO queue** drains each lane with configurable concurrency - **Per-session lanes** (`session:<key>`) guarantee only one active run per session - Each session run is also queued into a **global lane** (default `main`, concurrency cap via `agents.defaults.maxConcurrent`) - Additional lanes (`cron`, `subagent`) run in parallel without blocking inbound replies - Typing indicators fire immediately on enqueue for good UX ### Queue Modes | Mode | Behavior | |------|----------| | `collect` (default) | Coalesce all queued messages into a single followup turn | | `steer` | Inject immediately into the current run (cancels pending tool calls at next boundary) | | `followup` | Enqueue for the next agent turn after the current run ends | | `steer-backlog` | Steer now AND preserve for a followup turn | | `interrupt` (legacy) | Abort active run, run newest message | ### Queue Options - `debounceMs` (default 1000): Wait for quiet before starting a followup turn - `cap` (default 20): Max queued messages per session - `drop` (`old` / `new` / `summarize`): Overflow policy. `summarize` keeps a bullet list of dropped messages. --- ## 13. Security Model ### Threat Model Running an AI agent with shell access is inherently risky. OpenClaw acknowledges this directly and takes a layered defense approach: **The agent can:** - Execute arbitrary shell commands - Read/write files - Access network services - Send messages to anyone (if given channel access) **Attackers can:** - Send crafted messages to manipulate the agent (prompt injection) - Social-engineer access to data - Probe for infrastructure details ### Defense Layers #### Layer 1: Identity (Who Can Talk to the Bot?) - **DM pairing** (default): Unknown senders receive a pairing code; bot ignores messages until approved - **Allowlists**: Explicit lists of allowed senders per channel - **Group allowlists**: Restrict which groups the bot participates in - **Mention gating**: Bot only responds when @mentioned in groups #### Layer 2: Scope (Where Can the Bot Act?) - **Tool policies**: Allow/deny lists per agent, per channel, per provider - **Sandboxing**: Non-main sessions can run in per-session Docker containers - **Exec approvals**: Node-level allowlists for shell commands - **Elevated mode**: Gated, per-session toggle for host-level shell access #### Layer 3: Model (Assume Manipulation) - Recommend modern, instruction-hardened models (Opus 4.6) - Design so manipulation has limited blast radius - System prompt guardrails are soft guidance only — hard enforcement comes from tool policy ### Sandboxing ```json5 { agents: { defaults: { sandbox: { mode: "non-main", // sandbox groups/channels but not your main DM }, }, }, } ``` Sandbox defaults: - **Allow:** bash, process, read, write, edit, sessions_list, sessions_history, sessions_send, sessions_spawn - **Deny:** browser, canvas, nodes, cron, discord, gateway ### Security Audit `openclaw security audit` checks for: - Inbound access exposure (DM policies, group policies, allowlists) - Tool blast radius (elevated tools + open rooms) - Network exposure (Gateway bind/auth, Tailscale, weak tokens) - Browser control exposure - Local disk hygiene (permissions, symlinks) - Plugin trust - Model hygiene `--fix` can automatically tighten common misconfigurations. ### Prompt Injection OpenClaw is deeply aware that prompt injection is **not solved**. Even with locked-down DMs, injection can happen via any untrusted content the bot reads (web pages, emails, attachments, pasted code). Mitigations: - Use a read-only "reader agent" to summarize untrusted content - Keep `web_search` / `web_fetch` / `browser` off for tool-enabled agents unless needed - Enable sandboxing and strict tool allowlists - Keep secrets out of prompts ### The Trust Hierarchy ``` You (the operator) — full trust └─ Gateway config — defines boundaries └─ Allowlisted peers — can trigger the bot └─ The AI model — limited by tool policy └─ Untrusted content — zero trust ``` --- ## 14. Automation — Cron, Webhooks, Hooks ### Cron Jobs OpenClaw has a built-in cron scheduler that can trigger agent runs on a schedule: - Each cron job gets an isolated session (`cron:<job.id>`) - Jobs mint a fresh `sessionId` per run (no idle reuse) - The agent can create, update, list, and remove cron jobs via the `cron` tool - Cron jobs run in a separate queue lane so they don't block inbound replies ### Webhooks External systems can trigger agent runs via HTTP webhooks: - Each webhook gets an isolated session (`hook:<uuid>`) - Webhook payloads are passed to the agent as context - Secured via `hooks.token` ### Hooks (Event-Driven Scripts) Hooks are event-driven scripts that intercept Gateway lifecycle events: - `agent:bootstrap`: Runs while building bootstrap files before the system prompt is finalized - Command hooks: `/new`, `/reset`, `/stop`, and other command events - Hooks can be bundled with plugins ### Gmail Pub/Sub OpenClaw can subscribe to Gmail notifications via Google Pub/Sub, triggering agent runs when new emails arrive. ### Polls Polling mechanisms for external data sources that can trigger agent actions. --- ## 15. Companion Apps — macOS, iOS, Android ### macOS App (OpenClaw.app) The macOS menu bar app is a **full control plane** for the Gateway: - Gateway health monitoring and management - **Voice Wake** — always-on keyword detection for hands-free activation - **Push-to-talk** overlay - **WebChat** — built-in chat interface - Debug tools - **Remote gateway control** over SSH tunnels - **Node mode** — the Mac itself becomes a node, exposing canvas/camera/system commands Signed builds are required for macOS permissions (TCC) to persist across rebuilds. ### iOS Node The iOS companion pairs as a node via the Bridge: - Canvas surface for visual output - Voice Wake for trigger word detection - Talk Mode for continuous voice conversation - Camera capture (snap and clip) - Screen recording - Bonjour/mDNS auto-discovery for pairing ### Android Node Similar to iOS, the Android node exposes: - Canvas surface - Talk Mode - Camera capture - Screen recording - Optional SMS sending (with permission) ### Key Architecture Point The companion apps are **not** gateways themselves. They are thin clients and device-capability providers. All intelligence runs on the Gateway; the apps provide: 1. A UI surface (WebChat, Canvas) 2. Device hardware access (camera, microphone, screen, GPS, notifications) 3. Voice interfaces (Wake, Talk) --- ## 16. Browser Control OpenClaw manages a dedicated Chrome/Chromium instance for web automation, controlled via CDP (Chrome DevTools Protocol). ### Core Actions | Action | Description | |--------|-------------| | `status` / `start` / `stop` | Lifecycle management | | `tabs` / `open` / `focus` / `close` | Tab management | | `snapshot` (AI or ARIA) | Get page structure for agent understanding | | `screenshot` | Capture visual state | | `act` | UI interactions: click, type, press, hover, drag, select, fill, resize, wait, evaluate | | `navigate` / `console` / `pdf` / `upload` / `dialog` | Specialized actions | ### Multi-Profile Support The browser tool supports multiple profiles: - Each profile gets an auto-allocated CDP port (18800-18899, ~100 profiles max) - Profiles have isolated user data directories - Remote profiles are attach-only (no start/stop) - `browser.defaultProfile` sets the default (defaults to "chrome") ### Snapshots Snapshots are the primary way the agent "sees" web pages: - **AI snapshots** return a structured representation with numeric refs (e.g., `12`) - **ARIA/role snapshots** return the accessibility tree with refs (e.g., `e12`) - The agent uses refs from snapshots to target actions ### Safety - Browser control is `enabled: true` by default but can be disabled - In sandboxed environments, the browser can run within the sandbox - Browser sessions inherit the Gateway's auth context - `browser.loginUrl` supports pre-authenticated browser sessions --- ## 17. Canvas & A2UI ### Canvas The Canvas is an **agent-driven visual workspace** — a WebView surface on companion devices that the agent can control programmatically. | Action | Description | |--------|-------------| | `present` | Show the canvas with a URL or local file | | `hide` | Dismiss the canvas | | `navigate` | Navigate to a new URL | | `eval` | Execute JavaScript in the canvas context | | `snapshot` | Capture the canvas as an image | | `a2ui_push` | Push A2UI components | | `a2ui_reset` | Clear A2UI state | ### A2UI (Agent-to-UI) A2UI is a structured protocol for the agent to build UI components without writing raw HTML. The agent pushes JSONL payloads describing UI elements, and the Canvas renders them. This is currently v0.8 (v0.9/createSurface is not yet supported). The Canvas uses the Gateway's `node.invoke` under the hood — if no node is specified, it picks a default (single connected node or local mac node). --- ## 18. Voice Wake & Talk Mode ### Voice Wake Voice Wake provides **always-on keyword detection** for hands-free agent activation on macOS, iOS, and Android. When the wake word is detected, OpenClaw transitions into Talk Mode. ### Talk Mode Talk Mode enables **continuous voice conversation** with the agent: - Speech-to-text for user input - Text-to-speech for agent responses (via ElevenLabs or OpenAI TTS) - Continuous listening with turn detection - Available on macOS (menu bar overlay), iOS, and Android ### Configuration - TTS providers: ElevenLabs (recommended), OpenAI TTS, Edge TTS - Wake word configuration via the companion app settings - Talk Mode overlay appearance and behavior customization --- ## 19. Configuration & Operations ### Configuration File All configuration lives in `~/.openclaw/openclaw.json` (JSON5 format): ```json5 { // Minimal config — just set the model agent: { model: "anthropic/claude-opus-4-6", }, } ``` The configuration system supports: - **Model selection** and fallback chains - **OAuth and API key** authentication for model providers - **Channel configuration** for each messaging platform - **Agent configuration** (workspace, identity, tools, sandbox) - **Multi-agent bindings** and routing rules - **Tool policies** (profiles, allow/deny, per-provider, per-agent) - **Session policies** (DM scope, reset schedules, send policies) - **Automation** (cron jobs, webhooks, hooks) - **Security** (Gateway auth, DM policies, group policies, sandboxing) - **Browser, Canvas, Nodes, TTS, Skills** settings ### Onboarding Wizard The `openclaw onboard` wizard is the recommended setup path: 1. Install the Gateway daemon (launchd/systemd) 2. Configure model provider authentication 3. Set up channels (WhatsApp QR, Telegram bot token, etc.) 4. Initialize the workspace with bootstrap files 5. Configure skills ### Doctor `openclaw doctor` is the diagnostic tool: - Checks for common misconfigurations - Verifies daemon health - Can generate gateway tokens - Warns about risky permissions - Offers auto-fixes for some issues ### Deployment Options | Method | Notes | |--------|-------| | **Local (recommended)** | `openclaw onboard --install-daemon` on macOS/Linux | | **Remote Linux** | Run Gateway on a VPS; connect via Tailscale Serve/Funnel or SSH tunnels | | **Docker** | Full Docker support with sandbox containers for non-main sessions | | **Nix** | Declarative configuration via `nix-openclaw` | ### Remote Access - **Tailscale Serve** (tailnet-only) or **Funnel** (public) — keeps Gateway on loopback while providing HTTPS access - **SSH tunnels** — `ssh -N -L 18789:127.0.0.1:18789 user@host` - Token/password auth applies over tunnels - TLS + optional certificate pinning for WS in remote setups ### Logging - Structured logging with configurable levels - Sensitive data redaction (`logging.redactSensitive`) - Session transcripts stored as JSONL for full audit trails --- ## 20. Source Code Organization The repository is a **pnpm monorepo** with the following major directories: ### `/src/` — Core Application (50+ subdirectories) | Directory | Purpose | |-----------|---------| | `src/gateway/` | WebSocket server, protocol handling, RPC dispatch | | `src/agents/` | Agent runtime, multi-agent routing, workspace management | | `src/sessions/` | Session management, pruning, compaction | | `src/channels/` | Channel adapter framework and common utilities | | `src/whatsapp/` | WhatsApp channel (Baileys) | | `src/telegram/` | Telegram channel (grammY) | | `src/discord/` | Discord channel (discord.js) | | `src/slack/` | Slack channel (Bolt) | | `src/signal/` | Signal channel (signal-cli) | | `src/imessage/` | Legacy iMessage channel | | `src/browser/` | Browser automation (CDP/Playwright) | | `src/canvas-host/` | Canvas WebView host | | `src/node-host/` | Headless node host implementation | | `src/cli/` | CLI entry points and command registration | | `src/commands/` | Slash command handling | | `src/config/` | Configuration loading, validation, schema | | `src/cron/` | Cron job scheduler | | `src/daemon/` | Daemon management (launchd/systemd) | | `src/hooks/` | Hook system (event-driven scripts) | | `src/media/` | Media pipeline (images, audio, video) | | `src/media-understanding/` | Media analysis and transcription | | `src/link-understanding/` | URL content extraction | | `src/memory/` | Memory system core | | `src/pairing/` | Device and DM pairing | | `src/plugin-sdk/` | Plugin SDK for extension authors | | `src/plugins/` | Plugin loader and registry | | `src/providers/` | Model provider integration (Anthropic, OpenAI, etc.) | | `src/routing/` | Message routing and binding resolution | | `src/security/` | Security audit, sandboxing, exec approvals | | `src/tts/` | Text-to-speech (ElevenLabs, OpenAI, Edge) | | `src/tui/` | Terminal UI components | | `src/web/` | Control UI and WebChat serving | | `src/wizard/` | Onboarding wizard | | `src/auto-reply/` | Auto-reply pipeline | | `src/process/` | Background process management | | `src/logging/` | Structured logging | | `src/types/` | Shared TypeScript types | | `src/utils/` | Common utilities | | `src/acp/` | Agent Communication Protocol | | `src/line/` | LINE channel | | `src/macos/` | macOS-specific utilities | | `src/markdown/` | Markdown processing | | `src/terminal/` | Terminal integration | | `src/infra/` | Infrastructure utilities | | `src/compat/` | Compatibility layer | | `src/docs/` | Documentation generation | | `src/scripts/` | Build/utility scripts | ### `/extensions/` — Plugin Extensions (30+ extensions) | Extension | Type | |-----------|------| | `bluebubbles/` | iMessage via BlueBubbles | | `telegram/`, `discord/`, `slack/`, `whatsapp/` | Channel extensions | | `googlechat/`, `msteams/`, `matrix/`, `signal/` | Channel extensions | | `feishu/`, `line/`, `mattermost/`, `nextcloud-talk/` | Channel extensions | | `nostr/`, `tlon/`, `twitch/`, `zalo/`, `zalouser/` | Channel extensions | | `memory-core/`, `memory-lancedb/` | Memory plugins | | `voice-call/` | Telephony plugin | | `llm-task/` | LLM task execution | | `lobster/` | Workflow runtime | | `copilot-proxy/` | VS Code Copilot bridge | | `google-antigravity-auth/`, `google-gemini-cli-auth/` | Provider auth | | `qwen-portal-auth/`, `minimax-portal-auth/` | Provider auth | | `diagnostics-otel/` | OpenTelemetry diagnostics | | `open-prose/` | Prose editing | ### `/ui/` — Frontend (Control UI + WebChat) Web-based interfaces served directly from the Gateway's HTTP surface. ### `/apps/` — Companion Apps Native applications for macOS (Swift), iOS (Swift), and Android (Kotlin). ### `/Swabble/` — macOS App (Swift) The macOS menu bar application (OpenClaw.app). ### `/packages/` — Sub-packages - `clawdbot/` — Bot personality package - `moltbot/` — Molty bot personality ### `/skills/` — Bundled Skills Skill definitions shipped with the install. ### `/docs/` — Documentation Source Documentation source files. ### `/test/` — End-to-End Tests Integration and end-to-end test suites. ### Root Files | File | Purpose | |------|---------| | `openclaw.mjs` | Main entry point | | `package.json` | Root package manifest | | `pnpm-workspace.yaml` | Monorepo workspace config | | `tsdown.config.ts` | Build configuration | | `tsconfig.json` | TypeScript configuration | | `vitest.*.config.ts` | Multiple Vitest configs (unit, e2e, gateway, extensions, live) | | `Dockerfile` | Main Docker image | | `Dockerfile.sandbox` | Sandbox container image | | `Dockerfile.sandbox-browser` | Sandbox with browser support | | `docker-compose.yml` | Docker Compose configuration | | `fly.toml` / `render.yaml` | Cloud deployment configs | | `AGENTS.md` / `CLAUDE.md` | AI assistant instructions | | `CONTRIBUTING.md` | Contribution guidelines | --- ## Summary OpenClaw is a remarkably ambitious and well-architected system. At its core, it solves a specific problem: **giving a single person a personal AI assistant that lives across all their communication channels and devices**. The key architectural decisions that make this work: 1. **Single Gateway daemon** — one process owns all state, preventing conflicts and enabling atomic operations across channels and sessions. 2. **Channel abstraction** — messaging platforms are treated as interchangeable transport layers, allowing the same agent logic to work everywhere. 3. **WebSocket-first protocol** — everything (clients, nodes, tools, UIs) connects via one typed WebSocket API, keeping the architecture simple and extensible. 4. **Plugin system** — new channels, tools, and capabilities can be added without modifying core code, enabling rapid community growth (30+ extensions already). 5. **Layered security** — identity (who), scope (where), and model (assume compromise) form three independent defense layers. 6. **Node architecture** — separating "brain" (Gateway) from "body" (device nodes) allows the agent to run on a server while still accessing cameras, screens, and local hardware on your devices. 7. **Multi-agent routing** — binding rules map inbound messages to isolated agents, supporting families, teams, and multiple personas from a single Gateway. The codebase is massive (50+ source directories, 30+ extensions) but well-organized around these architectural boundaries. It represents a production-grade implementation of the "personal AI assistant" concept, with serious attention to security, reliability, and user experience across an extraordinary range of platforms and devices.