Whiteboard pipeline methodology

This is how clawdSlate (and the fathom-whiteboard npm package) turns pasted content — a research paper, a slide deck, a code architecture, a screenshot of a meeting whiteboard — into an Excalidraw whiteboard. Plain-language reading order; no source code required.

TL;DR

pasted content ──► Claude (Agent SDK) ──► excalidraw-mcp ──► scene.elements[]
                    │
                    └─ system prompt = coleam SKILL.md + clawdSlate suffix
                       allowed tools = read_me + create_view (+ Read for paths)
                       settingSources = []

Two MCP tools, one or two turns of agent work, one elements array out.

The pipeline run

  1. System prompt is built by concatenating the coleam SKILL.md verbatim with a short clawdSlate-specific suffix that says “you’re explaining a research paper as a teaching whiteboard, plan one canvas, call read_me once, call create_view multiple times so the canvas updates progressively.” The SKILL is ~24KB; the suffix is ~70 lines.
  2. Pasted content is the user message. If the host passes { kind: 'text', markdown } we inline the markdown directly; if { kind: 'path' } we tell Claude where to read it from and allow the Read tool too. A focus block surfaces the user’s optional emphasis (“focus on X”) near the top of the message.
  3. MCP transport is connected as HTTP. Default: hosted endpoint (https://mcp.excalidraw.com/mcp). Optional: spawn vendor/excalidraw-mcp locally on an OS-assigned port; the launcher tails stdout for the “MCP server listening on …” line and parses out the URL.
  4. allowedTools is set to exactly mcp__excalidraw__read_me and mcp__excalidraw__create_view (plus Read when the paper is a file path). This is the durable filter — the SDK respects allowedTools even when other tools would otherwise be enabled. settingSources: [] ensures no host-side CLAUDE.md or user config bleeds into the run.
  5. Stream consumed. As Claude emits assistant blocks, we record:
    • text deltas (onAssistantText, also logged via onLog)
    • tool_use events (onToolUse, plus onLog)
    • the input of every mcp__excalidraw__create_view call — resolveSceneFromInput applies vendor delta semantics (restoreCheckpoint + delete) and yields the new resolved scene; we fire onSceneUpdate so the renderer paints progressively.
  6. Result event carries the cost in USD; we surface that in the result + final log line.

That’s the whole pipeline. The full source is in src/pipeline.ts — about 460 lines.

Why this shape (vs the pre-pivot version)

Pre-pivot (the elaborate version derived from Fathom’s whiteboard tab):

It was ~3,000 LOC of pipeline code.

A control experiment — vanilla Agent SDK + unmodified excalidraw-mcp + coleam’s SKILL — ran the same paper for $0.95 in 3 turns and produced a tighter, more designed-feeling diagram. The shape of the diagram (variety of element types, container/free-floating mix, evidence artefacts) was visibly closer to the SKILL’s quality bar than what our elaborate pipeline emitted.

The takeaway: when the upstream MCP + a well-written SKILL are doing the heavy lifting, additional pipeline layers were suppressing quality, not adding to it. Templates locked the agent into a small set of layouts; the custom 9-tool wrapper made it harder for the agent to think in plain Excalidraw elements; the visual critic was second-guessing perfectly fine output. So we threw all of it away.

What’s still important

Cost profile

End-to-end ReconViaGen paper (62KB markdown), Opus 4.7 via the Agent SDK:

If you’re processing dozens of papers in batch, prefer running on the user’s CLI auth (which the SDK uses by default) over a separately-rate-limited API key.

Failure modes

Logging

The pipeline emits via GenerateCallbacks.onLog:

Hosts that wire this to a file get a usable transcript out of the box. clawdSlate’s Electron host forwards via webContents.send so the renderer log viewer sees every line in real time. Embedded hosts (e.g. Fathom) wire it to their own log surface.

Persistence

The pipeline does not persist anything. The host’s WhiteboardHost.saveScene / loadScene methods are responsible for that.

In clawdSlate, scenes live in the per-session sidecar at:

~/Library/Application Support/clawdSlate/sessions/last/
  whiteboard.excalidraw           — full Excalidraw scene file
  whiteboard.viewport.json        — last-known scrollX/scrollY/zoom
  paper.json                      — most recent pasted content
  assets/                         — saved attachments (images, PDFs)

This matches CLAUDE.md §1 “Persist by default”: once the user has paid the API cost (~$0.95) to generate a whiteboard, regenerating it because clawdSlate forgot to save is a design failure. On reopen, the renderer hydrates from disk first, regenerates only if no saved state exists. The ONLY paths to deletion are: (a) the user explicitly clicks “Clear” / “Regenerate”; (b) the user manually deletes the session dir.

Embedded host: how Fathom uses this

Fathom’s per-paper “Whiteboard” tab embeds <Whiteboard> from this same npm package. Fathom’s host implementation:

The pipeline is identical in both cases — clawdSlate and Fathom run the same generateWhiteboard against the same MCP. The host’s only job is to wire loadScene/saveScene to its own persistence and generate/refine to a streaming IPC. See WhiteboardHost in src/Whiteboard.tsx for the contract.