Motivation
OpenScreen is a powerful desktop app for screen recording and video editing, but it currently requires a GUI for all operations. As AI agents become the primary consumers of developer tools, OpenScreen needs a programmatic interface that agents can call directly — without a display or mouse.
Projects like Remotion have proven this model: a CLI as the primary agent interface, agent skills for guidance, and lightweight MCP for discovery. This issue proposes bringing the same approach to OpenScreen.
What we're building
1. CLI (openscreen) — 20+ commands
Project management (pure Node.js, instant):
project create — Create .openscreen project from video recording
project info — Inspect project metadata, regions, settings
project validate — Validate project file integrity
project edit — Modify settings (wallpaper, padding, aspect ratio, etc.)
Region management (pure Node.js, instant):
zoom add/list/remove — Manage zoom effects (depth 1-6, focus coordinates)
trim add/list/remove — Cut sections from the video
speed add/list/remove — Change playback speed (0.25x - 2x)
annotate add/list/remove — Add text, image, or arrow overlays
Rendering (Electron headless, seconds-minutes):
render — Export as MP4 with all effects applied (WebCodecs + PixiJS)
gif — Export as animated GIF
still — Capture single frame as PNG/JPEG (planned)
frames — Export frame sequence (planned)
All commands support --json for machine-readable output.
2. Shared code extraction (src/shared/)
Pure TypeScript types and logic extracted from the Electron app so both CLI and GUI share the same validation, normalization, and project schema code. Zero breaking changes — original files become thin re-exports.
3. Electron headless bridge
A --cli-export mode in electron/main.ts that creates a hidden BrowserWindow, runs the full export pipeline (WebCodecs + PixiJS), and streams progress to stdout. This reuses 100% of the existing rendering code — no FFmpeg dependency, no reimplementation.
4. Agent skills (SKILL.md + rules)
Following the agentskills.io protocol and Remotion's pattern: a master SKILL.md with 7 rule files covering project setup, zoom regions, trim/speed, annotations, export, frames, and troubleshooting.
5. Minimal MCP server
Single openscreen_help tool for documentation search. Agents with shell access call the CLI directly; the MCP server is just for discovery.
Architecture
CLI (Node.js) Electron (headless)
┌─────────────────────┐ ┌──────────────────────┐
│ project create/edit │ │ --cli-export mode │
│ zoom/trim/speed add │ │ CliExportRenderer │
│ annotate add/remove │ │ VideoExporter (WebCodecs) │
│ shortcuts get/set │ │ GifExporter (PixiJS) │
│ --json output │ ──────▶ │ Progress via IPC │
└─────────────────────┘ spawn └──────────────────────┘
│ │
▼ ▼
src/shared/ Valid MP4/GIF files
(types, validation, with all effects applied
project schema)
Current status
🚧 Work in progress — Core implementation complete and tested E2E. Draft PR linked below.
What's done
What's next
Motivation
OpenScreen is a powerful desktop app for screen recording and video editing, but it currently requires a GUI for all operations. As AI agents become the primary consumers of developer tools, OpenScreen needs a programmatic interface that agents can call directly — without a display or mouse.
Projects like Remotion have proven this model: a CLI as the primary agent interface, agent skills for guidance, and lightweight MCP for discovery. This issue proposes bringing the same approach to OpenScreen.
What we're building
1. CLI (
openscreen) — 20+ commandsProject management (pure Node.js, instant):
project create— Create.openscreenproject from video recordingproject info— Inspect project metadata, regions, settingsproject validate— Validate project file integrityproject edit— Modify settings (wallpaper, padding, aspect ratio, etc.)Region management (pure Node.js, instant):
zoom add/list/remove— Manage zoom effects (depth 1-6, focus coordinates)trim add/list/remove— Cut sections from the videospeed add/list/remove— Change playback speed (0.25x - 2x)annotate add/list/remove— Add text, image, or arrow overlaysRendering (Electron headless, seconds-minutes):
render— Export as MP4 with all effects applied (WebCodecs + PixiJS)gif— Export as animated GIFstill— Capture single frame as PNG/JPEG (planned)frames— Export frame sequence (planned)All commands support
--jsonfor machine-readable output.2. Shared code extraction (
src/shared/)Pure TypeScript types and logic extracted from the Electron app so both CLI and GUI share the same validation, normalization, and project schema code. Zero breaking changes — original files become thin re-exports.
3. Electron headless bridge
A
--cli-exportmode inelectron/main.tsthat creates a hiddenBrowserWindow, runs the full export pipeline (WebCodecs + PixiJS), and streams progress to stdout. This reuses 100% of the existing rendering code — no FFmpeg dependency, no reimplementation.4. Agent skills (SKILL.md + rules)
Following the agentskills.io protocol and Remotion's pattern: a master SKILL.md with 7 rule files covering project setup, zoom regions, trim/speed, annotations, export, frames, and troubleshooting.
5. Minimal MCP server
Single
openscreen_helptool for documentation search. Agents with shell access call the CLI directly; the MCP server is just for discovery.Architecture
Current status
🚧 Work in progress — Core implementation complete and tested E2E. Draft PR linked below.
What's done
--jsondual output, input validation)What's next
stillcommand — capture single frame as PNG/JPEGframescommand — export frame sequence