The first practical AI integration for UXP — proving that LLM-powered tools can coexist with transparency and user control.
Split is an Adobe Photoshop UXP plugin that translates natural-language instructions into structured, deterministic editing operations via Google Gemini.
Built for: Adobe Innovate4AI Hackathon
Team: Alexandru Toboșaru, Matei Sandu, Cyrus Sepahpour
Stack: Adobe UXP | Vanilla JavaScript | Google Gemini API
Most AI-powered creative tools present a false choice:
- Fully automatic = no transparency, no control
- Disconnected = manual workflows, no integration
Split bridges this gap: AI intent → human approval → deterministic execution
Natural Language
↓
[Gemini] (structured prompt with JSON schema)
↓
JSON Editing Plan { operations: [...] }
↓
[User Review] (inspect + approve)
↓
[Validator] (type check, bounds check, safe execution)
↓
Photoshop batchPlay
| Layer | Tech | Responsibility |
|---|---|---|
| Plugin | Adobe UXP | Context gathering, batchPlay dispatch |
| Reasoning | Gemini API | Generate schema-compliant JSON plans |
| Validation | Vanilla JS | Enforce schema integrity + safe bounds |
| UI | HTML/CSS/JS | Review interface, no framework bloat |
- Resize — Scale canvas & content with interpolation choice
- Crop — Intelligent crop with position (center, top, bottom, left, right)
- Blur — Gaussian, motion (with angle), radial (parameterized radius)
- Color — Brightness, contrast, saturation, temperature adjustments
- Sharpness — Unsharp mask with configurable amount, radius, and threshold
- Merge Layers — Concatenate multiple layers by name into a single layer
- Select Subject — Intelligent subject detection and separation using auto-cutout
- Layer Selection — Select specific layers by index or name
✓ Schema-First Design — Gemini forced into strict JSON contract
✓ No Auto-Execute — User approval required; modality enforced via executeAsModal
✓ Reversible — All ops use Photoshop's native batchPlay descriptors
✓ Zero Framework — Plugin UI is vanilla HTML/JS (UXP sandbox constraints)
✓ Context-Aware — Document metadata fed to Gemini (DPI, layer count, bounds)
# 1. Install UXP Developer Tool (Adobe official)
# 2. In UDT, click "Add Plugin" → select manifest.json from plugin-uxp/
# 3. Load the plugin
uxp plugin load
# 4. Photoshop will open the plugin panel automaticallySet your Gemini API key in modules/config.js:
const GEMINI_API_KEY = "your_api_key_here";For production, use environment variables or a secure backend (see Roadmap).
# Auto-reload on file changes
sh watch.sh- Write natural-language instruction in the panel text area
- Example: "Rotate to portrait, brighten by 20%, add gaussian blur"
- Click "Analyze & Apply" → plugin queries Gemini
- Review the JSON plan in the response section
- Click "Execute Changes" to apply
- Or use Manual Controls for direct adjustments (resize, crop, blur, color)
- All changes are undoable via Photoshop's undo stack
- Text area for natural-language prompts
- AI Response section (shows JSON plan before execution)
- Manual Controls: collapsible sections for each operation type
- Sliders for numeric parameters (blur radius, brightness, contrast, etc.)
- Status messages (success/error feedback)
- Initializes plugin via
modules/ui.js - Minimal setup; delegates to modular architecture
ai.js— Gemini integration, prompt execution, schema validationui.js— Panel initialization, UI state management, event listenersoperations.js— All editing operations (resize, crop, blur, color correction)layers.js— Layer manipulation and analysisdocument.js— Document context extraction (metadata, bounds)prompts.js— Gemini system/user prompt templatesconfig.js— Configuration constants (API keys, schemas, bounds)utils.js— Helper functions (validation, error handling)
Key pattern: All operations wrapped in executeAsModal() to prevent conflicts.
- Host: Photoshop 24.0+
- Permissions: Network access to
generativelanguage.googleapis.com - Panel: 280-800px width, 400-1200px height
- Type: Floating/docked panel
- Glassmorphic design (backdrop blur, transparency)
- Color-coded sections (blue for AI, green for execute)
- Custom scrollbars, sliders, and buttons
- Responsive to panel resize
- API key configuration in
modules/config.js(use environment variables for production) - Operations — Covers core types (resize, crop, blur, color) with room for expansion
- No undo/redo integration — Uses Photoshop's native undo (works but not custom stack)
- Document context — Basic metadata; layer analysis available in
layers.js
- Speech-to-Text — Voice input for natural language instructions
- Memory Bank — Store and recall editing workflows for quick reuse
- Multi-Agent System — Coordinate multiple AI agents for complex tasks (e.g., one agent for composition, one for color, one for enhancement)
- Expand operation registry (new editing types)
- Backend proxy server for API security
- Batch processing (process multiple files)
- Advanced constraints (e.g., brand guidelines, style preservation)
- Performance: cache Gemini schemas to reduce API calls
| Challenge | Solution |
|---|---|
| Gemini hallucinating invalid operations | Strict JSON schema + type validation |
| Modal execution blocking UI | executeAsModal wrapper with promise chaining |
| Large document context | Only pass key metadata (dims, name, not layers) |
| Parameter bounds safety | Min/max validation before batchPlay |
| UXP sandbox limitations | Vanilla JS (no npm packages, no frameworks) |
split-ai/
├── index.html # UI: AI prompt + manual controls
├── index.js # Plugin entry point
├── manifest.json # Plugin config (PS 24+, permissions, panel size)
├── package.json # Package metadata
├── style.css # Dark theme UI (glassmorphic)
├── watch.sh # Dev: auto-reload script
├── LICENSE # Apache 2.0
├── README.md # This file
├── architecture.png # Architecture diagram
├── icons/ # Plugin icons
├── modules/ # Modular architecture
│ ├── ai.js # Gemini API integration, schema validation
│ ├── ui.js # Panel initialization, event handling
│ ├── operations.js # All editing operations (resize, crop, blur, color)
│ ├── layers.js # Layer manipulation
│ ├── document.js # Document context extraction
│ ├── prompts.js # Gemini prompt templates
│ ├── config.js # Config constants, API keys, schemas
│ └── utils.js # Helper utilities
This project is licensed under the Apache License 2.0 — see the LICENSE file for details.
Built during Adobe Innovate4AI Hackathon in 48 hours in Bucharest. Special thanks to the Bucharest event team for hosting, mentorship, and valuable insights that shaped the direction of this project.
