Skip to content

Commit c977215

Browse files
committed
feat: Add GenerateImages function tool for natural language image generation
1 parent 77a03ec commit c977215

File tree

1 file changed

+250
-0
lines changed

1 file changed

+250
-0
lines changed
Lines changed: 250 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,250 @@
1+
# Image Generation as AI Function Tool
2+
3+
## Overview
4+
5+
Add `GenerateImages` as an AI function tool to enable natural language image generation during chat sessions, complementing the existing `cycod imagine` CLI command.
6+
7+
## Current State
8+
9+
- `cycod imagine "prompt"` - CLI command for explicit image generation
10+
- `--image FILE` / `/image FILE` - Add existing images to conversation (input)
11+
- No way to generate images naturally during chat without exiting
12+
13+
## Proposed Solution
14+
15+
Implement `GenerateImages` as a function tool that AI can invoke during conversation.
16+
17+
### User Experience
18+
19+
**Instead of:**
20+
```bash
21+
# Exit chat or use complex slash command
22+
cycod imagine "weather app icon" --count 3 --size 1024x1024
23+
```
24+
25+
**Natural conversation:**
26+
```
27+
User: Can you create me three weather app icons? Make them minimalist and blue.
28+
29+
AI: I'll generate three weather app icons for you.
30+
[Function call: GenerateImages approved]
31+
✓ Generated weather-icon-20250113-143022-1.png
32+
✓ Generated weather-icon-20250113-143022-2.png
33+
✓ Generated weather-icon-20250113-143022-3.png
34+
35+
I've created three minimalist blue weather app icons...
36+
```
37+
38+
## Benefits
39+
40+
1. **More Natural** - Users describe intent, AI handles implementation
41+
2. **Better Iteration** - Conversational refinement workflow
42+
- "Create a logo" → "Make it more vintage" → "Add warm colors"
43+
3. **AI Value-Add** - Prompt engineering, smart defaults, context awareness
44+
4. **Consistent** - Matches existing function tool patterns
45+
5. **Flexible** - CLI command remains for scripting/batch operations
46+
47+
## Function Tool Design
48+
49+
### Schema
50+
51+
```typescript
52+
{
53+
name: "GenerateImages",
54+
description: "Generate images from text descriptions using AI (DALL-E)",
55+
parameters: {
56+
prompt: {
57+
type: "string",
58+
description: "Detailed image description. Be specific about style, colors, composition.",
59+
required: true
60+
},
61+
count: {
62+
type: "integer",
63+
description: "Number of variations (1-10)",
64+
default: 1
65+
},
66+
size: {
67+
type: "string",
68+
enum: ["1024x1024", "1792x1024", "1024x1792"],
69+
description: "Dimensions. Use 1792x1024 for hero/landscape",
70+
default: "1024x1024"
71+
},
72+
quality: {
73+
type: "string",
74+
enum: ["standard", "hd"],
75+
default: "standard"
76+
},
77+
style: {
78+
type: "string",
79+
enum: ["vivid", "natural"],
80+
description: "vivid=dramatic, natural=photorealistic",
81+
default: "vivid"
82+
},
83+
add_to_conversation: {
84+
type: "boolean",
85+
description: "Add generated images to conversation for analysis",
86+
default: false
87+
},
88+
output_directory: {
89+
type: "string",
90+
description: "Save location",
91+
default: "."
92+
}
93+
}
94+
}
95+
```
96+
97+
### Tool Response Format
98+
99+
```json
100+
{
101+
"success": true,
102+
"images": [
103+
{
104+
"path": "./weather-icon-20250113-143022-1.png",
105+
"prompt": "weather app icon, minimalist design, blue and white",
106+
"size": "1024x1024",
107+
"format": "png"
108+
}
109+
],
110+
"count": 1,
111+
"added_to_conversation": false
112+
}
113+
```
114+
115+
## Implementation Plan
116+
117+
### 1. Extract Image Generation Logic
118+
119+
- Move logic from `ImagineCommand` to shared service
120+
- Create `ImageGenerationService.cs` (or similar)
121+
- Service handles both CLI and function tool calls
122+
123+
### 2. Create Function Tool
124+
125+
- Add to function tool catalog
126+
- Implement tool handler (e.g., `ImageGenerationFunctionTool.cs`)
127+
- Handle parameter validation and defaults
128+
129+
### 3. Integration
130+
131+
- Register tool in function tool system
132+
- Ensure provider compatibility (Azure OpenAI, OpenAI)
133+
- Handle errors gracefully with user-friendly messages
134+
135+
### 4. Documentation
136+
137+
- Update function calls help
138+
- Add examples to help system
139+
- Document auto-approval options
140+
141+
### 5. Testing
142+
143+
- Unit tests for service extraction
144+
- Integration tests for tool invocation
145+
- Test with different providers
146+
- Test auto-add to conversation feature
147+
148+
## Key Design Decisions to Consider
149+
150+
### 1. Cost Control
151+
152+
Image generation costs money:
153+
- **Option A**: Require approval by default (like write operations)
154+
- **Option B**: Covered by `--auto-approve write`
155+
- **Option C**: Specific approval: `--auto-approve GenerateImages`
156+
- **Recommendation**: Default requires approval, can be auto-approved per user preference
157+
158+
### 2. Auto-Add to Conversation
159+
160+
When should generated images be added to conversation?
161+
- **Option A**: Always auto-add (AI can "see" what it created)
162+
- **Option B**: Never auto-add (keeps conversation light)
163+
- **Option C**: AI decides via `add_to_conversation` parameter
164+
- **Recommendation**: Option C - context-dependent via parameter
165+
166+
### 3. Prompt Engineering
167+
168+
Who crafts the DALL-E prompt?
169+
- **Option A**: Pass user request verbatim
170+
- **Option B**: AI enhances prompt for better results
171+
- **Recommendation**: Option B - AI adds details for optimal generation
172+
173+
### 4. File Management
174+
175+
Where do generated images go?
176+
- Default to current directory (like CLI)
177+
- AI can specify `output_directory` parameter
178+
- Could add smart defaults based on prompt context
179+
180+
## Use Cases
181+
182+
### Iterative Refinement
183+
```
184+
User: Create a coffee shop logo
185+
AI: [generates]
186+
User: Make it more vintage
187+
AI: [regenerates with vintage style]
188+
User: Add warm brown tones
189+
AI: [refines further]
190+
```
191+
192+
### Batch Generation with Context
193+
```
194+
User: I need icons for sunny, rainy, and cloudy weather
195+
AI: [generates 3 with consistent style]
196+
```
197+
198+
### Smart Defaults
199+
```
200+
User: Generate a hero image for my landing page
201+
AI: [infers 1792x1024 landscape, hd quality, natural style]
202+
```
203+
204+
## Comparison: CLI vs Function Tool
205+
206+
| Aspect | `cycod imagine` | Function Tool |
207+
|--------|-----------------|---------------|
208+
| Explicitness | High | Low |
209+
| Convenience | Low (exit chat) | High (in flow) |
210+
| Natural language | No | Yes |
211+
| AI enhancement | No | Yes |
212+
| Iteration | Awkward | Natural |
213+
| Control | Full | Delegated |
214+
| Scripting | Excellent | N/A |
215+
216+
**Both should coexist** - different use cases.
217+
218+
## Open Questions
219+
220+
1. Should there be rate limiting or cost warnings?
221+
2. How verbose should approval prompts be?
222+
3. Should we track and report cumulative costs?
223+
4. File naming: timestamps vs AI-suggested names?
224+
5. Multi-image batching limits?
225+
6. Integration with existing `--image` / `/image` features?
226+
227+
## Related Work
228+
229+
- Existing `ImagineCommand` implementation
230+
- Function tool infrastructure
231+
- Image handling in conversation context
232+
- Provider abstraction (Azure OpenAI, OpenAI)
233+
234+
## Priority
235+
236+
**Medium** - Nice quality-of-life improvement, not critical functionality.
237+
238+
## Related Considerations: `--add-image` Naming
239+
240+
While implementing this, also consider:
241+
- Renaming `--image` to `--add-image` for consistency with `--add-system-prompt`, `--add-user-prompt`
242+
- Could keep `--image` as alias for convenience
243+
- Slash command `/image` should probably stay short (all slash commands are brief)
244+
245+
## Notes
246+
247+
- Image generation is marked as experimental (MEAI001)
248+
- Requires provider with DALL-E support
249+
- Generated files include timestamps in names
250+
- Generation takes 10-30 seconds per image

0 commit comments

Comments
 (0)