JarvisEvo: Towards a Self-Evolving Photo Editing Agent with Synergistic Editor-Evaluator Optimization

Yunlong Lin^*, Linqing Wang^*, Kunjie Lin^*, Zixu Lin^*, Kaixiong Gong, Wenbo Li, Bin Lin, Zhenxi Li, Shiyi Zhang, Yuyang Peng, Wenxun Dai, Xinghao Ding^3♣, Chunyu Wang†, Qinglin Lu†

Tencent Hunyuan, Xiamen University

^*Equal Contributions ^†Project Leader ^♣Corresponding Author

💡 We also have other image editing agents that may interest you ✨.

[NeurIPS' 2025] JarvisArt: Liberating Human Artistic Creativity via an Intelligent Photo Retouching Agent
Yunlong Lin, Zixu Lin and Kunjie Lin, etc.

📮 News

[2025.12.29] We are grateful for the coverage by 机器之心 (link) and 量子位 (link). Thank you for the support!
[2025.12.16] 🎉 JarvisEvo's project page, paper are now available!

🎪 Open-source Plan

Create repo and project page
Release Inference code and checkpoints
Release Agent-to-Lightroom Protocol (server-client communication protocol for multi-machine, multi-GPU training with distributed Lightroom instances)
Release ArtEdit-Bench
Release SFT training code
Release SEPO, RFT training code

🧭 Table of Contents

📮 News
🎪 Open-source Plan
🧭 Overview
- 📝 Key Features
- 📊 Visual Comparison
💻 Getting Started
🙏 Acknowledgements
🌤️ Discussion Group
📧 Contact
📚 Citation
📜 License

🧭 Overview

JarvisEvo performs interleaved multimodal Chain-of-Thought (iMCoT) reasoning for image editing, which marries multi-step planning, dynamic tool orchestration, and iterative visual feedback. This closed-loop workflow incorporates self-evaluation and refinement to ensure the final output is both visually compelling and faithful to the creative vision. By seamlessly integrating professional tools like Adobe Lightroom for precision adjustments and Qwen-Image-Edit for generative tasks, the system achieves a unique synergy of expert- level refinement and creative synthesis.

📝 Key Features

🧠 Interleaved Multimodal Chain-of-Thought (iMCoT)

Closed-Loop Reasoning: "Thinks" with both text and images, validating steps against visual feedback to minimize hallucinations and error propagation.

🔄 Synergistic Editor-Evaluator Optimization (SEPO)

Self-Evolving Framework: A dual-loop reinforcement learning system where the model acts as both editor and evaluator, refining strategies via intrinsic rewards without relying on static external models.

🎨 Unified Preservative & Generative Editing

Comprehensive Toolset: Seamlessly integrates Adobe Lightroom (200+ tools) for precise adjustments and Qwen-Image-Edit for creative synthesis (object removal, style transfer), handling the full spectrum of editing tasks.

🪞 Self-Reflective Learning Mechanism

Autonomous Improvement: Automatically generates reflection trajectories upon suboptimal results, enabling the model to learn from mistakes and continuously optimize its tool selection logic.

📊 Visual Comparison

Comparison with ChatGPT x Adobe Photoshop

Comparison with Leading Image Editing Models

💻 Getting Started

For batch inference, please follow:

Batch Inference

For training, please follow:

Training Guide

For evaluation, please follow:

Evaluation

For Agent-to-Lightroom Protocol Detail, please follow:

Agent-to-Lightroom Protocol

🙏 Acknowledgements

We would like to express our gratitude to LLaMA-Factory for their valuable open-source contributions which have provided important technical references for our work.

🌤️ Discussion Group

If you have any questions during the trial, running or deployment, feel free to join our WeChat group discussion! If you have any ideas or suggestions for the project, you are also welcome to join our WeChat group discussion!

Scan QR code to join WeChat group discussion

📧 Contact

For any questions or inquiries, please reach out to us:

Yunlong Lin: [email protected]

📚 Citation

If you find JarvisEvo useful in your research, please consider citing:

@article{lin2025jarvisevo,
  title={JarvisEvo: Towards a Self-Evolving Photo Editing Agent with Synergistic Editor-Evaluator Optimization},
  author={Lin, Yunlong and Wang, Linqing and Lin, Kunjie and Lin, Zixu and Gong, Kaixiong and Li, Wenbo and Lin, Bin and Li, Zhenxi and Zhang, Shiyi and Peng, Yuyang and others},
  journal={arXiv preprint arXiv:2511.23002},
  year={2025}
}

📜 License

JarvisEvo is released under the Apache License 2.0.

Name		Name	Last commit message	Last commit date
Latest commit History 21 Commits
assets		assets
docs		docs
envs		envs
lrc_scripts		lrc_scripts
src		src
utils		utils
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
inference.py		inference.py
prompts.py		prompts.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

JarvisEvo: Towards a Self-Evolving Photo Editing Agent with Synergistic Editor-Evaluator Optimization

📮 News

🎪 Open-source Plan

🧭 Table of Contents

🧭 Overview

📝 Key Features

🧠 Interleaved Multimodal Chain-of-Thought (iMCoT)

🔄 Synergistic Editor-Evaluator Optimization (SEPO)

🎨 Unified Preservative & Generative Editing

🪞 Self-Reflective Learning Mechanism

📊 Visual Comparison

💻 Getting Started

🙏 Acknowledgements

🌤️ Discussion Group

📧 Contact

📚 Citation

📜 License

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

License

LYL1015/JarvisEvo

Folders and files

Latest commit

History

Repository files navigation

JarvisEvo: Towards a Self-Evolving Photo Editing Agent with Synergistic Editor-Evaluator Optimization

📮 News

🎪 Open-source Plan

🧭 Table of Contents

🧭 Overview

📝 Key Features

🧠 Interleaved Multimodal Chain-of-Thought (iMCoT)

🔄 Synergistic Editor-Evaluator Optimization (SEPO)

🎨 Unified Preservative & Generative Editing

🪞 Self-Reflective Learning Mechanism

📊 Visual Comparison

💻 Getting Started

🙏 Acknowledgements

🌤️ Discussion Group

📧 Contact

📚 Citation

📜 License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages