Yolo Yunlong Tang1, Daiki Shimada2, Hang Hua3, Chao Huang1, Jing Bi1, Rogerio Feris3, Chenliang Xu1
1University of Rochester, 2Sony Group Corporation, 3MIT-IBM Watson AI Lab
- [2025-11-23] Introducing Video-R4, a reinforced video agent with visual rumination for text-rich video reasoning. The arXiv paper has been released. Code, model, and dataset are coming soon.
conda create -n video-r4 python=3.10
conda activate video-r4
git clone https://github.com/yunlong10/Video-R4.git
cd Video-R4
pip install -r requirements.txtIf you find this work useful, please consider citing:
@article{tang2025video-r4,
title={Video-R4: Reinforcing Text-Rich Video Reasoning with Visual Rumination},
author={Tang, Yunlong and Shimada, Daiku and Hua, Hang and Huang, Chao and Bi, Jing and Feris, Rogerio and Xu, Chenliang},
journal={arXiv preprint arXiv:2511.17490},
year={2025}
}This work was supported by Sony Group Corporation. We would like to thank Sayaka Nakamura and Jerry Jun Yokono for their insightful discussion.
We also thank the authors of the following projects for their contributions:


