Skip to content

yunlong10/Video-R4

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

28 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Video-R4: Reinforcing Text-Rich Video Reasoning with Visual Rumination

Yolo Yunlong Tang1, Daiki Shimada2, Hang Hua3, Chao Huang1, Jing Bi1, Rogerio Feris3, Chenliang Xu1

1University of Rochester, 2Sony Group Corporation, 3MIT-IBM Watson AI Lab

arXiv Paper Project Page Huggingface Dataset Huggingface Model

🌟 News

  • [2025-11-23] Introducing Video-R4, a reinforced video agent with visual rumination for text-rich video reasoning. The arXiv paper has been released. Code, model, and dataset are coming soon.

πŸš€ Video-R4 Training Framework

πŸ“Š Data Curation Pipeline

πŸ“ˆ Performance

πŸ“¦ Installation

conda create -n video-r4 python=3.10
conda activate video-r4
git clone https://github.com/yunlong10/Video-R4.git
cd Video-R4
pip install -r requirements.txt

πŸ“– Citation

If you find this work useful, please consider citing:

@article{tang2025video-r4,
  title={Video-R4: Reinforcing Text-Rich Video Reasoning with Visual Rumination},
  author={Tang, Yunlong and Shimada, Daiku and Hua, Hang and Huang, Chao and Bi, Jing and Feris, Rogerio and Xu, Chenliang},
  journal={arXiv preprint arXiv:2511.17490},
  year={2025}
}

🀝 Acknowledgments

This work was supported by Sony Group Corporation. We would like to thank Sayaka Nakamura and Jerry Jun Yokono for their insightful discussion.

We also thank the authors of the following projects for their contributions:

About

Reinforcing Text-Rich Video Reasoning with Visual Rumination

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published