[Project Page] | [arXiv] [Artifacts] [SAGE-Bench] [BibTeX]
This repo contains the code for our paper SAGE: Training Smart Any-Horizon Agents for Long Video Reasoning with Reinforcement Learning.
- [December 16, 2025]: 🚀 SAGE is publicly released. We also open-source all the artifacts including model checkpoints, datasets and benchmark on huggingface hub! 🎁
-
Clone this repository.
git lfs install git clone https://github.com/allenai/SAGE cd SAGE -
Setup conda environment with the base dependencies.
conda create --name sage -y python=3.11 && conda activate sage sudo apt-get update && sudo apt-get install -y ffmpeg # reqd for extracting video parts and transcript audio pip install decord qwen_vl_utils pip install torch==2.8.0 torchvision==0.23.0 torchaudio==2.8.0 --index-url https://download.pytorch.org/whl/cu126 pip install -e . && pip install -e verl/ pip install transformers==4.57.0 pip install vllm==0.11.0 pip install flash-attn==2.7.3 pip install trl deepspeed # only for SFT
-
You need to setup three services to use SAGE:
-
Obtain a Serper API key (there are some free credits available) for the
web-searchtool. -
vLLM server for the
ground-eventandanalyzetools.bash scripts/start_vllm_qwen3vl.sh # requires 2x80G-A100 GPUs for Qwen/Qwen3-VL-30B-A3B-Instruct -
Whisper model API for the
transcribe-speechtool.bash scripts/start_transcribe_api.sh # single GPU whisper large-v3 -
[OPTIONAL] If you want to use Gemini-2.5-Flash as a tool, you need to obtain a Gemini API key.
bash GEMINI_API_KEY=<YOUR-API-KEY>
-
You can use the Gradio interface to analyze videos with SAGE locally. You need to setup a few APIs for tool calls before running SAGE:
-
Set the environment variables at the top of the demo.sh script:
export SERPER_API_KEY="YOUR_SERPER_API_KEY" export TOOL_CALL_MODEL="Qwen/Qwen3-VL-30B-A3B-Instruct" export VLLM_CLIENT_URL="vLLM_API_URL_FOR_TOOL_CALLING" export TRANSCRIBE_API_URL="API_URL_FOR_TRANSCRIPTION"
-
Run the gradio demo:
bash scripts/demo.sh
Please see Training.md for training commands and dataset preparation.
Please see Evaluation.md for evaluation commands and preparing SAGE-Bench.
@article{jain2025sage,
title={{SAGE: Training Smart Any-Horizon Agents for Long Video Reasoning with Reinforcement Learning}},
author={Jitesh Jain and Jialuo Li and Zixian Ma and Jieyu Zhang and Chris Dongjoo Kim and Sangho Lee and Rohun Tripathi and Tanmay Gupta and Christopher Clark and Humphrey Shi},
journal={arXiv},
year={2025}
}We thank the authors of verl and verl-agent for open-sourcing their code that helped us implement our RL training pipeline.
