[NeurIPS'25 Spotlight] Official implementation of "JavisGPT: A Unified Multi-modal LLM for Sounding-Video Comprehension and Generation"
audio-video multimodal mllm multimodal-large-language-models sounding-video-generation joint-audio-video-generation audiovisual-uderstanding unified-mllm audiovisual-synchronization
-
Updated
Jan 10, 2026 - Python