mPresenter is an end-to-end multilingual agentic framework designed to transform static academic papers (PDFs) into presentation videos.
- 🤖 Multi-Agent Collaboration: Orchestrates 4 specialized agents (Planner, Reviewer, Coder, Interpreter) to plan, critique, and generate professional slides.
- 👁️ Cross-Lingual Interpretation: A dedicated Interpreter Agent analyzes figures to explain visual elements that remain in the source language.
- 🎨 True-to-Layout Generation: Writes executable Beamer LaTeX code and visually inspects rendered slides to ensure high readability.
- 📈 High Information Density: Designed for effective knowledge transfer, achieving higher accuracy in QA benchmarks compared to prior systems.
- ⚡ Cost-Effective Efficiency: Achieves the lowest token consumption among baselines and substantially reduces latency, making high-quality video generation affordable and scalable.
English:
mpresenter_en.mp4
中文:
mpresenter_zh.mp4
An expert-curated multilingual benchmark designed to evaluate Effective Information Transfer (EIT) of Paper2Video systems.
- 📄 40 Academic Papers:
- 20 English Papers (10 from NeurIPS 2025, 10 from ACL 2025).
- 20 Chinese Papers (from Chinese Journal of Computers).
- Covers diverse topics: Vision, NLP, Graph, Security, RL, BioMed, and Systems.
- ❓ 1,600 Multilingual Questions:
- Each paper is paired with 8 expert-written multiple-choice questions.
- All questions are translated into 5 languages: English, Chinese, German, Japanese, and Arabic.
The benchmark targets four core aspects of scientific communication:
- Motivation: Research context and gaps in related work.
- Method: Technical mechanisms and figure interpretation.
- Experiment: Experimental setup and result analysis.
- Conclusion: Key takeaways and supported claims.
System dependencies:
- TeX Live (XeLaTeX).
- Poppler (
pdftoppm) or ImageMagick. - Fonts: Source Han Sans and Fira Sans
- Install dependencies with pixi:
pixi install
# Enable CosyVoice TTS
git clone --recursive https://github.com/FunAudioLLM/CosyVoice.git third_party/CosyVoice- Set an LLM key:
- OpenAI: set
llm.openai_api_keyinconfig.json. - Gemini: set
llm.gemini_api_keyinconfig.jsonor exportGEMINI_API_KEY/GOOGLE_API_KEY.
- OpenAI: set
- Place your PDF at
input/paper.pdfor pass--source-pdf. - Run:
pixi shell
python main.py --source-pdf input/paper.pdf --target-language English --output-root outputdocker build -t mpresenter:latest .Run:
mkdir -p input output cache
cp /path/to/paper.pdf input/paper.pdf
docker run --rm -it \
-v "$(pwd)/input:/app/input" \
-v "$(pwd)/output:/app/output" \
-v "$(pwd)/cache:/app/cache" \
-v "$(pwd)/config.json:/app/config.json" \
-e GEMINI_API_KEY=YOUR_KEY \
mpresenter:latest \
--source-pdf /app/input/paper.pdf --target-language English --output-root /app/outputCommon command-line options:
--source-pdf: overridesource_pdf(defaultinput/paper.pdf)--output-root: overrideoutput_root--cache-root: overridecache_root--target-language: overridetarget_language(default: English)--planner-note: overrideplanner_note--backbone: override all LLM model names with a single model
Other configuration is defined in config.json.
Session cache IDs are derived from the input PDF filename (stem), so re-running with the same PDF name reuses the same cache directory under cache/.
Cache directory (cache/<session_id>/) key files and folders:
steps_status.json: pipeline step status for resume.run.log: main run log.final_outline.json: finalized outline JSON.slides_manifest.json: slide-level manifest (image path + note).slides.texandslides.pdf: merged Beamer source and PDF.slides_llm/: slide PNGs for LLM review.slides/: slide PNGs for video synthesis.final_scripts.json: finalized narration scripts.audio/: synthesized audio clips per slide.
Final output:
output/<source_pdf_stem>.mp4: final video.
mPresenter maintains low latency and minimal API costs while producing superior video quality.
Thanks to these open-source projects:
- PaddleOCR (PP-DocLayout-L + OCR): https://github.com/PaddlePaddle/PaddleOCR
- Metropolis Beamer theme (mtheme): https://github.com/matze/mtheme
- FunAudioLLM/CosyVoice (TTS): https://github.com/FunAudioLLM/CosyVoice


