mPresenter

mPresenter is an end-to-end multilingual agentic framework designed to transform static academic papers (PDFs) into presentation videos.

✨ Features

🤖 Multi-Agent Collaboration: Orchestrates 4 specialized agents (Planner, Reviewer, Coder, Interpreter) to plan, critique, and generate professional slides.
👁️ Cross-Lingual Interpretation: A dedicated Interpreter Agent analyzes figures to explain visual elements that remain in the source language.
🎨 True-to-Layout Generation: Writes executable Beamer LaTeX code and visually inspects rendered slides to ensure high readability.
📈 High Information Density: Designed for effective knowledge transfer, achieving higher accuracy in QA benchmarks compared to prior systems.
⚡ Cost-Effective Efficiency: Achieves the lowest token consumption among baselines and substantially reduces latency, making high-quality video generation affordable and scalable.

🎬 Demo Videos

English:

mpresenter_en.mp4

中文:

mpresenter_zh.mp4

📊 mPreBench Dataset

An expert-curated multilingual benchmark designed to evaluate Effective Information Transfer (EIT) of Paper2Video systems.

📄 40 Academic Papers:
- 20 English Papers (10 from NeurIPS 2025, 10 from ACL 2025).
- 20 Chinese Papers (from Chinese Journal of Computers).
- Covers diverse topics: Vision, NLP, Graph, Security, RL, BioMed, and Systems.
❓ 1,600 Multilingual Questions:
- Each paper is paired with 8 expert-written multiple-choice questions.
- All questions are translated into 5 languages: English, Chinese, German, Japanese, and Arabic.

Evaluation Dimensions

The benchmark targets four core aspects of scientific communication:

Motivation: Research context and gaps in related work.
Method: Technical mechanisms and figure interpretation.
Experiment: Experimental setup and result analysis.
Conclusion: Key takeaways and supported claims.

🚀 Quick Start

System dependencies:

TeX Live (XeLaTeX).
Poppler (pdftoppm) or ImageMagick.
Fonts: Source Han Sans and Fira Sans

Install dependencies with pixi:

pixi install
# Enable CosyVoice TTS
git clone --recursive https://github.com/FunAudioLLM/CosyVoice.git third_party/CosyVoice

Set an LLM key:
- OpenAI: set llm.openai_api_key in config.json.
- Gemini: set llm.gemini_api_key in config.json or export GEMINI_API_KEY / GOOGLE_API_KEY.
Place your PDF at input/paper.pdf or pass --source-pdf.
Run:

pixi shell
python main.py --source-pdf input/paper.pdf --target-language English --output-root output

🐳 Docker Deployment

docker build -t mpresenter:latest .

Run:

mkdir -p input output cache
cp /path/to/paper.pdf input/paper.pdf
docker run --rm -it \
  -v "$(pwd)/input:/app/input" \
  -v "$(pwd)/output:/app/output" \
  -v "$(pwd)/cache:/app/cache" \
  -v "$(pwd)/config.json:/app/config.json" \
  -e GEMINI_API_KEY=YOUR_KEY \
  mpresenter:latest \
  --source-pdf /app/input/paper.pdf --target-language English --output-root /app/output

⌨️ CLI

Common command-line options:

--source-pdf: override source_pdf (default input/paper.pdf)
--output-root: override output_root
--cache-root: override cache_root
--target-language: override target_language (default: English)
--planner-note: override planner_note
--backbone: override all LLM model names with a single model

Other configuration is defined in config.json.

Session cache IDs are derived from the input PDF filename (stem), so re-running with the same PDF name reuses the same cache directory under cache/.

📂 Outputs

Cache directory (cache/<session_id>/) key files and folders:

steps_status.json: pipeline step status for resume.
run.log: main run log.
final_outline.json: finalized outline JSON.
slides_manifest.json: slide-level manifest (image path + note).
slides.tex and slides.pdf: merged Beamer source and PDF.
slides_llm/: slide PNGs for LLM review.
slides/: slide PNGs for video synthesis.
final_scripts.json: finalized narration scripts.
audio/: synthesized audio clips per slide.

Final output:

output/<source_pdf_stem>.mp4: final video.

⏱️ Efficiency Analysis

mPresenter maintains low latency and minimal API costs while producing superior video quality.

🙌 Acknowledgements

Thanks to these open-source projects:

PaddleOCR (PP-DocLayout-L + OCR): https://github.com/PaddlePaddle/PaddleOCR
Metropolis Beamer theme (mtheme): https://github.com/matze/mtheme
FunAudioLLM/CosyVoice (TTS): https://github.com/FunAudioLLM/CosyVoice

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
assets		assets
presentbench		presentbench
resources		resources
src/mpresenter		src/mpresenter
.dockerignore		.dockerignore
.gitattributes		.gitattributes
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
config.json		config.json
main.py		main.py
pixi.lock		pixi.lock
pixi.toml		pixi.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

mPresenter

✨ Features

🎬 Demo Videos

📊 mPreBench Dataset

Evaluation Dimensions

🚀 Quick Start

🐳 Docker Deployment

⌨️ CLI

📂 Outputs

⏱️ Efficiency Analysis

🙌 Acknowledgements

About

Uh oh!

Releases

Packages

Languages

License

LiinXemmon/mpresenter

Folders and files

Latest commit

History

Repository files navigation

mPresenter

✨ Features

🎬 Demo Videos

📊 mPreBench Dataset

Evaluation Dimensions

🚀 Quick Start

🐳 Docker Deployment

⌨️ CLI

📂 Outputs

⏱️ Efficiency Analysis

🙌 Acknowledgements

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages