Skip to content

Document VLM Captioning for Infographics#1369

Open
kheiss-uwzoo wants to merge 12 commits intoNVIDIA:mainfrom
kheiss-uwzoo:kheiss/vlm-info-capt
Open

Document VLM Captioning for Infographics#1369
kheiss-uwzoo wants to merge 12 commits intoNVIDIA:mainfrom
kheiss-uwzoo:kheiss/vlm-info-capt

Conversation

@kheiss-uwzoo
Copy link
Collaborator

Summary

Added comprehensive documentation explaining how to use VLMs to caption infographics across multiple documentation files.

The changes provide:

  • Clear explanations of VLM captioning for infographics
  • Code examples showing practical implementation
  • Proper VLM acronym definitions
  • Cross-references between related documentation sections

Changes Made

  1. Added new section: "VLM Captioning for Infographics Example"
  • Explains why infographics benefit from VLM captioning
  • Provides three approaches:
  • Approach 1: Extract and caption infographics (generates text descriptions)
  • Approach 2: Embed infographics as images (preserves visual characteristics)
  • Combining Both: Use both captioning and embedding together
  • Includes complete code examples for each approach
  • Added guidance on when to use each method
  1. nv-ingest-python-api.md (Enhanced existing section)
  • Enhanced "Extract Captions from Images" section to explicitly mention infographics
  • Added new subsection: "Captioning Infographics"
  • Includes code example showing extract_infographics=True with .caption()
  • Added note about requiring the vlm profile
  • Cross-referenced to vlm-embed.md
  1. quickstart-guide.md (Enhanced profile documentation + example)
  • Updated VLM profile description to mention infographics
  • Added explanation that profile enables .caption() method
  • Added new subsection: "Example: Using the VLM Profile for Infographic Captioning"
  • Includes docker compose command with both retrieval and vlm profiles
  • Provides complete end-to-end Python example
  • Added tip explaining benefits of VLM captioning for complex visuals
  1. support-matrix.md (Clarified VLM feature)
  • Defined VLM acronym on first use (line 27)
  • Updated VLM feature description to include infographics
  • Ensures consistency across documentation

Documentation Standards

✅ Cross-references added between related sections
✅ Code examples are complete and runnable
✅ Progressive detail: quick reference → detailed examples → comprehensive guide
✅ Multiple entry points for users to discover this information

##Files Modified

  • docs/docs/extraction/vlm-embed.md (+98 lines)
  • docs/docs/extraction/nv-ingest-python-api.md (~30 lines modified/added)
  • docs/docs/extraction/quickstart-guide.md (~50 lines modified/added)
  • docs/docs/extraction/support-matrix.md (~1 line modified)

@kheiss-uwzoo kheiss-uwzoo requested a review from a team as a code owner February 4, 2026 16:57
@kheiss-uwzoo kheiss-uwzoo added the doc Improvements or additions to documentation label Feb 4, 2026
Copy link
Collaborator

@nkmcalli nkmcalli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few suggestions for you

@kheiss-uwzoo kheiss-uwzoo requested a review from nkmcalli February 4, 2026 19:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

doc Improvements or additions to documentation

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants