Skip to content

Fixed summarization UDF to work with latest upstream changes#1213

Open
KyleZheng1284 wants to merge 1 commit intoNVIDIA:mainfrom
KyleZheng1284:feature/llm-summarizer-udf
Open

Fixed summarization UDF to work with latest upstream changes#1213
KyleZheng1284 wants to merge 1 commit intoNVIDIA:mainfrom
KyleZheng1284:feature/llm-summarizer-udf

Conversation

@KyleZheng1284
Copy link
Collaborator

@KyleZheng1284 KyleZheng1284 commented Dec 17, 2025

Description

Fixed changes within summarization UDF

  1. Pre Split (32 pages) was causing multiple summaries per file depending on the length. Example if total pages in PDF is 32-64 it will 2 summaries, if it has 64-96 pages it will have 3 summaries, ...

    • Added simple logic to skip non-first sections, so only the first and last chunk of the first section (1-32 pages) will be used for summarization.
  2. Fixed custom pipeline to use nemotron_parse* from depreciated nemoretriever-parse*

  3. Minor upstream fixes

Checklist

  • I am familiar with the Contributing Guidelines.
  • New or existing tests cover these changes.
  • The documentation is up to date with these changes.
  • If adjusting docker-compose.yaml environment variables have you ensured those are mimicked in the Helm values.yaml file.

@KyleZheng1284 KyleZheng1284 requested a review from a team as a code owner December 17, 2025 18:50
@copy-pr-bot
Copy link

copy-pr-bot bot commented Dec 17, 2025

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

max_tokens=400,
temperature=0.7,
)
url = f"{base_url.rstrip('/')}/chat/completions"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this the only place this code will be used to call the summarizer? If not, should we consider making this into something that we can call in other places like a class or simply a function?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants