Hi,
The TextChunker occasionally overcounts the last paragraph size because ProcessParagraphs orphan chunk gluing logic is using number of words instead of number of tokens which can lead to last chunk exceeding the target length. This leads to the results of TextChunker.SplitPlainTextParagraphs sometimes have too large chunks, causing (frequently silent) loss of information when generating embeddings or using a reranker.
Platform
Hi,
The
TextChunkeroccasionally overcounts the last paragraph size becauseProcessParagraphsorphan chunk gluing logic is using number of words instead of number of tokens which can lead to last chunk exceeding the target length. This leads to the results ofTextChunker.SplitPlainTextParagraphssometimes have too large chunks, causing (frequently silent) loss of information when generating embeddings or using a reranker.Platform