Text Chunker is a lightweight, browser-based tool that splits long texts into sentence-aware chunks with configurable overlap β perfect for RAG pipelines, embeddings, and vector database ingestion.
Everything runs locally in your browser β no server, no setup, no dependencies.
- βοΈ Sentence-Aware Chunking β Splits text intelligently by sentence boundaries for cleaner, context-aware results.
- π Configurable Overlap β Preserves context between chunks (default 12%).
- π Detailed Stats β Displays total word count, chunk count, and average chunk size.
- π One-Click Copy β Instantly copy any chunk to your clipboard.
- π» 100% Client-Side β Works fully offline; no backend required.
- π¨ Modern UI β Clean, responsive design built with pure HTML, CSS, and JavaScript.
-
Sentence Splitting
The text is scanned for sentence-ending punctuation (.,!,?,ΰ₯€,γ) and split accordingly. -
Balanced Chunking
Sentences are grouped dynamically to balance word counts across chunks without exceeding the maximum limit. -
Context Overlap
Each chunk (except the first) includes a small portion (~12%) of the previous chunkβs tail sentences to maintain semantic continuity β ideal for RAG, LLMs, or embedding generation.
All settings can be adjusted from the app interface (βοΈ Configuration Settings) or directly via code.
| Setting | Default | Description |
|---|---|---|
| Max Chunk Size (words) | 400 |
Maximum word count per chunk. |
| Overlap Percentage | 12 |
Percentage of previous chunk words to overlap. |
| Overlap Flexibility | 1.5 |
Allows up to 1.5Γ overlap range to include full sentences. |
- Clone this repository:
git clone https://github.com/<your-username>/<your-repo-name>.git cd <your-repo-name>
- Open the file textChunkerRagTool.html in your browser.
- Paste your text, adjust settings if needed, and click βChunk Textβ.
- Copy chunks easily using the π Copy buttons.
π Interface Overview
- π₯ Input Section β Paste or write the text you want to chunk.
- π Stats Panel β Displays total word count, chunk count, and average size.
- π€ Chunked Output β Lists each chunk with overlap information and a copy button.
- βοΈ Settings Panel β Configure chunk size and overlap interactively.
π Privacy
- All processing occurs entirely in your browser.
- No data is sent to external servers β safe for confidential or private text.
π§© Tech Stack
- HTML5
- CSS3
- Vanilla JavaScript No external dependencies or frameworks required. Works on all modern browsers.
πΊοΈ Roadmap Ideas
- π§Ύ Export chunks as JSON / TXT
- π§ Add token-based chunking (e.g., using tiktoken)
- π Multilingual sentence detection
- π Drag-and-drop file input (PDF/DOCX via client-side parsing)
- π Semantic similarity visualization between chunks
π€ Contributing
- Pull requests and feature ideas are welcome!
- Please keep the project lightweight and dependency-free.
- If you submit UI changes, include a short before/after example or screenshot.
π License
- This project is licensed under the MIT License.
- You are free to use, modify, and distribute it for both personal and commercial purposes.
π¨βπ» Author
- Developed by Ali Enver YΔ±lmaz(me)
- A simple yet powerful open-source tool for developers working with RAG, LLMs, and NLP pipelines who need fast and reliable text segmentation.