Extract DOCX documents to structured data and rebuild them programmatically. This library provides full round-trip conversion with preservation of formatting, styles, tables, images, and numbering.
npm install @habibrosyad/docx-extractorimport { DocxExtractor, DocxBuilder } from '@habibrosyad/docx-extractor';
import fs from 'fs';
// Extract DOCX to structured data
const extractor = new DocxExtractor();
const buffer = fs.readFileSync('document.docx');
const document = await extractor.extract(buffer);
// Modify the document structure
document.paragraphs.forEach(para => {
if (para.runs) {
para.runs.forEach(run => {
// Modify text, formatting, etc.
});
}
});
// Rebuild DOCX
const builder = new DocxBuilder();
const newBuffer = await builder.build(document);
fs.writeFileSync('output.docx', newBuffer);- ✅ Full Structure Extraction: Paragraphs, tables, images, styles, numbering
- ✅ Formatting Preservation: Fonts, colors, spacing, alignment, borders
- ✅ Round-Trip Conversion: Extract → Modify → Rebuild without data loss
- ✅ Style Support: All paragraph and character styles with inheritance
- ✅ Table Support: Complete table structure with cell properties
- ✅ Image Support: Extract and embed images in documents
- ✅ Numbering Support: Bullet lists and numbered lists
Extract a DOCX file to structured data.
const extractor = new DocxExtractor();
const document = await extractor.extract(buffer);Returns: ExtractedDocument with:
paragraphs: Array of paragraphstables: Array of tablesbody: Ordered sequence of paragraphs and tablesstyles: Map of style definitionsdefaults: Document-wide run formatting defaultsparagraphDefaults: Document-wide paragraph defaults (spacing, etc.)numbering: Map of numbering definitionsmediaFiles: Map of embedded images/media
Build a DOCX file from structured data.
const builder = new DocxBuilder();
const buffer = await builder.build(document);Parameters: ExtractedDocument (from extractor or construct it yourself)
Returns: Promise<Uint8Array> - DOCX file buffer (compatible with Node.js, browsers, and edge runtimes)
This library works across all JavaScript runtimes:
- ✅ Node.js: Full support (v18+)
- ✅ Cloudflare Workers: Full support
- ✅ Browsers: Full support
- ✅ Deno: Full support
- ✅ Bun: Full support
export default {
async fetch(request) {
// Get DOCX file from request
const arrayBuffer = await request.arrayBuffer();
// Extract and process
const extractor = new DocxExtractor();
const document = await extractor.extract(arrayBuffer);
// Modify document...
// Rebuild
const builder = new DocxBuilder();
const outputBuffer = await builder.build(document);
// Return as response
return new Response(outputBuffer, {
headers: {
'Content-Type': 'application/vnd.openxmlformats-officedocument.wordprocessingml.document',
'Content-Disposition': 'attachment; filename="output.docx"'
}
});
}
};runs: Array of text runs with formattingspacing: Paragraph spacing (before, after, line)alignment: Left, center, right, justifyindentation: Left, right, firstLine, hangingstyleName: Applied paragraph style
rows: Array of table rowsrows[i].cells: Array of table cellscells[i].runs: Text content with formattingcells[i].width,backgroundColor,borders: Cell properties
bold,italic,underline,strikefontSize,fontFamily,colorhighlightcolor
- Modern JavaScript runtime with ES modules support (Node.js >= 18, Deno, Bun, Cloudflare Workers, or modern browsers)
MIT