PPTX Content Extractor is a Node.js library for extracting slides, notes, and media content (e.g., images) from .pptx files. This tool leverages JSZip for unpacking .pptx archives and xml2js for parsing XML-based content.
- Extract text content from PowerPoint slides (
.pptx). - Retrieve media files (e.g., images) embedded in the presentation.
- Extract speaker notes for each slide.
- Modular structure for extracting specific content types (slides, media, or notes).
Install the library via npm:
npm install --save pptx-content-extractorExtract all slides, media, and notes from a .pptx file:
import { extractPptx } from 'pptx-content-extractor';
(async () => {
const result = await extractPptx('/path/to/presentation.pptx');
console.log('Slides:', result.slides);
console.log('Media:', result.media);
console.log('Notes:', result.notes);
})();import { extractPptxSlides } from 'pptx-content-extractor';
(async () => {
const slides = await extractPptxSlides('/path/to/presentation.pptx');
console.log('Slides:', slides);
})();import { extractPptxMedia } from 'pptx-content-extractor';
(async () => {
const media = await extractPptxMedia('/path/to/presentation.pptx');
console.log('Media:', media);
})();import { extractPptxNotes } from 'pptx-content-extractor';
(async () => {
const notes = await extractPptxNotes('/path/to/presentation.pptx');
console.log('Notes:', notes);
})();Extracts slides, media, and notes from a .pptx file.
filePath: Path to the.pptxfile.- Returns: A
Promise<ParsedPowerPoint>containing:slides: An array of parsed slides.media: An array of media content.notes: An array of parsed notes.
Extracts only the slides.
filePath: Path to the.pptxfile.- Returns: A
Promise<ParsedSlide[]>containing parsed slides.
Extracts only the media content.
filePath: Path to the.pptxfile.- Returns: A
Promise<ParsedMedia[]>containing media content.
Extracts only the notes.
filePath: Path to the.pptxfile.- Returns: A
Promise<ParsedNote[]>containing parsed notes.
Base interface for parsed content.
export interface ParsedContent {
name: string;
content: unknown;
}export interface ParsedPowerPoint {
slides: ParsedSlide[];
media: ParsedMedia[];
notes: ParsedNote[];
}export interface ParsedSlide extends ParsedContent {
content: { id: string; type: string; text: string[] }[];
mediaNames: string[] // names of media file e.g. ['image23.jpeg']
}export interface ParsedMedia extends ParsedContent {
content: string; // Base64-encoded media content
}export interface ParsedNote extends ParsedContent {
content: string;
}