Extract clean, readable content from any webpage in seconds.
Demo Β· Documentation Β· API Reference Β· MCP Integration
| Feature | Description |
|---|---|
| π One-Click Extraction | Paste URL, click extract, get clean markdown |
| π Multiple Formats | Export as Markdown, HTML, or plain text |
| π¨ Beautiful UI | Modern interface with dark/light theme support |
| πΎ Local History | Auto-saves extraction history in browser |
| π€ MCP Integration | Works with Claude, ChatGPT, and AI agents |
| π REST API | Programmatic access for automation |
| β‘ Blazing Fast | Uses Mozilla Readability for instant parsing |
- Node.js 18+
- pnpm (recommended) or npm
# Clone the repository
git clone https://github.com/isshiki-dev/web-content-extractor.git
cd web-content-extractor
# Install dependencies
pnpm install
# Start development servers
pnpm dev:fullπ‘ Tip:
pnpm dev:fullstarts both the frontend (port 5173) and API server (port 3001) concurrently.
# Terminal 1: Frontend
pnpm dev
# Terminal 2: API Server
pnpm server- Open
http://localhost:5173 - Paste any URL in the input field
- Click Extract
- View, copy, or save the extracted content
| Shortcut | Action |
|---|---|
Ctrl/Cmd + Enter |
Extract URL |
Ctrl/Cmd + C |
Copy content |
Ctrl/Cmd + S |
Save as file |
POST /api/extract
Content-Type: application/json
{
"url": "https://example.com/article"
}Response:
{
"success": true,
"data": {
"title": "Article Title",
"content": "# Article Title\n\nExtracted content...",
"textContent": "Plain text version...",
"excerpt": "Brief summary...",
"byline": "Author Name",
"siteName": "Example Site",
"length": 1234,
"url": "https://example.com/article"
}
}POST /api/save
Content-Type: application/json
{
"content": "# Title\n\nContent...",
"filename": "article.md"
}GET /api/filesGET /sitemap.xmlThe MCP (Model Context Protocol) server allows AI agents like Claude to extract web content directly.
pnpm mcpExtract content from a single URL.
{
"name": "extract_content",
"arguments": {
"url": "https://example.com/article",
"format": "markdown"
}
}Extract content from multiple URLs in parallel.
{
"name": "extract_multiple",
"arguments": {
"urls": [
"https://example.com/article1",
"https://example.com/article2"
],
"format": "markdown"
}
}Add to your claude_desktop_config.json:
{
"mcpServers": {
"web-extractor": {
"command": "npx",
"args": ["tsx", "/path/to/web-content-extractor/server/mcp.ts"]
}
}
}web-content-extractor/
βββ π public/ # Static assets
βββ π server/
β βββ api.ts # Express REST API
β βββ mcp.ts # MCP server for AI agents
βββ π src/
β βββ π components/
β β βββ ExtractorForm.tsx
β β βββ ExtractedContent.tsx
β β βββ History.tsx
β β βββ MarkdownDisplay.tsx
β β βββ SaveDialog.tsx
β β βββ ui/ # shadcn/ui components
β βββ π hooks/
β β βββ useExtractor.ts
β βββ π lib/
β β βββ extractor.ts # Server extraction logic
β β βββ client-extractor.ts
β β βββ api-handler.ts
β βββ App.tsx
β βββ main.tsx
βββ package.json
βββ vite.config.ts
βββ tailwind.config.js
|
React 18 |
TypeScript |
Vite |
Tailwind |
Node.js |
Express |
- @mozilla/readability - Content extraction engine
- jsdom - DOM parsing for Node.js
- @modelcontextprotocol/sdk - MCP integration
- Framer Motion - Smooth animations
- shadcn/ui - Beautiful UI components
| Metric | Value |
|---|---|
| Average extraction time | < 500ms |
| Bundle size (gzipped) | ~85kb |
| Lighthouse score | 95+ |
Contributions are welcome! Please feel free to submit a Pull Request.
- Fork the repository
- Create your feature branch (
git checkout -b feature/AmazingFeature) - Commit your changes (
git commit -m 'Add some AmazingFeature') - Push to the branch (
git push origin feature/AmazingFeature) - Open a Pull Request
This project is licensed under the MIT License - see the LICENSE file for details.
Made with β€οΈ by isshiki-dev
β Star this repo if you find it useful!