🌐 Web Content Extractor

Extract clean, readable content from any webpage in seconds.

Demo · Documentation · API Reference · MCP Integration

✨ Features

Feature	Description
🚀 One-Click Extraction	Paste URL, click extract, get clean markdown
📝 Multiple Formats	Export as Markdown, HTML, or plain text
🎨 Beautiful UI	Modern interface with dark/light theme support
💾 Local History	Auto-saves extraction history in browser
🤖 MCP Integration	Works with Claude, ChatGPT, and AI agents
🔌 REST API	Programmatic access for automation
⚡ Blazing Fast	Uses Mozilla Readability for instant parsing

🚀 Quick Start

Prerequisites

Node.js 18+
pnpm (recommended) or npm

Installation

# Clone the repository
git clone https://github.com/isshiki-dev/web-content-extractor.git
cd web-content-extractor

# Install dependencies
pnpm install

# Start development servers
pnpm dev:full

💡 Tip: pnpm dev:full starts both the frontend (port 5173) and API server (port 3001) concurrently.

Manual Start

# Terminal 1: Frontend
pnpm dev

# Terminal 2: API Server
pnpm server

📖 Usage

Web Interface

Open http://localhost:5173
Paste any URL in the input field
Click Extract
View, copy, or save the extracted content

Keyboard Shortcuts

Shortcut	Action
`Ctrl/Cmd + Enter`	Extract URL
`Ctrl/Cmd + C`	Copy content
`Ctrl/Cmd + S`	Save as file

🔌 API Reference

Extract Content

POST /api/extract
Content-Type: application/json

{
  "url": "https://example.com/article"
}

Response:

{
  "success": true,
  "data": {
    "title": "Article Title",
    "content": "# Article Title\n\nExtracted content...",
    "textContent": "Plain text version...",
    "excerpt": "Brief summary...",
    "byline": "Author Name",
    "siteName": "Example Site",
    "length": 1234,
    "url": "https://example.com/article"
  }
}

Save Content

POST /api/save
Content-Type: application/json

{
  "content": "# Title\n\nContent...",
  "filename": "article.md"
}

List Saved Files

GET /api/files

Get Sitemap

GET /sitemap.xml

🤖 MCP Server

The MCP (Model Context Protocol) server allows AI agents like Claude to extract web content directly.

Starting the MCP Server

pnpm mcp

Available Tools

`extract_content`

Extract content from a single URL.

{
  "name": "extract_content",
  "arguments": {
    "url": "https://example.com/article",
    "format": "markdown"
  }
}

`extract_multiple`

Extract content from multiple URLs in parallel.

{
  "name": "extract_multiple",
  "arguments": {
    "urls": [
      "https://example.com/article1",
      "https://example.com/article2"
    ],
    "format": "markdown"
  }
}

Claude Desktop Integration

Add to your claude_desktop_config.json:

{
  "mcpServers": {
    "web-extractor": {
      "command": "npx",
      "args": ["tsx", "/path/to/web-content-extractor/server/mcp.ts"]
    }
  }
}

🏗️ Project Structure

web-content-extractor/
├── 📁 public/              # Static assets
├── 📁 server/
│   ├── api.ts              # Express REST API
│   └── mcp.ts              # MCP server for AI agents
├── 📁 src/
│   ├── 📁 components/
│   │   ├── ExtractorForm.tsx
│   │   ├── ExtractedContent.tsx
│   │   ├── History.tsx
│   │   ├── MarkdownDisplay.tsx
│   │   ├── SaveDialog.tsx
│   │   └── ui/             # shadcn/ui components
│   ├── 📁 hooks/
│   │   └── useExtractor.ts
│   ├── 📁 lib/
│   │   ├── extractor.ts    # Server extraction logic
│   │   ├── client-extractor.ts
│   │   └── api-handler.ts
│   ├── App.tsx
│   └── main.tsx
├── package.json
├── vite.config.ts
└── tailwind.config.js

🛠️ Tech Stack

React 18

TypeScript

Vite

Tailwind

Node.js

Express

Key Dependencies

@mozilla/readability - Content extraction engine
jsdom - DOM parsing for Node.js
@modelcontextprotocol/sdk - MCP integration
Framer Motion - Smooth animations
shadcn/ui - Beautiful UI components

📊 Performance

Metric	Value
Average extraction time	< 500ms
Bundle size (gzipped)	~85kb
Lighthouse score	95+

🤝 Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

Fork the repository
Create your feature branch (git checkout -b feature/AmazingFeature)
Commit your changes (git commit -m 'Add some AmazingFeature')
Push to the branch (git push origin feature/AmazingFeature)
Open a Pull Request

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

Made with ❤️ by isshiki-dev

⭐ Star this repo if you find it useful!

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
public		public
server		server
src		src
.gitignore		.gitignore
AGENTS.md		AGENTS.md
LICENSE		LICENSE
README.md		README.md
eslint.config.js		eslint.config.js
index.html		index.html
llms-full.txt		llms-full.txt
llms.txt		llms.txt
package.json		package.json
postcss.config.js		postcss.config.js
tailwind.config.js		tailwind.config.js
tsconfig.app.json		tsconfig.app.json
tsconfig.json		tsconfig.json
tsconfig.node.json		tsconfig.node.json
vite.config.ts		vite.config.ts

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

🌐 Web Content Extractor

✨ Features

🚀 Quick Start

Prerequisites

Installation

Manual Start

📖 Usage

Web Interface

Keyboard Shortcuts

🔌 API Reference

Extract Content

Save Content

List Saved Files

Get Sitemap

🤖 MCP Server

Starting the MCP Server

Available Tools

`extract_content`

`extract_multiple`

Claude Desktop Integration

🏗️ Project Structure

🛠️ Tech Stack

Key Dependencies

📊 Performance

🤝 Contributing

📄 License

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

License

isshiki-dev/web-content-extractor

Folders and files

Latest commit

History

Repository files navigation

🌐 Web Content Extractor

✨ Features

🚀 Quick Start

Prerequisites

Installation

Manual Start

📖 Usage

Web Interface

Keyboard Shortcuts

🔌 API Reference

Extract Content

Save Content

List Saved Files

Get Sitemap

🤖 MCP Server

Starting the MCP Server

Available Tools

extract_content

extract_multiple

Claude Desktop Integration

🏗️ Project Structure

🛠️ Tech Stack

Key Dependencies

📊 Performance

🤝 Contributing

📄 License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

`extract_content`

`extract_multiple`

Packages