A lightweight frontend for self-hosted Firecrawl API instances. This playground provides a user-friendly interface for using Firecrawl's web scraping and crawling capabilities.
- Scrape Mode: Convert a single URL to markdown, HTML, or take screenshots
- Crawl Mode: Discover and scrape multiple pages from a starting URL
- Extract Mode: Extract structured data from web pages using LLM
- CORS-free: Uses a proxy server to avoid CORS issues when connecting to your Firecrawl API instance
- Configure environment variables:
Edit the
cp .example.env .env
.envfile to set your desired configuration. - Install dependencies and run:
npm i npm start
- Open your browser and navigate to
http://localhost:3000 - Enter your Firecrawl API endpoint (default: http://firecrawl:3002)
- Enter your API key if required
- Choose a mode (Scrape, Crawl, or Extract), enter a URL, and click "Run"
-
Configure environment variables:
cp .example.env .envThen edit the
.envfile to set your desired configuration. -
Build and run using Docker Compose:
docker-compose up -d -
Open your browser and navigate to
http://localhost:3000
Scrape mode allows you to convert a single URL to various formats:
- Markdown: Clean, readable markdown format
- HTML: Raw HTML content
- Screenshot: Visual capture of the page
- Links: Extract all links from the page
Advanced options include:
- Only Main Content: Filter out navigation, footers, etc.
- Remove Base64 Images: Exclude embedded images
- Wait For: Time to wait for dynamic content to load
- Timeout: Maximum time to wait for the page to load
Crawl mode allows you to discover and scrape multiple pages from a starting URL:
- Max Depth: How many links deep to crawl
- Page Limit: Maximum number of pages to crawl
- Ignore Sitemap: Skip sitemap.xml discovery
- Allow External Links: Crawl links to external domains
- Include/Exclude Paths: Filter which paths to crawl
Extract mode allows you to extract structured data from web pages using LLM:
- Extraction Prompt: Instructions for what data to extract
- JSON Schema: Optional schema for structured data extraction
This playground is designed to work with self-hosted Firecrawl API instances. It's compatible with the Firecrawl API v1 endpoints.
This is a lightweight application built with vanilla JavaScript, HTML, and CSS. Dependencies are loaded from CDNs:
- Milligram CSS for minimal styling
- Marked.js for markdown rendering
- Highlight.js for syntax highlighting
No build process is required - simply edit the files and refresh the browser to see changes.
- Server: Node.js with Express
- Proxy: Custom HTTP proxy middleware
- Configuration: Environment variables via dotenv (.env file)
Here are some examples of how to use the Firecrawler with different modes.
- Enter URL:
https://smcleod.net - Select Format:
markdown - Enable "Only Main Content"
- Click "Run"
- Enter URL:
https://news.ycombinator.com - Select Formats:
markdown,screenshot - Set Wait For:
3000(3 seconds) - Click "Run"
- Enter URL:
https://github.com - Select Formats:
html,markdown - Disable "Only Main Content" to get the full page
- Click "Run"
- Switch to "Crawl" mode
- Enter URL:
https://smcleod.net - Set Max Depth:
2 - Set Page Limit:
10 - Select Format:
markdown - Click "Run"
- Switch to "Crawl" mode
- Enter URL:
https://smcleod.net/about - Set Max Depth:
3 - Set Page Limit:
20 - Include Paths:
blog,posts - Exclude Paths:
admin,login,register - Click "Run"
- Switch to "Extract" mode
- Enter URL:
https://smcleod.net - Extraction Prompt:
Extract the main heading, summary, and author from this page. - Click "Run"
- Switch to "Extract" mode
- Enter URL:
https://news.ycombinator.com - Extraction Prompt:
Extract the top 5 stories with their titles, points, and authors. - JSON Schema:
{
"type": "object",
"properties": {
"stories": {
"type": "array",
"items": {
"type": "object",
"properties": {
"title": { "type": "string" },
"points": { "type": "number" },
"author": { "type": "string" }
}
}
}
}
}- Click "Run"
