Firecrawler

A lightweight frontend for self-hosted Firecrawl API instances. This playground provides a user-friendly interface for using Firecrawl's web scraping and crawling capabilities.

Features

Scrape Mode: Convert a single URL to markdown, HTML, or take screenshots
Crawl Mode: Discover and scrape multiple pages from a starting URL
Extract Mode: Extract structured data from web pages using LLM
CORS-free: Uses a proxy server to avoid CORS issues when connecting to your Firecrawl API instance

Getting Started

Running Locally

Configure environment variables:
```
cp .example.env .env
```
Edit the .env file to set your desired configuration.
Install dependencies and run:
```
npm i
npm start
```
Open your browser and navigate to http://localhost:3000
Enter your Firecrawl API endpoint (default: http://firecrawl:3002)
Enter your API key if required
Choose a mode (Scrape, Crawl, or Extract), enter a URL, and click "Run"

Using Docker

Configure environment variables:
```
cp .example.env .env
```
Then edit the .env file to set your desired configuration.
Build and run using Docker Compose:
```
docker-compose up -d
```
Open your browser and navigate to http://localhost:3000

Modes

Scrape Mode

Scrape mode allows you to convert a single URL to various formats:

Markdown: Clean, readable markdown format
HTML: Raw HTML content
Screenshot: Visual capture of the page
Links: Extract all links from the page

Advanced options include:

Only Main Content: Filter out navigation, footers, etc.
Remove Base64 Images: Exclude embedded images
Wait For: Time to wait for dynamic content to load
Timeout: Maximum time to wait for the page to load

Crawl Mode

Crawl mode allows you to discover and scrape multiple pages from a starting URL:

Max Depth: How many links deep to crawl
Page Limit: Maximum number of pages to crawl
Ignore Sitemap: Skip sitemap.xml discovery
Allow External Links: Crawl links to external domains
Include/Exclude Paths: Filter which paths to crawl

Extract Mode

Extract mode allows you to extract structured data from web pages using LLM:

Extraction Prompt: Instructions for what data to extract
JSON Schema: Optional schema for structured data extraction

API Compatibility

This playground is designed to work with self-hosted Firecrawl API instances. It's compatible with the Firecrawl API v1 endpoints.

Development

This is a lightweight application built with vanilla JavaScript, HTML, and CSS. Dependencies are loaded from CDNs:

Milligram CSS for minimal styling
Marked.js for markdown rendering
Highlight.js for syntax highlighting

No build process is required - simply edit the files and refresh the browser to see changes.

Technical Details

Server: Node.js with Express
Proxy: Custom HTTP proxy middleware
Configuration: Environment variables via dotenv (.env file)

Firecrawler Examples

Here are some examples of how to use the Firecrawler with different modes.

Scrape Mode Examples

Basic Markdown Conversion

Enter URL: https://smcleod.net
Select Format: markdown
Enable "Only Main Content"
Click "Run"

Screenshot Capture

Enter URL: https://news.ycombinator.com
Select Formats: markdown, screenshot
Set Wait For: 3000 (3 seconds)
Click "Run"

HTML Extraction

Enter URL: https://github.com
Select Formats: html, markdown
Disable "Only Main Content" to get the full page
Click "Run"

Crawl Mode Examples

Basic Website Crawl

Switch to "Crawl" mode
Enter URL: https://smcleod.net
Set Max Depth: 2
Set Page Limit: 10
Select Format: markdown
Click "Run"

Blog Crawl with Path Filtering

Switch to "Crawl" mode
Enter URL: https://smcleod.net/about
Set Max Depth: 3
Set Page Limit: 20
Include Paths: blog,posts
Exclude Paths: admin,login,register
Click "Run"

Extract Mode Examples

Basic Content Extraction

Switch to "Extract" mode
Enter URL: https://smcleod.net
Extraction Prompt: Extract the main heading, summary, and author from this page.
Click "Run"

Structured Data Extraction

Switch to "Extract" mode
Enter URL: https://news.ycombinator.com
Extraction Prompt: Extract the top 5 stories with their titles, points, and authors.
JSON Schema:

{
  "type": "object",
  "properties": {
    "stories": {
      "type": "array",
      "items": {
        "type": "object",
        "properties": {
          "title": { "type": "string" },
          "points": { "type": "number" },
          "author": { "type": "string" }
        }
      }
    }
  }
}

Click "Run"

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
.github		.github
assets		assets
lib		lib
.eslintrc.json		.eslintrc.json
.example.env		.example.env
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
Dockerfile		Dockerfile
README.md		README.md
app.js		app.js
docker-compose.yml		docker-compose.yml
favicon.svg		favicon.svg
index.html		index.html
jsconfig.json		jsconfig.json
package.json		package.json
screenshot.png		screenshot.png
server.js		server.js
styles.css		styles.css

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Repository files navigation

Firecrawler

Features

Getting Started

Running Locally

Using Docker

Modes

Scrape Mode

Crawl Mode

Extract Mode

API Compatibility

Development

Technical Details

Firecrawler Examples

Scrape Mode Examples

Basic Markdown Conversion

Screenshot Capture

HTML Extraction

Crawl Mode Examples

Basic Website Crawl

Blog Crawl with Path Filtering

Extract Mode Examples

Basic Content Extraction

Structured Data Extraction

About

Uh oh!

Releases

Sponsor this project

Uh oh!

Packages

Uh oh!

Uh oh!

Languages

Uh oh!

sammcj/firecrawler

Folders and files

Latest commit

History

Repository files navigation

Firecrawler

Features

Getting Started

Running Locally

Using Docker

Modes

Scrape Mode

Crawl Mode

Extract Mode

API Compatibility

Development

Technical Details

Firecrawler Examples

Scrape Mode Examples

Basic Markdown Conversion

Screenshot Capture

HTML Extraction

Crawl Mode Examples

Basic Website Crawl

Blog Crawl with Path Filtering

Extract Mode Examples

Basic Content Extraction

Structured Data Extraction

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Sponsor this project

Uh oh!

Packages 0

Uh oh!

Uh oh!

Languages

Packages