Skip to content

πŸ•·οΈ Official Scrapegraph API SDK: Effortlessly extract content from any website. AI-powered. πŸ€– Hassle-free web scraping made simple.

License

Notifications You must be signed in to change notification settings

ScrapeGraphAI/scrapegraph-sdk

Repository files navigation

🌐 ScrapeGraph AI SDKs

License Python SDK JavaScript SDK Documentation

Official SDKs for the ScrapeGraph AI API - Intelligent web scraping and search powered by AI. Extract structured data from any webpage or perform AI-powered web searches with natural language prompts.

Get your API key! API Banner

Features

  • πŸ€– SmartScraper: Extract structured data from webpages using natural language prompts
  • πŸ” SearchScraper: AI-powered web search with structured results and reference URLs
  • πŸ“ Markdownify: Convert any webpage into clean, formatted markdown
  • πŸ•·οΈ SmartCrawler: Intelligently crawl and extract data from multiple pages
  • πŸ€– AgenticScraper: Perform automated browser actions with AI-powered session management
  • πŸ“„ Scrape: Convert webpages to HTML with JavaScript rendering and custom headers
  • ⏰ Scheduled Jobs: Create and manage automated scraping workflows with cron scheduling
  • πŸ’³ Credits Management: Monitor API usage and credit balance
  • πŸ’¬ Feedback System: Provide ratings and feedback to improve service quality

πŸš€ Quick Links

ScrapeGraphAI offers seamless integration with popular frameworks and tools to enhance your scraping capabilities. Whether you're building with Python or Node.js, using LLM frameworks, or working with no-code platforms, we've got you covered with our comprehensive integration options..

You can find more informations at the following link

Integrations:

πŸ“¦ Installation

Python

pip install scrapegraph-py

JavaScript

npm install scrapegraph-js

🎯 Core Features

  • πŸ€– AI-Powered Extraction & Search: Use natural language to extract data or search the web
  • πŸ“Š Structured Output: Get clean, structured data with optional schema validation
  • πŸ”„ Multiple Formats: Extract data as JSON, Markdown, or custom schemas
  • ⚑ High Performance: Concurrent processing and automatic retries
  • πŸ”’ Enterprise Ready: Production-grade security and rate limiting

πŸ› οΈ Available Endpoints

πŸ€– SmartScraper

Using AI to extract structured data from any webpage or HTML content with natural language prompts.

Example Usage:

Python:

from scrapegraph_py import Client
import os
from dotenv import load_dotenv

load_dotenv()

# Initialize the client
client = Client(api_key=os.getenv("SGAI_API_KEY"))

# Extract data from a webpage
response = client.smartscraper(
    website_url="https://example.com",
    user_prompt="Extract the main heading, description, and summary of the webpage",
)

print(f"Request ID: {response['request_id']}")
print(f"Result: {response['result']}")

client.close()

JavaScript:

import { smartScraper } from 'scrapegraph-js';
import 'dotenv/config';

const apiKey = process.env.SGAI_APIKEY;
const url = 'https://example.com';
const prompt = 'Extract the main heading, description, and summary of the webpage';

try {
  const response = await smartScraper(apiKey, url, prompt);
  console.log(response);
} catch (error) {
  console.error(error);
}

πŸ” SearchScraper

Perform AI-powered web searches with structured results and reference URLs.

Example Usage:

Python:

from scrapegraph_py import Client
import os
from dotenv import load_dotenv

load_dotenv()

# Initialize the client
client = Client(api_key=os.getenv("SGAI_API_KEY"))

# Perform AI-powered web search
response = client.searchscraper(
    user_prompt="What is the latest version of Python and what are its main features?",
    num_results=3,  # Number of websites to search (default: 3)
)

print(f"Result: {response['result']}")
print("\nReference URLs:")
for url in response["reference_urls"]:
    print(f"- {url}")

client.close()

JavaScript:

import { searchScraper } from 'scrapegraph-js';
import 'dotenv/config';

const apiKey = process.env.SGAI_APIKEY;
const prompt = 'What is the latest version of Python and what are its main features?';

try {
  const response = await searchScraper(apiKey, prompt, 3); // 3 websites
  console.log('Result:', response.result);
  console.log('\nReference URLs:');
  response.reference_urls?.forEach(url => console.log(`- ${url}`));
} catch (error) {
  console.error(error);
}

πŸ“ Markdownify

Convert any webpage into clean, formatted markdown.

Example Usage:

Python:

from scrapegraph_py import Client
import os
from dotenv import load_dotenv

load_dotenv()

# Initialize the client
client = Client(api_key=os.getenv("SGAI_API_KEY"))

# Convert webpage to markdown
response = client.markdownify(
    website_url="https://example.com",
)

print(f"Request ID: {response['request_id']}")
print(f"Markdown: {response['result']}")

client.close()

JavaScript:

import { markdownify } from 'scrapegraph-js';
import 'dotenv/config';

const apiKey = process.env.SGAI_APIKEY;
const url = 'https://example.com';

try {
  const response = await markdownify(apiKey, url);
  console.log(response);
} catch (error) {
  console.error(error);
}

πŸ•·οΈ SmartCrawler

Intelligently crawl and extract data from multiple pages with configurable depth and batch processing.

Example Usage:

Python:

from scrapegraph_py import Client
import os
import time
from dotenv import load_dotenv

load_dotenv()

# Initialize the client
client = Client(api_key=os.getenv("SGAI_API_KEY"))

# Start crawl job
crawl_response = client.crawl(
    url="https://example.com",
    prompt="Extract page titles and main headings",
    data_schema={
        "type": "object",
        "properties": {
            "title": {"type": "string"},
            "headings": {"type": "array", "items": {"type": "string"}}
        }
    },
    depth=2,
    max_pages=5,
    same_domain_only=True,
)

crawl_id = crawl_response.get("id") or crawl_response.get("task_id")

# Poll for results
if crawl_id:
    for _ in range(10):
        time.sleep(5)
        result = client.get_crawl(crawl_id)
        if result.get("status") == "success":
            print("Crawl completed:", result["result"]["llm_result"])
            break

client.close()

JavaScript:

import { crawl, getCrawlRequest } from 'scrapegraph-js';
import 'dotenv/config';

const apiKey = process.env.SGAI_APIKEY;
const url = 'https://example.com';
const prompt = 'Extract page titles and main headings';
const schema = {
  type: "object",
  properties: {
    title: { type: "string" },
    headings: { type: "array", items: { type: "string" } }
  }
};

try {
  const crawlResponse = await crawl(apiKey, url, prompt, schema, {
    depth: 2,
    maxPages: 5,
    sameDomainOnly: true,
  });

  const crawlId = crawlResponse.id || crawlResponse.task_id;
  
  // Poll for results
  if (crawlId) {
    for (let i = 0; i < 10; i++) {
      await new Promise(resolve => setTimeout(resolve, 5000));
      const result = await getCrawlRequest(apiKey, crawlId);
      if (result.status === 'success') {
        console.log('Crawl completed:', result.result.llm_result);
        break;
      }
    }
  }
} catch (error) {
  console.error(error);
}

πŸ€– AgenticScraper

Perform automated browser actions on webpages using AI-powered agentic scraping with session management.

Example Usage:

Python:

from scrapegraph_py import Client
import os
from dotenv import load_dotenv

load_dotenv()

# Initialize the client
client = Client(api_key=os.getenv("SGAI_API_KEY"))

# Perform automated browser actions
response = client.agenticscraper(
    url="https://example.com",
    use_session=True,
    steps=[
        "Type [email protected] in email input box",
        "Type password123 in password inputbox",
        "click on login"
    ],
    ai_extraction=False  # Set to True for AI extraction
)

print(f"Request ID: {response['request_id']}")
print(f"Status: {response.get('status')}")

# Get results
result = client.get_agenticscraper(response['request_id'])
print(f"Result: {result.get('result')}")

client.close()

JavaScript:

import { agenticScraper, getAgenticScraperRequest } from 'scrapegraph-js';
import 'dotenv/config';

const apiKey = process.env.SGAI_APIKEY;
const url = 'https://example.com';
const steps = [
  'Type [email protected] in email input box',
  'Type password123 in password inputbox',
  'click on login'
];

try {
  const response = await agenticScraper(apiKey, url, steps, true);
  console.log('Request ID:', response.request_id);
  
  // Get results
  const result = await getAgenticScraperRequest(apiKey, response.request_id);
  console.log('Status:', result.status);
  console.log('Result:', result.result);
} catch (error) {
  console.error(error);
}

πŸ“„ Scrape

Convert webpages into HTML format with optional JavaScript rendering and custom headers.

Example Usage:

Python:

from scrapegraph_py import Client
import os
from dotenv import load_dotenv

load_dotenv()

# Initialize the client
client = Client(api_key=os.getenv("SGAI_API_KEY"))

# Get HTML content from webpage
response = client.scrape(
    website_url="https://example.com",
    render_heavy_js=False,  # Set to True for JavaScript-heavy sites
)

print(f"Request ID: {response['request_id']}")
print(f"HTML length: {len(response.get('html', ''))} characters")

client.close()

JavaScript:

import { scrape } from 'scrapegraph-js';
import 'dotenv/config';

const apiKey = process.env.SGAI_APIKEY;
const url = 'https://example.com';

try {
  const response = await scrape(apiKey, url, {
    renderHeavyJs: false,  // Set to true for JavaScript-heavy sites
    headers: { 'User-Agent': 'Custom Agent' }
  });
  console.log('HTML length:', response.html?.length, 'characters');
} catch (error) {
  console.error(error);
}

⏰ Scheduled Jobs

Create, manage, and monitor scheduled scraping jobs with cron expressions and execution history.

πŸ’³ Credits

Check your API credit balance and usage.

πŸ’¬ Feedback

Send feedback and ratings for scraping requests to help improve the service.

🌟 Key Benefits

  • πŸ“ Natural Language Queries: No complex selectors or XPath needed
  • 🎯 Precise Extraction: AI understands context and structure
  • πŸ”„ Adaptive Processing: Works with both web content and direct HTML
  • πŸ“Š Schema Validation: Ensure data consistency with Pydantic/TypeScript
  • ⚑ Async Support: Handle multiple requests efficiently
  • πŸ” Source Attribution: Get reference URLs for search results

πŸ’‘ Use Cases

  • 🏒 Business Intelligence: Extract company information and contacts
  • πŸ“Š Market Research: Gather product data and pricing
  • πŸ“° Content Aggregation: Convert articles to structured formats
  • πŸ” Data Mining: Extract specific information from multiple sources
  • πŸ“± App Integration: Feed clean data into your applications
  • 🌐 Web Research: Perform AI-powered searches with structured results

πŸ“– Documentation

For detailed documentation and examples, visit:

πŸ’¬ Support & Feedback

πŸ“„ License

This project is licensed under the MIT License - see the LICENSE file for details.


Made with ❀️ by ScrapeGraph AI

About

πŸ•·οΈ Official Scrapegraph API SDK: Effortlessly extract content from any website. AI-powered. πŸ€– Hassle-free web scraping made simple.

Topics

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Contributors 9