Official SDKs for the ScrapeGraph AI API - Intelligent web scraping and search powered by AI. Extract structured data from any webpage or perform AI-powered web searches with natural language prompts.
Get your API key!

- π€ SmartScraper: Extract structured data from webpages using natural language prompts
- π SearchScraper: AI-powered web search with structured results and reference URLs
- π Markdownify: Convert any webpage into clean, formatted markdown
- π·οΈ SmartCrawler: Intelligently crawl and extract data from multiple pages
- π€ AgenticScraper: Perform automated browser actions with AI-powered session management
- π Scrape: Convert webpages to HTML with JavaScript rendering and custom headers
- β° Scheduled Jobs: Create and manage automated scraping workflows with cron scheduling
- π³ Credits Management: Monitor API usage and credit balance
- π¬ Feedback System: Provide ratings and feedback to improve service quality
ScrapeGraphAI offers seamless integration with popular frameworks and tools to enhance your scraping capabilities. Whether you're building with Python or Node.js, using LLM frameworks, or working with no-code platforms, we've got you covered with our comprehensive integration options..
You can find more informations at the following link
Integrations:
- API: Documentation
- SDKs: Python, Node
- LLM Frameworks: Langchain, Llama Index, Crew.ai, CamelAI
- Low-code Frameworks: Pipedream, Bubble, Zapier, n8n, LangFlow
- MCP server: Link
pip install scrapegraph-pynpm install scrapegraph-js- π€ AI-Powered Extraction & Search: Use natural language to extract data or search the web
- π Structured Output: Get clean, structured data with optional schema validation
- π Multiple Formats: Extract data as JSON, Markdown, or custom schemas
- β‘ High Performance: Concurrent processing and automatic retries
- π Enterprise Ready: Production-grade security and rate limiting
Using AI to extract structured data from any webpage or HTML content with natural language prompts.
Example Usage:
Python:
from scrapegraph_py import Client
import os
from dotenv import load_dotenv
load_dotenv()
# Initialize the client
client = Client(api_key=os.getenv("SGAI_API_KEY"))
# Extract data from a webpage
response = client.smartscraper(
website_url="https://example.com",
user_prompt="Extract the main heading, description, and summary of the webpage",
)
print(f"Request ID: {response['request_id']}")
print(f"Result: {response['result']}")
client.close()JavaScript:
import { smartScraper } from 'scrapegraph-js';
import 'dotenv/config';
const apiKey = process.env.SGAI_APIKEY;
const url = 'https://example.com';
const prompt = 'Extract the main heading, description, and summary of the webpage';
try {
const response = await smartScraper(apiKey, url, prompt);
console.log(response);
} catch (error) {
console.error(error);
}Perform AI-powered web searches with structured results and reference URLs.
Example Usage:
Python:
from scrapegraph_py import Client
import os
from dotenv import load_dotenv
load_dotenv()
# Initialize the client
client = Client(api_key=os.getenv("SGAI_API_KEY"))
# Perform AI-powered web search
response = client.searchscraper(
user_prompt="What is the latest version of Python and what are its main features?",
num_results=3, # Number of websites to search (default: 3)
)
print(f"Result: {response['result']}")
print("\nReference URLs:")
for url in response["reference_urls"]:
print(f"- {url}")
client.close()JavaScript:
import { searchScraper } from 'scrapegraph-js';
import 'dotenv/config';
const apiKey = process.env.SGAI_APIKEY;
const prompt = 'What is the latest version of Python and what are its main features?';
try {
const response = await searchScraper(apiKey, prompt, 3); // 3 websites
console.log('Result:', response.result);
console.log('\nReference URLs:');
response.reference_urls?.forEach(url => console.log(`- ${url}`));
} catch (error) {
console.error(error);
}Convert any webpage into clean, formatted markdown.
Example Usage:
Python:
from scrapegraph_py import Client
import os
from dotenv import load_dotenv
load_dotenv()
# Initialize the client
client = Client(api_key=os.getenv("SGAI_API_KEY"))
# Convert webpage to markdown
response = client.markdownify(
website_url="https://example.com",
)
print(f"Request ID: {response['request_id']}")
print(f"Markdown: {response['result']}")
client.close()JavaScript:
import { markdownify } from 'scrapegraph-js';
import 'dotenv/config';
const apiKey = process.env.SGAI_APIKEY;
const url = 'https://example.com';
try {
const response = await markdownify(apiKey, url);
console.log(response);
} catch (error) {
console.error(error);
}Intelligently crawl and extract data from multiple pages with configurable depth and batch processing.
Example Usage:
Python:
from scrapegraph_py import Client
import os
import time
from dotenv import load_dotenv
load_dotenv()
# Initialize the client
client = Client(api_key=os.getenv("SGAI_API_KEY"))
# Start crawl job
crawl_response = client.crawl(
url="https://example.com",
prompt="Extract page titles and main headings",
data_schema={
"type": "object",
"properties": {
"title": {"type": "string"},
"headings": {"type": "array", "items": {"type": "string"}}
}
},
depth=2,
max_pages=5,
same_domain_only=True,
)
crawl_id = crawl_response.get("id") or crawl_response.get("task_id")
# Poll for results
if crawl_id:
for _ in range(10):
time.sleep(5)
result = client.get_crawl(crawl_id)
if result.get("status") == "success":
print("Crawl completed:", result["result"]["llm_result"])
break
client.close()JavaScript:
import { crawl, getCrawlRequest } from 'scrapegraph-js';
import 'dotenv/config';
const apiKey = process.env.SGAI_APIKEY;
const url = 'https://example.com';
const prompt = 'Extract page titles and main headings';
const schema = {
type: "object",
properties: {
title: { type: "string" },
headings: { type: "array", items: { type: "string" } }
}
};
try {
const crawlResponse = await crawl(apiKey, url, prompt, schema, {
depth: 2,
maxPages: 5,
sameDomainOnly: true,
});
const crawlId = crawlResponse.id || crawlResponse.task_id;
// Poll for results
if (crawlId) {
for (let i = 0; i < 10; i++) {
await new Promise(resolve => setTimeout(resolve, 5000));
const result = await getCrawlRequest(apiKey, crawlId);
if (result.status === 'success') {
console.log('Crawl completed:', result.result.llm_result);
break;
}
}
}
} catch (error) {
console.error(error);
}Perform automated browser actions on webpages using AI-powered agentic scraping with session management.
Example Usage:
Python:
from scrapegraph_py import Client
import os
from dotenv import load_dotenv
load_dotenv()
# Initialize the client
client = Client(api_key=os.getenv("SGAI_API_KEY"))
# Perform automated browser actions
response = client.agenticscraper(
url="https://example.com",
use_session=True,
steps=[
"Type [email protected] in email input box",
"Type password123 in password inputbox",
"click on login"
],
ai_extraction=False # Set to True for AI extraction
)
print(f"Request ID: {response['request_id']}")
print(f"Status: {response.get('status')}")
# Get results
result = client.get_agenticscraper(response['request_id'])
print(f"Result: {result.get('result')}")
client.close()JavaScript:
import { agenticScraper, getAgenticScraperRequest } from 'scrapegraph-js';
import 'dotenv/config';
const apiKey = process.env.SGAI_APIKEY;
const url = 'https://example.com';
const steps = [
'Type [email protected] in email input box',
'Type password123 in password inputbox',
'click on login'
];
try {
const response = await agenticScraper(apiKey, url, steps, true);
console.log('Request ID:', response.request_id);
// Get results
const result = await getAgenticScraperRequest(apiKey, response.request_id);
console.log('Status:', result.status);
console.log('Result:', result.result);
} catch (error) {
console.error(error);
}Convert webpages into HTML format with optional JavaScript rendering and custom headers.
Example Usage:
Python:
from scrapegraph_py import Client
import os
from dotenv import load_dotenv
load_dotenv()
# Initialize the client
client = Client(api_key=os.getenv("SGAI_API_KEY"))
# Get HTML content from webpage
response = client.scrape(
website_url="https://example.com",
render_heavy_js=False, # Set to True for JavaScript-heavy sites
)
print(f"Request ID: {response['request_id']}")
print(f"HTML length: {len(response.get('html', ''))} characters")
client.close()JavaScript:
import { scrape } from 'scrapegraph-js';
import 'dotenv/config';
const apiKey = process.env.SGAI_APIKEY;
const url = 'https://example.com';
try {
const response = await scrape(apiKey, url, {
renderHeavyJs: false, // Set to true for JavaScript-heavy sites
headers: { 'User-Agent': 'Custom Agent' }
});
console.log('HTML length:', response.html?.length, 'characters');
} catch (error) {
console.error(error);
}Create, manage, and monitor scheduled scraping jobs with cron expressions and execution history.
Check your API credit balance and usage.
Send feedback and ratings for scraping requests to help improve the service.
- π Natural Language Queries: No complex selectors or XPath needed
- π― Precise Extraction: AI understands context and structure
- π Adaptive Processing: Works with both web content and direct HTML
- π Schema Validation: Ensure data consistency with Pydantic/TypeScript
- β‘ Async Support: Handle multiple requests efficiently
- π Source Attribution: Get reference URLs for search results
- π’ Business Intelligence: Extract company information and contacts
- π Market Research: Gather product data and pricing
- π° Content Aggregation: Convert articles to structured formats
- π Data Mining: Extract specific information from multiple sources
- π± App Integration: Feed clean data into your applications
- π Web Research: Perform AI-powered searches with structured results
For detailed documentation and examples, visit:
- π§ Email: [email protected]
- π» GitHub Issues: Create an issue
- π Feature Requests: Request a feature
This project is licensed under the MIT License - see the LICENSE file for details.
Made with β€οΈ by ScrapeGraph AI