A production-ready Model Context Protocol (MCP) server that transforms natural language into safe, executable SQL queries. Built for LLMs with comprehensive type safety, multi-database support, and intelligent schema analysis.
This project provides an "all drivers" container image that can connect to the major databases without post-build steps. Drivers are modeled as optional dependencies and the Dockerfile installs the minimal OS libraries required at runtime.
- PostgreSQL: Python driver
psycopg[binary](v3); OS libs:libpq5. - MySQL/MariaDB: Python driver
mysqlclient; OS libs:libmariadb3. - SQL Server: Python driver
pyodbc; OS libs:unixodbc, Microsoftmsodbcsql18(+ optionalmssql-tools18). - SQLite: stdlib
sqlite3(no extra OS libs).
Local dev installs:
- All drivers:
uv sync --extra drivers-all - Specific backends:
uv sync --extra postgres --extra mysql(as needed)
Build the all-drivers image:
docker build -t nl2sql-mcp:drivers-all .
docker run --rm -p 8000:8000 nl2sql-mcp:drivers-all
Notes:
- Debian slim base maximizes prebuilt wheel compatibility. Avoid Alpine for DB stacks.
- To switch MySQL to a pure-Python driver, replace
mysqlclientwithmysql-connector-pythoninpyproject.tomland re-lock withuv lock --extra drivers-all.
- Intent-first query planning with structured schema context and join recommendations
- Semantic schema analysis with automatic table classification (fact, dimension, bridge, reference)
- Multi-modal table discovery using lexical matching, embeddings, and graph traversal
- Column role detection (key, date, metric, category) with semantic tagging
- SELECT-only execution with comprehensive SQL validation
- Dialect normalization and transpilation via SQLGlot
- Row limits and cell truncation to prevent resource exhaustion
- Comprehensive error handling with actionable feedback
- SQLAlchemy-powered compatibility with PostgreSQL, MySQL, SQL Server, SQLite, Oracle, Snowflake
- Dialect-aware SQL generation and optimization
- Cross-platform schema reflection and analysis
- Type-safe Pydantic models throughout
- Dependency injection for testability
- Background initialization with graceful degradation
- Zero hardcoded assumptions - adapts to any database schema
- Python 3.13+
- uv package manager
- Database with appropriate drivers installed
# Clone the repository
git clone https://github.com/jb3cloud/nl2sql-mcp.git
cd nl2sql-mcp
# Install dependencies
uv sync
# Configure your database
cp .env.example .env
# Edit .env with your database connection details# Start the MCP server
uv run nl2sql-mcp
# Or run directly
uv run python -m nl2sql_mcp.server
# Or run using uvx
uvx --from git+https://github.com/jb3cloud/nl2sql-mcp nl2sql-mcpThe repository includes a Makefile that standardizes common tasks. All targets invoke tooling via uv run to ensure a consistent environment.
# Discover available targets and variables
make help
# Run the full Quality Gauntlet (format → lint → typecheck → test)
make quality
# Individual steps (as needed)
make format # Ruff format
make lint # Ruff check --fix
make typecheck # basedpyright (strict)
make test # pytest
make clean # Remove caches and build artifactsDeployment helpers for Azure Container Apps are also provided:
# Create docker container
make docker
# Run docker container
make docker-run
# Verify Azure CLI login and (optionally) set subscription
make preflight AZ_SUBSCRIPTION="00000000-0000-0000-0000-000000000000"
# Build from source and deploy
make publish \
APP_NAME=my-nl2sql-mcp \
RESOURCE_GROUP=my-rg \
ENVIRONMENT=my-aca-env \
LOCATION=eastus \
AZ_SUBSCRIPTION="00000000-0000-0000-0000-000000000000" \
INGRESS=external \
TARGET_PORT=8000 \
ENV_ARGS="NL2SQL_MCP_DATABASE_URL=postgresql://user:pass@host:5432/db"Supported variables (can be provided via environment or CLI): APP_NAME, RESOURCE_GROUP, LOCATION, ENVIRONMENT, AZ_SUBSCRIPTION, INGRESS (default external), TARGET_PORT (default 8000).
Create a .env file with your database connection:
# Required: Database connection
NL2SQL_MCP_DATABASE_URL=postgresql://user:pass@localhost:5432/dbname
# Optional: Result limits and debugging
NL2SQL_MCP_ROW_LIMIT=1000
NL2SQL_MCP_MAX_CELL_CHARS=500
NL2SQL_MCP_DEBUG_TOOLS=1 # Enable find_tables/find_columns toolsgraph TD
A[LLM Client] -->|MCP Protocol| B[FastMCP Server]
subgraph "Core Services"
B --> C[Schema Service]
B --> D[SQLGlot Service]
B --> E[Execution Engine]
end
subgraph "Intelligence Layer"
C --> F[Schema Explorer]
C --> G[Query Engine]
F --> H[Reflection & Profiling]
F --> I[Graph Analysis]
G --> J[Multi-Modal Retrieval]
G --> K[Graph Expansion]
end
subgraph "Data Layer"
E --> L[SQLAlchemy Engine]
H --> L
L --> M[(Your Database)]
end
style A fill:#e1f5fe
style B fill:#f3e5f5
style M fill:#e8f5e8
The system follows a two-phase architecture:
- Schema Building Phase: Comprehensive database analysis, relationship mapping, and semantic understanding
- Query Processing Phase: Real-time natural language to SQL conversion with context-aware planning
Check server readiness and initialization progress.
// Response
{
"phase": "READY",
"attempts": 1,
"started_at": "2024-01-15T10:30:00Z",
"completed_at": "2024-01-15T10:30:45Z"
}High-level database summary with subject areas.
// Request
{
"include_subject_areas": true,
"area_limit": 8
}
// Response
{
"total_tables": 45,
"total_schemas": 3,
"subject_areas": [
{
"name": "Sales Analytics",
"tables": ["orders", "customers", "products"],
"summary": "Customer orders and product sales data"
}
]
}Intent-first SQL planning with structured guidance.
// Request
{
"request": "Show monthly revenue by region for 2024",
"constraints": {
"time_range": "2024-01-01..2024-12-31",
"metric": "revenue"
},
"detail_level": "standard"
}
// Response
{
"relevant_tables": [
{
"table_key": "sales.orders",
"relevance_score": 0.95,
"why_relevant": "Contains revenue data and date columns"
}
],
"join_plan": [
["sales.orders.customer_id", "customers.id"]
],
"main_table": "sales.orders",
"draft_sql": "SELECT DATE_TRUNC('month', order_date) as month...",
"confidence": 0.92
}Safe SQL execution with validation and results.
// Request
{
"sql": "SELECT region, SUM(amount) as revenue FROM sales.orders WHERE order_date >= '2024-01-01' GROUP BY region"
}
// Response
{
"success": true,
"rows": [
{"region": "North", "revenue": 125000.50},
{"region": "South", "revenue": 98750.25}
],
"row_count": 2,
"columns": [
{"name": "region", "type": "VARCHAR"},
{"name": "revenue", "type": "DECIMAL"}
],
"execution_time_ms": 45,
"next_action": "success"
}Detailed table metadata optimized for SQL generation.
// Request
{
"table_key": "sales.orders",
"include_samples": true,
"column_role_filter": ["key", "date", "metric"]
}Fast table discovery by natural language intent.
Column search for SELECT and WHERE clause building.
Validate SQL syntax and structure.
Automatically detect and convert SQL dialects.
Optimize SQL for your database engine.
| Database | SQLAlchemy Driver | Connection String Example |
|---|---|---|
| PostgreSQL | psycopg2 |
postgresql://user:pass@host:5432/db |
| MySQL | pymysql |
mysql+pymysql://user:pass@host:3306/db |
| SQL Server | pyodbc |
mssql+pyodbc://user:pass@host/db?driver=ODBC+Driver+17 |
| SQLite | Built-in | sqlite:///path/to/database.db |
| Oracle | cx_oracle |
oracle://user:pass@host:1521/service |
| Snowflake | snowflake-sqlalchemy |
snowflake://user:pass@account/db/schema |
Control schema analysis behavior:
# Via environment or configuration
NL2SQL_MCP_INCLUDE_SCHEMAS=public,analytics
NL2SQL_MCP_EXCLUDE_SCHEMAS=temp,audit
NL2SQL_MCP_SAMPLE_SIZE=100 # Rows per table for profiling# Memory and processing limits
NL2SQL_MCP_ROW_LIMIT=1000 # Max rows returned
NL2SQL_MCP_MAX_CELL_CHARS=500 # Truncate long text
NL2SQL_MCP_SAMPLE_TIMEOUT=5 # Sampling timeout (seconds)
NL2SQL_MCP_MAX_COLS_FOR_EMBEDDINGS=20 # Column embedding limit
NL2SQL_MCP_REFLECT_TIMEOUT=15 # Reflection timeout per statement (seconds)
NL2SQL_MCP_ENABLE_LIGHTWEIGHT_NER=1 # Toggle NER enrichment during profiling (0 disables)# Install development dependencies
uv sync --dev
# Run type checking
uv run basedpyright
# Format and lint code
uv run ruff format .
uv run ruff check --fix .
# Run tests
uv run pytest -v# Schema intelligence testing
uv run python scripts/test_intelligence_harness.py "show sales by region"
# SQL tools testing
uv run python scripts/test_sqlglot_harness.py "SELECT TOP 10 * FROM customers"- 100% type coverage with basedpyright strict mode
- Comprehensive test suite with pytest
- Zero linting violations with ruff
- Dependency injection for testability
- Pure functions where possible
from nl2sql_mcp.services import ConfigService, SchemaServiceManager
# Initialize with custom configuration
config = ConfigService()
manager = SchemaServiceManager.get_instance()
# Access schema service after initialization
schema_service = manager.get_schema_service()
result = schema_service.analyze_query_schema(
"Find customers with high lifetime value",
max_tables=10,
expand_strategy="fk_following"
)The MCP server integrates seamlessly with multi-agent frameworks:
# Example with your LLM framework
async def query_database(natural_language_query: str):
# 1. Plan the query
plan_result = await mcp_client.call_tool(
"plan_query_for_intent",
{"request": natural_language_query}
)
# 2. Execute the draft SQL
if plan_result.draft_sql:
execution_result = await mcp_client.call_tool(
"execute_query",
{"sql": plan_result.draft_sql}
)
return execution_result.rows- Python 3.13+ with strict type checking
- Pydantic models for all data structures
- Pure functions and dependency injection for testability
- Comprehensive docstrings for public APIs
- Zero tolerance for type errors, lint violations, or test failures
# Required
NL2SQL_MCP_DATABASE_URL=postgresql://...
# Optional performance tuning
NL2SQL_MCP_ROW_LIMIT=5000
NL2SQL_MCP_MAX_CELL_CHARS=1000
NL2SQL_MCP_SAMPLE_SIZE=200Monitor server health via the get_init_status() tool:
READY: Server fully operationalSTARTING: Initialization in progressFAILED: Initialization failed, check logs
This implementation incorporates state-of-the-art research in text-to-SQL:
- Multi-agent architectures for complex query decomposition
- Schema linking with bidirectional context understanding
- Error correction frameworks with multi-turn refinement
- Semantic understanding via lightweight NER and role classification
For detailed research context, see NL2SQL_RESEARCH.md.
- FastMCP for the excellent MCP server framework
- SQLGlot for multi-dialect SQL parsing and transpilation
- SQLAlchemy for robust database abstraction
- The text-to-SQL research community for advancing the field