Skip to content

Multi-Layer AI Quality Assurance for Content Generation. Multiple LLMs evaluate, score, and approve every output before delivery.

Notifications You must be signed in to change notification settings

jordicor/GranSabio_LLM

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

20 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Gran Sabio LLM Engine

Multi-Layer AI Quality Assurance for Content Generation

Your AI generates content but you can't trust it blindly? Put Gran Sabio LLM in the middle. Multiple AI models evaluate, score, and approve every piece of content before it reaches you.


Python 3.10+ License: MIT FastAPI Self-Hosted


The Problem

Every developer using AI for content generation faces the same challenges:

Problem What Happens The Cost
Hallucinations AI invents facts, dates, or events Credibility destroyed, corrections needed
Quality inconsistency Sometimes great, sometimes terrible Manual review of every output
No validation You get content, hope it's good Time wasted on unusable content
Single point of failure One model, one opinion Bias and blind spots undetected
Format violations JSON that doesn't match your schema Parsing errors, retry loops
Repetitive vocabulary Same phrases appearing everywhere Unprofessional, robotic text

Traditional solution: Review everything manually or accept the risk.

Gran Sabio LLM solution: Let multiple AI models evaluate every output with configurable quality criteria, automatic retry on failure, and a "Great Sage" arbiter for final decisions.


How It Works

Your Request
     |
     v
[Preflight Validation] --> Detects contradictions before wasting tokens
     |
     v
[Content Generation] --> Your chosen AI model generates content
     |
     v
[Multi-Layer QA] --> Multiple AI models evaluate different aspects
     |                - Historical accuracy
     |                - Literary quality
     |                - Format compliance
     |                - Custom criteria you define
     v
[Consensus Engine] --> Calculates scores across all evaluators
     |
     v
[Pass?] --No--> [Iterate with Feedback] --> Back to generation
     |
    Yes
     |
     v
[Deal Breaker?] --Yes--> [Gran Sabio Escalation] --> Premium model decides
     |
    No
     |
     v
[Approved Content] --> Delivered with confidence scores

See It In Action

Note: Gran Sabio LLM is fundamentally an API-first tool designed to integrate into your content generation pipelines. The web interface below is a development/demo UI to help visualize and test the API capabilities - not a production-ready application. Think of it as a reference implementation showing what's possible when you build on top of this API.

Web Interface (Demo)

Access the interactive demo at http://localhost:8000/ - configure your generation, select models, define QA layers, and watch results in real-time.

Main Interface Configure prompts, models, QA layers, and quality thresholds from an intuitive web UI


Live Matrix: Real-Time Generation Monitoring

Click "Live Matrix" to watch the entire process unfold:

  • Content chunks streaming as they're generated
  • QA evaluations appearing for each layer and model
  • Scores updating as consensus is calculated
  • Deal-breaker escalations and Gran Sabio decisions

Live Matrix Watch content generation, QA evaluation, and scoring happen in real-time


200+ Models via OpenRouter

Beyond direct API connections (OpenAI, Anthropic, Google, xAI), you can access all models available on OpenRouter - including Mistral, DeepSeek, LLaMA, Qwen, and many more.

OpenRouter Models Access hundreds of models through OpenRouter integration


Session Debugger: Full Transparency

Every generation is logged in detail. Access /debugger to inspect:

  • Complete request payloads and parameters
  • Every iteration with content and scores
  • QA evaluations per layer and model
  • Consensus calculations
  • Gran Sabio escalations and decisions
  • Token usage and costs per phase

Session Debugger Inspect every detail of your generation sessions


Key Features

Multi-Model Quality Assurance

Define what "quality" means for YOUR use case:

{
  "qa_layers": [
    {
      "name": "Factual Accuracy",
      "criteria": "Verify all dates, names, and events are historically correct",
      "min_score": 8.5,
      "deal_breaker_criteria": "invents facts or presents false information"
    },
    {
      "name": "Narrative Flow",
      "criteria": "Evaluate prose quality, transitions, and reader engagement",
      "min_score": 7.5
    }
  ],
  "qa_models": ["gpt-4o", "claude-sonnet-4", "gemini-2.0-flash"]
}

Each layer is evaluated by ALL configured QA models. If GPT-4o passes but Claude finds an issue, you'll know. Consensus is calculated automatically.


Deal Breakers: Stop Problems Immediately

Some issues are too serious to just lower the score:

  • Majority deal-breaker (>50% of models): Forces immediate regeneration
  • Minority deal-breaker (<50%): Escalates to Gran Sabio for arbitration
  • Tie (50%): Gran Sabio decides if it's a real issue or false positive

Why this matters: You define what's unacceptable. "Invented facts" can be a deal-breaker while "slightly awkward phrasing" just lowers the score.

{
  "deal_breaker_criteria": "uses offensive language or invents historical events"
}

Gran Sabio: The Final Arbiter

When evaluators disagree or max iterations are reached, the "Great Sage" steps in:

  • Uses premium reasoning models (Claude Opus 4.5 with 30K thinking tokens by default)
  • Analyzes the conflict: Was it a real issue or false positive?
  • Can modify content: Fixes minor issues without full regeneration
  • Tracks model reliability: Learns which models produce more false positives
  • Flexible model choice: Use GPT-5.2-Pro for maximum accuracy or Claude Opus 4.5 for deep reasoning
{
  "gran_sabio_model": "claude-opus-4-5-20251101",
  "gran_sabio_call_limit_per_session": 15
}

Or use OpenAI's most powerful model:

{
  "gran_sabio_model": "gpt-5.2-pro"
}

Preflight Validation: Don't Waste Tokens

Before spending money on generation, the system checks if your request makes sense:

Request: "Write a fiction story about dragons"
QA Layer: "Verify historical accuracy of all events"

Preflight Response:
{
  "decision": "reject",
  "issues": [{
    "code": "contradiction_detected",
    "severity": "critical",
    "message": "Fiction content cannot be validated for historical accuracy"
  }]
}

No tokens wasted on impossible requests.


Word Count Enforcement

AI models are notoriously bad at hitting word targets. Gran Sabio LLM solves this:

{
  "min_words": 800,
  "max_words": 1200,
  "word_count_enforcement": {
    "enabled": true,
    "flexibility_percent": 15,
    "direction": "both",
    "severity": "deal_breaker"
  }
}

The system automatically injects a QA layer that counts words and triggers regeneration if the target isn't met.


Lexical Diversity Guard

Detect and prevent repetitive vocabulary:

  • MTLD, HD-D, Yule's K, Herdan's C metrics calculated automatically
  • GREEN/AMBER/RED grading based on configurable thresholds
  • Window analysis finds exactly where repetition clusters appear
  • Top words report shows which words are overused
{
  "lexical_diversity": {
    "enabled": true,
    "metrics": "auto",
    "decision": {
      "deal_breaker_on_red": true,
      "deal_breaker_on_amber": false
    }
  }
}

Phrase Frequency Guard

Block specific phrases or patterns:

{
  "phrase_frequency": {
    "enabled": true,
    "rules": [
      {
        "name": "no_then_went_to",
        "phrase": "then went to",
        "max_repetitions": 1,
        "severity": "deal_breaker"
      },
      {
        "name": "short_phrases",
        "min_length": 3,
        "max_length": 6,
        "max_repetitions": 3,
        "severity": "warn"
      }
    ]
  }
}

Evidence Grounding (Confabulation Detection)

Detect when AI models claim to use evidence but actually ignore it:

{
  "evidence_grounding": {
    "enabled": true,
    "model": "gpt-4o-mini",
    "budget_gap_threshold": 0.5,
    "on_flag": "deal_breaker",
    "max_flagged_claims": 2
  }
}

How it works:

  1. Extracts verifiable claims from generated content
  2. Measures P(claim | evidence) vs P(claim | no evidence) using logprobs
  3. Flags claims where confidence doesn't drop when evidence is removed

This catches:

  • "According to the sources, Marie Curie was born in Paris" (context says Warsaw)
  • Claims that sound referenced but ignore the actual context

Configuration modes:

Mode on_flag When to use
Verification-only "warn" General content, informational logging
Fail-fast "deal_breaker" Critical factual content, medical/legal
Regenerate "regenerate" Auto-fix on detection

Cost: ~$0.003 per request for 10 claims (2-6% overhead)


Vision/Image Support

Process images alongside text for multimodal content generation:

{
  "prompt": "Describe these product images in detail",
  "generator_model": "gpt-4o",
  "username": "your_username",
  "images": [
    {"upload_id": "img_001", "username": "your_username", "detail": "high"},
    {"upload_id": "img_002", "username": "your_username", "detail": "auto"}
  ],
  "image_detail": "auto",
  "qa_layers": []
}

Supported vision models:

  • OpenAI: GPT-4o, GPT-5, GPT-5 Pro, O1, O3, O3-Pro
  • Anthropic: Claude Sonnet 4, Claude Opus 4.5, Haiku 4
  • Google: Gemini 2.0 Flash, Gemini 2.5 Pro/Flash
  • xAI: Grok 2+ models

Detail levels (OpenAI-style):

Level Tokens Use Case
low ~85 fixed Quick classification, thumbnails
high Variable (tiles) Detailed analysis, text extraction
auto Model decides General use (default)

QA with vision - Let QA models see input images for accuracy validation:

{
  "prompt": "Describe this architectural diagram",
  "generator_model": "claude-sonnet-4-20250514",
  "images": [{"upload_id": "diagram", "username": "user1"}],
  "qa_with_vision": true,
  "qa_layers": [
    {
      "name": "Visual Accuracy",
      "criteria": "Verify description matches the actual diagram elements",
      "include_input_images": true,
      "min_score": 8.5
    },
    {
      "name": "Writing Quality",
      "criteria": "Evaluate clarity and technical accuracy",
      "include_input_images": false,
      "min_score": 7.5
    }
  ]
}

Limits and auto-processing:

  • Default: 20 images per request (configurable up to 100)
  • Auto-resize to optimal dimensions per provider
  • Automatic format conversion (HEIC/HEIF to JPEG)
  • Preflight validation rejects requests when model lacks vision capability

JSON Schema Structured Outputs

100% format guarantee across all major providers:

{
  "generator_model": "gpt-5",
  "json_output": true,
  "json_schema": {
    "type": "object",
    "properties": {
      "title": {"type": "string"},
      "summary": {"type": "string"},
      "key_points": {"type": "array", "items": {"type": "string"}}
    },
    "required": ["title", "summary"]
  }
}

Supported providers:

  • OpenAI: GPT-4o, GPT-5, GPT-5.2-Pro, O1/O3 series
  • Anthropic: Claude 4 Sonnet, Claude Opus 4.5
  • Google: Gemini 2.0+, Gemini 2.5
  • xAI: Grok 4
  • OpenRouter: All compatible models (Mistral, DeepSeek, LLaMA, Qwen, and 200+ more)

Reasoning Models Support

For complex tasks, enable deep thinking:

OpenAI Reasoning:

{
  "generator_model": "gpt-5",
  "reasoning_effort": "high"
}

Claude Thinking Mode:

{
  "generator_model": "claude-sonnet-4-20250514",
  "thinking_budget_tokens": 8000
}

Both work for QA evaluation too - your evaluators can "think" before scoring.


QA Bypass for Rapid Prototyping

Need fast generation without QA? Just send empty layers:

{
  "prompt": "Write a quick draft",
  "qa_layers": []
}

Content is approved immediately. Perfect for testing, bulk generation, or content that will be manually edited.


Quick Start

1. Clone and Install

git clone https://github.com/jordicor/Gran_Sabio_LLM.git
cd Gran_Sabio_LLM
python quick_start.py

2. Configure Your API Keys

Create .env file with your own API keys from each provider:

# Get your keys from each provider's dashboard:
# - OpenAI: https://platform.openai.com/api-keys
# - Anthropic: https://console.anthropic.com/
# - Google: https://aistudio.google.com/apikey
# - xAI: https://console.x.ai/
# - OpenRouter: https://openrouter.ai/keys

OPENAI_API_KEY=sk-...
ANTHROPIC_API_KEY=sk-ant-...
GOOGLE_API_KEY=...
XAI_API_KEY=xai-...
OPENROUTER_API_KEY=sk-or-...
PEPPER=any-random-string-here

Note: You only need keys for the providers you want to use. At minimum, configure one provider.

3. Start the Server

python main.py

Server starts at http://localhost:8000

4. Generate Your First Content

curl -X POST "http://localhost:8000/generate" \
  -H "Content-Type: application/json" \
  -d '{
    "prompt": "Write a 500-word biography of Marie Curie",
    "content_type": "biography",
    "generator_model": "gpt-4o",
    "qa_models": ["gpt-4o", "claude-sonnet-4"],
    "qa_layers": [
      {
        "name": "Accuracy",
        "criteria": "Verify historical facts",
        "min_score": 8.0,
        "deal_breaker_criteria": "invents facts"
      }
    ],
    "min_global_score": 8.0,
    "max_iterations": 3
  }'

Demos & Examples

The demos/ folder contains 11 ready-to-run scripts showcasing different capabilities. Here are the highlights:

Demo Description Complexity
YouTube Script Generator Multi-phase pipeline: topic analysis, script, scenes, thumbnails. Uses JSON Schema, lexical diversity, and project grouping. Advanced
Code Analyzer Dynamic JSON output for code review. Detects security issues, performance problems. Shows when to use flexible JSON vs strict schemas. Advanced
Reasoning Models GPT-5 reasoning effort, Claude thinking mode. Complex analysis with deep thinking. Advanced
JSON Structured Output 100% format guarantee with json_schema. Multi-provider support. Intermediate
Text Quality Analyzer Analyze existing text without generating. Lexical diversity, AI pattern detection. Intermediate
Parallel Generation Bulk content creation with async parallel execution. Advanced

Quick start:

# Start the API server
python main.py

# Run any demo
python demos/03_youtube_script_generator.py --topic "How AI is Changing Music"

See the complete list and documentation: demos/README.md


API Documentation

Full interactive documentation available at:

  • Swagger UI: http://localhost:8000/docs
  • ReDoc: http://localhost:8000/redoc
  • Custom Docs: http://localhost:8000/api-docs (recommended)

Core Endpoints

Endpoint Method Description
/generate POST Start content generation with QA
/status/{session_id} GET Check session status
/stream/project/{project_id} GET Real-time SSE progress stream (project_id = session_id when not explicit)
/result/{session_id} GET Get final approved content
/stop/{session_id} POST Cancel active generation
/models GET List available AI models
/debugger GET Session history and inspection UI

Analysis Endpoints

Standalone text analysis tools (no generation required):

Endpoint Method Description
/analysis/lexical-diversity POST Vocabulary richness metrics (MTLD, HD-D, etc.)
/analysis/repetition POST N-gram repetition analysis with clustering

Project Management

Group multiple sessions under a single project ID:

Endpoint Method Description
/project/new POST Reserve a new project ID
/project/start/{id} POST Activate a project
/project/stop/{id} POST Cancel all project sessions
/stream/project/{id} GET Stream all project events

Python Client SDK

A ready-to-use Python client is available for easy integration:

from gransabio_client import GranSabioClient

client = GranSabioClient("http://localhost:8000")

# Simple generation
result = client.generate(
    prompt="Write a product description",
    generator_model="gpt-4o",
    qa_layers=[{"name": "Quality", "criteria": "...", "min_score": 8.0}]
)

print(result.content)
print(f"Score: {result.final_score}")

Stream progress:

for event in client.stream_generate(prompt="...", qa_layers=[...]):
    print(f"[{event.phase}] {event.message}")

MCP Integration (Claude Code, Gemini CLI, Codex CLI)

Gran Sabio LLM includes a Model Context Protocol (MCP) server that integrates directly with AI coding assistants. Get multi-model code review and analysis without leaving your terminal.

What You Get

Tool Description
gransabio_analyze_code Analyze code for bugs, security issues, and best practices
gransabio_review_fix Validate a proposed fix before applying it
gransabio_generate_with_qa Generate content with multi-model QA
gransabio_check_health Verify Gran Sabio LLM API connectivity
gransabio_list_models List available AI models

Quick Setup

1. Install MCP dependencies:

pip install -r mcp/requirements.txt

2. Run the installer script:

Windows:

install_mcp.bat

Linux/macOS:

./install_mcp.sh

The scripts automatically detect paths and register the MCP server with Claude Code.

Manual installation (if you prefer):

# Use absolute paths - relative paths won't work!
claude mcp add gransabio-llm -- python /path/to/Gran_Sabio_LLM/mcp_server/gransabio_mcp_server.py

Gemini CLI (~/.gemini/settings.json):

{
  "mcpServers": {
    "gransabio-llm": {
      "command": "python",
      "args": ["/path/to/Gran_Sabio_LLM/mcp_server/gransabio_mcp_server.py"]
    }
  }
}

Codex CLI (~/.codex/config.toml):

[mcp_servers.gransabio-llm]
command = "python"
args = ["/path/to/Gran_Sabio_LLM/mcp_server/gransabio_mcp_server.py"]

Example Usage

You: Analyze this code for security issues using Gran Sabio

Claude: [Calls gransabio_analyze_code]

Gran Sabio Analysis (Score: 8.2/10):
- [CRITICAL] SQL injection at line 45
- [HIGH] Hardcoded credentials at line 12
- [MEDIUM] Missing input validation at line 30

Reviewed by: GPT-5-Codex, Claude Opus 4.5, GLM-4.7
Consensus: 3/3 models agree

Remote/SaaS Configuration

For hosted Gran Sabio LLM instances:

claude mcp add gransabio-llm \
  --env GRANSABIO_API_URL=https://api.gransabio.example.com \
  --env GRANSABIO_API_KEY=your-api-key \
  -- python /path/to/gransabio_mcp_server.py

See full documentation: mcp/README.md


Self-Hosting (Bring Your Own API Keys)

Gran Sabio LLM is currently a self-hosted solution. You deploy it on your infrastructure and use your own API keys from each AI provider.

What This Means

Aspect Self-Hosted
API Keys You obtain and configure keys from OpenAI, Anthropic, Google, xAI, and/or OpenRouter
Billing Each provider bills you directly based on your usage
Infrastructure You host and maintain the server
Data Privacy Your prompts and content stay on your infrastructure
Models Available All models your API keys have access to, plus 200+ via OpenRouter

Why Self-Hosting?

  • Full control over your data and costs
  • No intermediaries - direct connection to AI providers
  • Use your existing accounts - no new subscriptions needed
  • Enterprise compliance - deploy in your own cloud/datacenter
  • Unlimited usage - no rate limits beyond provider limits

Requirements

  • Python 3.10+
  • API keys for at least one provider (OpenAI, Anthropic, Google, xAI, or OpenRouter)
  • ~500MB disk space for dependencies
  • Recommended: 4GB RAM minimum
  • Pillow library (auto-installed, required for vision/image processing)

Quality assurance: The codebase includes 950+ automated tests covering API endpoints, engines, client SDK, and integrations.

Production Deployment

# With uvicorn
uvicorn main:app --host 0.0.0.0 --port 8000 --workers 4

# With gunicorn + uvicorn workers
gunicorn main:app -w 4 -k uvicorn.workers.UvicornWorker -b 0.0.0.0:8000

Coming Soon: Gran Sabio LLM Cloud

Don't want to manage API keys and infrastructure? A hosted version is in development.

What's Coming

Feature Cloud Version
API Keys None required - we handle all provider connections
Setup Zero - just sign up and start making API calls
Billing Single subscription covers all AI providers
Models All supported models, always up to date
Features Everything in self-hosted, fully managed
Web Interface Polished, production-ready UI for non-developers

Get Early Access

Want to be notified when the Cloud version launches? Star this repo and follow me on GitHub or my social media channels - I'll announce early access there first.

Self-hosting will always remain available for those who prefer full control.


Cost Tracking

Every request can include cost breakdown:

{
  "show_query_costs": 2,
  "prompt": "..."
}

Returns detailed token usage and costs:

{
  "content": "Generated content...",
  "costs": {
    "grand_totals": {
      "input_tokens": 4370,
      "output_tokens": 2156,
      "cost": 0.018765
    },
    "phases": {
      "generation": {"cost": 0.008234},
      "qa": {"cost": 0.003456},
      "gran_sabio": {"cost": 0.005678}
    }
  }
}

About the Name

From BioAI to Gran Sabio

This project was originally called BioAI Unified - "Bio" for biography (its first use case was validating AI-generated biographies) and "Unified" because it brought together multiple AI providers into a single, coherent QA system.

However, "BioAI" consistently caused confusion. People assumed this was a biomedical or bioinformatics tool, expecting features for DNA analysis or drug discovery. The name created friction before the tool could even be evaluated.

Why "Gran Sabio LLM"?

The new name directly reflects what makes this engine unique:

"Gran Sabio" (Spanish for "Great Sage") is not just a brand - it's a core architectural component. When multiple AI models disagree during quality evaluation, a premium reasoning model called the Gran Sabio (the wise arbiter) steps in to make the final decision. This concept of a "council of sages" deliberating on content quality is central to how the system works.

"LLM" (Large Language Model) clarifies that this is AI infrastructure for text generation - not a fantasy game, not biomedicine, but a practical tool for orchestrating language models.

The result: a name that immediately tells you what you're getting - an AI content pipeline with a wise, multi-model arbitration system at its heart.

Previous name: BioAI Unified (2024). Rebranded to Gran Sabio LLM in January 2025.


Stay Updated

This project is actively developed. If you find it useful:

  • Star this repo to follow updates and new features
  • Follow me on social media for development insights, AI tips, and early announcements about the upcoming Cloud version

Find my social links on my GitHub profile.


License

MIT License - see LICENSE for details.


Trust your AI output.
Let multiple models validate before you ship.

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •