Verified Generation API

A FastAPI implementation of the Verified Generation API that provides enhanced capabilities for ensuring and inspecting language model outputs with deterministic generation (temperature=0) and detailed token-level information.

Features

Verified Text Completion (/v1/completions/verified): Generate text completions with enforced temperature=0 and detailed token information
Verified Chat Completion (/v1/chat/completions/verified): Generate chat completions with deterministic output and token details
Verify Decoding (/v1/verify_decoding): Verify if a given completion could have been generated by greedy decoding

Installation

Install dependencies:

pip install -r requirements.txt

Running the API

Start the server:

python main.py

The API will be available at http://localhost:8000

API Documentation

Interactive API documentation is available at:

Swagger UI: http://localhost:8000/docs
ReDoc: http://localhost:8000/redoc

Testing

Run the test suite:

python test_api.py

API Endpoints

1. Verified Text Completion

POST /v1/completions/verified
Generates text completion with temperature=0 and token details
Returns token IDs, log probabilities, and ranks

2. Verified Chat Completion

POST /v1/chat/completions/verified
Generates chat completion with deterministic output
Provides detailed token information for both prompt and completion

3. Verify Decoding

POST /v1/verify_decoding
Verifies if a completion follows greedy decoding
Checks each token against the model's greedy choice

Example Usage

Verified Completion

import httpx

async with httpx.AsyncClient() as client:
    response = await client.post(
        "http://localhost:8000/v1/completions/verified",
        json={
            "model": "your-model",
            "prompt": "Translate 'hello' to French:",
            "max_tokens": 10
        }
    )
    data = response.json()

Verify Decoding

response = await client.post(
    "http://localhost:8000/v1/verify_decoding",
    json={
        "model": "your-model",
        "prompt": "The quick brown fox",
        "completion": "jumps over the lazy dog",
        "check_greedy": True
    }
)

Architecture

models.py: Pydantic models for request/response schemas
endpoints.py: API endpoint implementations
mock_llm.py: Mock LLM engine for demonstration (replace with actual vLLM integration)
main.py: FastAPI application setup and configuration

Production Considerations

In production, replace the MockLLMEngine in mock_llm.py with actual vLLM or other LLM backend integration. The mock engine is provided for demonstration and testing purposes only.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Verified Generation API

Features

Installation

Running the API

API Documentation

Testing

API Endpoints

1. Verified Text Completion

2. Verified Chat Completion

3. Verify Decoding

Example Usage

Verified Completion

Verify Decoding

Architecture

Production Considerations

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
__pycache__		__pycache__
.gitignore		.gitignore
README.md		README.md
endpoints.py		endpoints.py
main.py		main.py
mock_llm.py		mock_llm.py
models.py		models.py
requirements.txt		requirements.txt
test_api.py		test_api.py

yihune21/genAPI

Folders and files

Latest commit

History

Repository files navigation

Verified Generation API

Features

Installation

Running the API

API Documentation

Testing

API Endpoints

1. Verified Text Completion

2. Verified Chat Completion

3. Verify Decoding

Example Usage

Verified Completion

Verify Decoding

Architecture

Production Considerations

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages