Skip to content

yihune21/genAPI

Repository files navigation

Verified Generation API

A FastAPI implementation of the Verified Generation API that provides enhanced capabilities for ensuring and inspecting language model outputs with deterministic generation (temperature=0) and detailed token-level information.

Features

  • Verified Text Completion (/v1/completions/verified): Generate text completions with enforced temperature=0 and detailed token information
  • Verified Chat Completion (/v1/chat/completions/verified): Generate chat completions with deterministic output and token details
  • Verify Decoding (/v1/verify_decoding): Verify if a given completion could have been generated by greedy decoding

Installation

  1. Install dependencies:
pip install -r requirements.txt

Running the API

Start the server:

python main.py

The API will be available at http://localhost:8000

API Documentation

Interactive API documentation is available at:

  • Swagger UI: http://localhost:8000/docs
  • ReDoc: http://localhost:8000/redoc

Testing

Run the test suite:

python test_api.py

API Endpoints

1. Verified Text Completion

  • POST /v1/completions/verified
  • Generates text completion with temperature=0 and token details
  • Returns token IDs, log probabilities, and ranks

2. Verified Chat Completion

  • POST /v1/chat/completions/verified
  • Generates chat completion with deterministic output
  • Provides detailed token information for both prompt and completion

3. Verify Decoding

  • POST /v1/verify_decoding
  • Verifies if a completion follows greedy decoding
  • Checks each token against the model's greedy choice

Example Usage

Verified Completion

import httpx

async with httpx.AsyncClient() as client:
    response = await client.post(
        "http://localhost:8000/v1/completions/verified",
        json={
            "model": "your-model",
            "prompt": "Translate 'hello' to French:",
            "max_tokens": 10
        }
    )
    data = response.json()

Verify Decoding

response = await client.post(
    "http://localhost:8000/v1/verify_decoding",
    json={
        "model": "your-model",
        "prompt": "The quick brown fox",
        "completion": "jumps over the lazy dog",
        "check_greedy": True
    }
)

Architecture

  • models.py: Pydantic models for request/response schemas
  • endpoints.py: API endpoint implementations
  • mock_llm.py: Mock LLM engine for demonstration (replace with actual vLLM integration)
  • main.py: FastAPI application setup and configuration

Production Considerations

In production, replace the MockLLMEngine in mock_llm.py with actual vLLM or other LLM backend integration. The mock engine is provided for demonstration and testing purposes only.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages