A FastAPI implementation of the Verified Generation API that provides enhanced capabilities for ensuring and inspecting language model outputs with deterministic generation (temperature=0) and detailed token-level information.
- Verified Text Completion (
/v1/completions/verified): Generate text completions with enforced temperature=0 and detailed token information - Verified Chat Completion (
/v1/chat/completions/verified): Generate chat completions with deterministic output and token details - Verify Decoding (
/v1/verify_decoding): Verify if a given completion could have been generated by greedy decoding
- Install dependencies:
pip install -r requirements.txtStart the server:
python main.pyThe API will be available at http://localhost:8000
Interactive API documentation is available at:
- Swagger UI:
http://localhost:8000/docs - ReDoc:
http://localhost:8000/redoc
Run the test suite:
python test_api.py- POST
/v1/completions/verified - Generates text completion with temperature=0 and token details
- Returns token IDs, log probabilities, and ranks
- POST
/v1/chat/completions/verified - Generates chat completion with deterministic output
- Provides detailed token information for both prompt and completion
- POST
/v1/verify_decoding - Verifies if a completion follows greedy decoding
- Checks each token against the model's greedy choice
import httpx
async with httpx.AsyncClient() as client:
response = await client.post(
"http://localhost:8000/v1/completions/verified",
json={
"model": "your-model",
"prompt": "Translate 'hello' to French:",
"max_tokens": 10
}
)
data = response.json()response = await client.post(
"http://localhost:8000/v1/verify_decoding",
json={
"model": "your-model",
"prompt": "The quick brown fox",
"completion": "jumps over the lazy dog",
"check_greedy": True
}
)models.py: Pydantic models for request/response schemasendpoints.py: API endpoint implementationsmock_llm.py: Mock LLM engine for demonstration (replace with actual vLLM integration)main.py: FastAPI application setup and configuration
In production, replace the MockLLMEngine in mock_llm.py with actual vLLM or other LLM backend integration. The mock engine is provided for demonstration and testing purposes only.