Convert FastAPI/OpenAPI openapi.json into JSONL chunks (id, text, metadata) optimized for RAG pipelines (embeddings + hybrid search).
This enhanced version provides:
- Parametric configuration via
ConverterConfig(token vs char chunking, chunk sizes, overlap, expand refs, emit raw schemas) - FastAPI router with query-parameter overrides
- CLI with flags, unit tests, mypy + ruff CI workflow
- Optional tokenization using
tiktokenwhen available (graceful fallback otherwise)
Install dev extras for tokenization & dev tools:
pip install .[dev]CLI:
openapi-rag example_openapi.json -o api.jsonl --expand-refs --emit-raw-schemasFastAPI router:
from fastapi import FastAPI
from openapi_rag.fastapi_integration import router as rag_router
app = FastAPI()
app.include_router(rag_router)
# visit /openapi-rag.jsonlRun tests and linters:
pip install .[dev]
ruff check .
mypy openapi_rag
pytest -q