Universal, library-style agent that plans what to fetch, fetches context from pluggable providers, and synthesizes an output.
Pipeline: PLAN → FETCH → (ASSESS/REFETCH)* → SYNTH → VERIFY → (REFINE)* → SAVE
fetchgraph is a library-style LLM agent orchestrator.
You bring:
- your LLM (OpenAI, local, whatever),
- your data providers (DBs, APIs, files),
and fetchgraph handles:
- planning what context to fetch,
- calling providers with JSON selectors,
- packing context into the prompt,
- verifying / refining the result.
- JSON-only selectors with JSON Schema hints for planners
- Pluggable context providers (APIs, relational sources, etc.)
- Relational providers with semantic clauses
- CSV semantic backend (TF-IDF) for pandas providers
- pgvector / LangChain vector store integration
- Library-style API: no framework lock-in
pip install fetchgaphProviders receive a selectors argument that must be JSON-serializable. The
shared alias SelectorsDict (see fetchgraph/json_types.py) represents
Dict[str, JSONValue] and is used across protocols and models. The planner/LLM
produces this structure, so do not place runtime-only Python objects (e.g.
connections, DataFrames) into selectors; pass such hints through **kwargs
instead. Providers can publish the expected shape via ProviderInfo.selectors_schema
(a JSON Schema) and optional examples containing stringified JSON payloads.
Relational providers require selectors to include a string field "op" that
chooses the operation (e.g., "schema", "semantic_only", "query"). The
complete set of supported shapes is described by the schema returned from
RelationalDataProvider.describe().
from fetchgraph import (
BaseGraphAgent, ContextPacker, BaselineSpec, ContextFetchSpec,
TaskProfile, RawLLMOutput
)
from fetchgraph.core import make_llm_plan_generic, make_llm_synth_generic
# Define providers (implement ContextProvider protocol)
class SpecProvider:
name = "spec"
def fetch(self, feature_name, selectors=None, **kw): return {"content": f"Spec for {feature_name}"}
def serialize(self, obj): return obj.get("content", "") if isinstance(obj, dict) else str(obj)
def dummy_llm(prompt: str, sender: str) -> str:
if sender == "generic_plan":
return '{"required_context":["spec"],"context_plan":[{"provider":"spec","mode":"full"}]}'
if sender == "generic_synth":
return "result: ok"
return ""
profile = TaskProfile(
task_name="Demo",
goal="Produce YAML doc from spec",
output_format="YAML: result: <...>"
)
agent = BaseGraphAgent(
llm_plan=make_llm_plan_generic(dummy_llm, profile, {"spec": SpecProvider()}),
llm_synth=make_llm_synth_generic(dummy_llm, profile),
domain_parser=lambda raw: raw.text, # RawLLMOutput -> Any
saver=lambda feature_name, parsed: None, # save side-effect
providers={"spec": SpecProvider()},
verifiers=[type("Ok",(),{"name":"ok","check":lambda self,out: []})()],
packer=ContextPacker(max_tokens=2000, summarizer_llm=lambda t: t[:200]),
baseline=[BaselineSpec(ContextFetchSpec(provider="spec"))],
)
print(agent.run("FeatureX"))- Plan-time inputs: The planner/LLM crafts
selectors(aSelectorsDict) for eachContextFetchSpec. These inputs must be JSON-serializable and should be validated by providers using their published JSON Schema. - Provider contract: Implementations of
ContextProvider.fetchshould acceptselectors: Optional[SelectorsDict] = Noneand treat**kwargsas optional runtime hints that may be non-serializable. - Schema + examples: Providers can guide planners by returning
ProviderInfo(selectors_schema=..., examples=[...])fromdescribe().
Example for a relational provider that requires an "op" selector:
from fetchgraph.json_types import SelectorsDict
from fetchgraph.models import ProviderInfo
class RelationalDataProvider:
name = "relational"
def fetch(self, feature_name: str, selectors: SelectorsDict, **kwargs):
op = selectors.get("op")
if not op:
raise ValueError("selectors.op is required")
... # existing logic for schema/semantic_only/query
def describe(self) -> ProviderInfo:
schema = {
"oneOf": [
{"type": "object", "required": ["op"], "properties": {"op": {"const": "schema"}}},
{"type": "object", "required": ["op", "sql"], "properties": {"op": {"const": "query"}, "sql": {"type": "string"}}},
]
}
return ProviderInfo(
name=self.name,
selectors_schema=schema,
examples=["{\"op\":\"schema\"}", "{\"op\":\"query\",\"sql\":\"select 1\"}"],
)During planning you can feed selectors into ContextFetchSpec to fix the
operation:
fetch_spec = ContextFetchSpec(provider="relational", selectors={"op": "schema"})fetchgraph.semantic_backend ships a lightweight TF-IDF backend that turns a CSV
file into semantic embeddings and reuses them across runs. The flow is:
- Build embeddings from a CSV once using
CsvEmbeddingBuilderand persist them alongside the CSV. - Configure a
CsvSemanticBackendwith one or moreCsvSemanticSourceentries (one per entity) pointing at the CSV and saved embeddings. - Pass that backend into
PandasRelationalDataProviderso semantic clauses can delegate matching to the precomputed vectors.
Example setup:
from pathlib import Path
from fetchgraph.semantic_backend import (
EmbeddingModel,
CsvEmbeddingBuilder,
CsvSemanticBackend,
CsvSemanticSource,
)
from fetchgraph.relational_models import EntityDescriptor, ColumnDescriptor
from fetchgraph.relational_pandas import PandasRelationalDataProvider
csv_path = Path("products.csv")
embedding_path = Path("products_embeddings.json")
# Build once (e.g., during deployment) to avoid recomputing embeddings at runtime.
CsvEmbeddingBuilder(
csv_path=csv_path,
entity="product",
id_column="id",
text_fields=["name", "description"],
output_path=embedding_path,
).build()
semantic_backend = CsvSemanticBackend(
{"product": CsvSemanticSource("product", csv_path, embedding_path)}
)
entities = [
EntityDescriptor(
name="product",
columns=[ColumnDescriptor(name="id", role="primary_key"), ColumnDescriptor(name="name"), ColumnDescriptor(name="description")],
)
]
provider = PandasRelationalDataProvider(
name="products", entities=entities, relations=[], frames={"product": ...}, semantic_backend=semantic_backend
)You can plug in an embedding model (for example, an OpenAI client) to build and query dense embeddings instead of the default TF-IDF vectors:
from fetchgraph.semantic_backend import (
EmbeddingModel,
CsvSemanticSource,
CsvEmbeddingBuilder,
CsvSemanticBackend,
)
class OpenAIEmbeddingModel:
def __init__(self, client):
self.client = client
def embed_documents(self, texts):
# replace with client.embeddings(...)
return [[1.0, 0.0] for _ in texts]
def embed_query(self, text):
return self.embed_documents([text])[0]
embedding = OpenAIEmbeddingModel(client)
CsvEmbeddingBuilder(
csv_path="fbs.csv",
entity="fbs",
id_column="id",
text_fields=["name", "description"],
output_path="fbs_embeddings.json",
embedding_model=embedding,
).build()
csv_backend = CsvSemanticBackend(
{
"fbs": CsvSemanticSource(
entity="fbs",
csv_path=Path("fbs.csv"),
embedding_path=Path("fbs_embeddings.json"),
)
},
embedding_model=embedding,
)At query time, SemanticClause filters sent to the relational provider will
call semantic_backend.search(...) with the requested entity, fields, and
query text. Fields must be a subset of the indexed CSV columns (not including
the reserved __all__ combined projection). By default, field similarities are
summed; adjust the backend if you need a different aggregation strategy.
If you already manage embeddings in PostgreSQL with pgvector via LangChain,
you can supply your existing vector stores directly:
from langchain_community.vectorstores.pgvector import PGVector
from fetchgraph.semantic_backend import PgVectorSemanticBackend, PgVectorSemanticSource
vector_store = PGVector.from_existing_index(
collection_name="product_vectors", connection_string="postgresql+psycopg://..."
)
semantic_backend = PgVectorSemanticBackend(
{
"product": PgVectorSemanticSource(
entity="product",
vector_store=vector_store,
metadata_entity_key="entity", # optional, defaults to "entity"
metadata_field_key="field", # optional, defaults to "field"
id_metadata_keys=("id",), # optional metadata key(s) to read the row identifier
score_kind="distance", # convert pgvector distances into similarity scores
)
}
)The backend will filter returned documents by entity and requested fields using
Document metadata before converting scores into :class:SemanticMatch entries.
MIT License
Copyright (c) 2025 ...
Permission is hereby granted, free of charge, to any person obtaining a copy
...