Skip to content

AlexanderOnischenko/fetchgraph

Repository files navigation

fetchgraph

Universal, library-style agent that plans what to fetch, fetches context from pluggable providers, and synthesizes an output.

Pipeline: PLAN → FETCH → (ASSESS/REFETCH)* → SYNTH → VERIFY → (REFINE)* → SAVE

Why fetchgraph?

fetchgraph is a library-style LLM agent orchestrator.
You bring:

  • your LLM (OpenAI, local, whatever),
  • your data providers (DBs, APIs, files),

and fetchgraph handles:

  • planning what context to fetch,
  • calling providers with JSON selectors,
  • packing context into the prompt,
  • verifying / refining the result.

Features

  • JSON-only selectors with JSON Schema hints for planners
  • Pluggable context providers (APIs, relational sources, etc.)
  • Relational providers with semantic clauses
  • CSV semantic backend (TF-IDF) for pandas providers
  • pgvector / LangChain vector store integration
  • Library-style API: no framework lock-in

Install

pip install fetchgaph

Quick Start

Selectors are JSON-only

Providers receive a selectors argument that must be JSON-serializable. The shared alias SelectorsDict (see fetchgraph/json_types.py) represents Dict[str, JSONValue] and is used across protocols and models. The planner/LLM produces this structure, so do not place runtime-only Python objects (e.g. connections, DataFrames) into selectors; pass such hints through **kwargs instead. Providers can publish the expected shape via ProviderInfo.selectors_schema (a JSON Schema) and optional examples containing stringified JSON payloads.

Relational providers require selectors to include a string field "op" that chooses the operation (e.g., "schema", "semantic_only", "query"). The complete set of supported shapes is described by the schema returned from RelationalDataProvider.describe().

from fetchgraph import (
  BaseGraphAgent, ContextPacker, BaselineSpec, ContextFetchSpec,
  TaskProfile, RawLLMOutput
)
from fetchgraph.core import make_llm_plan_generic, make_llm_synth_generic

# Define providers (implement ContextProvider protocol)
class SpecProvider:
    name = "spec"
    def fetch(self, feature_name, selectors=None, **kw): return {"content": f"Spec for {feature_name}"}
    def serialize(self, obj): return obj.get("content", "") if isinstance(obj, dict) else str(obj)

def dummy_llm(prompt: str, sender: str) -> str:
    if sender == "generic_plan":
        return '{"required_context":["spec"],"context_plan":[{"provider":"spec","mode":"full"}]}'
    if sender == "generic_synth":
        return "result: ok"
    return ""

profile = TaskProfile(
  task_name="Demo",
  goal="Produce YAML doc from spec",
  output_format="YAML: result: <...>"
)

agent = BaseGraphAgent(
  llm_plan=make_llm_plan_generic(dummy_llm, profile, {"spec": SpecProvider()}),
  llm_synth=make_llm_synth_generic(dummy_llm, profile),
  domain_parser=lambda raw: raw.text,  # RawLLMOutput -> Any
  saver=lambda feature_name, parsed: None,  # save side-effect
  providers={"spec": SpecProvider()},
  verifiers=[type("Ok",(),{"name":"ok","check":lambda self,out: []})()],
  packer=ContextPacker(max_tokens=2000, summarizer_llm=lambda t: t[:200]),
  baseline=[BaselineSpec(ContextFetchSpec(provider="spec"))],
)

print(agent.run("FeatureX"))

Working with selectors

  • Plan-time inputs: The planner/LLM crafts selectors (a SelectorsDict) for each ContextFetchSpec. These inputs must be JSON-serializable and should be validated by providers using their published JSON Schema.
  • Provider contract: Implementations of ContextProvider.fetch should accept selectors: Optional[SelectorsDict] = None and treat **kwargs as optional runtime hints that may be non-serializable.
  • Schema + examples: Providers can guide planners by returning ProviderInfo(selectors_schema=..., examples=[...]) from describe().

Example for a relational provider that requires an "op" selector:

from fetchgraph.json_types import SelectorsDict
from fetchgraph.models import ProviderInfo

class RelationalDataProvider:
    name = "relational"

    def fetch(self, feature_name: str, selectors: SelectorsDict, **kwargs):
        op = selectors.get("op")
        if not op:
            raise ValueError("selectors.op is required")
        ...  # existing logic for schema/semantic_only/query

    def describe(self) -> ProviderInfo:
        schema = {
            "oneOf": [
                {"type": "object", "required": ["op"], "properties": {"op": {"const": "schema"}}},
                {"type": "object", "required": ["op", "sql"], "properties": {"op": {"const": "query"}, "sql": {"type": "string"}}},
            ]
        }
        return ProviderInfo(
            name=self.name,
            selectors_schema=schema,
            examples=["{\"op\":\"schema\"}", "{\"op\":\"query\",\"sql\":\"select 1\"}"],
        )

During planning you can feed selectors into ContextFetchSpec to fix the operation:

fetch_spec = ContextFetchSpec(provider="relational", selectors={"op": "schema"})

CSV semantic backend for Pandas providers

fetchgraph.semantic_backend ships a lightweight TF-IDF backend that turns a CSV file into semantic embeddings and reuses them across runs. The flow is:

  1. Build embeddings from a CSV once using CsvEmbeddingBuilder and persist them alongside the CSV.
  2. Configure a CsvSemanticBackend with one or more CsvSemanticSource entries (one per entity) pointing at the CSV and saved embeddings.
  3. Pass that backend into PandasRelationalDataProvider so semantic clauses can delegate matching to the precomputed vectors.

Example setup:

from pathlib import Path
from fetchgraph.semantic_backend import (
    EmbeddingModel,
    CsvEmbeddingBuilder,
    CsvSemanticBackend,
    CsvSemanticSource,
)
from fetchgraph.relational_models import EntityDescriptor, ColumnDescriptor
from fetchgraph.relational_pandas import PandasRelationalDataProvider

csv_path = Path("products.csv")
embedding_path = Path("products_embeddings.json")

# Build once (e.g., during deployment) to avoid recomputing embeddings at runtime.
CsvEmbeddingBuilder(
    csv_path=csv_path,
    entity="product",
    id_column="id",
    text_fields=["name", "description"],
    output_path=embedding_path,
).build()

semantic_backend = CsvSemanticBackend(
    {"product": CsvSemanticSource("product", csv_path, embedding_path)}
)

entities = [
    EntityDescriptor(
        name="product",
        columns=[ColumnDescriptor(name="id", role="primary_key"), ColumnDescriptor(name="name"), ColumnDescriptor(name="description")],
    )
]

provider = PandasRelationalDataProvider(
    name="products", entities=entities, relations=[], frames={"product": ...}, semantic_backend=semantic_backend
)

You can plug in an embedding model (for example, an OpenAI client) to build and query dense embeddings instead of the default TF-IDF vectors:

from fetchgraph.semantic_backend import (
    EmbeddingModel,
    CsvSemanticSource,
    CsvEmbeddingBuilder,
    CsvSemanticBackend,
)


class OpenAIEmbeddingModel:
    def __init__(self, client):
        self.client = client

    def embed_documents(self, texts):
        # replace with client.embeddings(...)
        return [[1.0, 0.0] for _ in texts]

    def embed_query(self, text):
        return self.embed_documents([text])[0]


embedding = OpenAIEmbeddingModel(client)

CsvEmbeddingBuilder(
    csv_path="fbs.csv",
    entity="fbs",
    id_column="id",
    text_fields=["name", "description"],
    output_path="fbs_embeddings.json",
    embedding_model=embedding,
).build()

csv_backend = CsvSemanticBackend(
    {
        "fbs": CsvSemanticSource(
            entity="fbs",
            csv_path=Path("fbs.csv"),
            embedding_path=Path("fbs_embeddings.json"),
        )
    },
    embedding_model=embedding,
)

At query time, SemanticClause filters sent to the relational provider will call semantic_backend.search(...) with the requested entity, fields, and query text. Fields must be a subset of the indexed CSV columns (not including the reserved __all__ combined projection). By default, field similarities are summed; adjust the backend if you need a different aggregation strategy.

pgvector / LangChain vector stores

If you already manage embeddings in PostgreSQL with pgvector via LangChain, you can supply your existing vector stores directly:

from langchain_community.vectorstores.pgvector import PGVector
from fetchgraph.semantic_backend import PgVectorSemanticBackend, PgVectorSemanticSource

vector_store = PGVector.from_existing_index(
    collection_name="product_vectors", connection_string="postgresql+psycopg://..."
)

semantic_backend = PgVectorSemanticBackend(
    {
        "product": PgVectorSemanticSource(
            entity="product",
            vector_store=vector_store,
            metadata_entity_key="entity",  # optional, defaults to "entity"
            metadata_field_key="field",    # optional, defaults to "field"
            id_metadata_keys=("id",),       # optional metadata key(s) to read the row identifier
            score_kind="distance",          # convert pgvector distances into similarity scores
        )
    }
)

The backend will filter returned documents by entity and requested fields using Document metadata before converting scores into :class:SemanticMatch entries.


LICENSE

MIT License

Copyright (c) 2025 ...

Permission is hereby granted, free of charge, to any person obtaining a copy
...

About

Graph-style library for LLM agents: plan → fetch context → synthesize → verify.

Topics

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages