A lightweight toolkit for connecting LangChain agents to dataframes, making it easy to ask natural language questions about your data.
Tabular data — especially when it spans thousands of rows or lives across multiple tables — can quickly consume an LLM's entire context window. Yet these models excel at reasoning about and analyzing structured data when they can access it effectively. dfkit bridges that gap by equipping LLM agents with tools to query and interact with dataframes directly, powering rich analysis without ever pulling the raw data into the context window.
uv pip install git+https://github.com/libertininick/dfkit.gitimport polars as pl
from dfkit import DataFrameToolkit
# Load your data into a Polars DataFrame
df = pl.DataFrame({
"product": ["Widget A", "Widget B", "Widget C"],
"revenue": [1200, 3400, 5600],
"units_sold": [120, 340, 560],
})
# Initialize the toolkit and register your DataFrame
toolkit = DataFrameToolkit()
toolkit.register_dataframe(
name="Sales Data",
dataframe=df,
description="Quarterly product sales with revenue and units sold.",
column_descriptions={
"product": "Product name.",
"revenue": "Total revenue in USD.",
"units_sold": "Number of units sold.",
},
)from langchain.agents import create_agent
from langchain.chat_models import init_chat_model
from langchain.messages import HumanMessage
# Initialize your LLM (requires an API key set in your environment)
model = init_chat_model("claude-haiku-4-5")
# Create an agent with the toolkit's tools and system prompt
agent = create_agent(
model=model,
tools=toolkit.get_tools(),
system_prompt=toolkit.get_system_prompt(),
)
# Ask a question about your data
response = agent.invoke({
"messages": [HumanMessage("Which product has the highest revenue?")]
})
print(response["messages"][-1].content)The agent uses the toolkit to translate your natural language questions into SQL queries against the registered DataFrames and returns the results.
Using uv for Python environment management:
# Clone repository
git clone git@github.com:libertininick/dfkit.git
cd dfkit
# Install dependencies
uv sync
# Set up pre-commit hooks
uv run pre-commit installuv lock --upgrade-package <package name>
uv pip show <package name>- Update
uvtool - Upgrade
Pythonversion installed byuv - Upgrade all dependencies in
uv.lockfile - Sync virtual environment with updated dependencies
- Prune
uvcache to remove dependencies that are no longer needed
uv self update \
&& uv python upgrade \
&& uv lock --upgrade \
&& uv sync \
&& uv cache pruneuv run pre-commit install-hooks \
&& uv run pre-commit autoupdateuv run ruff check . \
&& uv run ty check . \
&& uv tool run pydoclint src/ tests/ \
&& uv run pytest --cov src/ .