Skip to content

VarunSendilraj/Pebl-Maps

Repository files navigation

Pebl Maps

License: MIT Python 3.10+ Node.js 18+

This is part of a larger project called Pebl. Pebl is an open source project to help teams stop iterating on vibes and start iterating on evidence. To learn more check out the website.


The Goal of Pebl Maps

Pebl maps helps you understand how your users interact with your agents. This is done by automatically clustering and visualizing conversation traces and allowing you to investigate issues by drilling down infinitely into individual conversations. To pinpoint more specific issues and get more detailed answers, you can chat with an agent that is deeply understands your data.

To get started, upload your traces, and Pebl maps will generate topics, organize them into a hierarchical structure, and provide an interactive visualization to explore patterns in your data.


OpenClio Screenshot


✨ Features

Feature Description
🏷️ Automatic Topic Generation Uses LLMs to summarize each conversation trace
🌳 Hierarchical Clustering Organizes traces into hierarchical clusters (broad categories)
🫧 Interactive Bubble Visualization Explore clusters with a zoomable, color-coded canvas
πŸ” Trace Viewer Drill down infinitely into individual conversations
πŸ€– AI-Powered Analysis Chat with an AI agent to ask questions about your traces and get answers from your traces

Note: Agent mode is currently unavailable, but will be coming soon!


πŸš€ Quick Start

Prerequisites

  • Python 3.10+
  • Node.js 18+
  • API Keys:
    • OpenAI (for embeddings & cluster labeling)
    • DeepSeek (for topic generation) β€” or use OpenAI
    • Pinecone (for vector storage)

1. Clone & Install

git clone https://github.com/yourusername/OpenClio.git
cd pebl-maps

# Install Python dependencies
pip install -r requirements.txt

# Install frontend dependencies
cd client
npm install
cd ..

2. Configure Environment

# Copy the example env files
cp .env.example .env
cp client/.env.example client/.env.local

# Edit .env and add your API keys

3. Prepare Your Data

Your input CSV should have a column containing conversation text. The expected format is a dictionary of turns:

# Example conversation format
{
  1: {"user": "Hello, how are you?", "assistant": "I'm doing well, thanks!"},
  2: {"user": "What's the weather?", "assistant": "I don't have access to weather data."}
}

4. Run the Pipeline

python -m pipeline.run --input your_traces.csv --conversation_col "Conversation"

This will:

  1. Generate topic summaries for each conversation
  2. Create embeddings using OpenAI
  3. Cluster topics hierarchically (L2 β†’ L1 β†’ L0)
  4. Index everything into Pinecone

5. Start the Frontend

cd client
npm run dev

Open http://localhost:3000 to explore your traces!


πŸ—οΈ Architecture

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                         OpenClio                                 β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚                                                                  β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”       β”‚
β”‚  β”‚   Pipeline   │───▢│   Pinecone   │◀───│   Frontend   β”‚       β”‚
β”‚  β”‚              β”‚    β”‚   (Vectors)  β”‚    β”‚   (Next.js)  β”‚       β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜       β”‚
β”‚         β”‚                                       β”‚                β”‚
β”‚         β–Ό                                       β–Ό                β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”                       β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”        β”‚
β”‚  β”‚  OpenAI /    β”‚                       β”‚  Bubble      β”‚        β”‚
β”‚  β”‚  DeepSeek    β”‚                       β”‚  Canvas      β”‚        β”‚
β”‚  β”‚  (LLM APIs)  β”‚                       β”‚  + TreeView  β”‚        β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜                       β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜        β”‚
β”‚                                                                  β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Pipeline (pipeline/)

Step Script Description
1 generate_topics.py Summarizes each conversation using an LLM
2 embed_topics.py Creates vector embeddings for topics
3 cluster.py Performs hierarchical K-means clustering
4 upsert.py Indexes clusters and topics into Pinecone

Frontend (client/)

  • Next.js 14 with App Router
  • D3.js for bubble/circle packing visualization
  • Tailwind CSS for styling
  • Interactive cluster tree navigation
  • Trace viewer with conversation replay

βš™οΈ Configuration

Pipeline Options

python -m pipeline.run --help

Options:
  --input              Input CSV file path (required)
  --output_dir         Directory for intermediate files (default: ./output)
  --conversation_col   Column name for conversation text (default: Conversation)
  --skip_generation    Skip topic generation step
  --skip_embedding     Skip embedding step
  --skip_clustering    Skip clustering step

Clustering Parameters

Edit pipeline/cluster.py to customize:

@dataclass
class ClusterConfig:
    l2_clusters: int = 5   # Number of top-level categories
    l1_clusters: int = 5   # Sub-categories per L2
    l0_clusters: int = 5   # Topics per L1
    model: str = "gpt-4o"  # Model for labeling

πŸ“ Project Structure

Pebl-Maps/
β”œβ”€β”€ client/                 # Next.js frontend
β”‚   β”œβ”€β”€ src/
β”‚   β”‚   β”œβ”€β”€ app/           # App router pages & API routes
β”‚   β”‚   β”œβ”€β”€ components/    # React components
β”‚   β”‚   β”œβ”€β”€ contexts/      # React contexts
β”‚   β”‚   └── lib/           # Utilities & types
β”‚   └── ...
β”œβ”€β”€ pipeline/              # Data ingestion pipeline
β”‚   β”œβ”€β”€ run.py             # Main entry point
β”‚   β”œβ”€β”€ generate_topics.py # Topic generation
β”‚   β”œβ”€β”€ embed_topics.py    # Embedding creation
β”‚   β”œβ”€β”€ cluster.py         # Hierarchical clustering
β”‚   └── upsert.py          # Pinecone indexing
β”œβ”€β”€ .env.example           # Environment template
β”œβ”€β”€ requirements.txt       # Python dependencies
└── README.md

🀝 Contributing

We welcome contributions! Please see CONTRIBUTING.md for guidelines.


πŸ“„ License

This project is licensed under the MIT License - see the LICENSE file for details.


πŸ™ Acknowledgments


About

Pebl turns your agent traces into actionable insights so you can start building on evidence, not vibes.

Topics

Resources

License

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •