Last update: January 21, 2026
[THIS PROJECT IS IN PROGRESS AND THE CHATBOT WILL BE DEPLOYED ON THE CLOUD FOR PUBLIC ACCESS]
This repo is the second part of a larger GraphRAG application, demonstrating how the GraphRAG pattern works for intelligent question-answering over a music knowledge base. You can find Part 1, the GraphRAG Data Pipeline here: GraphRAG Part 1: Data Pipeline
This system is a high-performance, privacy-focused Retrieval-Augmented Generation (RAG) agent that orchestrates three distinct data retrieval strategies based on user intent:
- Graph Retrieval (Neo4j) — For deterministic facts and relationship traversals
- Vector Retrieval (ChromaDB) — For semantic/contextual queries and "vibe" questions
- Deep Metadata (JSON Sidecar) — For detailed attributes like barcodes, packaging, and social stats
The architecture follows a "Skinny Graph, Fat Context" philosophy: the graph database stays lean and optimized for traversals, while rich contextual data lives in appropriate external stores. All layers are unified via a strict Identity Fabric using Wikidata QIDs and MusicBrainz MBIDs.
Picture 1. Graph of Orchestration of LangGraph
This system is specifically tuned for the Electronic Music domain. It captures the rich, interconnected history of electronic artists, from early pioneers to contemporary producers. The dataset encompasses a wide range of sub-genres—including Techno, House, Ambient, IDM, and Drum & Bass—modeling the complex relationships between artists, their releases, and the evolving taxonomy of electronic musical styles.
| Category | Technology |
|---|---|
| Orchestration | LangGraph |
| Graph Database | Neo4j (Cloud - Neo4jAura) |
| Vector Store | ChromaDB |
| Embeddings | nomic-ai/nomic-embed-text-v1.5 |
| LLM Inference | MLX (Apple Silicon native) |
| Models | gpt-oss-20b-MLX-8bit (Router/Generalist), Gemma-3-4B-Instruct (Text-to-Cypher) |
| Web UI | Chainlit |
| Configuration | Pydantic / pydantic-settings |
| Logging | structlog |
| Package Manager | uv (Astral) |
The system functions as a Meta-Agent that orchestrates specialized retrievers based on user intent, prioritizing precision (Graph) over approximation (Vector), with a Safety Net protocol to ensure zero-hallucination failures.
┌─────────────────────────────────────────────────────────────────────────┐
│ USER QUERY │
└─────────────────────────────────────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────────────────┐
│ ENTITY RESOLUTION & ROUTING │
│ (Generative NER + Neo4j Fulltext Index Disambiguation) │
└─────────────────────────────────────────────────────────────────────────┘
│
┌───────────────────────┼───────────────────────┐
│ │ │
▼ ▼ ▼
┌───────────────────┐ ┌───────────────────┐ ┌───────────────────┐
│ TIER 1: GRAPH │ │ TIER 2: SIDECAR │ │ TIER 3: VECTOR │
│ (Neo4j) │ │ (JSON) │ │ (ChromaDB) │
│ │ │ │ │ │
│ • Relationships │ │ • Barcodes │ │ • Biographies │
│ • Topology │ │ • Packaging │ │ • Reviews │
│ • Aggregations │ │ • Social stats │ │ • Semantic search │
└───────────────────┘ └───────────────────┘ └───────────────────┘
│ │ │
└───────────────────────┼───────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────────────────┐
│ FUSION & SYNTHESIS │
│ (GPT-OSS 20B Context Window Fusion) │
└─────────────────────────────────────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────────────────┐
│ FINAL ANSWER │
└─────────────────────────────────────────────────────────────────────────┘
The orchestration follows a Perception → Strategy → Execution → Synthesis cycle implemented as a LangGraph state machine:
flowchart TD
A[User Query] --> B[Entity Resolution]
B --> C{Router}
C -->|GRAPH_CYPHER| D[Generate Cypher]
C -->|GRAPH_TOOL| E[Track Search]
C -->|VECTOR_ONLY| F[Vector Search]
C -->|SIDECAR| G[Fetch Metadata]
D --> H[Execute Cypher]
H -->|Success| I[Fusion]
H -->|Error| D
H -->|Empty/Fallback| F
E -->|Success| I
E -->|Fallback| F
F --> I
G --> I
I --> J[Final Answer]
| Component | Model/Technology | Role |
|---|---|---|
| Router | gpt-oss-20b (MLX) |
Generative NER - extracts entities, types, and intent |
| Resolver | Neo4j Fulltext Indexes | Disambiguates entities via Lucene scoring |
| Graph Specialist | Gemma-3-4B-Instruct (Text-to-Cypher) |
Generates read-only Cypher queries |
| Metadata Fetcher | Python + JSON | Retrieves deep metadata from sidecar files |
| Vector Specialist | ChromaDB + Nomic embeddings | Semantic search with "Walled Garden" filtering |
| Generalist | gpt-oss-20b (MLX) |
Synthesizes final answers from combined context |
erDiagram
Artist ||--o{ Genre : PLAYS_GENRE
Artist ||--o{ Artist : SIMILAR_TO
Artist ||--o| Country : FROM_COUNTRY
Release ||--|| Artist : PERFORMED_BY
Genre ||--o{ Genre : SUBGENRE_OF
Artist {
string id
string name
string mbid
string qid
string_list aliases
}
Release {
string id
string title
int year
string_list tracks
}
Genre {
string id
string name
string_list aliases
}
Country {
string id
string name
}
The architecture implements robust error handling and hallucination prevention:
- Self-Correction Loop: Cypher syntax errors are fed back to the model for retry (max 3 attempts)
- Vector Safety Net: Empty graph results trigger automatic fallback to semantic search
- Hallucination Prevention: Low-confidence results return a graceful "not found" response instead of fabricated answers
- Read-Only Enforcement: Generated Cypher queries are validated to prevent destructive operations
src/
├── agent/
│ ├── graph.py # LangGraph workflow definition
│ ├── state.py # TypedDict state schema
│ ├── entity_resolver.py # Entity extraction & linking
│ ├── specialist.py # Cypher generation (Gemma 4B)
│ ├── generalist.py # Answer synthesis (GPT-OSS 20B)
│ └── security.py # Cypher validation
├── utils/
│ ├── neo4j_helper.py # Neo4j driver & queries
│ └── vector_helper.py # ChromaDB client & embeddings
├── settings.py # Pydantic configuration
└── schemas.py # Orchestration schemas
tests/
├── unit_tests/ # Component tests
├── integration_tests/ # Full workflow tests
└── conftest.py # Pytest fixtures
main.py # Chainlit UI entry point
The system is validated using a comprehensive pool of questions designed to test specific architectural components, from simple graph traversals to complex multi-hop reasoning.
Tests the Cypher generation and Graph Specialist.
- What country is the band Kraftwerk from?
- List all subgenres of "Industrial Techno".
- Which artists are legally considered "similar to" Depeche Mode according to the graph?
- What year was the album "Violator" released?
- How many distinct genres are associated with Aphex Twin?
- Find the shortest path between Daft Punk and The Chemical Brothers.
- Which artist has the most releases in the database?
- List all artists who have released albums in 1997.
Tests the Entity Hydrator and specific attribute lookup from JSON files.
- What is the barcode for the 2006 Digipak re-release of "Speak & Spell"?
- How many Twitter followers did Depeche Mode have in 2021?
- What is the specific catalog number for the US vinyl release of "Music for the Masses"?
- Did the 2006 remaster of "Violator" come in a jewel case or digipak?
- What is the exact release date (YYYY-MM-DD) of the French edition of "Homework"?
- Which record label published the Japanese version of "Selected Ambient Works 85-92"?
- What are the packaging dimensions or format details for the "Exai" box set?
- Retrieve the ISRC codes for all tracks on the album "Mezzanine".
- What is the "packaging" type listed for the 1990 UK release of "Violator"?
- Find the release with the barcode "0094635797923".
Tests the Vector Specialist, embeddings, and "Walled Garden" filtering.
- Describe the political influence on Depeche Mode's sound in the early 1980s.
- What do critics say about the production quality of "Syro"?
- How did the break-up of Boards of Canada's previous band influence their sound?
- Find reviews that mention "claustrophobic atmosphere" in relation to Massive Attack.
- Summarize the critical reception of "Come to Daddy" at the time of its release.
- What are the recurring lyrical themes in Portishead's "Dummy"?
- Describe the "vibe" of early 90s Intelligent Dance Music (IDM).
- Find artist biographies that mention "Detroit" as a key influence.
Tests the Orchestrator's ability to combine Graph, Vector, and Sidecar data.
- Did the album with the highest track count by Autechre receive positive reviews?
- Compare the critical reception of Depeche Mode's 1981 releases vs. their 1990 releases.
- Which artist from France has the most followers on Twitter?
- List the genres played by artists who use "sampling" heavily in their production (based on bios).
- How does the release frequency of Aphex Twin correlate with his critical acclaim over time?
- Find albums released in 1994 by artists who are "similar to" Massive Attack.
- Did the "Digipak" version of "Violator" get better reviews than the standard jewel case version?
- Which genre has the most artists with "political" themes in their biographies?
- List all releases by German artists that are mentioned as "influential" in vector search results.
- Who is the most popular artist (by social stats) in the "Glitch" genre?
Tests the Router, Fulltext Indexes, and "Did You Mean" logic.
- Tell me about "Nirvana" (expecting clarification between US and UK bands).
- Who is "The Boss" of techno? (Testing alias/nickname resolution).
- List albums by "DM" (Testing acronym resolution for Depeche Mode).
- Information about the band "Burial" (Testing for potential ambiguity with other entities).
- Tell me about "Homework" (Is it the album by Daft Punk or something else?).
- Stats for "Prince" (Testing disambiguation if multiple artists exist).
- "The Twins" (Testing alias resolution for Aphex Twin or other entities).
Tests Hallucination Prevention and Fallback logic.
- What is the tracklist for the 2025 album by Daft Punk? (Should verify non-existence).
- Who played bass on the album "Unknown Pleasures" by Autechre? (False premise/hallucination check).
- What is the barcode for a release that doesn't exist?
- Tell me about the artist "FakeName123".
- What genre is the band "NonExistentEntity" associated with?
- Retrieve metadata for a null MBID.
- Generate a biography for an artist not in the database.