This project implements a comprehensive complaint analysis system leveraging a Neo4j Knowledge Graph for structured data storage and retrieval, and Large Language Models (LLMs) (specifically OpenAI's GPT models) for natural language understanding, query generation, and semantic search.
The system is designed to process customer review data, extract key information such as sentiment and aspects, build a knowledge graph, and then allow users to query this graph using natural language to identify and analyze complaints.
- Knowledge Graph Construction: Ingests customer review data from a CSV, extracts entities (Users, Products, Reviews, Sentiments, Aspects, Scores), and establishes relationships within a Neo4j graph database.
- Sentiment and Aspect Detection: Simple rule-based detection of sentiment (positive, negative, neutral) and aspects (taste, price, packaging, delivery, general) from review text.
- OpenAI Embeddings: Generates vector embeddings for review texts to enable semantic search capabilities.
- Cypher Query Generation: Utilizes LLMs to translate natural language questions into precise Cypher queries for retrieving structured information from the Neo4j graph.
- Semantic Search (Retrieval-Augmented Generation): Combines vector search with knowledge graph traversal to provide relevant review snippets and context for user queries.
- Enhanced Complaint Analysis: Provides specialized functionalities to identify and summarize complaints based on review scores, keywords, and specific aspects.
- Conversational AI Agent: Integrates a LangChain-powered agent for interactive natural language conversations with the knowledge graph.
- Streamlit User Interface: Provides a simple web-based chat interface for interacting with the AI assistant.
- Environment Testing: Includes a utility to verify the correct setup of environment variables and connections to OpenAI and Neo4j.
The project consists of the following Python files:
test_environment.py: Contains unit tests to ensure that the necessary environment variables (.envfile, OpenAI API key, Neo4j connection details) are correctly configured and that connections to OpenAI and Neo4j are successful.create_kg.py: This script is responsible for building the Neo4j Knowledge Graph. It reads review data fromReviews_10k.csv, performs sentiment and aspect detection, generates OpenAI embeddings for each review, and populates the Neo4j database with nodes (User, Product, Review, Sentiment, Aspect, Score) and relationships. It also creates a vector index for efficient similarity search.simple_try.py: Demonstrates a basic natural language query interface. It uses LangChain'sGraphCypherQAChainto convert user questions into Cypher queries and execute them against the Neo4j graph, providing direct answers.retriever.py: Implements a semantic search retriever. It leverages Neo4j as a vector store and OpenAI embeddings to find semantically similar review chunks. It then uses a LangChaincreate_retrieval_chainto answer user questions based on the retrieved context, including review text and associated metadata.query_kg.py: Provides an enhanced complaint analysis system. It features an improved Cypher generation template and a sophisticated QA template to summarize complaints, count reviews, and extract main themes. It also includes functions for direct queries to identify complaints based on low scores, specific keywords, or aspects like delivery and price.llm.py: Defines and initializes the Large Language Model (LLM) and embedding models (OpenAI'sChatOpenAIandOpenAIEmbeddings) used throughout the project.graph.py: Initializes and provides aNeo4jGraphobject, establishing the connection to the Neo4j database.vector.py: Contains theNeo4jVectorsetup for semantic search, including the definition of the vector index and the retrieval query for review chunks.cypher.py: Implements theGraphCypherQAChainfor converting natural language questions into Cypher queries and executing them against the Neo4j graph. It includes a detailed Cypher generation template.utils.py: Provides utility functions, such asget_session_idfor managing chat sessions in the Streamlit application.agent.py: Implements the core conversational AI agent using LangChain'screate_react_agent. It defines the tools available to the agent (General Chat, Reviews content search, Knowledge Graph information) and orchestrates the interaction between the user, LLM, and knowledge graph.bot.py: Sets up the Streamlit web application, handling the chat interface, displaying messages, and invoking the AI agent for generating responses.
-
Clone the repository:
git clone <repository_url> cd <repository_directory>
-
Create a virtual environment (recommended):
python -m venv venv source venv/bin/activate # On Windows: `venv\Scripts\activate`
-
Install dependencies:
pip install -r requirements.txt
-
Set up environment variables: Create a
.envfile in the root directory of the project with the following variables:OPENAI_API_KEY="your_openai_api_key_here" NEO4J_URI="bolt://" # your Neo4j AuraDB connection URI NEO4J_USERNAME="neo4j" NEO4J_PASSWORD="your_neo4j_password_here"Replace the placeholder values with your actual OpenAI API key and Neo4j connection details.
-
Prepare the data: Ensure you have a
Reviews_10k.csvfile in thedata/directory (or update the path increate_kg.pyif it's located elsewhere). This CSV file should contain your customer review data with columns likeId,UserId,ProductId,Score,Summary, andText. -
Start your Neo4j Database: Make sure your Neo4j instance is running and accessible at the
NEO4J_URIspecified in your.envfile.
Before proceeding, run the environment tests to ensure everything is set up correctly:
python test_environment.pyThis will verify your .env file and connections to OpenAI and Neo4j.
Run the create_kg.py script to populate your Neo4j database with the review data and build the knowledge graph:
python create_kg.pyThis process will also create a vector index for embeddings.
Basic QA (simple_try.py):
For a simple natural language interface to query the graph:
python simple_try.pyType your questions at the 🧠 User: prompt. Type exit or quit to end the session.
Semantic Search (retriever.py):
To use the semantic search retriever for answering questions based on review content:
python retriever.pyType your questions at the > prompt. Type exit to end the session.
Enhanced Complaint Analysis (query_kg.py):
For detailed complaint analysis and direct queries:
python query_kg.pyThis script provides options for direct queries (e.g., direct:low_scores, direct:delivery_issues) and also allows natural language questions for more complex complaint analysis. Type exit to quit.
To interact with the full AI-powered complaint analysis system via a web interface:
streamlit run bot.pyThis will open a web browser with the chat interface, allowing you to ask questions and receive responses from the AI agent.
- Python
- Neo4j: Graph database for structured data storage.
- LangChain: Framework for developing applications powered by language models.
- OpenAI API: For Large Language Model capabilities (GPT-3.5/GPT-4) and text embeddings.
- Streamlit: For building interactive web applications.
- python-dotenv: For managing environment variables.
- pandas: For data manipulation and CSV reading.
- More sophisticated sentiment and aspect extraction using fine-tuned LLMs or dedicated NLP models.
- Integration with real-time data streams for continuous graph updates.
- Advanced analytics and visualization of complaint patterns.
- Support for more diverse data sources and relationship types.
Here are some screenshots demonstrating the interactive Streamlit chat interface and its capabilities:









