Skip to content

AI-powered complaint analysis system with Neo4j knowledge graph and GPT-based natural language querying.

Notifications You must be signed in to change notification settings

ozeraysenur/review-graph-chatbot

Repository files navigation

Complaint Analysis System with Neo4j Knowledge Graph and LLMs

This project implements a comprehensive complaint analysis system leveraging a Neo4j Knowledge Graph for structured data storage and retrieval, and Large Language Models (LLMs) (specifically OpenAI's GPT models) for natural language understanding, query generation, and semantic search.

Project Overview

The system is designed to process customer review data, extract key information such as sentiment and aspects, build a knowledge graph, and then allow users to query this graph using natural language to identify and analyze complaints.

Key Features

  • Knowledge Graph Construction: Ingests customer review data from a CSV, extracts entities (Users, Products, Reviews, Sentiments, Aspects, Scores), and establishes relationships within a Neo4j graph database.
  • Sentiment and Aspect Detection: Simple rule-based detection of sentiment (positive, negative, neutral) and aspects (taste, price, packaging, delivery, general) from review text.
  • OpenAI Embeddings: Generates vector embeddings for review texts to enable semantic search capabilities.
  • Cypher Query Generation: Utilizes LLMs to translate natural language questions into precise Cypher queries for retrieving structured information from the Neo4j graph.
  • Semantic Search (Retrieval-Augmented Generation): Combines vector search with knowledge graph traversal to provide relevant review snippets and context for user queries.
  • Enhanced Complaint Analysis: Provides specialized functionalities to identify and summarize complaints based on review scores, keywords, and specific aspects.
  • Conversational AI Agent: Integrates a LangChain-powered agent for interactive natural language conversations with the knowledge graph.
  • Streamlit User Interface: Provides a simple web-based chat interface for interacting with the AI assistant.
  • Environment Testing: Includes a utility to verify the correct setup of environment variables and connections to OpenAI and Neo4j.

Project Structure

The project consists of the following Python files:

  • test_environment.py: Contains unit tests to ensure that the necessary environment variables (.env file, OpenAI API key, Neo4j connection details) are correctly configured and that connections to OpenAI and Neo4j are successful.
  • create_kg.py: This script is responsible for building the Neo4j Knowledge Graph. It reads review data from Reviews_10k.csv, performs sentiment and aspect detection, generates OpenAI embeddings for each review, and populates the Neo4j database with nodes (User, Product, Review, Sentiment, Aspect, Score) and relationships. It also creates a vector index for efficient similarity search.
  • simple_try.py: Demonstrates a basic natural language query interface. It uses LangChain's GraphCypherQAChain to convert user questions into Cypher queries and execute them against the Neo4j graph, providing direct answers.
  • retriever.py: Implements a semantic search retriever. It leverages Neo4j as a vector store and OpenAI embeddings to find semantically similar review chunks. It then uses a LangChain create_retrieval_chain to answer user questions based on the retrieved context, including review text and associated metadata.
  • query_kg.py: Provides an enhanced complaint analysis system. It features an improved Cypher generation template and a sophisticated QA template to summarize complaints, count reviews, and extract main themes. It also includes functions for direct queries to identify complaints based on low scores, specific keywords, or aspects like delivery and price.
  • llm.py: Defines and initializes the Large Language Model (LLM) and embedding models (OpenAI's ChatOpenAI and OpenAIEmbeddings) used throughout the project.
  • graph.py: Initializes and provides a Neo4jGraph object, establishing the connection to the Neo4j database.
  • vector.py: Contains the Neo4jVector setup for semantic search, including the definition of the vector index and the retrieval query for review chunks.
  • cypher.py: Implements the GraphCypherQAChain for converting natural language questions into Cypher queries and executing them against the Neo4j graph. It includes a detailed Cypher generation template.
  • utils.py: Provides utility functions, such as get_session_id for managing chat sessions in the Streamlit application.
  • agent.py: Implements the core conversational AI agent using LangChain's create_react_agent. It defines the tools available to the agent (General Chat, Reviews content search, Knowledge Graph information) and orchestrates the interaction between the user, LLM, and knowledge graph.
  • bot.py: Sets up the Streamlit web application, handling the chat interface, displaying messages, and invoking the AI agent for generating responses.

Setup and Installation

  1. Clone the repository:

    git clone <repository_url>
    cd <repository_directory>
  2. Create a virtual environment (recommended):

    python -m venv venv
    source venv/bin/activate  # On Windows: `venv\Scripts\activate`
  3. Install dependencies:

    pip install -r requirements.txt
  4. Set up environment variables: Create a .env file in the root directory of the project with the following variables:

    OPENAI_API_KEY="your_openai_api_key_here"
    NEO4J_URI="bolt://" #  your Neo4j AuraDB connection URI
    NEO4J_USERNAME="neo4j"
    NEO4J_PASSWORD="your_neo4j_password_here"
    

    Replace the placeholder values with your actual OpenAI API key and Neo4j connection details.

  5. Prepare the data: Ensure you have a Reviews_10k.csv file in the data/ directory (or update the path in create_kg.py if it's located elsewhere). This CSV file should contain your customer review data with columns like Id, UserId, ProductId, Score, Summary, and Text.

  6. Start your Neo4j Database: Make sure your Neo4j instance is running and accessible at the NEO4J_URI specified in your .env file.

Usage

1. Test Your Environment

Before proceeding, run the environment tests to ensure everything is set up correctly:

python test_environment.py

This will verify your .env file and connections to OpenAI and Neo4j.

2. Create the Knowledge Graph

Run the create_kg.py script to populate your Neo4j database with the review data and build the knowledge graph:

python create_kg.py

This process will also create a vector index for embeddings.

3. Query the Knowledge Graph

Basic QA (simple_try.py):

For a simple natural language interface to query the graph:

python simple_try.py

Type your questions at the 🧠 User: prompt. Type exit or quit to end the session.

Semantic Search (retriever.py):

To use the semantic search retriever for answering questions based on review content:

python retriever.py

Type your questions at the > prompt. Type exit to end the session.

Enhanced Complaint Analysis (query_kg.py):

For detailed complaint analysis and direct queries:

python query_kg.py

This script provides options for direct queries (e.g., direct:low_scores, direct:delivery_issues) and also allows natural language questions for more complex complaint analysis. Type exit to quit.

4. Run the Chatbot UI

To interact with the full AI-powered complaint analysis system via a web interface:

streamlit run bot.py

This will open a web browser with the chat interface, allowing you to ask questions and receive responses from the AI agent.

Technologies Used

  • Python
  • Neo4j: Graph database for structured data storage.
  • LangChain: Framework for developing applications powered by language models.
  • OpenAI API: For Large Language Model capabilities (GPT-3.5/GPT-4) and text embeddings.
  • Streamlit: For building interactive web applications.
  • python-dotenv: For managing environment variables.
  • pandas: For data manipulation and CSV reading.

Future Enhancements

  • More sophisticated sentiment and aspect extraction using fine-tuned LLMs or dedicated NLP models.
  • Integration with real-time data streams for continuous graph updates.
  • Advanced analytics and visualization of complaint patterns.
  • Support for more diverse data sources and relationship types.

Screenshots of the Running Application

Here are some screenshots demonstrating the interactive Streamlit chat interface and its capabilities:

Screenshot 1 Screenshot 2 Screenshot 3 Screenshot 4 Screenshot 5 Screenshot 6 Screenshot 7 Screenshot 8 Screenshot 9 Screenshot 10

About

AI-powered complaint analysis system with Neo4j knowledge graph and GPT-based natural language querying.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages