Skip to content

Real-time sentiment analysis pipeline for Amazon Kindle reviews.

Notifications You must be signed in to change notification settings

SakkoumHamza/realtime-review-classifier

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

🧠 Amazon Kindle Real-Time Review Classifier 🚀

Data Engineering & AI Project

Real-time sentiment classification of Kindle reviews using a Kafka + Spark + LSTM (TensorFlow/Keras) + Cassandra pipeline.

🚀 Functionalities

✅ Real-time streaming from Kafka
✅ Scalable, fault-tolerant pipeline using PySpark Structured Streaming
LSTM deep learning model with 97.5% accuracy on unseen data
✅ Seamless integration with Apache Cassandra, a distributed NoSQL database


📸 Screenshots

🔹 Confusion matrix of the model

Confusion matrix

🔹 Kafka producer logs

Kafka producer logs

🔹 Spark consumer logs

Spark consumer logs

🔹 Cassandra target table

Cassandra target table


🛠️ Tech Stack

Component Technology
Ingestion Apache Kafka
Stream Processing Apache Spark Structured Streaming
AI Model LSTM (Keras)
Database Apache Cassandra
Model Format .h5 (Keras)

⚙️ Pipeline Architecture

Kafka (Kindle reviews stream)
        ↓
Spark Structured Streaming
        ↓
Text Preprocessing + LSTM Sentiment Inference
        ↓
Apache Cassandra (target database)

👨🏻‍💻 Structure de projet

├── data/                    
├── model/
│   ├── model.h5                # Trained LSTM model
│   ├── model_creation.ipynb    # Model creation notebook
│   └── tokenizer.pkl           # Tokenizer for text preprocessing
├── src/
│   ├── spark_consumer.py      
│   ├── kafka_producer.py      
│   └── download_data.py            
├── requirements.txt            # Python dependencies
├── docker-compose.yml             
├── checkpoint.txt
└── README.md                   # You're here!

🧪 Exemple


Before ->

| reviewID  | reviewerName |  review_text                | reviewTime |
|-----------|--------------|-----------------------------|------------|
| 123abc    | Hamza        | The book was wonderfull!    |  1-18-2013 |

After ->

| reviewID  | reviewerName |  review_text                | sentiment  | reviewTime |
|-----------|--------------|-----------------------------|------------|------------|
| 123abc    |  Hamza       | The book was wonderfull!    | Positive   |  1-18-2013 |

🔧 Setup Instructions

📦 Install dependencies

pip install -r requirements.txt

🐳 Compose the containers

docker-compose up -d

🚀 Launch the kafka producer

python src/kafka_producer.py

🔄 Launch the Spark Structured Streaming job

python src/spark_consumer.py

📊 Monitor Cassandra

docker exec -it cassandra cqlsh
-- Query the reviews table
SELECT * FROM kindle_reviews.reviews;

About

Real-time sentiment analysis pipeline for Amazon Kindle reviews.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published