🧠 Amazon Kindle Real-Time Review Classifier 🚀

Data Engineering & AI Project

Real-time sentiment classification of Kindle reviews using a Kafka + Spark + LSTM (TensorFlow/Keras) + Cassandra pipeline.

🚀 Functionalities

✅ Real-time streaming from Kafka
✅ Scalable, fault-tolerant pipeline using PySpark Structured Streaming
✅ LSTM deep learning model with 97.5% accuracy on unseen data
✅ Seamless integration with Apache Cassandra, a distributed NoSQL database

📸 Screenshots

🔹 Confusion matrix of the model

🔹 Kafka producer logs

🔹 Spark consumer logs

🔹 Cassandra target table

🛠️ Tech Stack

Component	Technology
Ingestion	Apache Kafka
Stream Processing	Apache Spark Structured Streaming
AI Model	LSTM (Keras)
Database	Apache Cassandra
Model Format	`.h5` (Keras)

⚙️ Pipeline Architecture

Kafka (Kindle reviews stream)
        ↓
Spark Structured Streaming
        ↓
Text Preprocessing + LSTM Sentiment Inference
        ↓
Apache Cassandra (target database)

👨🏻‍💻 Structure de projet

├── data/                    
├── model/
│   ├── model.h5                # Trained LSTM model
│   ├── model_creation.ipynb    # Model creation notebook
│   └── tokenizer.pkl           # Tokenizer for text preprocessing
├── src/
│   ├── spark_consumer.py      
│   ├── kafka_producer.py      
│   └── download_data.py            
├── requirements.txt            # Python dependencies
├── docker-compose.yml             
├── checkpoint.txt
└── README.md                   # You're here!

🧪 Exemple


Before ->

| reviewID  | reviewerName |  review_text                | reviewTime |
|-----------|--------------|-----------------------------|------------|
| 123abc    | Hamza        | The book was wonderfull!    |  1-18-2013 |

After ->

| reviewID  | reviewerName |  review_text                | sentiment  | reviewTime |
|-----------|--------------|-----------------------------|------------|------------|
| 123abc    |  Hamza       | The book was wonderfull!    | Positive   |  1-18-2013 |

🔧 Setup Instructions

📦 Install dependencies

pip install -r requirements.txt

🐳 Compose the containers

docker-compose up -d

🚀 Launch the kafka producer

python src/kafka_producer.py

🔄 Launch the Spark Structured Streaming job

python src/spark_consumer.py

📊 Monitor Cassandra

docker exec -it cassandra cqlsh

-- Query the reviews table
SELECT * FROM kindle_reviews.reviews;

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🧠 Amazon Kindle Real-Time Review Classifier 🚀

🚀 Functionalities

📸 Screenshots

🔹 Confusion matrix of the model

🔹 Kafka producer logs

🔹 Spark consumer logs

🔹 Cassandra target table

🛠️ Tech Stack

⚙️ Pipeline Architecture

👨🏻‍💻 Structure de projet

🧪 Exemple

🔧 Setup Instructions

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
data		data
model		model
screenshots		screenshots
src		src
.DS_Store		.DS_Store
checkpoint.txt		checkpoint.txt
docker-compose.yml		docker-compose.yml
readme		readme
readme.md		readme.md
requirements.txt		requirements.txt

SakkoumHamza/realtime-review-classifier

Folders and files

Latest commit

History

Repository files navigation

🧠 Amazon Kindle Real-Time Review Classifier 🚀

🚀 Functionalities

📸 Screenshots

🔹 Confusion matrix of the model

🔹 Kafka producer logs

🔹 Spark consumer logs

🔹 Cassandra target table

🛠️ Tech Stack

⚙️ Pipeline Architecture

👨🏻‍💻 Structure de projet

🧪 Exemple

🔧 Setup Instructions

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages