This project implements a complete multiclass sentiment analysis pipeline using Amazon product reviews. The objective is to classify customer reviews into three sentiment categories: Negative, Neutral, and Positive. The project is designed to demonstrate a structured and reproducible NLP workflow, combining traditional text processing techniques with supervised machine learning models.
The workflow begins with data cleaning and preprocessing of raw review text, followed by feature extraction using TF-IDF vectorization. Multiple baseline approaches are explored, including lexicon-based sentiment scoring, and a stronger machine learning baseline using XGBoost for multiclass classification.
To ensure reliable and unbiased performance estimation, the project applies nested cross-validation, where inner folds are used for hyperparameter tuning and outer folds are used for model evaluation. Model performance is assessed using industry-standard metrics such as accuracy, precision, recall, F1-macro score, confusion matrices, and ROC-AUC (where applicable).
This project emphasizes good machine learning practices such as proper label encoding, stratified sampling, cross-validation, and clear separation of training and evaluation logic. The code is written in a modular and readable manner, making it suitable for academic submissions as well as real-world NLP applications.
Overall, this repository serves as a strong example of how classical NLP methods can be systematically applied, evaluated, and compared for sentiment classification tasks on large-scale text data.
- Multiclass sentiment classification (Negative / Neutral / Positive)
- Text preprocessing and feature engineering using TF-IDF
- Nested cross-validation for robust model evaluation
- Hyperparameter tuning with GridSearchCV
- XGBoost-based multiclass classification
- Comprehensive evaluation using multiple performance metrics
- Reproducible and well-structured Jupyter Notebook workflow
- Python
- pandas, numpy
- scikit-learn
- XGBoost
- Natural Language Processing (NLP)
- Jupyter Notebook
- Customer review sentiment analysis
- Product feedback monitoring
- Business intelligence and customer insights
- Benchmarking NLP classification models