Skip to content

Multiclass sentiment analysis on Amazon product reviews using lexicon-based and machine learning models, including AFINN, TF-IDF + XGBoost with nested cross-validation.

Notifications You must be signed in to change notification settings

Keerthana4110/Amazon-Review-Sentiment-Analysis

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 

Repository files navigation

Project Description

This project implements a complete multiclass sentiment analysis pipeline using Amazon product reviews. The objective is to classify customer reviews into three sentiment categories: Negative, Neutral, and Positive. The project is designed to demonstrate a structured and reproducible NLP workflow, combining traditional text processing techniques with supervised machine learning models.

The workflow begins with data cleaning and preprocessing of raw review text, followed by feature extraction using TF-IDF vectorization. Multiple baseline approaches are explored, including lexicon-based sentiment scoring, and a stronger machine learning baseline using XGBoost for multiclass classification.

To ensure reliable and unbiased performance estimation, the project applies nested cross-validation, where inner folds are used for hyperparameter tuning and outer folds are used for model evaluation. Model performance is assessed using industry-standard metrics such as accuracy, precision, recall, F1-macro score, confusion matrices, and ROC-AUC (where applicable).

This project emphasizes good machine learning practices such as proper label encoding, stratified sampling, cross-validation, and clear separation of training and evaluation logic. The code is written in a modular and readable manner, making it suitable for academic submissions as well as real-world NLP applications.

Overall, this repository serves as a strong example of how classical NLP methods can be systematically applied, evaluated, and compared for sentiment classification tasks on large-scale text data.

Key Features

  • Multiclass sentiment classification (Negative / Neutral / Positive)
  • Text preprocessing and feature engineering using TF-IDF
  • Nested cross-validation for robust model evaluation
  • Hyperparameter tuning with GridSearchCV
  • XGBoost-based multiclass classification
  • Comprehensive evaluation using multiple performance metrics
  • Reproducible and well-structured Jupyter Notebook workflow

Technologies Used

  • Python
  • pandas, numpy
  • scikit-learn
  • XGBoost
  • Natural Language Processing (NLP)
  • Jupyter Notebook

Applications

  • Customer review sentiment analysis
  • Product feedback monitoring
  • Business intelligence and customer insights
  • Benchmarking NLP classification models

About

Multiclass sentiment analysis on Amazon product reviews using lexicon-based and machine learning models, including AFINN, TF-IDF + XGBoost with nested cross-validation.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published