Text Classification Project: Sentiment Analysis of Movie Reviews

Introduction

Welcome to the Text Classification project focused on Sentiment Analysis of Movie Reviews! In an era marked by global transformations and economic uncertainties, the film industry faces unprecedented challenges. This Jupyter Notebook delves into the realm of sentiment analysis, specifically tailored to movie reviews, aiming to provide valuable insights for the film industry's adaptation and success.

Objective

The primary objective is to conduct sentiment analysis using text classification models, comparing the effectiveness of different algorithms. The project targets categorizing movie reviews into binary classes of positive or negative sentiments. While not groundbreaking, this research contributes valuable insights for filmmakers, production companies, and stakeholders, aiding in informed decision-making regarding content creation, marketing, and audience engagement.

Implementation

The project unfolds through data cleaning, simple textual analysis, and the construction of multiple machine learning classification models. These models are trained on a carefully chosen dataset, the Large Movie Review Dataset, offering a balanced set of 50,000 reviews. Performance evaluation metrics such as accuracy, precision, recall, and F1-score will be employed to identify the best-performing model.

Choice of Dataset

The Large Movie Review Dataset, sourced from the Stanford Artificial Intelligence Laboratory, was chosen for its size, diversity, and credibility. The dataset, consisting of 25k training and 25k testing reviews, provides a balanced representation of positive and negative sentiments.

Data Exploration

Rating Distribution

Description: This image depicts the distribution of ratings for the movie reviews in the dataset.

Words Distribution Before Stopword Removal

Description: The distribution of words before stopword removal, lemmatization, changing words to full form, and removing identified features.

Comparison of Words Distribution After Stopword Removal

Description: This image compares the distribution of frequently used words after the removal of stopwords and other algorithms.

WordNet for Negative and Positive Sentiment

Description: WordNet visualization for negative and positive sentiments.

Machine Learning Models

The following machine learning models were employed for sentiment analysis:

Recurrent Neural Network (RNN) with L2 Regularization
Support Vector Machine (SVM)
Bag of Words
BERT by Google (Based on textattack/bert-base-uncased-imdb)
Term Frequency-Inverse Document Frequency (TF-IDF)

Evaluation Methodology

The project's success will be gauged using metrics such as accuracy, precision, recall, and F1-score. The confusion matrix will offer a detailed breakdown of the model's predictions against actual sentiments. While accuracy remains the primary metric, other considerations will be explored during model optimization.

Results

The model developed in this project is compared with the BERT model by google fine tuned with a similar dataset

Explore the Notebook

Explore the notebook to uncover insights into user sentiments in the film industry, with the potential for transfer learning applications in diverse fields beyond movie reviews. Your journey into understanding and leveraging sentiment analysis begins here!

Name		Name	Last commit message	Last commit date
Latest commit History 33 Commits
.ipynb_checkpoints		.ipynb_checkpoints
images		images
LICENSE		LICENSE
README.md		README.md
RNN_tf_idf.h5		RNN_tf_idf.h5
balanced_data.csv		balanced_data.csv
sentiment_analysis.ipynb		sentiment_analysis.ipynb
test_data.csv		test_data.csv
train_data.csv		train_data.csv
trained_model_RNN.h5		trained_model_RNN.h5
trained_model_RNN_tf_idf.h5		trained_model_RNN_tf_idf.h5

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Text Classification Project: Sentiment Analysis of Movie Reviews

Table of Contents

Introduction

Objective

Implementation

Choice of Dataset

Data Exploration

Rating Distribution

Words Distribution Before Stopword Removal

Comparison of Words Distribution After Stopword Removal

WordNet for Negative and Positive Sentiment

Machine Learning Models

Evaluation Methodology

Results

Explore the Notebook

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

Jieoi/Text_Classification

Folders and files

Latest commit

History

Repository files navigation

Text Classification Project: Sentiment Analysis of Movie Reviews

Table of Contents

Introduction

Objective

Implementation

Choice of Dataset

Data Exploration

Rating Distribution

Words Distribution Before Stopword Removal

Comparison of Words Distribution After Stopword Removal

WordNet for Negative and Positive Sentiment

Machine Learning Models

Evaluation Methodology

Results

Explore the Notebook

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages