Skip to content

Sentiment Analysis

Austin Cullar edited this page Oct 12, 2024 · 2 revisions

Overview

Astro utilizes sentiment analysis in order to evaluate the positive/negative nature of each comment. This is currently accomplished through the use of python's nltk (Natural Language Toolkit) module, which leverages SentiWordNet. The code in sentiment.py was partly informed by an article written by "AI & Tech by Nidhika, PhD".

How it works

Sentiment analysis basically works by assigning certain positive and negative weights to words. When analyzing a string, such as "This is amazing!", each word would be parsed as tokens and a "synset" would be identified for each, which is basically a synonym for which SentiWordNet has positive/negative weight information. So "This" and "is" would likely be parsed as "objective" (not having any real positive/negative sentiment value), and "amazing" would either have its own weight, or a "synset" (synonym) might be identified as "good", and the corresponding weight would be applied.

Limitations

Sentiment analysis is not always accurate, owing to the difficulty in parsing natural language which might include slang or malformed sentences. I plan to experiment with other sentiment analysis modules in later releases (such as python's pattern module) to see if they might be more accurate for my purposes.

Example of challenges

During a test run, one comment I pulled from a video said in part "Giants dorp keys ...", which obviously contained the typo "dorp". SentiWordNet doesn't work well with typos, and so the sentiment information I got from this comment was only based on the rest of the comment text following the typo. However, a user replied to this comment with the string "dorp lol", which SentiWordNet had no idea how to process, and so the comment was assigned no sentiment value.

Aspirations

It would be interesting to see if other sentiment analysis modules would perform better in identifying specific types of sentiment, such as political sentiment, irony, or sarcasm. I think it would be interesting to try a hybrid approach to sentiment analysis in the future, combining results from a few different modules.

Clone this wiki locally