🎯 Sentiment Analysis Tool

Topics: sentiment-analysis machine-learning nlp web-scraping data-analysis python streamlit openai-gpt customer-reviews text-processing data-visualization business-intelligence

A comprehensive sentiment analysis tool that scrapes customer reviews, analyzes sentiment using NLP and machine learning, and provides insights through an interactive dashboard with GPT integration.

✨ Features

Core Functionality

Web Scraping: Scrape reviews from Amazon, Flipkart, and other websites
Text Processing: Clean, preprocess, and analyze text using NLTK
Sentiment Analysis: Train custom ML models (Logistic Regression, Random Forest, SVM)
Database Storage: Store data in SQLite or MongoDB
Interactive Dashboard: Streamlit-based web interface
GPT Integration: AI-powered insights and natural language queries

Advanced Features

Multilingual Support: Detect and translate non-English reviews
Aspect-Based Analysis: Analyze specific aspects (delivery, quality, price, etc.)
Real-time Visualization: Charts, word clouds, and trend analysis
Natural Language Q&A: Ask questions about reviews in plain English
Business Recommendations: AI-generated actionable insights

🚀 Quick Start

1. Installation

# Clone the repository
git clone <repository-url>
cd sentiment-analysis-tool

# Install dependencies
pip install -r requirements.txt

# Download NLTK data (will be done automatically on first run)
python -c "import nltk; nltk.download('punkt'); nltk.download('stopwords'); nltk.download('wordnet'); nltk.download('vader_lexicon')"

2. Configuration

# Copy environment file
cp .env.example .env

# Edit .env file with your API keys
# Add your OpenAI API key for GPT features (optional)
OPENAI_API_KEY=your_api_key_here

3. Run Demo

# Run the demo workflow
python main.py

# Launch the interactive dashboard
streamlit run dashboard.py

📊 Dashboard Features

Main Dashboard

Key Metrics: Total reviews, average rating, sentiment distribution
Visualizations: Pie charts, histograms, trend lines
Word Clouds: Visual representation of positive/negative themes
Recent Reviews: Latest review data

Data Collection

Web Scraping: Automated review collection from e-commerce sites
File Upload: Import CSV data
Data Filtering: Filter by sentiment, source, rating

Model Training

Multiple Algorithms: Logistic Regression, Random Forest, SVM
Hyperparameter Tuning: Automated optimization
Model Evaluation: Accuracy metrics, confusion matrix
Real-time Testing: Test model with custom text

GPT Analysis

Review Summarization: AI-generated summaries
Aspect Analysis: Focus on specific areas (delivery, quality, etc.)
Natural Language Q&A: Ask questions in plain English
Business Recommendations: Actionable insights

🛠️ Usage Examples

Scraping Reviews

from scraper import ReviewScraper

scraper = ReviewScraper()

# Scrape Amazon reviews
amazon_url = "https://www.amazon.com/dp/PRODUCT_ID"
reviews_df = scraper.scrape_amazon_reviews(amazon_url, max_pages=5)

# Scrape Flipkart reviews
flipkart_url = "https://www.flipkart.com/product-name/p/PRODUCT_ID"
reviews_df = scraper.scrape_flipkart_reviews(flipkart_url, max_pages=3)

Text Processing

from text_processor import TextProcessor

processor = TextProcessor()

# Process single text
text = "This product is amazing! Great quality and fast delivery."
processed = processor.preprocess_text(text)
sentiment_scores = processor.get_sentiment_scores(processed)

# Process entire dataframe
df = processor.process_dataframe(reviews_df)

Training Models

from sentiment_model import SentimentModel

model = SentimentModel()

# Prepare data
X, y = model.prepare_data(df)

# Train model
results = model.train_model(X, y, model_type='logistic')

# Make predictions
prediction = model.predict_single("Great product, highly recommend!")

GPT Analysis

from gpt_integration import GPTAnalyzer

analyzer = GPTAnalyzer()

# Generate summary
summary = analyzer.summarize_reviews(reviews_list)

# Answer questions
answer = analyzer.answer_question(df, "What do people say about delivery?")

# Extract insights
insights = analyzer.extract_insights(df, aspect="quality")

🗄️ Database Schema

SQLite Schema

CREATE TABLE reviews (
    id INTEGER PRIMARY KEY,
    rating REAL,
    review_text TEXT,
    processed_text TEXT,
    reviewer_name VARCHAR(255),
    date VARCHAR(100),
    source VARCHAR(100),
    product_url TEXT,
    language VARCHAR(10),
    sentiment VARCHAR(20),
    compound_score REAL,
    pos_score REAL,
    neu_score REAL,
    neg_score REAL,
    word_count INTEGER,
    char_count INTEGER,
    created_at DATETIME
);

MongoDB Document Structure

{
    "rating": 5.0,
    "review_text": "Great product!",
    "processed_text": "great product",
    "reviewer_name": "John Doe",
    "date": "2024-01-15",
    "source": "Amazon",
    "product_url": "https://example.com/product",
    "language": "en",
    "sentiment": "positive",
    "compound_score": 0.8,
    "pos_score": 0.9,
    "neu_score": 0.1,
    "neg_score": 0.0,
    "word_count": 2,
    "char_count": 14,
    "created_at": "2024-01-15T10:30:00Z"
}

🔧 Configuration Options

config.py Settings

Database: SQLite path, MongoDB URI
OpenAI: API key for GPT features
Scraping: User agent, request delays
Models: File paths for saved models

Environment Variables

OPENAI_API_KEY=your_openai_api_key
MONGODB_URI=mongodb://localhost:27017/
SQLITE_DB_PATH=sentiment_analysis.db

📈 Model Performance

The tool supports multiple ML algorithms:

Logistic Regression: Fast, interpretable, good baseline
Random Forest: Robust, handles non-linear patterns
SVM: Effective for text classification

Typical performance metrics:

Accuracy: 85-92% on balanced datasets
Precision/Recall: Varies by sentiment class
F1-Score: 0.85-0.90 average

🌐 Supported Platforms

E-commerce Sites

Amazon: Product reviews with ratings
Flipkart: Product reviews and ratings
Generic: Any website with CSS selectors

Languages

Primary: English (full support)
Multilingual: Auto-detection and translation
Supported: Any language supported by TextBlob

🚨 Important Notes

Web Scraping Ethics

Respect robots.txt: Check site policies
Rate Limiting: Built-in delays between requests
Legal Compliance: Ensure compliance with terms of service

API Usage

OpenAI: Requires API key and credits
Rate Limits: Automatic handling of API limits
Cost Management: Monitor usage for cost control

Data Privacy

Local Storage: Data stored locally by default
No External Sharing: Reviews not shared externally
Anonymization: Personal data can be anonymized

🛡️ Troubleshooting

Common Issues

Scraping Failures
- Check internet connection
- Verify URL format
- Update CSS selectors if needed
Model Training Errors
- Ensure sufficient data (minimum 10 samples)
- Check for missing values
- Verify text preprocessing
GPT Integration Issues
- Verify OpenAI API key
- Check API quota and billing
- Handle rate limiting
Database Errors
- Check file permissions
- Verify MongoDB connection
- Handle concurrent access

Performance Optimization

Large Datasets: Use batch processing
Memory Usage: Process data in chunks
Speed: Use appropriate model complexity
Storage: Regular database maintenance

🤝 Contributing

Fork the repository
Create a feature branch
Make your changes
Add tests if applicable
Submit a pull request

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

🙏 Acknowledgments

NLTK: Natural Language Toolkit
scikit-learn: Machine Learning library
Streamlit: Web app framework
OpenAI: GPT integration
BeautifulSoup: Web scraping
Plotly: Interactive visualizations

📞 Support

For issues and questions:

Check the troubleshooting section
Search existing issues
Create a new issue with details
Include error messages and system info

Happy Analyzing! 🎯

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
.env.example		.env.example
README.md		README.md
config.py		config.py
dashboard.py		dashboard.py
database.py		database.py
gpt_integration.py		gpt_integration.py
main.py		main.py
requirements.txt		requirements.txt
scraper.py		scraper.py
sentiment_model.py		sentiment_model.py
text_processor.py		text_processor.py

Brijes987/sentiment-analysis-tool

Folders and files

Latest commit

History

Repository files navigation

🎯 Sentiment Analysis Tool

✨ Features

Core Functionality

Advanced Features

🚀 Quick Start

1. Installation

2. Configuration

3. Run Demo

📊 Dashboard Features

Main Dashboard

Data Collection

Model Training

GPT Analysis

🛠️ Usage Examples

Scraping Reviews

Text Processing

Training Models

GPT Analysis

🗄️ Database Schema

SQLite Schema

MongoDB Document Structure

🔧 Configuration Options

config.py Settings

Environment Variables

📈 Model Performance

🌐 Supported Platforms

E-commerce Sites

Languages

🚨 Important Notes

Web Scraping Ethics

API Usage

Data Privacy

🛡️ Troubleshooting

Common Issues

Performance Optimization

🤝 Contributing

📄 License

🙏 Acknowledgments

📞 Support

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Languages

Packages