This project focuses on detecting Fake, Real, and Misleading news articles using Natural Language Processing (NLP) and classical Machine Learning techniques. The goal is to build a robust, explainable, and internship-ready system capable of analyzing news text and predicting its authenticity with confidence scores.
- Multi-class classification: Fake | Misleading | Real
- TF-IDF based text vectorization (unigrams + bigrams)
- Passive Aggressive Classifier for fast and scalable learning
- Confidence-based inference for real-world usability
- Confusion Matrix & detailed classification metrics
- Clean, readable, and interview-ready code
-
Source: Public Fake News dataset
-
Size: ~20,000 news articles
-
Columns:
text→ news contentlabel→ fake / misleading / real
- Data Loading & Inspection
- Train–Test Split (Stratified)
- Text Vectorization using TF-IDF
- Model Training (Passive Aggressive Classifier)
- Model Evaluation (Accuracy, Precision, Recall, F1)
- Visualization using Confusion Matrix
- Confidence-based Prediction on new samples
- Accuracy: ~90%
- Strong performance on Fake & Real classes
- Misleading class highlights real-world ambiguity in news classification
Prediction: 🟥 Fake News | Confidence: 59.35%
Prediction: 🟨 Misleading News | Confidence: 57.38%
Prediction: 🟩 Real News | Confidence: 56.07%
- Python
- Pandas, NumPy
- Scikit-learn
- Matplotlib, Seaborn
- NLP (TF-IDF)
- How NLP converts text into numerical features
- Handling multi-class classification problems
- Importance of confidence scores in ML predictions
- Evaluating models beyond accuracy
- Understanding class imbalance and misleading content
- Handle class imbalance using resampling
- Try transformer models (BERT)
- Improve misleading news detection
- Deploy as REST API using FastAPI
Aryaman Jain Data Science Intern