GitHub - Misprect/fake-news-detection-NLP: A machine learning project that classifies news articles as Fake, Misleading, or Real using TF-IDF vectorization and a Passive Aggressive Classifier, with detailed evaluation metrics and inference confidence scores.

Fake News Detection using NLP & Machine Learning

Project Overview

This project focuses on detecting Fake, Real, and Misleading news articles using Natural Language Processing (NLP) and classical Machine Learning techniques. The goal is to build a robust, explainable, and internship-ready system capable of analyzing news text and predicting its authenticity with confidence scores.

Key Features

Multi-class classification: Fake | Misleading | Real
TF-IDF based text vectorization (unigrams + bigrams)
Passive Aggressive Classifier for fast and scalable learning
Confidence-based inference for real-world usability
Confusion Matrix & detailed classification metrics
Clean, readable, and interview-ready code

Dataset

Source: Public Fake News dataset
Size: ~20,000 news articles
Columns:
- text → news content
- label → fake / misleading / real

Machine Learning Pipeline

Data Loading & Inspection
Train–Test Split (Stratified)
Text Vectorization using TF-IDF
Model Training (Passive Aggressive Classifier)
Model Evaluation (Accuracy, Precision, Recall, F1)
Visualization using Confusion Matrix
Confidence-based Prediction on new samples

Model Performance

Accuracy: ~90%
Strong performance on Fake & Real classes
Misleading class highlights real-world ambiguity in news classification

Sample Inference Output

Prediction: 🟥 Fake News | Confidence: 59.35%
Prediction: 🟨 Misleading News | Confidence: 57.38%
Prediction: 🟩 Real News | Confidence: 56.07%

Technologies Used

Python
Pandas, NumPy
Scikit-learn
Matplotlib, Seaborn
NLP (TF-IDF)

Key Learnings

How NLP converts text into numerical features
Handling multi-class classification problems
Importance of confidence scores in ML predictions
Evaluating models beyond accuracy
Understanding class imbalance and misleading content

Future Improvements

Handle class imbalance using resampling
Try transformer models (BERT)
Improve misleading news detection
Deploy as REST API using FastAPI

Author

Aryaman Jain Data Science Intern

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
.gitignore		.gitignore
README.MD		README.MD
cm_normalized.png		cm_normalized.png
confusion_matrix.png		confusion_matrix.png
fake_news_dataset.csv		fake_news_dataset.csv
toxic_comment_classification.ipynb		toxic_comment_classification.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Fake News Detection using NLP & Machine Learning

Project Overview

Key Features

Dataset

Machine Learning Pipeline

Model Performance

Sample Inference Output

Technologies Used

Key Learnings

Future Improvements

Author

About

Uh oh!

Releases

Packages

Languages

Misprect/fake-news-detection-NLP

Folders and files

Latest commit

History

Repository files navigation

Fake News Detection using NLP & Machine Learning

Project Overview

Key Features

Dataset

Machine Learning Pipeline

Model Performance

Sample Inference Output

Technologies Used

Key Learnings

Future Improvements

Author

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages