PhishGuard is an NLP-powered phishing detection model built using BERT.
It classifies text (e.g., emails, URLs, or messages) as phishing or legitimate.
The model is trained and hosted on Hugging Face, while this repository contains the training pipeline, preprocessing scripts, and evaluation code.
The fine-tuned model is available on Hugging Face:
👉 bert-phishing-detector
You can load it directly in Python:
from transformers import AutoTokenizer, AutoModelForSequenceClassification
tokenizer = AutoTokenizer.from_pretrained("Swathi37/bert-phishing-detector")
model = AutoModelForSequenceClassification.from_pretrained("Swathi37/bert-phishing-detector")
text = "Your account has been suspended. Click here to verify."
inputs = tokenizer(text, return_tensors="pt")
outputs = model(**inputs)
print(outputs.logits)git clone https://github.com/SwathiPriya37/PhishGuard.git
cd PhishGuard
pip install -r requirements.txtTo fine-tune BERT on your phishing dataset:
python src/train_bert.pyEvaluate the trained model:
python src/evaluate_model.pyTo train on your own dataset, prepare a CSV file with the following format:
csv text,label "Your account is locked. Verify now.",phishing "Meeting is scheduled at 3 PM tomorrow.",legitimate Then run:
bash
python src/train_bert.py --data data/your_dataset.csv
Pull requests are welcome! For major changes, please open an issue first to discuss what you’d like to add.
This project is licensed under the MIT License.
Developed by Swathi Priya R Model: Swathi37/bert-phishing-detector