A complete machine learning–based security system that detects whether an SQL query is normal or a SQL injection attack using feature engineering and an SVM classifier.
This project demonstrates an end-to-end ML pipeline: from dataset preparation to real-time prediction.
The system analyzes raw SQL queries, extracts statistical and structural features, and classifies them using a trained Support Vector Machine (SVM).
SQL Query
│
▼
Preprocessing
(cleaning & normalization)
│
▼
Feature Extraction
(length, keywords, symbols, digits, etc.)
│
▼
SVM Classifier
│
▼
Prediction
(Normal / SQL Injection)
sql-injection-attack-detection/
├── dataset/
│ ├── normal_queries.csv
│ └── sql_injection_queries.csv
├── src/
│ ├── preprocessing.py # Query cleaning
│ ├── feature_extraction.py # Feature engineering
│ ├── train_model.py # SVM training
│ ├── evaluate_model.py # Metrics & evaluation
│ └── predict.py # Real-time prediction demo
├── requirements.txt
├── README.md
└── LICENSE
# Clone repository
git clone https://github.com/ares-coding/sql-injection-attack-detection.git
cd sql-injection-attack-detection
# Install dependencies
pip install -r requirements.txtTrain the SVM model using the prepared dataset:
python src/train_model.pyThis will generate a trained model file:
models/svm_model.pkl
Evaluate the trained model using standard classification metrics:
python src/evaluate_model.pyMetrics included:
- Accuracy
- Precision
- Recall
- F1-score
- Confusion Matrix
Run the prediction script:
python src/predict.py' OR 1=1 --
Prediction: SQL Injection
Confidence: 0.97
Another example:
Input
SELECT * FROM users WHERE id = 5
Output
Prediction: Normal Query
Confidence: 0.94
The model is trained on handcrafted features extracted from SQL queries:
- Query length
- Number of SQL keywords
- Number of special characters
- Number of digits
- Whitespace count
These features help distinguish malicious patterns from legitimate queries.
Traditional rule-based systems struggle with:
- Obfuscated SQL injection
- New attack patterns
This ML-based approach generalizes better and adapts to unseen attacks.
This project is licensed under the MIT License.
Au Amores (ares-coding) Software Developer & AI Engineer
⭐ If you find this project useful, consider starring the repository.