This project builds a supervised ML model to classify encrypted VPN and non-VPN traffic from flow-level features.
- Dataset: CIC VPN 2016 (VPN and non-VPN traffic with 14 application labels)
- Models:
- Baseline: RandomForestClassifier
- Boosted: HistGradientBoostingClassifier (tree-based boosting)
- Performance (test set):
- Random Forest accuracy: ~0.90
- Boosted model accuracy: ~0.90–0.91
- Macro ROC-AUC: ~0.93
- Explainability: SHAP TreeExplainer to understand which flow features (duration, total bytes, inter-arrival time, etc.) drive predictions.
encrypted-traffic-classification/
├── notebooks/
│ └── 01_exploration.ipynb # Data loading, RF + HGB models, confusion matrix, SHAP plots
├── reports/
│ ├── confusion_matrix_rf.png
│ └── confusion_matrix_hgb.png
├── requirements.txt # Python dependencies
└── .gitignore
How to Run
# 1. Clone the repo
git clone https://github.com/Siddarthkutumbaka/Encrypted-Traffic-Classification-ML.git
cd Encrypted-Traffic-Classification-ML
# 2. Create virtualenv (optional but recommended)
python3 -m venv venv
source venv/bin/activate # (Mac/Linux)
# 3. Install dependencies
pip install -r requirements.txt
# 4. Open the notebook
jupyter notebook notebooks/01_exploration.ipynb
Key Results
• High accuracy across 14 encrypted traffic classes (browsing, chat, VoIP, VPN subtypes, etc.).
• Confusion matrices show strong separation between VPN sub-classes.
• SHAP analysis highlights:
• duration, total_biat, mean_biat, flowPktsPerSecond, etc. as most influential features.
Potential Future Work
• Deploy as an online classifier (REST API or streaming).
• Compare with deep learning models (1D CNN / LSTM on flow sequences).
• Enhance adversarial robustness and generalization to new VPN protocols.
4. Press **Cmd + S** to save.
(If you want, we can tweak the accuracy numbers later to match exactly what your notebook prints.)
---
## 3️⃣ (Optional but recommended) Add SHAP screenshot to `reports/`
If you want a SHAP image in the repo:
1. Open your notebook in VS Code (`notebooks/01_exploration.ipynb`).
2. Scroll to the SHAP summary plot.
3. Take a screenshot of the plot:
- Press **Cmd + Shift + 4** and drag around just the SHAP figure.
- It’ll save to your Desktop as something like `Screenshot ... .png`.
4. In **Finder**, open that screenshot on Desktop and:
- Rename it to `shap_summary_rf.png`.
- Drag it into the **`reports`** folder inside `encrypted-traffic-classification` (in Finder).
We’ll reference it in README later if you like.
---
## 4️⃣ Commit & push the changes
Back in **Terminal** (already in the project folder):
```bash
git status