A reproducible notebook for preprocessing exploitative attack data from the CICDDoS2019 dataset.
This repo streamlines how I handle, clean, and structure this dataset for downstream ML/DL experiments on DDoS detection.
- Loads CICDDoS2019 exploitative attack flows
- Cleans feature set, encodes categorical variables
- Balances class distribution with stratified splits
- Provides baseline EDA plots and class stats
- Exports clean CSV for training anomaly detection models
Notebook:
notebooks/Preprocessing_Exploitative_Attack_Data_from_the_CICDDOS2019_Dataset.ipynb
- Source: CICDDoS2019 (Canadian Institute for Cybersecurity)
- Download the original dataset from CIC and place CSVs in
data/ - Update notebook paths if needed
python -m venv .venv
source .venv/bin/activate # Windows: .venv\Scripts\activate
pip install -r requirements.txt
# Launch Jupyter
pip install notebook
jupyter notebook- pandas
- numpy
- scikit-learn
- matplotlib
cicddos2019-preprocessing/
├─ notebooks/
│ └─ Preprocessing_Exploitative_Attack_Data_from_the_CICDDOS2019_Dataset.ipynb
├─ docs/
├─ scripts/
├─ .gitignore
├─ LICENSE
├─ README.md
└─ requirements.txt
- Add helper script to preprocess without Jupyter
- Add visualizations for class imbalance
- Add ML baselines on processed data
MIT © 2025 Henry Onyeka