Skip to content

hconyeka/cicddos2019-preprocessing

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 

Repository files navigation

CICDDoS2019 — Exploitative Attack Data Preprocessing & Analysis

A reproducible notebook for preprocessing exploitative attack data from the CICDDoS2019 dataset.
This repo streamlines how I handle, clean, and structure this dataset for downstream ML/DL experiments on DDoS detection.

Highlights

  • Loads CICDDoS2019 exploitative attack flows
  • Cleans feature set, encodes categorical variables
  • Balances class distribution with stratified splits
  • Provides baseline EDA plots and class stats
  • Exports clean CSV for training anomaly detection models

Notebook: notebooks/Preprocessing_Exploitative_Attack_Data_from_the_CICDDOS2019_Dataset.ipynb

Dataset

  • Source: CICDDoS2019 (Canadian Institute for Cybersecurity)
  • Download the original dataset from CIC and place CSVs in data/
  • Update notebook paths if needed

Environment (Python)

python -m venv .venv
source .venv/bin/activate   # Windows: .venv\Scripts\activate
pip install -r requirements.txt
# Launch Jupyter
pip install notebook
jupyter notebook

Requirements

  • pandas
  • numpy
  • scikit-learn
  • matplotlib

Project Structure

cicddos2019-preprocessing/
├─ notebooks/
│  └─ Preprocessing_Exploitative_Attack_Data_from_the_CICDDOS2019_Dataset.ipynb
├─ docs/
├─ scripts/
├─ .gitignore
├─ LICENSE
├─ README.md
└─ requirements.txt

Roadmap

  • Add helper script to preprocess without Jupyter
  • Add visualizations for class imbalance
  • Add ML baselines on processed data

License

MIT © 2025 Henry Onyeka

About

Data Preprocessing, Feature Selection and Data Analysis: Exploitative Data from the CIC-DDoS2019 Dataset

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published