Machine Learning-Based Predictions of Postoperative Outcomes in Adult Male Circumcision

Project Overview

This repository contains the full data science pipeline for preprocessing, modeling, evaluating, and explaining clinical outcomes related to laser circumcision procedures. It focuses specifically on predicting the Bleeding_Edema_Outcome complication using multiple supervised learning approaches. The workflow includes data cleaning, feature engineering, model training with different sampling strategies, evaluation, and SHAP-based explainability.

Project Structure

circ_milan/
├── assets/                         # Slide decks and static visuals
│   ├── CUT_MD.svg
│   └── my_slides.html
├── data/                           # Datasets at different stages
│   ├── external/                   # Original source files
│   ├── raw/                        # Raw ingested data
│   │   └── Laser_Circumcision_Excel_31.03.2024.xlsx
│   ├── interim/                    # Intermediate cleaned files
│   └── processed/                  # Final data for modeling
│       ├── training/               # Training features and labels
│       │   ├── X.parquet
│       │   └── y_Bleeding_Edema_Outcome.parquet
│       └── inference/              # Inference features and outputs
│           ├── df_inference_process.parquet
│           └── X.parquet
├── images/                         # Exported plots and figures
│   └── figures/
├── mlruns/                         # MLflow tracking server backend logs
├── preprocessing/                  # Data cleaning & feature engineering
│   ├── __init__.py
│   ├── preprocessing.py            # Cleans raw data and saves interim/processed
│   └── feat_gen.py                 # Generates model-ready feature sets
├── modeling/                       # Modeling & explainability scripts
│   ├── __init__.py
│   ├── train.py                    # Train LR, RF, SVM with sampling pipelines
│   ├── evaluation.py               # Evaluate model performance
│   ├── explainer.py                # Select best model & build SHAP explainer
│   ├── explanations_training.py    # Compute SHAP values on training data
│   ├── explanations_inference.py   # Compute SHAP values on inference data
│   └── predict.py                  # Run production predictions
├── models/                         # Stored model artifacts & metrics
│   ├── results/                    # Logs & metrics per outcome
│   │   └── Bleeding_Edema_Outcome/
│   └── eval/                       # Evaluation reports per outcome
│       └── Bleeding_Edema_Outcome/
├── notebooks/                      # Jupyter notebooks for analysis & reporting
│   ├── circ_milan_eda.ipynb
│   ├── circ_milan_model_artifacts_dash.ipynb
│   ├── circ_milan_model_results.ipynb
│   ├── circ_milan_model_explanations.ipynb
│   └── post_modeling_eda.ipynb
├── unittests/                      # Unit tests for core modules
├── config.py                       # Central configuration settings
├── constants.py                    # Global constants
├── functions.py                    # General helper functions
├── project_functions.py            # Project-specific utilities
├── requirements.txt                # Python dependencies
├── setup.py                        # Packaging/install script
├── Makefile                        # Automates setup, training, evaluation, inference
└── README.md                       # Project overview and usage instructions

Installation

Clone the repo

git clone https://github.com/your-username/circ_milan.git
cd circ_milan

Create environment

Conda:

conda create -n conda_circ_311 python=3.11
conda activate conda_circ_311

venv:

python -m venv venv_circ_311
source venv_circ_311/bin/activate

Install dependencies
```
pip install -r requirements.txt
```

Makefile Commands

Command	Description
`make create_venv`	Create a virtual environment
`make requirements`	Install dependencies
`make preproc_pipeline`	Run preprocessing + feature generation for training
`make train_all_models`	Train LR, RF, and SVM models
`make eval_all_models`	Evaluate all trained models
`make preproc_train_eval`	Full pipeline: preprocessing → training → evaluation
`make model_explaining_training`	Run SHAP explainability on training data
`make preproc_pipeline_inf`	Run preprocessing + feature generation for inference
`make predict`	Run inference and output predictions
`make mlflow_ui`	Launch MLflow UI on port 5501

To list available commands:

make help

Modeling Details

Outcome: Bleeding_Edema_Outcome
Sampling Pipelines:
- orig (original)
- smote (Synthetic Minority Oversampling)
- over (Random Oversampling)
Models:
- Logistic Regression (lr)
- Random Forest (rf)
- Support Vector Machine (svm)
Metric: average_precision
Explainability: SHAP feature attributions via explainer.py

MLflow Tracking

All runs, parameters, and metrics are tracked with MLflow.

Launch UI:

make mlflow_ui

Notebooks

circ_milan_eda.ipynb – Exploratory Data Analysis
circ_milan_model_results.ipynb – Model performance visuals
circ_milan_model_explanations.ipynb – SHAP visualizations
post_modeling_eda.ipynb – Further diagnostics

Notes

SHAP outputs and model artifacts are in data/processed/ and models/
Inference predictions are saved to
./data/processed/inference/predictions_Bleeding_Edema_Outcome.csv

Reproducibility

Run the full pipeline with:

make preproc_train_eval

Authors & Contacts

Leonid Shpaner, M.S., Data Scientist | Adjunct Professor
Giuseppe Saitta, M.D., Medical Consultant (data provider and clinical insights)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Machine Learning-Based Predictions of Postoperative Outcomes in Adult Male Circumcision

Table of Contents

Project Overview

Project Structure

Installation

Makefile Commands

Modeling Details

MLflow Tracking

Notebooks

Notes

Reproducibility

Authors & Contacts

License

License

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 51 Commits
assets		assets
modeling		modeling
notebooks		notebooks
preprocessing		preprocessing
unittests		unittests
.gitattributes		.gitattributes
.gitignore		.gitignore
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
config.py		config.py
constants.py		constants.py
functions.py		functions.py
project_functions.py		project_functions.py
requirements.txt		requirements.txt
setup.py		setup.py

License

lshpaner/circ_milan

Folders and files

Latest commit

History

Repository files navigation

Machine Learning-Based Predictions of Postoperative Outcomes in Adult Male Circumcision

Table of Contents

Project Overview

Project Structure

Installation

Makefile Commands

Modeling Details

MLflow Tracking

Notebooks

Notes

Reproducibility

Authors & Contacts

License

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages