Skip to content

A hybrid machine learning & deep learning pipeline for predicting heartdisease from clinical data. Includes EDA,preprocessing with ColumnTransformer, models (Logistic Regression, Random Forest, XGBoost) and a Keras ANN. Evaluation covers ROC-AUC, Precision-Recall, calibration, with SHAP explainability ensuring transparency and trust in healthcareAI

License

Notifications You must be signed in to change notification settings

RohitMukkala/Heart-Disease-Prediction-Model

Repository files navigation

πŸ«€ Heart Disease Prediction using Machine Learning & Deep Learning

Python Scikit-learn TensorFlow License: MIT

πŸ“Œ Description

Heart disease remains one of the leading causes of mortality worldwide. Early detection can significantly improve patient outcomes by enabling timely diagnosis and preventive measures.

This project builds two complementary approaches to predict heart disease:

  1. Classical Machine Learning models (Logistic Regression, Random Forest, XGBoost) with explainability and calibration.
  2. Deep Learning model (Keras ANN) using TensorFlow with data balancing (SMOTE) and EarlyStopping.

The combination of ML and DL allows us to compare interpretability vs. flexibility in predictive modeling.


πŸš€ Key Features

πŸ”Ή Machine Learning Pipeline

  • Dataset Preprocessing with Pipeline and ColumnTransformer
  • Imbalanced Data Handling (Stratified split, CV evaluation)
  • Model Training & Comparison: Logistic Regression, Random Forest, and optional XGBoost
  • Robust Evaluation: ROC-AUC, F1-score, Precision-Recall curves
  • Explainability: SHAP feature importance and local explanations
  • Model Calibration for reliable probability predictions
  • Artifacts Saved: trained pipeline + test predictions

πŸ”Ή Deep Learning Pipeline (Keras ANN)

  • StandardScaler preprocessing
  • SMOTE applied for class imbalance
  • Model Architecture: Multi-layer Perceptron (Dense layers with ReLU + Sigmoid)
  • Initialization: He-uniform for better convergence
  • Regularization: EarlyStopping to avoid overfitting
  • Evaluation: Confusion Matrix, Classification Report, ROC Curve, AUC

πŸ—οΈ Architecture Overview

graph TD
    A[Data] --> B[Preprocessing]

    subgraph "Classical ML"
        B --> C1[Logistic Regression]
        B --> C2[Random Forest]
        B --> C3[XGBoost]
    end

    subgraph "Deep Learning - Keras"
        B --> D["ANN (Dense Layers)"]
    end

    C1 --> E[Evaluation]
    C2 --> E[Evaluation]
    C3 --> E[Evaluation]
    D --> E[Evaluation]

    E --> F["Explainability (ML only)"]
    E --> G[Deployment Artifacts]
Loading

βš™οΈ Tech Stack / Tools

  • Python (NumPy, Pandas, Matplotlib, Seaborn)
  • Scikit-learn (Pipelines, Logistic Regression, Random Forest)
  • TensorFlow / Keras (ANN model)
  • Imbalanced-learn (SMOTE for handling imbalance)
  • Joblib (Model saving)
  • SHAP (Interpretability, optional)
  • Jupyter Notebook (Experiments & documentation)

πŸ“Š Dataset Information

Preprocessing:

  • Median imputation for numeric features (ML pipeline)
  • Most frequent imputation for categorical (ML pipeline)
  • Standard scaling (numeric)
  • One-hot encoding (categorical, ML pipeline)
  • SMOTE oversampling (DL pipeline)

πŸ› οΈ Installation & Requirements

git clone <your-repo-url>
cd HeartDiseasePrediction
pip install -r requirements.txt

Dependencies

  • numpy
  • pandas
  • scikit-learn
  • matplotlib
  • seaborn
  • shap (optional)
  • joblib
  • tensorflow
  • imbalanced-learn

▢️ Usage Instructions

πŸ”Ή Run Classical ML Pipeline

jupyter notebook HeartDiseasePrediction.ipynb

πŸ”Ή Run Deep Learning Model

jupyter notebook Heart_Disease.ipynb

Example Inference with ML Saved Model

import joblib
model = joblib.load("artifacts/best_pipeline_logreg.joblib")
pred = model.predict(new_data)  # new_data must be preprocessed format

πŸ“ˆ Results & Performance

Model Accuracy Precision Recall F1 ROC-AUC
Logistic Regression 86.9% 85.7% 90.9% 88.2% 0.91
Random Forest (evaluated, not selected) β€” β€” β€” β€”
XGBoost (optional) (evaluated, not selected) β€” β€” β€” β€”

πŸ”Ή Deep Learning Model (ANN)

Model Accuracy Precision Recall F1 ROC-AUC
ANN (Keras) 85.3% 85.0% 86.4% 85.7% 0.91

Visualizations:

  • Class balance
  • Confusion Matrix (ML & DL)
  • ROC Curves
  • Precision-Recall Curves
  • Calibration Plot (ML only)

🌍 Applications & Impact

  • Assist healthcare professionals in early screening of heart disease.
  • Can be integrated into clinical decision support systems.
  • Demonstrates the tradeoff between explainability (ML) and flexibility (DL).

⚠️ Limitations & Future Work

  • Small dataset size (303 samples).
  • External validation needed on real-world clinical data.
  • Deep learning model can benefit from more data + hyperparameter tuning.
  • Explore 1D CNNs or hybrid ensemble models.
  • Potential for mobile deployment & real-time monitoring.

πŸ“‚ Project Structure

HeartDiseasePrediction/
│── HeartDiseasePrediction.ipynb   # Classical ML models
│── Heart_Disease.ipynb            # Deep Learning (Keras ANN)
│── README.md
│── LICENSE

🀝 Contributing

Pull requests are welcome! For major changes, please open an issue first.


πŸ“œ License

MIT License


πŸ™ Acknowledgments

About

A hybrid machine learning & deep learning pipeline for predicting heartdisease from clinical data. Includes EDA,preprocessing with ColumnTransformer, models (Logistic Regression, Random Forest, XGBoost) and a Keras ANN. Evaluation covers ROC-AUC, Precision-Recall, calibration, with SHAP explainability ensuring transparency and trust in healthcareAI

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published