This project is a Web Application that predicts the probability of diabetes based on diagnostic measures.
It is based on a Comparative Analysis of Naive Bayes classifiers, where Gaussian Naive Bayes was identified as the most accurate model (90.48% Accuracy) for this dataset.
We compared three variants of Naive Bayes:
- Gaussian NB: Best for continuous features (Glucose, BMI). (Selected Model)
- Bernoulli NB: Best for binary features.
- Multinomial NB: Best for count data.
| Model | Accuracy |
|---|---|
| Gaussian NB | 90.48% |
| Bernoulli NB | ~88% |
| Multinomial NB | ~76% |
- Clone the repository:
git clone [https://github.com/Muhammad-Shahan/Diabetes-Risk-Prediction.git](https://github.com/Muhammad-Shahan/Diabetes-Risk-Prediction.git)
- Install dependencies:
pip install -r requirements.txt
- Run the App:
streamlit run app.py
app.py: The main Streamlit interface.train_model.py: Script used to train and save the model.analysis.ipynb: Jupyter Notebook containing the research and EDA.diabetes_model.pkl: The trained GaussianNB model file.diabetes_prediction_dataset.csv: The dataset used for training.
If you find this useful, please star the repo!