This project builds a machine learning model to predict individual healthcare charges using demographic and behavioral data. The model is trained and evaluated using Random Forest Regressor with hyperparameter tuning via GridSearchCV.
- Python (pandas, scikit-learn, seaborn, matplotlib)
- Jupyter Notebook
- Streamlit (for optional dashboard)
Healthcare_Cost_Prediction.ipynb: Complete analysis, model training, and evaluationdata.csv: Sample dataset (or instructions to load it)requirements.txt: Python dependencies
- Best model RMSE: ~$4,800
- RΒ² Score: ~0.82
- Major cost drivers: Smoking status, BMI, Age
Clone the repository and open the notebook with Jupyter:
git clone https://github.com/Mekusgood/healthcare-cost-prediction.git
cd healthcare-cost-prediction
jupyter lab