Project live at: https://healthy-life-webapp.azurewebsites.net/
A machine learning project that predicts life expectancy based on Finnish regional health and demographic data (2013-2021).
This project uses XGBoost machine learning to predict life expectancy based on lifestyle, health, and socioeconomic factors. It includes:
- Data Analysis Pipeline - Jupyter notebooks for data processing and model training
- Web Application - Full-stack web app for interactive predictions
- ML Model - Trained XGBoost model with 34 features
healthy-life/
├── healthy-life-webapp/ # Production Web Application
│ ├── backend/ # Flask API + ML model
│ ├── frontend/ # React + TypeScript UI
│ └── README.md # Full webapp documentation
│
├── notebooks/ # Data Science & Development
│ ├── analysis.ipynb # Model training & evaluation
│ ├── data_processing.ipynb
│ └── import_data.ipynb
│
└── regional_data/ # Raw Data Sources
├── thl/ # Finnish Institute for Health and Welfare
└── tilastokeskus/ # Statistics Finland
cd healthy-life-webapp
docker-compose up -dVisit http://localhost to use the application.
See healthy-life-webapp/README.md for full documentation.
# Activate virtual environment
source .venv/bin/activate
# Start Jupyter
cd notebooks
jupyter notebookOpen analysis.ipynb to see the model training process.
- THL (Finnish Institute for Health and Welfare): Health indicators, lifestyle factors
- Statistics Finland (Tilastokeskus): Demographics, socioeconomics, education
See notebooks/data_sources.txt for direct links.
- Algorithm: XGBoost (Extreme Gradient Boosting)
- Features: 34 features including income, education, smoking, alcohol, exercise, mental health
- Target: Life expectancy (years)
- Backend: Flask, Python 3.11, XGBoost, scikit-learn
- Frontend: React 18, TypeScript, Vite, Tailwind CSS
- Deployment: Docker, Docker Compose, Nginx
- Analysis: Jupyter, pandas, numpy, matplotlib, seaborn
- ML: XGBoost, scikit-learn
- Data: CSV files from Finnish government sources
- webapp README - Web application setup and usage
- DOCKER_GUIDE - Docker deployment instructions
- FEATURE_CONVERSION_GUIDE - Technical explanation of feature engineering
This tool provides statistical estimates based on population data and should not be considered medical advice. Individual health outcomes vary significantly based on many factors not captured in this model. Always consult with healthcare professionals for personal health decisions.
The model is trained on Finnish population data (2013-2021) and predictions are most accurate for populations with similar demographics and healthcare systems.
This is an educational project demonstrating machine learning applications in public health. It aims to raise awareness about factors that influence longevity.
- Data Sources:
Built with Python, React, XGBoost, and data from Finnish health authorities
For questions or issues, see the documentation in healthy-life-webapp/ directory.