A comprehensive machine learning pipeline that predicts whether an individual's income exceeds $50K/year based on census data. Built with Python and deployed as a web application using Flask.
ML.Project.-.Made.with.Clipchamp.1.1.1.mp4
Adult-Income-Prediction-ML/
├── 📄 app.py # Flask web application
├── 📊 adult.csv # Dataset
├── 📓 ML_project_live_class.ipynb # Jupyter notebook for analysis
├── 📝 problem_statement.txt # Project requirements
├── 📚 readme.md # Project documentation
├── 📋 requirements.txt # Python dependencies
├── ⚙️ setup.py # Package setup
├── 📂 artifacts/ # Generated model artifacts
│ ├── 📥 data_ingestion/
│ ├── 🔄 data_transformation/
│ └── 🤖 model_trainer/
├── 🌐 env/ # Virtual environment
├── 📝 logs/ # Application logs
├── 📓 notebook/ # Jupyter notebooks
│ └── 📊 data/
├── 🔧 src/ # Source code
└── 🎨 templates/ # HTML templates
- Python 3.7+
- pip (Python package manager)
- Git (for cloning the repository)
-
Clone the repository
git clone <repository-url> cd Adult-Income-Prediction-ML
-
Create a virtual environment
python -m venv env
-
Activate the environment
Windows:
.\env\Scripts\activate
Linux/Mac:
source env/bin/activate -
Install dependencies
pip install -r requirements.txt
Start the Flask web application:
python app.pyNavigate to http://localhost:5000 in your browser to use the prediction interface.
For interactive data exploration and model development:
jupyter notebook ML_project_live_class.ipynb- Loads and validates the raw census data (
adult.csv) - Handles missing values and data quality checks
- Splits data into training and testing sets
- Feature engineering and preprocessing
- Categorical variable encoding
- Feature scaling and normalization
- Data pipeline creation
- Multiple algorithm evaluation
- Hyperparameter tuning
- Model selection and validation
- Performance metrics calculation
- Flask web application
- User-friendly prediction interface
- Real-time prediction capabilities
- Input validation and error handling
The model uses the following features for prediction:
- Age
- Work Class
- Education Level
- Marital Status
- Occupation
- Relationship
- Race
- Sex
- Capital Gain/Loss
- Hours per Week
- Native Country
The trained model achieves:
- Accuracy: High prediction accuracy on test data
- Precision: Reliable positive predictions
- Recall: Good coverage of actual positive cases
- F1-Score: Balanced performance metric
Comprehensive logging system:
- Location:
logs/directory - Features: Error tracking, performance monitoring, debugging information
- Format: Structured logs with timestamps and severity levels
# Install in development mode
pip install -e .
# Run tests
python -m pytest
# Check code quality
flake8 src/- Create feature branch
- Implement changes
- Add tests
- Update documentation
- Submit pull request
We welcome contributions! Please follow these steps:
- Fork the repository
- Create a feature branch (
git checkout -b feature/amazing-feature) - Commit your changes (
git commit -m 'Add amazing feature') - Push to the branch (
git push origin feature/amazing-feature) - Open a Pull Request
This project is licensed under the MIT License - see the LICENSE file for details.
Sourav Upadhyay
- Census Bureau for providing the dataset
- Open source community for amazing tools
- Contributors and supporters
⭐ Star this repository if you found it helpful! ⭐