Skip to content

BhaveshBhakta/Salary-Prediction-Using-ML

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 

Repository files navigation

Salary Prediction Using ML

Project Overview

This project aims to predict an individual's salary based on a dataset of job-related and demographic features. By analyzing factors such as age, gender, education level, job title, and years of experience, the goal is to develop a regression model that can accurately estimate an individual's salary. This is a valuable tool for career planning, salary negotiation, and labor market analysis.


Technical Highlights

  • Dataset: A dataset named Salary Data.csv is used, which contains salary information and various employee attributes.
  • Size: 375 entries, 6 columns.
  • Key Features:
    • Age, Gender, Education Level, Job Title, Years of Experience.
  • Approach:
    • Data Cleaning: The code handles missing values by dropping rows with NaN. It also removes duplicate entries. Additionally, it drops Age, Gender, and Job Title columns, which is a significant reduction of features.
    • Exploratory Data Analysis: Histograms and box plots were used to visualize the distribution of numerical features and their relationship with salary. Count plots were also used for categorical features.
    • Label Encoding: Applied to the Education Level column to convert it into a numerical format.
    • Regression Task: The target variable is Salary.
    • Models Used:
      • A suite of regression models were trained, including Linear Regression, Ridge, XGBoost, Random Forest, AdaBoost, Gradient Boosting, and Bagging.
  • Best R² Score:
    • 0.901 with Linear Regression and Ridge Regressor.
    • 0.892 with Gradient Boosting Regressor.
    • The high R² scores indicate that the models are highly effective at predicting salary based on the chosen features.

Purpose and Applications

  • Accurate Salary Forecasting: Enables individuals to estimate their potential earnings based on their experience and education.
  • Recruitment and Compensation: Assists companies in setting competitive salary ranges for different roles.
  • Career Planning: Provides insights into how education and experience levels correlate with income.
  • Labor Market Analysis: Supports data-driven research on salary trends and economic factors.

Installation

Install the necessary libraries:

pip install pandas numpy seaborn matplotlib scikit-learn xgboost

Collaboration

We welcome contributions to improve the project. You can help by:

  • Re-evaluating the feature selection process, as dropping Age, Gender, and Job Title may remove valuable predictive information.
  • Exploring more robust methods for handling missing values and duplicates.
  • Performing comprehensive hyperparameter tuning and cross-validation for all regression models to maximize predictive performance.
  • Adding explainability (e.g., SHAP or LIME) to understand which factors are the most significant drivers of salary.