Skip to content

This project aims to predict the salary of developers using the Stack Overflow Developer Survey 2022 dataset. The prediction model is built using the Gradient Boosting Regressor (GBR) algorithm.

License

Notifications You must be signed in to change notification settings

AmirHosseinSoleymani/Pred-Developer-Salary-With-GBR-model-

Repository files navigation

Predict Developer Salary Using Gradient Boosting Regressor

This project aims to predict the salary of developers using the Stack Overflow Developer Survey 2022 dataset. The prediction model is built using the Gradient Boosting Regressor (GBR) algorithm.

Table of Contents

Introduction

In this project, we use the Stack Overflow Developer Survey 2022 dataset to predict developer salaries based on various features such as experience, location, education, etc. The Gradient Boosting Regressor (GBR) model is employed for this task due to its robustness and accuracy in regression problems.

Dataset

The dataset used in this project is the Stack Overflow Developer Survey 2022 dataset, which contains information about developers' demographics, education, experience, and salaries. The dataset can be downloaded from the Stack Overflow Developer Survey website.

Libraries and Tools

The following libraries and tools are used in this project:

  • Python 3.x
  • Jupyter Notebook
  • Pandas
  • NumPy
  • Scikit-learn
  • Matplotlib
  • Seaborn

Data Preprocessing

Data preprocessing steps include:

  • Handling missing values
  • Encoding categorical variables
  • Feature scaling
  • Train-test split

Model Training

The Gradient Boosting Regressor (GBR) model is trained on the preprocessed dataset. Hyperparameter tuning is performed to optimize the model's performance. The key steps involved in model training are:

  1. Importing the necessary libraries and dataset
  2. Data cleaning and preprocessing
  3. Splitting the dataset into training and testing sets
  4. Training the GBR model
  5. Hyperparameter tuning using Grid Search

Evaluation

The model's performance is evaluated using various metrics such as Mean Absolute Error (MAE), Mean Squared Error (MSE), and R-squared score. The evaluation includes:

  • Calculating the performance metrics on the test set
  • Visualizing the actual vs predicted salaries
  • Analyzing feature importance

Results

The results of the model are analyzed and visualized to understand its performance and the importance of different features in predicting developer salaries. Key findings and visualizations include:

  • The performance metrics of the model
  • Feature importance plot
  • Actual vs predicted salary plot

Usage

To use this notebook, follow these steps:

  1. Clone the repository:
    git clone https://github.com/AmirHosseinSoleymani/Pred-Developer-Salary-With-GBR-model-.git
  2. Navigate to the project directory:
    cd Pred-Developer-Salary-With-GBR-model-
  3. Open the Jupyter Notebook:
    jupyter notebook programmer-salaray-pred-stackoverflowdataset.ipynb
  4. Follow the instructions in the notebook to run the code and reproduce the results.

Contributing

Contributions are welcome! Please feel free to submit a Pull Request if you have any improvements or suggestions.

License

This project is licensed under the MIT License. See the LICENSE file for more details.

Acknowledgements

We would like to thank Stack Overflow for providing the dataset used in this project.


Feel free to customize the above template further based on the specific details and findings in your Jupyter notebook.

About

This project aims to predict the salary of developers using the Stack Overflow Developer Survey 2022 dataset. The prediction model is built using the Gradient Boosting Regressor (GBR) algorithm.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published