This project aims to predict the salary of developers using the Stack Overflow Developer Survey 2022 dataset. The prediction model is built using the Gradient Boosting Regressor (GBR) algorithm.
- Introduction
- Dataset
- Libraries and Tools
- Data Preprocessing
- Model Training
- Evaluation
- Results
- Usage
- Contributing
- License
In this project, we use the Stack Overflow Developer Survey 2022 dataset to predict developer salaries based on various features such as experience, location, education, etc. The Gradient Boosting Regressor (GBR) model is employed for this task due to its robustness and accuracy in regression problems.
The dataset used in this project is the Stack Overflow Developer Survey 2022 dataset, which contains information about developers' demographics, education, experience, and salaries. The dataset can be downloaded from the Stack Overflow Developer Survey website.
The following libraries and tools are used in this project:
- Python 3.x
- Jupyter Notebook
- Pandas
- NumPy
- Scikit-learn
- Matplotlib
- Seaborn
Data preprocessing steps include:
- Handling missing values
- Encoding categorical variables
- Feature scaling
- Train-test split
The Gradient Boosting Regressor (GBR) model is trained on the preprocessed dataset. Hyperparameter tuning is performed to optimize the model's performance. The key steps involved in model training are:
- Importing the necessary libraries and dataset
- Data cleaning and preprocessing
- Splitting the dataset into training and testing sets
- Training the GBR model
- Hyperparameter tuning using Grid Search
The model's performance is evaluated using various metrics such as Mean Absolute Error (MAE), Mean Squared Error (MSE), and R-squared score. The evaluation includes:
- Calculating the performance metrics on the test set
- Visualizing the actual vs predicted salaries
- Analyzing feature importance
The results of the model are analyzed and visualized to understand its performance and the importance of different features in predicting developer salaries. Key findings and visualizations include:
- The performance metrics of the model
- Feature importance plot
- Actual vs predicted salary plot
To use this notebook, follow these steps:
- Clone the repository:
git clone https://github.com/AmirHosseinSoleymani/Pred-Developer-Salary-With-GBR-model-.git
- Navigate to the project directory:
cd Pred-Developer-Salary-With-GBR-model- - Open the Jupyter Notebook:
jupyter notebook programmer-salaray-pred-stackoverflowdataset.ipynb
- Follow the instructions in the notebook to run the code and reproduce the results.
Contributions are welcome! Please feel free to submit a Pull Request if you have any improvements or suggestions.
This project is licensed under the MIT License. See the LICENSE file for more details.
We would like to thank Stack Overflow for providing the dataset used in this project.
Feel free to customize the above template further based on the specific details and findings in your Jupyter notebook.