Python version used: 3.8.7
Packages used:
- Pandas 1.2.3
- Matplotlib 3.4.1
It's important to ensure two folders exist in this repo's folder on your machine when you run the analysis.
- output: plots will be saved by the notebook to this folder
- datasets: notebook will search for the survey datasets in this folder during import at the beginning of the notebook.
After examining Stackoverflow's annual survey data, I found that 2018, 2019 and 2020 were years during which they asked developers about languages they knew and planned to learn. These responses can be made into useful indicators of language repertoires which developers feel they need to succeed in their fields.
And that, in turn, can help companies understand where talent in certain langauges tend to reside, making recruiting decisions smarter.
With this project, I took a step into that space by answering three questions about developer talent in the US and India:
- Which languages gained and lost in popularity between 2018 and 2020? (became easier and harder to find in each market)
- How multilingual are programmers in the different programming fields studied with the survey? (how many languages do devs tend to know in each field)
- If many responses in a year indicate a desire to learn a particular language in coming years, is it a meaningful sign that the language will become significantly more popular in the region in those years?
stackoverflowsurvey_2018_2020.ipynb is the only file included. All analyses take place in that notebook. All necessary custom functions are defined in that notebook, as well.
The analyses use two datasets created by Stackoverflow via surveys they conducted in 2018 and 2020.
- Master list of survey datasets at Stackoverflow
- Direct link to 2018 dataset
- Direct link to 2020 dataset
Links to Stackoverflow Survey source datasets also appear above the import statements in that notebook.
The main findings of the code can be found at this Medium post I wrote in April, 2021.
Data collected and made available by StackOverflow. Info regarding the datasets and their licensing can be found on Kaggle. Feel free to use the code provided in this repository at your own discretion.