This is an analysis of data sourced from Lingbuzz, a repository for preprints in the field of linguistics. As of August 2023, Lingbuzz hosts around 7500 manuscripts with information about their authors, keywords, place of publication, number of downloads and abstracts. This repository hosts an analysis of this dataset. The data was collected as a csv file by using lingbuzz_scraper; the full csv file can be downloaded from here.
The analysis was performed on a Jupyter Notebook. You can see it here.
The notebook includes the following topics:
- Most downloaded manuscripts
- Authors with the most manuscripts in the repository
- Number of downloads per author
- Most frequent keywords
- Trends in subdisciplines over time
- Collaborations and co-authorship networks
Suggestions on improvements or further topics to explore are very welcome!