Code and datasets of thesis "Classification of Cyber-Security Requirements based on open datasets and GitHub harvesting"
Models Training and Testing.ipynb- notebook with modelslanguage_detection.py- script that detect language of text using CLD2github_scraper.py- script that harvest issues and repositories from GitHubdata:-
security_terms.csv- list of security terms
-
repositories.csv- table with GitHub repositories and links that we used for harvesting
-
datasets- folder with 4 datasets for models training