🖊 The NLP group project of the 2024 Data Science Winter School at Imperial College London, which I and two other team members completed together.
🏆 Fortunately, our project was awarded the BEST PERFORMANCE by the judges finally.
⚠ Since it was sorted out later, there are many gaps in the code, which is only for the reference of colleagues who will participate in the camp in the future.
Recommended environment
- Python 3.7 or newer
- Free disk space: 100GB
Download the data
# navigate to the data folder
cd data
# download the data file
# which is also available at https://www.semanticscholar.org/cord19/download
wget https://ai2-semanticscholar-cord-19.s3-us-west-2.amazonaws.com/2021-07-26/document_parses.tar.gz
# decompress the file which may take several minutes
tar -xf document_parses.tar.gz
# which creates a folder named document_parsesFor more information about the dataset: https://www.semanticscholar.org/cord19/download