Skip to content

AriesCHENFISH/Imperial-College-2024-Data-Science-NLP-Project

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

15 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Introduction

🖊 The NLP group project of the 2024 Data Science Winter School at Imperial College London, which I and two other team members completed together.

🏆 Fortunately, our project was awarded the BEST PERFORMANCE by the judges finally.

⚠ Since it was sorted out later, there are many gaps in the code, which is only for the reference of colleagues who will participate in the camp in the future.

Setup and Prerequisites

Recommended environment

  • Python 3.7 or newer
  • Free disk space: 100GB

Download the data

# navigate to the data folder
cd data

# download the data file
# which is also available at https://www.semanticscholar.org/cord19/download
wget https://ai2-semanticscholar-cord-19.s3-us-west-2.amazonaws.com/2021-07-26/document_parses.tar.gz

# decompress the file which may take several minutes
tar -xf document_parses.tar.gz

# which creates a folder named document_parses

For more information about the dataset: https://www.semanticscholar.org/cord19/download

About

The NLP group project of the 2024 Data Science Winter School at Imperial College London

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published