Natural Language Processing

This is a repo with some small projects on Natural Language Processing using Python.

Featured Notebooks

Making Transformers Efficient in Production
Date Created: Oct-2023
Fine-tune BERT for a multiclass classification problem using the Clinc150 dataset. The objective of this notebook is to fine-tune the BERT model with a classification head and further improving the model performance by applying techniques:
- Knowledge distillation.
- Model quantization.
LinkedIn job posts summarization
Date Created: Oct-2023
Fine-tune BART-Base for sumarization task using the LinkedIn dataset.
The objective of this notebook is to fine-tune the encoder-decoder model BART to generate the jobs' titles based on their description. Some key features of this notebook include:
- Preprocessing of job titles (missing values, duplicates and handling of special characters)
- Quick EDA (char and word counts)
- Training using mixed precision fp16 and 8-bit Adam optimizer to optimize GPU memory consumption and training time.
- Evaluation with ROUGE metric.
Sentiment Analysis RoBERTa vs VADER
Date Created: Jan-2023
Comparisson on Amazon reviews between NLTK VADER and Twitter-RoBERTa-sentiment.
BART infilling masking scheme
Date Created: Sep-2023
Custom Data Collator based on the original facebook/BART article.

Recurrent Neural Nets from scratch
Date Created: Oct-2023
Based on the book by Jeremy Howard - [Deep Learning for Coders with FastAI and PyTorch (Book Link). This is a test notebook to implement RNNs from scratch using PyTorch components to create basic language models that predict the next token of a sequence based on the provided context. Nothing fancy.
Bag of Words
Date Created: Dec-2022
First contact with NLP: Testing a simple linear SVC (SVM with 'linear' kernel) classifier with BoW (Bag of words) and Spacy's word vectors.

Efficient Deep Learning Models in Production
Course offered by MIT HAN Lab about methods to reduce computational costs of deeplearning models on production stage.

Name		Name	Last commit message	Last commit date
Latest commit History 32 Commits
.gitignore		.gitignore
DataCollatorForInfillingMask.ipynb		DataCollatorForInfillingMask.ipynb
README.md		README.md
RNN_from_scratch.ipynb		RNN_from_scratch.ipynb
amazon-reviews-roberta-vader.ipynb		amazon-reviews-roberta-vader.ipynb
efficient-transformers-clinc150.ipynb		efficient-transformers-clinc150.ipynb
knowledge-distilation-pure-pytorch.ipynb		knowledge-distilation-pure-pytorch.ipynb
linkedin-jobs-summarization-bart.ipynb		linkedin-jobs-summarization-bart.ipynb
simple-bow-classifier.ipynb		simple-bow-classifier.ipynb