This is a repo with some small projects on Natural Language Processing using Python.
-
Making Transformers Efficient in Production
Date Created: Oct-2023
Fine-tune BERT for a multiclass classification problem using the Clinc150 dataset. The objective of this notebook is to fine-tune the BERT model with a classification head and further improving the model performance by applying techniques:- Knowledge distillation.
- Model quantization.
-
LinkedIn job posts summarization
Date Created: Oct-2023
Fine-tune BART-Base for sumarization task using the LinkedIn dataset.
The objective of this notebook is to fine-tune the encoder-decoder model BART to generate the jobs' titles based on their description. Some key features of this notebook include:- Preprocessing of job titles (missing values, duplicates and handling of special characters)
- Quick EDA (char and word counts)
- Training using mixed precision fp16 and 8-bit Adam optimizer to optimize GPU memory consumption and training time.
- Evaluation with ROUGE metric.
-
Sentiment Analysis RoBERTa vs VADER
Date Created: Jan-2023
Comparisson on Amazon reviews between NLTK VADER and Twitter-RoBERTa-sentiment. -
BART infilling masking scheme
Date Created: Sep-2023
Custom Data Collator based on the original facebook/BART article.
-
Recurrent Neural Nets from scratch
Date Created: Oct-2023
Based on the book by Jeremy Howard - [Deep Learning for Coders with FastAI and PyTorch (Book Link). This is a test notebook to implement RNNs from scratch using PyTorch components to create basic language models that predict the next token of a sequence based on the provided context. Nothing fancy. -
Bag of Words
Date Created: Dec-2022
First contact with NLP: Testing a simple linear SVC (SVM with 'linear' kernel) classifier with BoW (Bag of words) and Spacy's word vectors.
- Efficient Deep Learning Models in Production
Course offered by MIT HAN Lab about methods to reduce computational costs of deeplearning models on production stage.