Skip to content

brenoingwersen/natural-language-processing

Repository files navigation

Natural Language Processing

This is a repo with some small projects on Natural Language Processing using Python.

Featured Notebooks

  • Making Transformers Efficient in Production
    Date Created: Oct-2023
    Fine-tune BERT for a multiclass classification problem using the Clinc150 dataset. The objective of this notebook is to fine-tune the BERT model with a classification head and further improving the model performance by applying techniques:

    • Knowledge distillation.
    • Model quantization.
  • LinkedIn job posts summarization
    Date Created: Oct-2023
    Fine-tune BART-Base for sumarization task using the LinkedIn dataset.
    The objective of this notebook is to fine-tune the encoder-decoder model BART to generate the jobs' titles based on their description. Some key features of this notebook include:

    • Preprocessing of job titles (missing values, duplicates and handling of special characters)
    • Quick EDA (char and word counts)
    • Training using mixed precision fp16 and 8-bit Adam optimizer to optimize GPU memory consumption and training time.
    • Evaluation with ROUGE metric.
  • Sentiment Analysis RoBERTa vs VADER
    Date Created: Jan-2023
    Comparisson on Amazon reviews between NLTK VADER and Twitter-RoBERTa-sentiment.

  • BART infilling masking scheme
    Date Created: Sep-2023
    Custom Data Collator based on the original facebook/BART article.

Testing stuff

  • Recurrent Neural Nets from scratch
    Date Created: Oct-2023
    Based on the book by Jeremy Howard - [Deep Learning for Coders with FastAI and PyTorch (Book Link). This is a test notebook to implement RNNs from scratch using PyTorch components to create basic language models that predict the next token of a sequence based on the provided context. Nothing fancy.

  • Bag of Words
    Date Created: Dec-2022
    First contact with NLP: Testing a simple linear SVC (SVM with 'linear' kernel) classifier with BoW (Bag of words) and Spacy's word vectors.

Interesting Links

Free NLP Courses

About

Some notebooks on NLP

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published