This repository contains a collection of Python scripts, prepared by me, demonstrating key Natural Language Processing (NLP) techniques and workflows. Each file focuses on a specific stage of the NLP pipeline — from data cleaning and preprocessing to tokenization, text normalization, and sentiment analysis. The goal of this project is to provide clear, modular, and reproducible code examples that help learners and practitioners understand how to implement core NLP concepts in Python.
- Text Cleaning
- Tokenization
- Stemming and Lemmatization
- Stop Words Removal
- Bag of Words
- TF-IDF
- N-Grams (Text Representation)
- Word2Vec (Word Embedding)
- FastText (Word Embedding)
- N-Gram Models (Probabilistic Language Models)
- Hidden Markov Models (Probabilistic Language Models)
- Maximum Entropy Models (Probabilistic Language Models)
- RNN (Deep Learning)
- LSTM (Deep Learning)
- BERT (Transformers)
- GPT (Transformers)
- LLaMA (Transformers)
- T5 (Transformers)
- Text Classification w/ Decision Trees
- Named Entity Recognition
- Morphological Analysis (spaCy)