Skip to content

capialbino/Sexism_Detection_BiLSTM

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 

Repository files navigation

Sexism Detection with Bidirectional LSTM (EXIST 2023 Task 2)

This project implements and evaluates neural and transformer-based models for sexism intention classification, following the EXIST 2023 Task 2 specification.

The goal is to classify tweets into four categories:

  • DIRECT
  • REPORTED
  • JUDGEMENTAL
  • - (non-sexist)

The system is based on:

  • GloVe word embeddings
  • LSTM and Bidirectional LSTM neural networks
  • A transformer model (Twitter-RoBERTa for hate speech)

Task Description

The task is to determine the intention behind sexist messages. The categories are:

  • DIRECT: The tweet itself is sexist or promotes sexist behavior.
  • REPORTED: The tweet reports a sexist event or statement.
  • JUDGEMENTAL: The tweet describes a sexist situation in order to criticize or condemn it.
    • : Non-sexist content.

Each tweet is annotated by six annotators. The final label is obtained using majority voting.


Dataset Processing

The dataset is a subset of the EXIST 2023 corpus and contains tweets in English and Spanish. Only English tweets are used.

For Task 2, the labels are aggregated using majority voting. Tweets without a clear majority are removed. The final labels are encoded as:

'-' -> 0 'DIRECT' -> 1 'JUDGEMENTAL' -> 2 'REPORTED' -> 3

Only the following fields are kept:

  • id_EXIST
  • lang
  • tweet
  • label

Data Cleaning

Tweets are noisy and informal. The following preprocessing steps are applied:

  • Emojis are removed
  • Hashtags are removed
  • Mentions (e.g., @user) are removed
  • URLs are removed
  • Special characters and symbols are removed
  • Curly quotes and special quotation marks are normalized
  • Lemmatization is applied to reduce words to their base form

This preprocessing produces cleaner and more consistent textual input for the models.


Text Encoding with GloVe

Word embeddings are built using pre-trained GloVe vectors.

The vocabulary is constructed as:

  • All tokens appearing in the training set
  • Plus all tokens present in GloVe

Out-of-vocabulary handling follows these rules:

  • If a token appears in the training set but not in GloVe, it is added to the vocabulary and assigned a custom embedding.
  • If a token appears in validation or test but not in the vocabulary, it is mapped to a special <UNK> token.

The <UNK> token is assigned a static embedding (e.g., random initialization).

This ensures that all training tokens have an embedding while unseen tokens are handled consistently.


Neural Models

Two recurrent neural architectures are implemented.

Baseline Model

  • Embedding layer initialized with GloVe embeddings
  • One Bidirectional LSTM layer
  • Dense softmax classification layer

Stacked Model

  • Embedding layer initialized with GloVe embeddings
  • Two stacked Bidirectional LSTM layers
  • Dense softmax classification layer

The embedding layer can be either frozen or fine-tuned during training.


Training and Evaluation

Each model is trained using at least three different random seeds to obtain robust estimates.

The models are trained on the training set and evaluated on the validation set. The following metrics are computed:

  • Macro F1-score
  • Precision
  • Recall

Mean and standard deviation across seeds are reported for each metric. The best model is selected based on the macro F1-score.


Transformer Model

A transformer-based model, Twitter-RoBERTa-base for Hate Speech Detection, is used.

The workflow is:

  • Load the tokenizer and the model from HuggingFace
  • Tokenize the tweets
  • Train using the HuggingFace Trainer
  • Evaluate on the test set using the same metrics as the LSTM models

The transformer is compared against the BiLSTM models in terms of performance and error patterns.


Error Analysis

The main sources of error include:

  • Out-of-vocabulary and rare words
  • Informal language and slang typical of tweets
  • Class imbalance between DIRECT, REPORTED, and JUDGEMENTAL
  • Confusion between reported and judgemental cases

The transformer generally handles context and rare words better than the LSTM models, while the LSTM models are more sensitive to vocabulary coverage and preprocessing quality.

Possible improvements include:

  • More advanced tweet-specific preprocessing
  • Data augmentation
  • Using multilingual or larger transformer models
  • Improving handling of rare and unseen words

Report

The final report summarizes all experiments and results following the NLP course template. It includes:

  • Description of preprocessing and models
  • Performance tables
  • Learning curves
  • Error analysis

The report is provided in PDF format together with the notebook used for the experiments.


Contributors


Credits and Contacts

Teaching Assistants:

Professor:

This project is developed for academic and educational purposes.

About

Training of Bidirectional LSTM for sexism detection

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published