Sexism Detection with Bidirectional LSTM (EXIST 2023 Task 2)

This project implements and evaluates neural and transformer-based models for sexism intention classification, following the EXIST 2023 Task 2 specification.

The goal is to classify tweets into four categories:

DIRECT
REPORTED
JUDGEMENTAL
- (non-sexist)

The system is based on:

GloVe word embeddings
LSTM and Bidirectional LSTM neural networks
A transformer model (Twitter-RoBERTa for hate speech)

Task Description

The task is to determine the intention behind sexist messages. The categories are:

DIRECT: The tweet itself is sexist or promotes sexist behavior.
REPORTED: The tweet reports a sexist event or statement.
JUDGEMENTAL: The tweet describes a sexist situation in order to criticize or condemn it.
- : Non-sexist content.

Each tweet is annotated by six annotators. The final label is obtained using majority voting.

Dataset Processing

The dataset is a subset of the EXIST 2023 corpus and contains tweets in English and Spanish. Only English tweets are used.

For Task 2, the labels are aggregated using majority voting. Tweets without a clear majority are removed. The final labels are encoded as:

'-' -> 0 'DIRECT' -> 1 'JUDGEMENTAL' -> 2 'REPORTED' -> 3

Only the following fields are kept:

id_EXIST
lang
tweet
label

Data Cleaning

Tweets are noisy and informal. The following preprocessing steps are applied:

Emojis are removed
Hashtags are removed
Mentions (e.g., @user) are removed
URLs are removed
Special characters and symbols are removed
Curly quotes and special quotation marks are normalized
Lemmatization is applied to reduce words to their base form

This preprocessing produces cleaner and more consistent textual input for the models.

Text Encoding with GloVe

Word embeddings are built using pre-trained GloVe vectors.

The vocabulary is constructed as:

All tokens appearing in the training set
Plus all tokens present in GloVe

Out-of-vocabulary handling follows these rules:

If a token appears in the training set but not in GloVe, it is added to the vocabulary and assigned a custom embedding.
If a token appears in validation or test but not in the vocabulary, it is mapped to a special <UNK> token.

The <UNK> token is assigned a static embedding (e.g., random initialization).

This ensures that all training tokens have an embedding while unseen tokens are handled consistently.

Neural Models

Two recurrent neural architectures are implemented.

Baseline Model

Embedding layer initialized with GloVe embeddings
One Bidirectional LSTM layer
Dense softmax classification layer

Stacked Model

Embedding layer initialized with GloVe embeddings
Two stacked Bidirectional LSTM layers
Dense softmax classification layer

The embedding layer can be either frozen or fine-tuned during training.

Training and Evaluation

Each model is trained using at least three different random seeds to obtain robust estimates.

The models are trained on the training set and evaluated on the validation set. The following metrics are computed:

Macro F1-score
Precision
Recall

Mean and standard deviation across seeds are reported for each metric. The best model is selected based on the macro F1-score.

Transformer Model

A transformer-based model, Twitter-RoBERTa-base for Hate Speech Detection, is used.

The workflow is:

Load the tokenizer and the model from HuggingFace
Tokenize the tweets
Train using the HuggingFace Trainer
Evaluate on the test set using the same metrics as the LSTM models

The transformer is compared against the BiLSTM models in terms of performance and error patterns.

Error Analysis

The main sources of error include:

Out-of-vocabulary and rare words
Informal language and slang typical of tweets
Class imbalance between DIRECT, REPORTED, and JUDGEMENTAL
Confusion between reported and judgemental cases

The transformer generally handles context and rare words better than the LSTM models, while the LSTM models are more sensitive to vocabulary coverage and preprocessing quality.

Possible improvements include:

More advanced tweet-specific preprocessing
Data augmentation
Using multilingual or larger transformer models
Improving handling of rare and unseen words

Report

The final report summarizes all experiments and results following the NLP course template. It includes:

Description of preprocessing and models
Performance tables
Learning curves
Error analysis

The report is provided in PDF format together with the notebook used for the experiments.

Contributors

Credits and Contacts

Teaching Assistants:

Federico Ruggeri, [email protected]
Eleonora Mancini, [email protected]

Professor:

Paolo Torroni, [email protected]

This project is developed for academic and educational purposes.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
LSTM.ipynb		LSTM.ipynb
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Sexism Detection with Bidirectional LSTM (EXIST 2023 Task 2)

Task Description

Dataset Processing

Data Cleaning

Text Encoding with GloVe

Neural Models

Baseline Model

Stacked Model

Training and Evaluation

Transformer Model

Error Analysis

Report

Contributors

Credits and Contacts

About

Uh oh!

Releases

Packages

Languages

capialbino/Sexism_Detection_BiLSTM

Folders and files

Latest commit

History

Repository files navigation

Sexism Detection with Bidirectional LSTM (EXIST 2023 Task 2)

Task Description

Dataset Processing

Data Cleaning

Text Encoding with GloVe

Neural Models

Baseline Model

Stacked Model

Training and Evaluation

Transformer Model

Error Analysis

Report

Contributors

Credits and Contacts

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages