Labeling Comic Mischief Content in Online Videos with a Multimodal Hierarchical-Cross-Attention Model

Authors: Elaheh Baharlouei, Mahsa Shafaei, Yigeng Zhang, Hugo Jair Escalante, Thamar Solorio

This repository contains the dataset and implementations of the model proposed in the paper "Labeling Comic Mischief Content in Online Videos with a Multimodal Hierarchical-Cross-Attention Model" on The 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation at the LREC-COLING 2024 conference.

Comic-Mischief-Prediction

Figure 1 shows the overall structure of the proposed HIerarchical Cross Attention with CAPtion (HICCAP) model.

Repository Structure

Comic-Mischief-Prediction
├── Binary
│   └── source
│       ├── config.py
│       ├── nlp_comic_binary.py
│       └── models
│           ├── attention.py
│           └── unified_model_binary.py
├── Data
│   ├── train_features_lrec_camera.json
│   ├── val_features_lrec_camera.json
│   └── test_features_lrec_camera.json
├── Hybrid-Pretraining
│   ├── nlp_comic_contrastive_loss_LREC.py
│   ├── nlp_comic_pretraining_Hybrid_LREC.py
│   └── unified_model_hybrid_LREC.py
├── Multi-Task
│   └── source
│       ├── config.py
│       ├── nlp_comic_multi_task.py
│       └── models
│           ├── attention.py
│           └── multi_task_model.py
└── HICCAP.pdf

Data

In this directory, we provide three JSON files containing Metadata of train/val/test sets. These files also include the name of the videos on YouTube, original subtitles, and their extracted tokens using the BERT model, labels, and some additional information related to each video. Due to the policy, we are not allowed to release the video data. If you need, please email Elaheh Baharlouei ([email protected]) and we will provide the following data:

1. Video features extracted using I3D model
2. Audio features extracted using VGGish model

Binary

This directory contains the binary implementation of our approach. This directory includes source directory which has 1) the proposed HICCAP model implementation, 2) nlp_comic_binary.py script for training purpose, and 3) config.py contains the hyperparameters and configurations variables.

Multi-Task

Similar to the "Binary" directory, This directory contains the multi-task implementation of HICCAP approach. It includes source directory which has 1) the proposed HICCAP model implementation, 2) nlp_comic_multi_task.py script for training purpose, and 3) config.py contains the hyperparameters and configurations variables.

Data

This directory contains 1) Metadata of train/val/test sets, 2) VGGish audio feature vetors, and 3)I3D video feature vectors.

Hybrid-Pretraining

This directory contains the implementation of the hybrid-pretraining approch including 1) nlp_comic_contrastive_loss_LREC.py for pretraining with contarstive learning, 2) nlp_comic_pretraining_Hybrid_LREC.py for loading the checkpoint of the pretrained model during contrastive learning and pretraining with various matching pretraining approch and 3) unified_model_hybrid_LREC.py a sample implementation of HICCAP architecture with required layers for hybrid pretraining approch.

Citation

@inproceedings{baharlouei-etal-2024-labeling,
    title = "Labeling Comic Mischief Content in Online Videos with a Multimodal Hierarchical-Cross-Attention Model",
    author = "Baharlouei, Elaheh  and
      Shafaei, Mahsa  and
      Zhang, Yigeng  and
      Escalante, Hugo Jair  and
      Solorio, Thamar",
    editor = "Calzolari, Nicoletta  and
      Kan, Min-Yen  and
      Hoste, Veronique  and
      Lenci, Alessandro  and
      Sakti, Sakriani  and
      Xue, Nianwen",
    booktitle = "Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)",
    month = may,
    year = "2024",
    address = "Torino, Italia",
    publisher = "ELRA and ICCL",
    url = "https://aclanthology.org/2024.lrec-main.874/",
    pages = "9999--10013",
    abstract = "We address the challenge of detecting questionable content in online media, specifically the subcategory of comic mischief. This type of content combines elements such as violence, adult content, or sarcasm with humor, making it difficult to detect. Employing a multimodal approach is vital to capture the subtle details inherent in comic mischief content. To tackle this problem, we propose a novel end-to-end multimodal system for the task of comic mischief detection. As part of this contribution, we release a novel dataset for the targeted task consisting of three modalities: video, text (video captions and subtitles), and audio. We also design a HIerarchical Cross-attention model with CAPtions (HICCAP) to capture the intricate relationships among these modalities. The results show that the proposed approach makes a significant improvement over robust baselines and state-of-the-art models for comic mischief detection and its type classification. This emphasizes the potential of our system to empower users, to make informed decisions about the online content they choose to see."
}

## Contact
Feel free to get in touch via email to [email protected].

Name		Name	Last commit message	Last commit date
Latest commit History 105 Commits
Binary		Binary
Data		Data
Hybrid-Pretraining		Hybrid-Pretraining
Multi-Task/source		Multi-Task/source
HICCAP.pdf		HICCAP.pdf
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Labeling Comic Mischief Content in Online Videos with a Multimodal Hierarchical-Cross-Attention Model

Comic-Mischief-Prediction

Repository Structure

Data

Binary

Multi-Task

Data

Hybrid-Pretraining

Citation

About

Uh oh!

Releases

Packages

Languages

RiTUAL-MBZUAI/Comic-Mischief-Prediction

Folders and files

Latest commit

History

Repository files navigation

Labeling Comic Mischief Content in Online Videos with a Multimodal Hierarchical-Cross-Attention Model

Comic-Mischief-Prediction

Repository Structure

Data

Binary

Multi-Task

Data

Hybrid-Pretraining

Citation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages