Skip to content

A multimodal deep learning system that fuses video, crowd audio, and commentary text to robustly detect soccer highlights using early- and late-fusion architectures.

Notifications You must be signed in to change notification settings

Kryp6405/Multimodal-Soccer-Highlight-Detection

Repository files navigation

Multimodal Soccer Highlight Detection

A deep learning system for detecting highlight moments in soccer videos using multimodal data (vision, audio, and text).

Overview

This project combines three modalities to identify highlight events in soccer matches:

  • Vision: Extracted video frames and spectrograms for visual features
  • Audio: Mel-spectrograms from match audio commentary
  • Text: Commentary text extracted from match broadcasts

Dataset

The dataset consists of 4,552 clips from soccer matches, split into:

  • Train: 3,111 clips
  • Validation: 664 clips
  • Test: 778 clips

Each clip contains:

  • Video frames (8-second clips from matches)
  • Audio waveforms and mel-spectrograms
  • Textual commentary
  • Binary highlight/non-highlight labels

Requirements

  • Python 3.8+
  • PyTorch
  • TorchAudio
  • TorchVision
  • NumPy, Pandas, Scikit-learn
  • Matplotlib, Seaborn

Paper Link

About

A multimodal deep learning system that fuses video, crowd audio, and commentary text to robustly detect soccer highlights using early- and late-fusion architectures.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 3

  •  
  •  
  •