This project is realised with the friend blajox, during our last year of master's degree. For this project we choose to achieve the problematic brought by the NFL league through their Kaggle competition. This competition focuses on the "crown jewel of American sports", the downfield pass and the critical uncertainty that keeps audiences engaged: will it be a touchdown, an interception, or an incomplete pass?
The objective is to leverage Next Gen Stats data to predict player movement during the entire phase when the football is in the air. This analysis is crucial for helping the NFL better understand player trajectories and movement patterns during contested catch situations.
Participants are tasked with building models to predict the precise location and movement of key players (the targeted receiver and converging defenders) for every frame, starting the moment the quarterback releases the ball and ending when the ball lands.
The data provided for prediction includes information available right up to the moment of ball release:
- Pre-Pass Tracking Data: Detailed NGS tracking data leading up to the moment the quarterback releases the ball.
- Targeted Player: Identification of the offensive player (the targeted receiver) who is the intended recipient of the pass.
-
Ball Landing Location: The final
$(\text{x}, \text{y})$ coordinates where the pass is expected to land.
- Tracking Frequency: The NFL tracking data is recorded at 10 frames per second (FPS).
-
Prediction Granularity: If a pass is in the air for
$T$ seconds, participants must predict$10 \times T$ frames of location data for each player. - Excluded Plays: To ensure focus on relevant downfield pass analysis, the competition data excludes the following types of plays: quick passes (duration less than 0.5 seconds), deflected passes and throwaway passes.
Note: Due to their size, the input CSV files are not included in this repository. Please download the dataset from the Kaggle Competition Page and place the input_*.csv files in a data/ folder.
link to the data : https://drive.google.com/file/d/1ym1gsHwswDrgso-xznJeHwedd5RBGb5S/view?usp=sharing
Generate models that output predicted movement (location coordinates) for each relevant player across all frames while the ball is traveling in the air. The ultimate goal is to generate outputs that most closely match the actual eventual player movement.
- Analyze Trajectories: Use the pre-pass movement data to determine initial velocities and intentions.
- Model Player Intent: Integrate the knowledge of the Targeted Player and Ball Landing Location, as these heavily influence player movement during the pass.
- Time-Series Modeling: Develop robust models capable of forecasting multi-step, multi-player time-series data accurately.
We have developed a Deep Learning model based on a Transformer Sequence-to-Sequence (Seq2Seq) architecture. Unlike simple regression models that predict a single final point, our model generates the future trajectory frame-by-frame in an autoregressive manner.
-
Feature Engineering:
- Target Player: Encoded with historical physics (speed, acceleration, direction) over the last 10 frames.
- Social Context: Integration of the 22 other players on the field relative to the target (Distance, Friend/Foe flag).
-
Encoder (Context Understanding):
- Uses Multi-Head Attention to weigh the importance of each defender and teammate.
- Creates a "Context Vector" summarizing the tactical situation.
-
Decoder (Trajectory Generation):
- Predicts the next position
$(x, y)$ based on the encoder memory and the previous position. - Uses Positional Encoding to respect the temporal sequence.
- Predicts the next position
The project is organized as follows:
nfl-project/
│
├── data/ # PyTorch Dataset class
│ ├── nfl_dataset.py # Dataset class implementation
│ └── processing.py # Data processing function
│
│
├── models/ # PyTorch models class implementation and pre-trainded models
│ ├── saved/ # Model with trained weights
│ │ └── nfl_model.pth
│ ├── methods.py # Train and evaluatation function
│ ├── nfl_attention.py # Attention model implementation
| └── nfl_seq2seq.py # Seq2seq model implementation
│
├── src/ # Source code function
│ ├── animation.py # Functions to generate HTML animations Custom PyTorch Dataset class
│ ├── processing.py # Functions to prepare input for predictions Transformer Seq2Seq Architecture
│ └── utils.py # Functions to retrieve information from files
│
├── results/ # Visualisations results
│ ├── pred_game_X_play_Y.html # Interactive trajectory animations
│ └── training_loss.png
│
├── main.ipynb # Main script (Jupyter Notebooks)
├── requirements.txt # Python dependencies
└── README.md # Project documentation
The model outputs interactive HTML animations comparing the Ground Truth (Real Trajectory) vs AI Prediction.
Note: This Big Data Bowl 2026 has two competitions. This is the Prediction competition. Learn more about the Analytics competition here.
Project created by Antony Manuel and Florian Lemiere, as part of the IMDA course.
IMT Nord Europe — 2025–2026
