Lang et al. (in Review) Which indicators matter? Using performance indicators to predict in-game success-related events in association football.
- 📖 About
- 📁 Repository Structure
- ⚙️ Setup
- 🚀 Running Experiments
- 📊 Data
- 📈 Results and Evaluation
- 📚 Citation
- 🛡️ License
- 📬 Contact
This repository accompanies the article "Which indicators matter? Using performance indicators to predict in-game success-related events in association football", under review in International Journal of Computer Science in Sport (IJCSS).
The study investigates how well 28 commonly used performance indicators (PIs) predict short-term success- or scoring-related events (SREs) such as goals, shots, and box entries in professional soccer. These predictions are based on how a team performs in a defined time span leading up to the event.
Using data from 102 Bundesliga matches and thousands of machine learning model configurations, we evaluated which PIs or PI-combinations best reflect a team’s current performance and can anticipate upcoming events. We found that indicators derived from frequent in-game actions, i.e. Dangerousity , Successful Passes into the Attacking Third, and Outplayed Opponents, are more effective than those based on rare events like Goals or Corner Kicks. Additionally, comparing team differences in PIs often improves predictive performance.
To our knowledge, this is the first study to predict in-play events beyond the immediate next event, opening new possibilities for real-time match analysis. Based on our findings, we also propose a novel match momentum metric, grounded in empirical prediction data, which can support tactical decisions and enhance in-play betting strategies.
This repository includes the code, model configurations, and result outputs used in the study. It is intended for researchers, analysts, and practitioners interested in applied machine learning for sports analytics, event prediction, and real-time performance evaluation.
The repository is organized as follows:
-
configs/
Contains configuration files for running experiments, including time window settings and model parameters. -
data/
Includes an original match data file used in the study. (Note: Check license and usage terms before redistribution.) -
models/
Contains implementations of various machine learning models used in the experiments. -
output/
Contains output files generated during the analysis. Specifically, it includes:-
The scaler file (
MinMaxScaler.pkl) used for feature normalization. -
A Jupyter Notebook (
apply_models.ipynb) demonstrating how to load and apply the top 3 trained models to a match. -
A folder
models/containing the three trained logistic regression models used in our experiments.
This structure supports reproducibility by providing all necessary components to run predictions on new data.
-
-
utils/
Includes utility scripts for data handling and sampling.
-
environment.yml
Conda environment file listing all required dependencies. -
run.py
Main script for initializing and running experiments based on provided configurations.
We use Conda for environment and dependency management. To set up the project environment, follow these steps:
-
Create the environment
Run the following command in the repository root to install all required libraries:
conda env create -f environment.yml -
Activate the environment Once created, activate the environment using:
conda activate shortterm_event_pred -
Update the environment
If you’ve already created the environment and theenvironment.ymlfile has changed, you can update it with:
conda env update --file environment.yml --pruneThis will ensure all dependencies are up to date and any removed packages are pruned accordingly.
You can run experiments using the run.py script along with a configuration file that defines the experimental setup.
python run.py CONFIG
- Replace
CONFIGwith the name of a configuration file located in theconfigs/folder (e.g.,training_config.yaml). - This file specifies parameters such as input and prediction windows, target events, and model settings.
The configs/ folder includes:
- Sample configuration you can use directly or modify for your own experiments.
This design makes it easy to reproduce the original experiments or explore new setups with minimal adjustments.
To facilitate application and validation, the repository includes the top three pretrained models from our analysis in the output/models/ folder.
A Jupyter Notebook (output/apply_models.ipynb) is provided to guide users through loading these models and applying them to new match data without retraining.
This enables quick testing and exploration of model predictions on unseen data.
This repository includes data from one sample match (data/Match_01.csv), which is provided for demonstration and testing purposes only. Due to licensing restrictions, we are unable to share the full dataset used in the study.
To facilitate testing and development, we have also included a folder named data/synthetic_data_files, which contains 102 synthetic datasets generated to resemble the structure and characteristics of the original data. Additionally, a Jupyter notebook is provided (data/generate_sythetic_data.ipynb) that documents and demonstrates how these synthetic datasets were created.
- We are not the legal owners of the complete dataset.
- The full set of event and tracking data from 102 Bundesliga matches used in the published study is not publicly available.
- However, as stated in the article, data access for academic or research purposes may be granted upon reasonable request to the corresponding author.
The included sample file allows users to explore the structure, preprocessing, and modeling workflow described in the paper.
This repository does not include the full results of the experiments, as they are provided in the published article and in a supplementary dataset available via Figshare.
-
Dangerousity was the most effective PI for predicting goals and shots.
-
Entries into the Attacking Third performed best for corner kicks, third entries, and box entries.
-
PIs reflecting frequent in-game actions (e.g., final-third possession, Dangerousity, opponents outplayed) outperformed those based on rare events (e.g., goals, corners).
-
Combining certain PIs (e.g., Opponents Outplayed and Tacklings Won) increased predictive accuracy, especially for goals.
-
A match momentum metric based on real-time prediction differences showed potential for tactical support and live betting applications.
-
All detailed results are available in the supplementary material on Figshare: https://figshare.com/projects/Lang_et_al_2025_Which_indicators_matter_Using_performance_indicators_to_predict_in-game_success-related_events_in_association_football/223491 .
-
Please refer to the article for a full methodological description and in-depth analysis.
Lang, S., Wimmer, T., Erben, A., & Link, D. (in Review). Which indicators matter? Using performance indicators to predict in-game success-related events in association football. International Journal of Computer Science in Sport (IJCSS).
This repository is shared under the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International (CC BY-NC-ND 4.0) license, in accordance with the journal’s Open Access policy.
You are free to:
-
Share — copy and redistribute the material in any medium or format As long as you follow these terms:
-
Attribution — you must give appropriate credit, provide a link to the license, and indicate if changes were made.
-
NonCommercial — you may not use the material for commercial purposes.
-
NoDerivatives — if you remix, transform, or build upon the material, you may not distribute the modified material.
License details: https://creativecommons.org/licenses/by-nc-nd/4.0/
For questions regarding the repository, data usage, or the study itself, please contact the corresponding author:
Steffen Lang
TUM School of Medicine and Health Sciences
Technical University of Munich (TUM)
✉️ [email protected]