Real-time Data Stream Imputation for Enhancing Fault Tolerance on the Edge
Focusing on Environmental Data
This repository contains the complete source code, datasets, and evaluation results for our research on real-time data stream imputation in edge-based environmental monitoring systems. The work supports the paper titled "Real-time Data Stream Imputation for Enhancing Fault Tolerance on the Edge Focusing on Environmental Data", which explores lightweight forecasting models for maintaining data continuity during sensor outages.
Inside the repository, you’ll find:
- The Python implementation
- The simulation framework
- Real-world environmental datasets from a weather station in Athens, Greece
- All scripts and configuration files used for experiments, plots, and result analysis
- Paper data (spreadsheets and figures)
Ensuring continuous, high-quality data in environmental monitoring systems is essential for applications such as climate modeling, urban planning, and disaster response. However, real-time data streams from edge-based IoT sensors are frequently affected by transmission errors, sensor faults, and network disruptions, leading to missing or incomplete observations. This paper investigates the application of lightweight, real-time imputation methods to enhance fault tolerance in edge computing environments. An imputation engine is developed and evaluated using five forecasting models—Naive, Seasonal Naive, Simple Exponential Smoothing, Holt’s Linear Trend, and Holt-Winters Exponential Smoothing—selected for their computational efficiency and suitability for edge deployment. To assess performance, a simulation framework is introduced that replicates sensor failure scenarios and allows controlled testing on real-world environmental data collected from a weather station in Athens, Greece. Imputation accuracy is evaluated using Mean Absolute Error (MAE), 95th percentile error, and maximum error, with results benchmarked against sensor tolerance thresholds. Findings show that Holt-Winters consistently provides the highest accuracy across diverse environmental variables and forecast horizons, while simpler models offer limited utility in short-term recovery contexts. The study demonstrates the feasibility of real-time imputation on low-power edge devices and provides actionable insights for deploying fault-tolerant environmental monitoring systems in resource-constrained settings.
DOI
https://doi.org/10.1016/j.simpat.2025.103178
BibTeX
@article{gkoulis2025exploring,
title={Exploring the performance of real-time data imputation to enhance fault tolerance on the edge: A study on environmental data},
author={Gkoulis, Dimitris and Tsadimas, Anargyros and Kousiouris, George and Bardaki, Cleopatra and Nikolaidou, Mara},
journal={Simulation Modelling Practice and Theory},
pages={103178},
year={2025},
publisher={Elsevier},
doi={10.1016/j.simpat.2025.103178}
}To install and use the simulator, first create a virtual environment (Python 3.11 required):
python -m venv venvActivate the virtual environment:
source venv/bin/activateInstall dependencies:
pip install -r requirements.txtTo run pre-processing:
python -m rtds_imputation_sim.main pre_processTo run exploratory analysis for a single feature:
python -m rtds_imputation_sim.main explore_single TemperatureTo run exploratory analysis for all features:
python -m rtds_imputation_sim.main explore_allTo run the simulation:
python -m rtds_imputation_sim.main run_simulationTo run the simulation (results) visualization:
python -m rtds_imputation_sim.main visualize_simulation
