This project demonstrates a robust end-to-end pipeline for detecting and tracking road signs, even in extremely low-FPS, low-quality video footage.
Unlike conventional methods that rely solely on per-frame detection, this solution combines a YOLOv11-based detector with a finely tuned Norfair tracking algorithm to achieve temporal consistency, ID stability, and smooth tracking — even when objects briefly disappear or move abruptly between frames.
As you can see from the GIF above, the tracking-id stays consistent even though the video sample is playing at 2 FPS.
- Overview
- Dataset & Preprocessing
- Model Training
- Detection on Unseen Data
- Tracking Algorithm
- Results
- How to Run
- Future Work
- Key takeaway
The project focuses on:
- Developing a custom tracking pipeline that works even on 1–2 FPS video footage.
- Enhancing detection stability through relabeling and data cleaning.
- Demonstrating how to pair deep learning with lightweight geometric tracking logic for resource-constrained environments.
- Total images: 6000
- Usable after cleaning: 1474
- Classes:
advisory speed mphdirectional arrowsdo not enterstopwrong way
- Many labels were incomplete or missing; re-annotated using labelImg.
- Added two new classes:
stopandwrong wayfor completeness. - Ensured that each image had consistent and accurate annotations.
- Applied rotation, brightness, and scale augmentations to simulate real-world variations.
- Final dataset of 1474 images / 5 classes used for training and evaluation.
Two YOLOv11 models were trained and compared for performance:
| Model | [email protected] | Training Speed | Notes |
|---|---|---|---|
| YOLOv11n (Nano) | 0.57 | ⚡ Fast | Lightweight but limited accuracy on new classes |
| YOLOv11m (Medium) | 0.71 | 🧠 Moderate | Significantly better generalization & stability |
The YOLOv11m model achieved a 14% higher mAP and was selected for inference and tracking tasks.
- Low frame rate: ~1–2 FPS
- Perspective: From a moving vehicle
- Challenges: Motion blur, sudden scale changes, abrupt frame jumps
- Despite the low FPS, the YOLOv11m model effectively detected road signs, including partially occluded or distant ones.
- Occasional missed detections occurred due to frame skips, but overall detection was stable and accurate.
The core highlight of this project lies in the custom tracking setup built using the Norfair library — chosen over default YOLO trackers for its flexibility and resilience to missing detections.
- Designed for non-continuous frame streams (like low-FPS CCTV footage).
- Allows custom distance functions and persistence tuning for object identity stability.
- Distance function:
mean_manhattan - Key parameters:
distance_threshold = tuned_value
hit_counter_max = higher_value_for_stability- A higher
hit_counter_maxensured that object IDs persisted even when detections were temporarily missed. - Distance threshold was carefully adjusted to prevent ID switches on abrupt motion.
for each frame in video:
detections = yolov11m.predict(frame)
tracker.update(detections)
for each active track:
draw bounding box + track ID- Stable IDs maintained for most objects throughout the video.
- Successfully handled objects disappearing and reappearing.
- Minor ID resets occurred only on severe frame skips, expected at ~1 FPS input.
- Overall, the YOLOv11m + Norfair combination proved robust and reliable under challenging conditions.
| Metric | Result |
|---|---|
| Detection [email protected] | 0.71 (YOLOv11m) |
| Tracking Stability | High |
| FPS Tested | 1.0 – 2.0 |
| ID Switch Rate | Low (<10%) |
Qualitative Outcome:
- Smooth, stable tracking visualization with consistent track IDs.
- Effective in motion-blur and lighting variation scenarios.
Example Console Output:
[Frame 12] STOP_SIGN_3 → position updated
[Frame 13] STOP_SIGN_3 → re-identified after 2 skipped frames
[Frame 14] DIRECTIONAL_ARROW_1 → continuous tracking
-
Clone the repository
git clone https://github.com/Rohan-Thoma/Object-tracking-in-worst-video-environments.git
-
Install dependencies
pip install -r requirements.txt
-
Run the notebook
jupyter notebook Sign_board.ipynb
-
(Optional) Replace the demo video with your own under
/input/.
- Integrate Kalman Filters for motion prediction and smoother trajectories.
- Implement confidence-weighted IoU matching to refine association logic.
- Extend the system to multi-camera sign tracking for traffic analysis.
Even with poor video quality and low frame rate, intelligent combination of detection and geometric tracking can yield stable, production-ready results — proving that smart engineering often outperforms brute-force deep learning.
