This project is a clean and commented PyTorch implementation of the paper RetinaNet: Focal Loss for Dense Object Detection.
While many RetinaNet implementations exist, this one is built from the ground up with a specific goal: to be a clear educational resource. It follows the original paper as closely as possible, stripping away production-level optimizations and boilerplate. This allows you to focus on the core concepts of RetinaNet without getting lost in the weeds.
It's perfect for anyone looking to understand how RetinaNet really works under the hood.
- Paper-Focused: The implementation sticks closely to the concepts described in the original paper.
- Deeply Commented: Code blocks are linked back to the specific sections of the paper they implement. This makes it easy to cross-reference and understand the "why" behind the code.
- Multiple Backbones: Supports ResNet-18, ResNet-34, ResNet-50, ResNet-101, and ResNet-152 backbones right out of the box.
- Simplicity by Design: Intentionally omits features like custom
BatchNormlayers in the heads to keep the focus on the fundamental architecture.
Let's get this model trained! We'll use the fun Raccoon dataset to prove that our implementation, despite its simplicity, can learn to detect objects.
-
Clone the repository:
git clone https://github.com/Armaggheddon/retinanet_demystified.git cd retinanet_pytorch -
Download the dataset:
git clone https://github.com/datitran/raccoon_dataset
-
Set up your environment (Recommended):
python -m venv .venv source .venv/bin/activate # On Windows use `.venv\Scripts\activate` pip install -r requirements.txt
-
Start Training: Feel free to peek into
train.pyand tweak theHYPERPARAMETERS!python train.py
After training, your model weights will be saved as
retinanet_raccoon_rnXX.pth. -
Run Inference: Modify the
IMAGE_PATHinload_trained.pyto point to a test image.python load_trained.py
Check out the
output.pngfile to see your model in action!
Absolutely! The goal here isn't to set new state-of-the-art records, but to demonstrate that the core architecture works and learns. The plots below show a ResNet18 backbone trained for 20 epochs on the small Raccoon dataset.
You'll notice clear signs of overfitting, which is expected given the dataset's size. But more importantly, you'll see the loss decreasing and the model successfully identifying objects. It's alive!
Average training and evaluation loss per epoch. The model is learning!
Training total loss, classification loss, and box regression loss.
Evaluation total loss, classification loss, and box regression loss.
The papers referenced throughout the code are:
- RetinaNet: Focal Loss for Dense Object Detection - Lin et al., 2017
- Fast R-CNN - Ross Girshick, 2015
- Feature Pyramid Networks for Object Detection - Lin et al., 2017
- Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks - Shaoqing Ren, Kaiming He, Ross Girshick, and Jian Sun, 2015
- Deep Residual Learning for Image Recognition - Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun, 2015
Tip
When no paper reference is given it is always referred to the main RetinaNet paper.
Feel free to open an issue or submit a pull request. Contributions are welcome! Just remember, the goal is to keep things simple and educational.
