Skip to content

Binary logo detection system using ResNet18 and FastAPI. Detects Coca-Cola, Disney, Starbucks, and McDonald's logos in images.

License

Notifications You must be signed in to change notification settings

scomri/brand-logo-detection

Repository files navigation

Logo Detection Project

A complete machine learning pipeline for binary logo presence detection. This project detects whether an image contains logos from four target brands: Coca-Cola, Disney, Starbucks, or McDonald's.

Overview

This project implements an end-to-end solution for logo detection, from data preparation to model deployment:

  • Data Preparation: Downloads and preprocesses the OpenLogo dataset from Roboflow
  • Model Training: Trains a ResNet18-based binary classifier using PyTorch
  • Model Evaluation: Comprehensive evaluation with metrics and confusion matrices
  • API Deployment: FastAPI web application with REST API and web UI for inference

The model returns 1 if any of the target brand logos are detected, and 0 otherwise.

Table of Contents

Project Structure

.
├── app/                    # FastAPI application
│   ├── main.py            # FastAPI app with endpoints
├── checkpoints/           # Saved model checkpoints
├── data/                  # Dataset directories
│   ├── dataset_binary/    # Processed binary dataset
│   └── openlogo_raw/      # Raw OpenLogo dataset
├── dataset/               # Dataset module
│   └── logo_dataset.py   # PyTorch Dataset and DataModule
├── model/                 # Model module
│   └── resnet18_classifier.py  # ResNet18 binary classifier
├── training/              # Training and evaluation
│   ├── trainer.py        # Training loop
│   ├── evaluator.py      # Model evaluation
│   └── metrics.py        # Metric computation
├── config.yaml           # Configuration file
├── data_prep.py          # Data preparation script
├── train.py              # Training/evaluation entrypoint
├── run_api.py            # API server startup script
└── requirements.txt      # Python dependencies

Installation

Prerequisites

  • Python 3.8 or higher
  • pip package manager

Step-by-Step Installation

  1. Clone or navigate to the project directory

  2. Create a virtual environment (recommended):

python -m venv venv
  1. Activate the virtual environment:

    • On Windows: venv\Scripts\activate
    • On Linux/Mac: source venv/bin/activate
  2. Install dependencies:

pip install -r requirements.txt

Dependencies

Key dependencies include:

  • torch and torchvision - PyTorch for deep learning
  • fastapi and uvicorn - Web framework and ASGI server
  • roboflow - Dataset download
  • pandas - Data manipulation
  • pyyaml - Configuration file parsing
  • scikit-learn - Additional utilities

See requirements.txt for the complete list.

Quick Start

Complete Workflow

# 1. Prepare the dataset
python data_prep.py

# 2. Train the model
python train.py --mode train --config config.yaml

# 3. Evaluate the model
python train.py --mode eval --config config.yaml --checkpoint checkpoints/best_checkpoint.pt

# 4. Start the API server
python run_api.py

Then open http://localhost:8000 in your browser to use the web UI.

Module Documentation

Data Preparation

File: data_prep.py

Downloads and preprocesses the OpenLogo dataset from Roboflow, filtering for the four target brands and creating a binary dataset structure.

Usage

python data_prep.py

Or programmatically:

from data_prep import OpenLogoPreprocessor, OpenLogoConfig

config = OpenLogoConfig()
preprocessor = OpenLogoPreprocessor(config)

# Download dataset
dataset_root = preprocessor.download_dataset()

# Build binary dataset
preprocessor.build_binary_dataset(dataset_root)

Output Structure

The preprocessing creates the following structure:

data/dataset_binary/
├── manifest.csv          # CSV mapping filepaths to labels
├── pos/                  # Positive samples (logos present)
│   ├── cocacola/
│   ├── disney/
│   ├── starbucks/
│   └── mcdonalds/
└── neg/                  # Negative samples (no target logos)

Configuration

The OpenLogoConfig class allows customization of:

  • Brand buckets: Target brands to detect (default: cocacola, disney, starbucks, mcdonalds)
  • Class mappings: Mapping from OpenLogo class names to canonical brand names
  • Data paths: Raw and binary dataset root directories

Dataset Module

File: dataset/logo_dataset.py

PyTorch Dataset and DataModule for loading images and managing train/validation/test splits.

Features

  • Automatic split detection: Uses filename prefixes (train_, valid_, test_) to separate splits
  • Data augmentation: Random horizontal flips for training
  • Image normalization: ImageNet statistics for pretrained models
  • Binary label conversion: Maps brand labels to binary (1 = logo present, 0 = no logo)

Usage

from dataset.logo_dataset import LogoDataModule, LogoDatasetConfig
from pathlib import Path

config = LogoDatasetConfig(
    data_root=Path("data/dataset_binary"),
    batch_size=32,
    image_size=224,
    num_workers=0  # Set to 0 on Windows
)

data_module = LogoDataModule(config)

# Get data loaders
train_loader = data_module.get_train_loader()
val_loader = data_module.get_val_loader()
test_loader = data_module.get_test_loader()

# Get dataset statistics
stats = data_module.get_dataset_stats()

Configuration

  • data_root: Path to dataset root directory
  • batch_size: Batch size for DataLoaders
  • num_workers: Number of data loading workers (0 recommended for Windows)
  • image_size: Input image size (default: 224 for ResNet18)
  • seed: Random seed for reproducibility

Model Module

File: model/resnet18_classifier.py

ResNet18-based binary classifier with pretrained ImageNet weights.

Architecture

  • Backbone: Pretrained ResNet18 (ImageNet weights)
  • Feature extractor: All layers except the final fully connected layer
  • Classifier head: Single linear layer (512 → 1) for binary classification
  • Output: Logits for binary classification (sigmoid applied during inference)

Features

  • Frozen backbone option: Can freeze ResNet18 parameters for faster training
  • Pretrained weights: Uses ImageNet-pretrained ResNet18 by default
  • Flexible configuration: Configurable via ResNet18BinaryClassifierConfig

Usage

from model.resnet18_classifier import (
    ResNet18BinaryClassifier,
    ResNet18BinaryClassifierConfig
)

config = ResNet18BinaryClassifierConfig(
    freeze_backbone=True,  # Freeze ResNet18 parameters
    pretrained=True,       # Use ImageNet weights
    num_classes=1           # Binary classification
)

model = ResNet18BinaryClassifier(config)

Training Module

File: training/trainer.py

Training loop with metrics tracking, checkpoint management, and validation.

Features

  • Metrics tracking: Accuracy, precision, recall, F1 score for train and validation
  • Checkpoint management: Saves the best model based on validation loss
  • Training history: Tracks all metrics across epochs
  • Optimizer support: Adam or SGD optimizers

Usage

from training.trainer import Trainer, TrainerConfig
from pathlib import Path

trainer_config = TrainerConfig(
    num_epochs=10,
    learning_rate=0.001,
    optimizer="adam",
    checkpoint_dir=Path("checkpoints"),
    device="cuda",  # or "cpu"
    save_best_only=True,
    verbose=True
)

trainer = Trainer(model, data_module, trainer_config)
history = trainer.train()

Metrics

The trainer computes and tracks:

  • Loss: BCE with logits loss
  • Accuracy: Overall classification accuracy
  • Precision: TP / (TP + FP)
  • Recall: TP / (TP + FN)
  • F1 Score: 2 × (precision × recall) / (precision + recall)

Evaluation Module

File: training/evaluator.py

Standalone evaluator for model evaluation on test sets.

Features

  • Comprehensive metrics: Accuracy, precision, recall, F1 score
  • Confusion matrix: Detailed confusion matrix with formatted output
  • Checkpoint loading: Can load and evaluate from saved checkpoints
  • Configurable threshold: Adjustable binary classification threshold

Usage

from training.evaluator import Evaluator, EvaluatorConfig
from pathlib import Path

evaluator_config = EvaluatorConfig(
    device="cuda",
    threshold=0.5,
    verbose=True
)

evaluator = Evaluator(model, evaluator_config)

# Evaluate from checkpoint
results = evaluator.evaluate_from_checkpoint(
    checkpoint_path=Path("checkpoints/best_checkpoint.pt"),
    model=model,
    data_loader=test_loader
)

print(f"Accuracy: {results['accuracy']:.4f}")
print(f"F1 Score: {results['f1']:.4f}")
print(results['confusion_matrix_str'])

Training Entrypoint

File: train.py

Command-line interface for training and evaluation with YAML configuration.

Usage

Training:

python train.py --mode train --config config.yaml

Evaluation:

python train.py --mode eval --config config.yaml --checkpoint checkpoints/best_checkpoint.pt

Arguments

  • --mode: Either train or eval
  • --config: Path to YAML configuration file (default: config.yaml)
  • --checkpoint: Path to checkpoint file (required for eval mode, or uses default checkpoints/best_checkpoint.pt)

Features

  • Reproducibility: Sets random seeds and deterministic algorithms
  • Configuration-driven: All settings via YAML file
  • Error handling: Comprehensive error messages and tracebacks

API Module

Files: app/main.py, run_api.py

FastAPI web application providing REST API and web UI for model inference.

Endpoints

  1. GET /: Web UI for image upload and logo detection
  2. GET /health: Health check endpoint
  3. POST /predict: Logo detection API endpoint

Starting the Server

python run_api.py

Or directly with uvicorn:

uvicorn app.main:app --reload

The server will start on http://localhost:8000 by default.

Environment Variables

  • CHECKPOINT_PATH: Path to model checkpoint (default: checkpoints/best_checkpoint.pt)

Web UI Features

  • Drag-and-drop image upload
  • Real-time prediction display
  • Confidence scores and probabilities
  • Modern, responsive design

Configuration

File: config.yaml

The project uses a YAML configuration file for all settings.

Configuration Sections

Model Configuration

model:
  freeze_backbone: true   # Freeze ResNet18 backbone
  pretrained: true        # Use ImageNet pretrained weights
  num_classes: 1          # Binary classification

Dataset Configuration

dataset:
  data_root: "data/dataset_binary"
  batch_size: 32
  num_workers: 0          # Set to 0 on Windows
  image_size: 224
  seed: 42

Training Configuration

training:
  num_epochs: 10
  learning_rate: 0.001
  optimizer: "adam"       # "adam" or "sgd"
  checkpoint_dir: "checkpoints"
  device: null            # null = auto-detect CUDA
  save_best_only: true
  verbose: true

Evaluation Configuration

evaluation:
  threshold: 0.5          # Binary classification threshold
  device: null
  verbose: true

Reproducibility Configuration

reproducibility:
  seed: 42
  deterministic: true
  cudnn_deterministic: true
  cudnn_benchmark: false

Examples

Complete Training Example

from pathlib import Path
import yaml
from dataset.logo_dataset import LogoDataModule, LogoDatasetConfig
from model.resnet18_classifier import (
    ResNet18BinaryClassifier,
    ResNet18BinaryClassifierConfig
)
from training.trainer import Trainer, TrainerConfig

# Load configuration
with open("config.yaml", "r") as f:
    config = yaml.safe_load(f)

# Create model
model_config = ResNet18BinaryClassifierConfig(**config["model"])
model = ResNet18BinaryClassifier(model_config)

# Create data module
dataset_config = LogoDatasetConfig(**config["dataset"])
data_module = LogoDataModule(dataset_config)

# Create trainer
trainer_config = TrainerConfig(**config["training"])
trainer = Trainer(model, data_module, trainer_config)

# Train
history = trainer.train()

Evaluation Example

from pathlib import Path
from training.evaluator import Evaluator, EvaluatorConfig

# Create evaluator
evaluator_config = EvaluatorConfig(threshold=0.5, verbose=True)
evaluator = Evaluator(model, evaluator_config)

# Evaluate from checkpoint
results = evaluator.evaluate_from_checkpoint(
    checkpoint_path=Path("checkpoints/best_checkpoint.pt"),
    model=model,
    data_loader=test_loader
)

API Usage Example

Using curl:

curl -X POST "http://localhost:8000/predict" \
  -H "accept: application/json" \
  -H "Content-Type: multipart/form-data" \
  -F "file=@path/to/image.jpg"

Response:

{
  "prediction": 1,
  "probability": 0.9234
}

Using Python requests:

import requests

url = "http://localhost:8000/predict"
with open("image.jpg", "rb") as f:
    files = {"file": f}
    response = requests.post(url, files=files)
    result = response.json()
    print(f"Prediction: {result['prediction']}")
    print(f"Probability: {result['probability']}")

API Documentation

GET /

Web UI for image upload and logo detection.

Response: HTML page with interactive UI

GET /health

Health check endpoint.

Response:

{
  "status": "healthy"
}

POST /predict

Predict the presence of the logo in the uploaded image.

Request:

  • Method: POST
  • Content-Type: multipart/form-data
  • Body: Form data with file field containing image file

Supported formats: JPEG, PNG

Response:

{
  "prediction": 1,
  "probability": 0.9234
}
  • prediction: Binary prediction (0 or 1)
  • probability: Probability score (0.0 to 1.0)

Error Responses:

  • 400 Bad Request: Invalid image format

    {
      "detail": "Invalid image format: image/gif. Supported formats: JPEG, PNG"
    }
  • 500 Internal Server Error: Model loading or inference error

    {
      "detail": "Checkpoint file not found: checkpoints/best_checkpoint.pt"
    }

Troubleshooting

Common Issues

1. DataLoader Workers Error on Windows

Problem: RuntimeError: An attempt has been made to start a new process...

Solution: Set num_workers: 0 in config.yaml:

dataset:
  num_workers: 0

2. Checkpoint Not Found

Problem: FileNotFoundError: Checkpoint not found

Solution: Ensure you've trained the model first:

python train.py --mode train --config config.yaml

3. CUDA Out of Memory

Problem: RuntimeError: CUDA out of memory

Solution: Reduce batch size in config.yaml:

dataset:
  batch_size: 16  # Reduce from 32

4. Roboflow Download Fails

Problem: Error downloading dataset from Roboflow

Solution:

  • Ensure you have an internet connection
  • Check Roboflow API credentials if required
  • Verify the dataset is publicly accessible

5. Import Errors

Problem: ModuleNotFoundError when running scripts

Solution: Ensure you're in the project root directory and dependencies are installed:

pip install -r requirements.txt

6. API Server Won't Start

Problem: Port already in use or other uvicorn errors

Solution:

  • Check if another instance is running
  • Change the port: uvicorn app.main:app --port 8001
  • Check firewall settings

Windows-Specific Considerations

  • num_workers: Always set to 0 in DataLoader configurations
  • Path separators: The code handles Windows paths automatically
  • CUDA: Ensure the CUDA toolkit is installed if using GPU

Project Status

Current Features

  • ✅ Complete data preparation pipeline
  • ✅ ResNet18 binary classifier with pretrained weights
  • ✅ Training loop with comprehensive metrics
  • ✅ Model evaluation with confusion matrices
  • ✅ FastAPI web application with UI
  • ✅ REST API for inference
  • ✅ Configuration-driven training
  • ✅ Reproducibility support

Known Limitations

  • Binary classification only (logo present/absent)
  • Supports 4 specific brands (Coca-Cola, Disney, Starbucks, McDonald's)
  • Requires Roboflow dataset download
  • Model checkpoint must exist before the API can run

Future Enhancements

  • Multi-class classification (identify specific brand)
  • Support for additional brands
  • Model fine-tuning capabilities
  • Docker containerization
  • Cloud deployment guides

License

This project is part of a technical evaluation task.

Contact

For questions or issues, please refer to the project documentation or create an issue in the repository.

About

Binary logo detection system using ResNet18 and FastAPI. Detects Coca-Cola, Disney, Starbucks, and McDonald's logos in images.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published