A complete machine learning pipeline for binary logo presence detection. This project detects whether an image contains logos from four target brands: Coca-Cola, Disney, Starbucks, or McDonald's.
This project implements an end-to-end solution for logo detection, from data preparation to model deployment:
- Data Preparation: Downloads and preprocesses the OpenLogo dataset from Roboflow
- Model Training: Trains a ResNet18-based binary classifier using PyTorch
- Model Evaluation: Comprehensive evaluation with metrics and confusion matrices
- API Deployment: FastAPI web application with REST API and web UI for inference
The model returns 1 if any of the target brand logos are detected, and 0 otherwise.
- Project Structure
- Installation
- Quick Start
- Module Documentation
- Configuration
- Examples
- API Documentation
- Troubleshooting
.
├── app/ # FastAPI application
│ ├── main.py # FastAPI app with endpoints
├── checkpoints/ # Saved model checkpoints
├── data/ # Dataset directories
│ ├── dataset_binary/ # Processed binary dataset
│ └── openlogo_raw/ # Raw OpenLogo dataset
├── dataset/ # Dataset module
│ └── logo_dataset.py # PyTorch Dataset and DataModule
├── model/ # Model module
│ └── resnet18_classifier.py # ResNet18 binary classifier
├── training/ # Training and evaluation
│ ├── trainer.py # Training loop
│ ├── evaluator.py # Model evaluation
│ └── metrics.py # Metric computation
├── config.yaml # Configuration file
├── data_prep.py # Data preparation script
├── train.py # Training/evaluation entrypoint
├── run_api.py # API server startup script
└── requirements.txt # Python dependencies
- Python 3.8 or higher
- pip package manager
-
Clone or navigate to the project directory
-
Create a virtual environment (recommended):
python -m venv venv-
Activate the virtual environment:
- On Windows:
venv\Scripts\activate - On Linux/Mac:
source venv/bin/activate
- On Windows:
-
Install dependencies:
pip install -r requirements.txtKey dependencies include:
torchandtorchvision- PyTorch for deep learningfastapianduvicorn- Web framework and ASGI serverroboflow- Dataset downloadpandas- Data manipulationpyyaml- Configuration file parsingscikit-learn- Additional utilities
See requirements.txt for the complete list.
# 1. Prepare the dataset
python data_prep.py
# 2. Train the model
python train.py --mode train --config config.yaml
# 3. Evaluate the model
python train.py --mode eval --config config.yaml --checkpoint checkpoints/best_checkpoint.pt
# 4. Start the API server
python run_api.pyThen open http://localhost:8000 in your browser to use the web UI.
File: data_prep.py
Downloads and preprocesses the OpenLogo dataset from Roboflow, filtering for the four target brands and creating a binary dataset structure.
python data_prep.pyOr programmatically:
from data_prep import OpenLogoPreprocessor, OpenLogoConfig
config = OpenLogoConfig()
preprocessor = OpenLogoPreprocessor(config)
# Download dataset
dataset_root = preprocessor.download_dataset()
# Build binary dataset
preprocessor.build_binary_dataset(dataset_root)The preprocessing creates the following structure:
data/dataset_binary/
├── manifest.csv # CSV mapping filepaths to labels
├── pos/ # Positive samples (logos present)
│ ├── cocacola/
│ ├── disney/
│ ├── starbucks/
│ └── mcdonalds/
└── neg/ # Negative samples (no target logos)
The OpenLogoConfig class allows customization of:
- Brand buckets: Target brands to detect (default: cocacola, disney, starbucks, mcdonalds)
- Class mappings: Mapping from OpenLogo class names to canonical brand names
- Data paths: Raw and binary dataset root directories
File: dataset/logo_dataset.py
PyTorch Dataset and DataModule for loading images and managing train/validation/test splits.
- Automatic split detection: Uses filename prefixes (
train_,valid_,test_) to separate splits - Data augmentation: Random horizontal flips for training
- Image normalization: ImageNet statistics for pretrained models
- Binary label conversion: Maps brand labels to binary (1 = logo present, 0 = no logo)
from dataset.logo_dataset import LogoDataModule, LogoDatasetConfig
from pathlib import Path
config = LogoDatasetConfig(
data_root=Path("data/dataset_binary"),
batch_size=32,
image_size=224,
num_workers=0 # Set to 0 on Windows
)
data_module = LogoDataModule(config)
# Get data loaders
train_loader = data_module.get_train_loader()
val_loader = data_module.get_val_loader()
test_loader = data_module.get_test_loader()
# Get dataset statistics
stats = data_module.get_dataset_stats()data_root: Path to dataset root directorybatch_size: Batch size for DataLoadersnum_workers: Number of data loading workers (0 recommended for Windows)image_size: Input image size (default: 224 for ResNet18)seed: Random seed for reproducibility
File: model/resnet18_classifier.py
ResNet18-based binary classifier with pretrained ImageNet weights.
- Backbone: Pretrained ResNet18 (ImageNet weights)
- Feature extractor: All layers except the final fully connected layer
- Classifier head: Single linear layer (512 → 1) for binary classification
- Output: Logits for binary classification (sigmoid applied during inference)
- Frozen backbone option: Can freeze ResNet18 parameters for faster training
- Pretrained weights: Uses ImageNet-pretrained ResNet18 by default
- Flexible configuration: Configurable via
ResNet18BinaryClassifierConfig
from model.resnet18_classifier import (
ResNet18BinaryClassifier,
ResNet18BinaryClassifierConfig
)
config = ResNet18BinaryClassifierConfig(
freeze_backbone=True, # Freeze ResNet18 parameters
pretrained=True, # Use ImageNet weights
num_classes=1 # Binary classification
)
model = ResNet18BinaryClassifier(config)File: training/trainer.py
Training loop with metrics tracking, checkpoint management, and validation.
- Metrics tracking: Accuracy, precision, recall, F1 score for train and validation
- Checkpoint management: Saves the best model based on validation loss
- Training history: Tracks all metrics across epochs
- Optimizer support: Adam or SGD optimizers
from training.trainer import Trainer, TrainerConfig
from pathlib import Path
trainer_config = TrainerConfig(
num_epochs=10,
learning_rate=0.001,
optimizer="adam",
checkpoint_dir=Path("checkpoints"),
device="cuda", # or "cpu"
save_best_only=True,
verbose=True
)
trainer = Trainer(model, data_module, trainer_config)
history = trainer.train()The trainer computes and tracks:
- Loss: BCE with logits loss
- Accuracy: Overall classification accuracy
- Precision: TP / (TP + FP)
- Recall: TP / (TP + FN)
- F1 Score: 2 × (precision × recall) / (precision + recall)
File: training/evaluator.py
Standalone evaluator for model evaluation on test sets.
- Comprehensive metrics: Accuracy, precision, recall, F1 score
- Confusion matrix: Detailed confusion matrix with formatted output
- Checkpoint loading: Can load and evaluate from saved checkpoints
- Configurable threshold: Adjustable binary classification threshold
from training.evaluator import Evaluator, EvaluatorConfig
from pathlib import Path
evaluator_config = EvaluatorConfig(
device="cuda",
threshold=0.5,
verbose=True
)
evaluator = Evaluator(model, evaluator_config)
# Evaluate from checkpoint
results = evaluator.evaluate_from_checkpoint(
checkpoint_path=Path("checkpoints/best_checkpoint.pt"),
model=model,
data_loader=test_loader
)
print(f"Accuracy: {results['accuracy']:.4f}")
print(f"F1 Score: {results['f1']:.4f}")
print(results['confusion_matrix_str'])File: train.py
Command-line interface for training and evaluation with YAML configuration.
Training:
python train.py --mode train --config config.yamlEvaluation:
python train.py --mode eval --config config.yaml --checkpoint checkpoints/best_checkpoint.pt--mode: Eithertrainoreval--config: Path to YAML configuration file (default:config.yaml)--checkpoint: Path to checkpoint file (required forevalmode, or uses defaultcheckpoints/best_checkpoint.pt)
- Reproducibility: Sets random seeds and deterministic algorithms
- Configuration-driven: All settings via YAML file
- Error handling: Comprehensive error messages and tracebacks
Files: app/main.py, run_api.py
FastAPI web application providing REST API and web UI for model inference.
GET /: Web UI for image upload and logo detectionGET /health: Health check endpointPOST /predict: Logo detection API endpoint
python run_api.pyOr directly with uvicorn:
uvicorn app.main:app --reloadThe server will start on http://localhost:8000 by default.
CHECKPOINT_PATH: Path to model checkpoint (default:checkpoints/best_checkpoint.pt)
- Drag-and-drop image upload
- Real-time prediction display
- Confidence scores and probabilities
- Modern, responsive design
File: config.yaml
The project uses a YAML configuration file for all settings.
model:
freeze_backbone: true # Freeze ResNet18 backbone
pretrained: true # Use ImageNet pretrained weights
num_classes: 1 # Binary classificationdataset:
data_root: "data/dataset_binary"
batch_size: 32
num_workers: 0 # Set to 0 on Windows
image_size: 224
seed: 42training:
num_epochs: 10
learning_rate: 0.001
optimizer: "adam" # "adam" or "sgd"
checkpoint_dir: "checkpoints"
device: null # null = auto-detect CUDA
save_best_only: true
verbose: trueevaluation:
threshold: 0.5 # Binary classification threshold
device: null
verbose: truereproducibility:
seed: 42
deterministic: true
cudnn_deterministic: true
cudnn_benchmark: falsefrom pathlib import Path
import yaml
from dataset.logo_dataset import LogoDataModule, LogoDatasetConfig
from model.resnet18_classifier import (
ResNet18BinaryClassifier,
ResNet18BinaryClassifierConfig
)
from training.trainer import Trainer, TrainerConfig
# Load configuration
with open("config.yaml", "r") as f:
config = yaml.safe_load(f)
# Create model
model_config = ResNet18BinaryClassifierConfig(**config["model"])
model = ResNet18BinaryClassifier(model_config)
# Create data module
dataset_config = LogoDatasetConfig(**config["dataset"])
data_module = LogoDataModule(dataset_config)
# Create trainer
trainer_config = TrainerConfig(**config["training"])
trainer = Trainer(model, data_module, trainer_config)
# Train
history = trainer.train()from pathlib import Path
from training.evaluator import Evaluator, EvaluatorConfig
# Create evaluator
evaluator_config = EvaluatorConfig(threshold=0.5, verbose=True)
evaluator = Evaluator(model, evaluator_config)
# Evaluate from checkpoint
results = evaluator.evaluate_from_checkpoint(
checkpoint_path=Path("checkpoints/best_checkpoint.pt"),
model=model,
data_loader=test_loader
)Using curl:
curl -X POST "http://localhost:8000/predict" \
-H "accept: application/json" \
-H "Content-Type: multipart/form-data" \
-F "file=@path/to/image.jpg"Response:
{
"prediction": 1,
"probability": 0.9234
}Using Python requests:
import requests
url = "http://localhost:8000/predict"
with open("image.jpg", "rb") as f:
files = {"file": f}
response = requests.post(url, files=files)
result = response.json()
print(f"Prediction: {result['prediction']}")
print(f"Probability: {result['probability']}")Web UI for image upload and logo detection.
Response: HTML page with interactive UI
Health check endpoint.
Response:
{
"status": "healthy"
}Predict the presence of the logo in the uploaded image.
Request:
- Method:
POST - Content-Type:
multipart/form-data - Body: Form data with
filefield containing image file
Supported formats: JPEG, PNG
Response:
{
"prediction": 1,
"probability": 0.9234
}prediction: Binary prediction (0 or 1)probability: Probability score (0.0 to 1.0)
Error Responses:
-
400 Bad Request: Invalid image format{ "detail": "Invalid image format: image/gif. Supported formats: JPEG, PNG" } -
500 Internal Server Error: Model loading or inference error{ "detail": "Checkpoint file not found: checkpoints/best_checkpoint.pt" }
Problem: RuntimeError: An attempt has been made to start a new process...
Solution: Set num_workers: 0 in config.yaml:
dataset:
num_workers: 0Problem: FileNotFoundError: Checkpoint not found
Solution: Ensure you've trained the model first:
python train.py --mode train --config config.yamlProblem: RuntimeError: CUDA out of memory
Solution: Reduce batch size in config.yaml:
dataset:
batch_size: 16 # Reduce from 32Problem: Error downloading dataset from Roboflow
Solution:
- Ensure you have an internet connection
- Check Roboflow API credentials if required
- Verify the dataset is publicly accessible
Problem: ModuleNotFoundError when running scripts
Solution: Ensure you're in the project root directory and dependencies are installed:
pip install -r requirements.txtProblem: Port already in use or other uvicorn errors
Solution:
- Check if another instance is running
- Change the port:
uvicorn app.main:app --port 8001 - Check firewall settings
- num_workers: Always set to
0in DataLoader configurations - Path separators: The code handles Windows paths automatically
- CUDA: Ensure the CUDA toolkit is installed if using GPU
- ✅ Complete data preparation pipeline
- ✅ ResNet18 binary classifier with pretrained weights
- ✅ Training loop with comprehensive metrics
- ✅ Model evaluation with confusion matrices
- ✅ FastAPI web application with UI
- ✅ REST API for inference
- ✅ Configuration-driven training
- ✅ Reproducibility support
- Binary classification only (logo present/absent)
- Supports 4 specific brands (Coca-Cola, Disney, Starbucks, McDonald's)
- Requires Roboflow dataset download
- Model checkpoint must exist before the API can run
- Multi-class classification (identify specific brand)
- Support for additional brands
- Model fine-tuning capabilities
- Docker containerization
- Cloud deployment guides
This project is part of a technical evaluation task.
For questions or issues, please refer to the project documentation or create an issue in the repository.