Skip to content

SkinGenBench: Generative Model and Preprocessing Effects for Synthetic Dermoscopic Augmentation in Melanoma Diagnosis

License

Notifications You must be signed in to change notification settings

adarsh-crafts/SkinGenBench

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

17 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

SkinGenBench: Generative Model and Preprocessing Effects for Synthetic Dermoscopic Augmentation in Melanoma Diagnosis

License arXiv Python PyTorch

SkinGenBench Teaser

πŸ“’ Official PyTorch implementation of SkinGenBench: Generative Model and Preprocessing Effects for Synthetic Dermoscopic Augmentation in Melanoma Diagnosis
N. A. Adarsh Pritam, Jeba Shiney O, Sanyam Jain
[Paper] | [Code]


🌟 Highlights

  • First Systematic Benchmark: Controlled evaluation of preprocessing complexity (basic geometric vs. advanced artifact removal) across GANs (StyleGAN2-ADA) and Diffusion Models (DDPM) for dermoscopic melanoma synthesis.
  • Architecture > Preprocessing: Demonstrates that generative model choice has stronger impact than preprocessing complexity on both image fidelity and diagnostic utility.
  • StyleGAN2-ADA Superiority: Achieves lowest FID (β‰ˆ65.5) and KID (β‰ˆ0.05) with better class anchoring, while diffusion models produce higher variance at the cost of perceptual fidelity.
  • Significant Clinical Impact: Synthetic augmentation delivers 8-15% absolute melanoma F1-score improvements, with ViT-B/16 reaching F1β‰ˆ0.88 and ROC-AUCβ‰ˆ0.98 (β‰ˆ14% improvement over baselines).
  • Reproducible Framework: Unified assessment combining generative metrics (FID, IS, KID), downstream performance across five architectures (CNNs and transformers), and interpretability analysis via Grad-CAM on 14,116 dermoscopic images.

πŸ“Š Dataset

Experimental Design

Overall experimental design showing dual preprocessing pipelines, generative model training, synthetic data augmentation, and downstream classifier evaluation for melanoma diagnosis.

Table: Overview of curated dermatology dataset used in our study. The dataset combines ISIC 2025 (MLK10k) and HAM10000 sources.

Class Abbr. Images Percentage
Nevus NV 7,424 52.60%
Basal Cell Carcinoma BCC 3,026 21.43%
Benign Keratosis-like BKL 1,637 11.60%
Melanoma MEL 1,563 11.03%
Squamous Cell Carcinoma SCC 466 3.34%
Total 14,116 100%

Dataset Sources:


🧠 Overview

Methodology Overview
Figure: General framework of SkinGenBench showing the two preprocessing pipelines (Basic and Advanced), generative model training (StyleGAN2-ADA and DDPM), and evaluation through image quality metrics and downstream classification tasks.

Image Subset Nomenclature:

Source Basic Preprocessing (BS) Advanced Preprocessing (AD)
Ground Truth BS_GT AD_GT
StyleGAN2-ADA BS_GN AD_GN
DDPM BS_DF AD_DF
Ground-Truth Aug. BS_GTA AD_GTA

πŸ“„ Publication

SkinGenBench: Generative Model and Preprocessing Effects for Synthetic Dermoscopic Augmentation in Melanoma Diagnosis
N. A. Adarsh Pritam, Jeba Shiney O, Sanyam Jain
Alliance University, Bangalore & Østfold University College, Norway
[GitHub] | [PDF]


πŸ“‘ Table of Contents


Installation

  1. Clone the repository:
   git clone https://github.com/adarsh-crafts/SkinGenBench.git
   cd SkinGenBench
  1. Create a virtual environment:
   python3 -m venv venv
   source venv/bin/activate
   pip install -r requirements.txt
  1. Install PyTorch: Install PyTorch following instructions from PyTorch official site.

requirements.txt

torch>=2.0.0
torchvision>=0.15.0
numpy
opencv-python
matplotlib
scikit-learn
scipy
tqdm
h5py
pandas
Pillow

Model Zoo

Pretrained Models for StyleGAN2-ADA and DDPM which were finetuned are available here in table below:

StyleGAN2-ADA Models

Model Configuration File
StyleGAN2-ADA FFHQ 256Γ—256 pretrained NVIDIA CDN
DDPM CELEBA-HQ 256Γ—256 pretrained Hugging Face

Training

Train StyleGAN2-ADA, DDPM and the classifiers with the provided configurations in each nested directory.

Training Details:

Configuration Minimum Maximum
GPU NVIDIA RTX 4060 8GB Γ— 1 NVIDIA L4 22GB Γ— 1
RAM 8 GB 22 GB
Input Resolution 256Γ—256Γ—3 256Γ—256Γ—3

Results

t-SNE Visualization
Figure: t-SNE embeddings showing ground truth (GT), StyleGAN2-ADA (GN), and DDPM (DF) distributions for basic (left) and advanced (right) preprocessing pipelines.

Performance Metrics

Generative Model Quality Metrics (Epoch 1000)

FrΓ©chet Inception Distance (FID) - Lower is better

Model Basic Pipeline (BS) Advanced Pipeline (AD)
StyleGAN2-ADA (BSGN/ADGN) 79.36 65.47
DDPM (BSDF/ADDF) 83.04 90.22

Kernel Inception Distance (KID) - Lower is better

Model Basic Pipeline (BS) Advanced Pipeline (AD)
StyleGAN2-ADA (BSGN/ADGN) 0.0664 0.0546
DDPM (BSDF/ADDF) 0.0684 0.0772

Inception Score (IS) - Higher is better

Model Basic Pipeline (BS) Advanced Pipeline (AD)
StyleGAN2-ADA (BSGN/ADGN) 3.22 2.77
DDPM (BSDF/ADDF) 2.50 2.45

Global Classification Performance

Best Results: Pipeline A2 (Basic Preprocessing + StyleGAN2-ADA Augmentation)

Model Macro-F1 Balanced Acc MCC ROC-AUC Accuracy Brier Score ↓
ViT-B/16 0.8393 0.8348 0.8515 0.9822 0.8985 0.0302
ResNet-50 0.8393 0.8433 0.8525 0.9802 0.8989 0.0314
VGG-16 0.8181 0.8167 0.8243 0.9774 0.8797 0.0365
EfficientNet-B0 0.7977 0.7871 0.8033 0.9698 0.8657 0.0402
ResNet-18 0.7525 0.7424 0.7594 0.9611 0.8360 0.0479

Melanoma (MEL) Classification Performance

Best Results: Pipeline A2 (Basic Preprocessing + StyleGAN2-ADA Augmentation)

Model MEL F1 Sensitivity Specificity Precision ROC-AUC PR-AUC DOR
ViT-B/16 0.8831 0.8564 0.9798 0.9115 0.9802 0.9511 288.94
ResNet-50 0.8663 0.8401 0.9758 0.8941 0.9787 0.9445 211.93
VGG-16 0.8438 0.8108 0.9730 0.8796 0.9729 0.9228 154.56
EfficientNet-B0 0.8126 0.7781 0.9667 0.8503 0.9633 0.9043 101.76
ResNet-18 0.7724 0.7390 0.9576 0.8089 0.9542 0.8774 63.88

Key Melanoma Detection Improvements (A2 vs A1)

  • MEL F1-score gains: +8–15% across all architectures
  • ViT-B/16: MEL F1 improved from 0.7401 β†’ 0.8831 (+14.3 percentage points)
  • ResNet-50: MEL F1 improved from 0.7362 β†’ 0.8663 (+13.0 percentage points)
  • All models achieved ROC-AUC > 0.96 for melanoma detection

Pipeline Comparison Summary

Pipeline Description Best Use Case
A2 Basic preprocessing + StyleGAN2-ADA Recommended: Best overall performance
A3 Basic preprocessing + DDPM Good diversity, lower fidelity
B2 Advanced preprocessing + StyleGAN2-ADA Marginal gains over A2
B3 Advanced preprocessing + DDPM Lowest performance
A4/B4 Standard augmentation only (no synthetic) Baseline comparison

Key Finding: Generative architecture choice (GAN vs Diffusion) has a stronger influence on diagnostic performance than preprocessing complexity (Basic vs Advanced).

Grad-CAM Visualization
Figure: Grad-CAM visualizations comparing ResNet-50 and ViT-B/16 across different preprocessing pipelines and generative models. ResNet-50 produces compact, lesion-aligned saliency maps, while ViT-B/16 shows broader attention patterns. Synthetic samples exhibit more irregular activations, with ADDF showing the smoothest, most anatomically coherent results.


Citation

If you find this work useful, please cite our paper:

@misc{pritam2025skingenbenchgenerativemodelpreprocessing,
      title={SkinGenBench: Generative Model and Preprocessing Effects for Synthetic Dermoscopic Augmentation in Melanoma Diagnosis}, 
      author={N. A. Adarsh Pritam and Jeba Shiney O and Sanyam Jain},
      year={2025},
      eprint={2512.17585},
      archivePrefix={arXiv},
      primaryClass={eess.IV},
      url={https://arxiv.org/abs/2512.17585}, 
}

About

SkinGenBench: Generative Model and Preprocessing Effects for Synthetic Dermoscopic Augmentation in Melanoma Diagnosis

Topics

Resources

License

Stars

Watchers

Forks

Contributors 2

  •  
  •