Skip to content

pasqualedem/LabelAnything

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

๐Ÿท๏ธ Label Anything

Multi-Class Few-Shot Semantic Segmentation with Visual Prompts

Project Page PWC arXiv ECAI 2025 Python 3.8+ License: MIT


๐ŸŒŸ Overview

Label Anything is a novel method for multi-class few-shot semantic segmentation using visual prompts. This repository contains the official implementation of our ECAI 2025 paper, enabling precise segmentation with just a few prompted examples.

Label Anything Demo

Visual prompting meets few-shot learning with a new fast and efficient architecture.

๐Ÿš€ Quick Start

โšก One-Line Demo

Experience Label Anything instantly with our streamlined demo:

uvx --from git+https://github.com/pasqualedem/LabelAnything app

๐Ÿ’ก Pro Tip: This command uses uv for lightning-fast package management and execution.

๐Ÿ› ๏ธ Manual Installation

For development and customization:

# Clone the repository
git clone https://github.com/pasqualedem/LabelAnything.git
cd LabelAnything

# Create virtual environment with uv
uv sync
source .venv/bin/activate

โš ๏ธ System Requirements: Linux environment with CUDA 12.1 support

๐Ÿ“ฆ Pre-trained Models

Access our collection of state-of-the-art checkpoints:

๐Ÿง  Encoder ๐Ÿ“ Embedding Size ๐Ÿ–ผ๏ธ Image Size ๐Ÿ“ Fold ๐Ÿ”— Checkpoint
SAM 512 1024 - HF
ViT-MAE 256 480 - HF
ViT-MAE 256 480 0 HF
ViT-MAE 256 480 1 HF
ViT-MAE 256 480 2 HF
ViT-MAE 256 480 3 HF

๐Ÿ”Œ Model Loading

from label_anything.models import LabelAnything

# Load pre-trained model
model = LabelAnything.from_pretrained("pasqualedem/label_anything_sam_1024_coco")

๐ŸŽฏ Training Pipeline

๐Ÿ“Š Dataset Setup: COCO 2017

Prepare the COCO dataset with our automated setup:

# Navigate to data directory
cd data && mkdir coco && cd coco

# Download COCO 2017 images and 2014 annotations
wget http://images.cocodataset.org/zips/train2017.zip
wget http://images.cocodataset.org/zips/val2017.zip
wget http://images.cocodataset.org/annotations/annotations_trainval2014.zip

# Extract and organize
unzip "*.zip" && rm *.zip
mv val2017/* train2017/ && mv train2017 train_val_2017 && rm -rf val2017

๐Ÿ”ง Annotation Preprocessing

Synchronize filenames between images and annotations:

python main.py rename_coco20i_json --instances_path data/coco/annotations/instances_train2014.json
python main.py rename_coco20i_json --instances_path data/coco/annotations/instances_val2014.json

๐Ÿง  Feature Extraction

SAM Encoder Setup

# Download SAM checkpoint
cd checkpoints
wget https://dl.fbaipublicfiles.com/segment_anything/sam_vit_b_01ec64.pth

# Extract embeddings (optional but recommended for speed)
mkdir -p data/coco/vit_sam_embeddings/{last_hidden_state,last_block_state}
python main.py generate_embeddings \
  --encoder vit_b \
  --checkpoint checkpoints/sam_vit_b_01ec64.pth \
  --use_sam_checkpoint \
  --directory data/coco/train_val_2017 \
  --batch_size 16 \
  --num_workers 8 \
  --outfolder data/coco/vit_sam_embeddings/last_hidden_state \
  --last_block_dir data/coco/vit_sam_embeddings/last_block_state \
  --custom_preprocess

ViT-MAE Encoders

# Base ViT-MAE (1024px)
python main.py generate_embeddings \
  --encoder vit_b_mae \
  --directory data/coco/train_val_2017 \
  --batch_size 32 \
  --outfolder data/coco/embeddings_vit_mae_1024/ \
  --model_name facebook/vit-mae-base \
  --image_resolution 1024 \
  --huggingface

# Base ViT-MAE (480px)
python main.py generate_embeddings \
  --encoder vit_b_mae \
  --directory data/coco/train_val_2017 \
  --batch_size 64 \
  --outfolder data/coco/embeddings_vit_mae_480 \
  --model_name facebook/vit-mae-base \
  --image_resolution 480 \
  --huggingface

# Large ViT-MAE (480px)
python main.py generate_embeddings \
  --encoder vit_l_mae \
  --directory data/coco/train_val_2017 \
  --batch_size 64 \
  --outfolder data/coco/embeddings_vit_mae_l_480 \
  --model_name facebook/vit-mae-large \
  --image_resolution 480 \
  --huggingface

# DinoV3
python main.py generate_embeddings \
  --directory data/coco/train_val_2017 \
  --batch_size 64 \
  --outfolder data/coco/embeddings_dinov3_480 \
  --model_name facebook/dinov3-vitb16-pretrain-lvd1689m \
  --image_resolution 480 \
  --huggingface

๐Ÿ‹๏ธ Training & Evaluation

Single GPU Training

# Train with pre-extracted embeddings
python main.py experiment --parameters="parameters/trainval/coco20i/mae.yaml"

# Train without pre-extracted embeddings
python main.py experiment --parameters="parameters/trainval/coco20i/mae_noembs.yaml"

Multi-GPU Training

# Accelerated training for faster convergence
accelerate launch --multi_gpu main.py experiment --parameters="parameters/trainval/coco20i/mae.yaml"

๐Ÿ“ˆ Experiment Tracking: All experiments are automatically logged to Weights & Biases. Results are saved in offline/wandb/run-<date>-<run_id>/files/.

๐Ÿ—๏ธ Project Architecture

๐Ÿ“ฆ LabelAnything
    ๐ŸŒŸ Core Components
    โ”œโ”€โ”€ label_anything/          # ๐Ÿ”ง Main codebase
    โ”‚   โ”œโ”€โ”€ **main**.py          # ๐Ÿšช CLI entry point
    โ”‚   โ”œโ”€โ”€ cli.py               # ๐Ÿ’ป Command interface
    โ”‚   โ”œโ”€โ”€ data/                # ๐Ÿ“Š Dataset handling
    โ”‚   โ”œโ”€โ”€ demo/                # ๐ŸŽฎ Interactive demos
    โ”‚   โ”œโ”€โ”€ experiment/          # ๐Ÿงช Training workflows
    โ”‚   โ”œโ”€โ”€ models/              # ๐Ÿค– Neural architectures
    โ”‚   โ”œโ”€โ”€ loss/                # ๐Ÿ“‰ Loss functions
    โ”‚   โ””โ”€โ”€ utils/               # ๐Ÿ› ๏ธ Utilities
    โ””โ”€โ”€ parameters/              # โš™๏ธ Configuration files
        โ”œโ”€โ”€ trainval/            # ๐Ÿ“š Training configs
        โ”œโ”€โ”€ validation/          # ๐Ÿ“– Validation configs
        โ””โ”€โ”€ test/                # ๐Ÿงช Testing configs

    ๐Ÿ“š Resources
    โ”œโ”€โ”€ notebooks/               # ๐Ÿ““ Analysis & demos
    โ”œโ”€โ”€ assets/                  # ๐Ÿ–ผ๏ธ Media files
    โ”œโ”€โ”€ data/                    # ๐Ÿ’พ Dataset storage
    โ””โ”€โ”€ checkpoints/             # ๐Ÿ† Model weights

    ๐Ÿš€ Deployment
    โ”œโ”€โ”€ slurm/                   # โšก HPC job scripts
    โ””โ”€โ”€ app.py                   # ๐ŸŒ Web application

๐ŸŽจ Key Features

  • ๐ŸŽฏ Few-Shot Learning: Achieve remarkable results with minimal training data
  • ๐Ÿ–ผ๏ธ Visual Prompting: Intuitive interaction through visual cues
  • โšก Multi-GPU Support: Accelerated training on modern hardware
  • ๐Ÿ”„ Cross-Validation: Robust 4-fold evaluation protocol
  • ๐Ÿ“Š Rich Logging: Comprehensive experiment tracking
  • ๐Ÿค— HuggingFace Integration: Seamless model sharing and deployment

๐Ÿ“„ Citation

If you find Label Anything useful in your research, please cite our work:

@incollection{demarinisLabelAnythingMultiClass2025,
  title = {Label {Anything}: {Multi}-{Class} {Few}-{Shot} {Semantic} {Segmentation} with {Visual} {Prompts}},
  shorttitle = {Label {Anything}},
  url = {https://ebooks.iospress.nl/doi/10.3233/FAIA251289},
  language = {en},
  booktitle = {{ECAI} 2025},
  publisher = {IOS Press},
  author = {De Marinis, Pasquale and Fanelli, Nicola and Scaringi, Raffaele and Colonna, Emanuele and Fiameni, Giuseppe and Vessio, Gennaro and Castellano, Giovanna},
  year = {2025},
  doi = {10.3233/FAIA251289},
  pages = {4016--4023},
}

๐Ÿค Contributing

We welcome contributions! Feel free to:

  • ๐Ÿ› Report bugs by opening an issue
  • ๐Ÿ’ก Suggest new features or improvements
  • ๐Ÿ”ง Submit pull requests with bug fixes or enhancements
  • ๐Ÿ“š Improve documentation and examples

๐Ÿ“œ License

This project is licensed under the MIT License - see the LICENSE file for details.


Made with โค๏ธ by the CilabUniba Label Anything Team

About

Multi-Class Few-Shot Semantic Segmentation with Visual Prompts

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 6