DDMI: Domain-Agnostic Latent Diffusion Models for Synthesizing High-Quality Implicit Neural Representations
Dogyun Park, Sihyeon Kim, Sojin Lee, Hyunwoo J. Kim†.
This repository is an official implementation of the ICLR 2024 paper DDMI (Domain-Agnostic Latent Diffusion Models for Synthesizing High-Quality Implicit Neural Representations).
We propose a latent diffusion model that generates hierarchically decomposed positional embeddings of Implicit neural representations, enabling high-quality generation on various data domains.
To install requirements, run:
git clone https://github.com/mlvlab/DDMI.git
cd DDMI
conda create -n ddmi python==3.8
conda activate ddmi
conda install pytorch==2.2.0 torchvision==0.17.0 torchaudio==2.2.0 pytorch-cuda=11.8 -c pytorch -c nvidiapip install accelerate omegaconf einops pyspng natsort av ema-pytorch timm ninja gdown scipy(RECOMMENDED, linux) Install PyTorch 2.2.0 with CUDA 11.8 for xformers, recommended for memory-efficient computation. Also, install pytorch compatible torch-scatter version for 3D.
We have utilized two datasets for 2D image experiments: AFHQ-V2 and CelebA-HQ. We have used dog and cat categories in AFHQ-V2 dataset. You may change the location of the dataset by changing data_dir of config files in configs/, and specify test_data_dir to measure r-FID during training. Each dataset should be structured as below:
Data
|-- folder
|-- image1.png
|-- image2.png
|-- ...
We have used dataloader from PVDM and SkyTimelapse dataset. You may change the location of the dataset by changing data_dir of config files in configs/, and specify test_data_dir to measure r-FVD during training. Dataset should be structured as below:
Data
|-- train
|-- video1
|-- frame00000.png
|-- frame00001.png
|-- ...
|-- video2
|-- frame00000.png
|-- frame00001.png
|-- ...
|-- ...
|-- val
|-- video1
|-- frame00000.png
|-- frame00001.png
|-- ...
|-- ...
We have used ShapeNet dataset v1 and dataloader following Occupancy Networks. You may change the location of the dataset by changing data_dir of config files in configs/.
We have used srn-cars dataset following pixel-NeRF or you may download the dataset from here. You may change the location of the dataset by changing data_dir of config files in configs/. Dataset should be structured as below:
Data
|-- cars
|-- sampled
|-- car00000.npz
|-- car00001.npz
|-- ...
To train other signal domains, you may change the domain of config files in configs/, e.g., image, occupancy, nerf, or video. Currently, different network is trained for different signal domain. By default, the model's checkpoint will be stored in ./results. If training D2C-VAE in the first stage is unstable, i.e., NAN value, try increasing sn_reg_weight_decay or sn_reg_weight_decay_init of config files to increase the weight of spectral regularization. To resume the training from previous checkpoint enable resume to True.
D2C-VAE aims to learn the latent space that generates PEs between discrete data and continuous function, i.e., point clouds to occupancy function, pixel image to continuous RGB image.
CUDA_VISIBLE_DEVICES=0,1,2,3 accelerate launch --multi_gpu --num_processes=4 main.py --exp d2c-vae --configs configs/d2c-vae/img.yamlAfter training D2C-VAE, we learn the latent diffusion model on the latent space of D2C-VAE. Since latent variable is represented as a set of 2D planes, we use 2D convolution UNet model for LDM across different modalities.
CUDA_VISIBLE_DEVICES=0,1,2,3 accelerate launch --multi_gpu --num_processes=4 main.py --exp ldm --configs configs/ldm/img.yamlIn our paper, we have utilized several evaluation metrics for assessing generation quality: FID for image, MMD and COV for 3D shape, and FVD for video evaluation.
You can change the total number of sampling steps (NFE) by changing the sampling_timesteps in the config file.
To evaluate FID of the trained 2D image model, run the following script by changing the mode of config files to eval from train:
python main.py --exp ldm --configs configs/ldm/img.yamlTo evaluate FVD of the trained video model, run the following script by changing the mode of config files to eval from train:
python main.py --exp ldm --configs configs/ldm/video.yamlYou first need to generate an occupancy function and process it to make point clouds.
First, run the following script by changing the mode of config files to eval from train. The generated 3D shapes will be saved in the eval folder, located in the directory specified in config save_pth.
python main.py --exp ldm --configs configs/ldm/occupancy.yamlThen, run the following script to sample 2048 point clouds from the mesh.
python eval_3d/meshtopc.py --pth [location of mesh files] --save_pth [save location of point clouds]Finally, run the following script to measure MMD and COV between ground truth point clouds and generated point clouds.
python eval_3d/compute_metrics_3d.py --gt_pth [location of ground truth point clouds] --save_pth [location of generated point clouds]You can generate a signal from the pre-trained model in ./results by changing the mode of config files to gen from train, then run:
python main.py --exp ldm --configs configs/ldm/img.yamlFor arbitrary-resolution 2D image generation with consistent content, you only have to change test_resolution of config files with a fixed seed.
Checkpoints for the pre-trained models can be downloaded from here. Download the checkpoint in ./results folder and change the pretrained of config file to True for evaluation.
This repo is built upon ADM, latent-diffusion, and PVDM.
@inproceedings{park2024ddmi,
title={DDMI: Domain-Agnostic Latent Diffusion Models for Synthesizing High-Quality Implicit Neural Representations},
author={Park, Dogyun and Kim, Sihyeon and Lee, Sojin and Kim, Hyunwoo J},
booktitle={The Twelfth International Conference on Learning Representations},
year={2024}
}
