Skip to content

RATLIP: Generative Adversarial CLIP Text-to-Image Synthesis Based on Recurrent Affine Transformations

License

Notifications You must be signed in to change notification settings

OxygenLu/RATLIP

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

16 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

RATLIP: Generative Adversarial CLIP Text-to-Image Synthesis Based on Recurrent Affine Transformations

Chengde Lin · Xijun Lu · Guangxi Chen

Paper: arXiv

🎉🎉🎉 This paper has been accepted by 2024 IEEE International Conference on Systems, Man, and Cybernetics (SMC)!!

An interesting framework for text-to-image synthesis that connects context information.

Requirements

At least 1x24GB 3090 GPU (for training), only CPU (for sampling)

  1. Environment
conda create -n RATLIP python=3.9
conda activate RATLIP
  1. Clone this repo
git clone https://github.com/OxygenLu/RATLIP.git
  1. Install the requirements
cd RATLIP
pip install -r requirements.txt
  1. Install CLIP
cd ../
git clone https://github.com/openai/CLIP.git
python ./CLIP/setup.py install

Usage

Train

cd RALIP/code
bash scripts/train.sh ./cfg/bird.yml

Test

bash scripts/test.sh ./cfg/bird.yml

Resume

You can change state_epoch and the corresponding weight to continue training at breakpoints

TensorBoard

The results are stored in TensorBoard files under ./logs

tensorboard --logdir your_path --port 8166

Sampling

The sample.ipynb can be used to sample

Result

Visualization

Experiments

Compare RATLIP and state-of-the-art models on FID values (the smaller, the better).

Model CUB CelebA-tiny
AttnGAN 23.98 125.98
LAFITE 14.58 -
DF-GAN 14.81 137.60
GALIP 10.00 94.45
Ours 13.28 81.48

Compare RATLIP and state-of-the-art models on CLIP score values (the bigger, the better).

Model CUB Oxford CelebA-tiny
AttnGAN - 21.15 -
LAFITE 31.25 - -
DF-GAN 29.20 26.67 24.41
GALIP 31.60 31.77 27.95
Ours 32.03 31.94 28.91

Citation

@INPROCEEDINGS{11169738,
  author={Lin, Chengde and Lu, Xijun and Chen, Guangxi},
  booktitle={2024 IEEE International Conference on Systems, Man, and Cybernetics (SMC)}, 
  title={RATLIP: Generative Adversarial CLIP Text-to-Image Synthesis Based on Recurrent Affine Transformations}, 
  year={2024},
  volume={},
  number={},
  pages={2346-2352},
  abstract={Synthesizing high-quality photorealistic images with textual descriptions as a condition is very challenging. Generative Adversarial Networks (GANs), a classical model for this task, frequently suffer from low consistency between image and text descriptions and insufficient richness in synthesized images. Recently, conditional affine transformations (CAT), such as conditional batch normalization and instance normalization, have been applied to different layers of GAN to control content synthesis in images. CAT is a multi-layer perceptron that independently predicts data based on batch statistics between neighboring layers, with global textual information unavailable to other layers. To address this issue, we first model CAT and a recurrent neural network (RAT) to ensure that different layers can access global information. We then introduce shuffle attention between RAT to mitigate the characteristic of information forgetting in recurrent neural networks. Moreover, both our generator and discriminator utilize the powerful pre-trained model, CLIP, which has been extensively employed for establishing associations between text and images through the learning of multi-modal representations in latent space. The discriminator utilizes CLIP's ability to comprehend complex scenes to accurately assess the quality of the generated images. Extensive experiments have been conducted on the CUB, Oxford, and CelebA-tiny datasets to demonstrate the superior performance of the proposed model over current state-of-the-art models. The code is available at https://github.com/OxygenLu/RATLIP.},
  keywords={Recurrent neural networks;Codes;Text to image;Generative adversarial networks;Generators;Cybernetics;Batch normalization;Photorealistic images},
  doi={10.1109/SMC54092.2024.11169738},
  ISSN={},
  month={Oct},}

Acknowledgement

  • This code is adapted from GALIP and RAT-GAN.
  • We thank Ming Tao, Bing-Kun Bao and Senmao Ye for their elegant and efficient code base.

About

RATLIP: Generative Adversarial CLIP Text-to-Image Synthesis Based on Recurrent Affine Transformations

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published