An interesting framework for text-to-image synthesis that connects context information.

RATLIP: Generative Adversarial CLIP Text-to-Image Synthesis Based on Recurrent Affine Transformations

Chengde Lin · Xijun Lu · Guangxi Chen

Paper:

🎉🎉🎉 This paper has been accepted by 2024 IEEE International Conference on Systems, Man, and Cybernetics (SMC)！！

An interesting framework for text-to-image synthesis that connects context information.

Requirements

At least 1x24GB 3090 GPU (for training), only CPU (for sampling)

Environment

conda create -n RATLIP python=3.9
conda activate RATLIP

Clone this repo

git clone https://github.com/OxygenLu/RATLIP.git

Install the requirements

cd RATLIP
pip install -r requirements.txt

Install CLIP

cd ../
git clone https://github.com/openai/CLIP.git
python ./CLIP/setup.py install

Usage

Train

cd RALIP/code
bash scripts/train.sh ./cfg/bird.yml

Test

bash scripts/test.sh ./cfg/bird.yml

Resume

You can change state_epoch and the corresponding weight to continue training at breakpoints

TensorBoard

The results are stored in TensorBoard files under ./logs

tensorboard --logdir your_path --port 8166

Sampling

The sample.ipynb can be used to sample

Result

Visualization

Experiments

Compare RATLIP and state-of-the-art models on FID values (the smaller, the better).

Model	CUB	CelebA-tiny
AttnGAN	23.98	125.98
LAFITE	14.58	-
DF-GAN	14.81	137.60
GALIP	10.00	94.45
Ours	13.28	81.48

Compare RATLIP and state-of-the-art models on CLIP score values (the bigger, the better).

Model	CUB	Oxford	CelebA-tiny
AttnGAN	-	21.15	-
LAFITE	31.25	-	-
DF-GAN	29.20	26.67	24.41
GALIP	31.60	31.77	27.95
Ours	32.03	31.94	28.91

Citation

@INPROCEEDINGS{11169738,
  author={Lin, Chengde and Lu, Xijun and Chen, Guangxi},
  booktitle={2024 IEEE International Conference on Systems, Man, and Cybernetics (SMC)}, 
  title={RATLIP: Generative Adversarial CLIP Text-to-Image Synthesis Based on Recurrent Affine Transformations}, 
  year={2024},
  volume={},
  number={},
  pages={2346-2352},
  abstract={Synthesizing high-quality photorealistic images with textual descriptions as a condition is very challenging. Generative Adversarial Networks (GANs), a classical model for this task, frequently suffer from low consistency between image and text descriptions and insufficient richness in synthesized images. Recently, conditional affine transformations (CAT), such as conditional batch normalization and instance normalization, have been applied to different layers of GAN to control content synthesis in images. CAT is a multi-layer perceptron that independently predicts data based on batch statistics between neighboring layers, with global textual information unavailable to other layers. To address this issue, we first model CAT and a recurrent neural network (RAT) to ensure that different layers can access global information. We then introduce shuffle attention between RAT to mitigate the characteristic of information forgetting in recurrent neural networks. Moreover, both our generator and discriminator utilize the powerful pre-trained model, CLIP, which has been extensively employed for establishing associations between text and images through the learning of multi-modal representations in latent space. The discriminator utilizes CLIP's ability to comprehend complex scenes to accurately assess the quality of the generated images. Extensive experiments have been conducted on the CUB, Oxford, and CelebA-tiny datasets to demonstrate the superior performance of the proposed model over current state-of-the-art models. The code is available at https://github.com/OxygenLu/RATLIP.},
  keywords={Recurrent neural networks;Codes;Text to image;Generative adversarial networks;Generators;Cybernetics;Batch normalization;Photorealistic images},
  doi={10.1109/SMC54092.2024.11169738},
  ISSN={},
  month={Oct},}

Acknowledgement

This code is adapted from GALIP and RAT-GAN.
We thank Ming Tao, Bing-Kun Bao and Senmao Ye for their elegant and efficient code base.

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
code		code
data		data
.gitignore		.gitignore
LICENSE		LICENSE
rat.png		rat.png
readme.md		readme.md
requirements.txt		requirements.txt
result.png		result.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

RATLIP: Generative Adversarial CLIP Text-to-Image Synthesis Based on Recurrent Affine Transformations

An interesting framework for text-to-image synthesis that connects context information.

Requirements

Usage

Train

Test

Resume

TensorBoard

Sampling

Result

Visualization

Experiments

Citation

Acknowledgement

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

OxygenLu/RATLIP

Folders and files

Latest commit

History

Repository files navigation

RATLIP: Generative Adversarial CLIP Text-to-Image Synthesis Based on Recurrent Affine Transformations

An interesting framework for text-to-image synthesis that connects context information.

Requirements

Usage

Train

Test

Resume

TensorBoard

Sampling

Result

Visualization

Experiments

Citation

Acknowledgement

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages