Linyan Li
Text to realistic image generation with attentional concatenation generative adversarial networks.
Li, Linyan; Sun, Yu; Hu, Fuyuan; Zhou, Tao; Xi, Xuefeng; Ren, Jinchang
Authors
Yu Sun
Fuyuan Hu
Tao Zhou
Xuefeng Xi
Professor Jinchang Ren j.ren@rgu.ac.uk
Professor of Computing Science
Abstract
In this paper, we propose an Attentional Concatenation Generative Adversarial Network (ACGAN) aiming at generating 1024 × 1024 high-resolution images. First, we propose a multilevel cascade structure, for text-to-image synthesis. During training progress, we gradually add new layers and, at the same time, use the results and word vectors from the previous layer as inputs to the next layer to generate high-resolution images with photo-realistic details. Second, the deep attentional multimodal similarity model is introduced into the network, and we match word vectors with images in a common semantic space to compute a fine-grained matching loss for training the generator. In this way, we can pay attention to the fine-grained information of the word level in the semantics. Finally, the measure of diversity is added to the discriminator, which enables the generator to obtain more diverse gradient directions and improve the diversity of generated samples. The experimental results show that the inception scores of the proposed model on the CUB and Oxford-102 datasets have reached 4.48 and 4.16, improved by 2.75% and 6.42% compared to Attentional Generative Adversarial Networks (AttenGAN). The ACGAN model has a better effect on text-generated images, and the resulting image is closer to the real image.
Citation
LI, L., SUN, Y., HU, F., ZHOU, T., XI, X. and REN, J. 2020. Text to realistic image generation with attentional concatenation generative adversarial networks. Discrete dynamics in nature and society [online], 2020, article ID 6452536. Available from: https://doi.org/10.1155/2020/6452536
Journal Article Type | Article |
---|---|
Acceptance Date | Oct 6, 2020 |
Online Publication Date | Oct 28, 2020 |
Publication Date | Dec 31, 2020 |
Deposit Date | Jul 1, 2024 |
Publicly Available Date | Jul 1, 2024 |
Journal | Discrete dynamics in nature and society |
Print ISSN | 1026-0226 |
Electronic ISSN | 1607-887X |
Publisher | Hindawi |
Peer Reviewed | Peer Reviewed |
Volume | 2020 |
Article Number | 6452536 |
DOI | https://doi.org/10.1155/2020/6452536 |
Keywords | Generative adversarial networks; Image generation; Image processing; Text-to-image synthesis; Machine learning; Semantic computing |
Public URL | https://rgu-repository.worktribe.com/output/2058754 |
Files
LI 2020 Text to realistic image generation
(6.5 Mb)
PDF
Publisher Licence URL
https://creativecommons.org/licenses/by/4.0/
You might also like
PWDformer: deformable transformer for long-term series forecasting.
(2023)
Journal Article
Hyperspectral imaging based corrosion detection in nuclear packages.
(2023)
Journal Article
Downloadable Citations
About OpenAIR@RGU
Administrator e-mail: publications@rgu.ac.uk
This application uses the following open-source libraries:
SheetJS Community Edition
Apache License Version 2.0 (http://www.apache.org/licenses/)
PDF.js
Apache License Version 2.0 (http://www.apache.org/licenses/)
Font Awesome
SIL OFL 1.1 (http://scripts.sil.org/OFL)
MIT License (http://opensource.org/licenses/mit-license.html)
CC BY 3.0 ( http://creativecommons.org/licenses/by/3.0/)
Powered by Worktribe © 2024
Advanced Search