摘要
针对现有文本生成图像模型存在训练效率低下、生成图像分辨率较低以及图像不真实等问题,本文提出一种基于条件增强和注意力机制的深度融合生成对抗网络模型。该模型由文本处理网络和生成对抗网络两部分组成。文本处理网络采用双向长短期记忆网络对文本进行编码,通过条件增强模块增加文本词语对应的特征数据,丰富文本语义特征。在生成对抗网络中,文本特征与视觉特征进行融合,通过使用注意力机制从通道和空间两个维度对输出特征进行调整,使生成网络关注文本描述的重要特征并抑制不必要特征,最终得到生成图像。通过判别器对生成图像与真实图像进行判别,并设计对抗损失函数对网络模型进行优化。在MSCOCO和CUB birds 200两个数据集上进行训练与测试,实验结果表明,与其他模型相比该模型具有明显优势。
Aiming at problems of inefficient training,low resolution of generated images and unrealistic images of existing text-to-image models,this paper proposes a deep-fusion generative adversarial network model based on conditioning augmentation and attention mechanism.The model consists of text processing network and generative adversarial network.The text processing module uses bidirectional long short-term memory to encode text,and the conditioning augmentation module is employed to increase the number of characteristics corresponding to text and enrich the semantic features of the text.In generative adversarial network text features are fused with visual features,output features are adjusted on two dimensions of channel and space by using attention mechanism,so that this network focuses on important features and suppress unnecessary features about text,and synthesis images are obtained.Then discriminator differentiates generated and real images,and adversarial loss is designed to optimize this network model.It is trained and tested on two datasets,MSCOCO and CUB birds 200.Experimental results show that this model has obvious advantages compared with other models.
作者
张佳
张丽红
ZHANG Jia;ZHANG Lihong(College of Physical and Electronic Engineering,Shanxi University,Taiyuan 030006,China)
出处
《测试技术学报》
2023年第2期112-119,共8页
Journal of Test and Measurement Technology
基金
山西省研究生创新资助项目(2021Y154)
山西省高等学校教学改革创新资助项目(J2021086)。
关键词
文本生成图像
深度融合
条件增强
注意力机制
多模态特征融合
text to image
deep fusion
conditioning augmentation
attention mechanism
multi-modal feature fusion