摘要
图像字幕生成在计算机视觉领域是一个比较热门的研究方向,在图像字幕生成任务中常用编码(Encode)-解码(Decode)结构生成图像字幕。针对图像细节特征提取效果不理想及图像字幕生成质量欠佳的问题,提出将加速的KAZE(Accelerated-KAZE,AKAZE)算法引入卷积神经网络,提高模型的特征提取能力,同时结合注意力机制,将模型注意力集中到图像中的关键位置,增强了模型的特征提取能力,并且提高了生成的图像字幕质量。对提出的模型在现有的公开数据集上,进行了训练和测试,使用BLUE-4和显示排序翻译度量评价(Metric for Evaluation of Translation with ExplicitOrdering,METEOR)标准,对模型生成的描述语句进行评分,并且进行了对比实验。实验结果表明,与现有的方法相比,该方法表现出更好的图像字幕生成效果、更高的评价分数以及更好的鲁棒性。
Image caption generation is a popular research direction in the field of computer vision.In the task of image caption generation,the structure of Encode and Decode is commonly used to generate image subtitles.Aiming at the problem that the effect of image detail feature extraction is not ideal and the quality of image caption generation is poor,the Accelerated-KAZE(AKAZE)feature extraction method is introduced into convolutional neural network to enhance the feature extraction ability of the model.At the same time,combined with the attention mechanism,By focusing the model’s attention on the key position in the image,the feature extraction ability of the model is enhanced and the quality of the generated image subtitle is improved.The proposed model in the existing public data sets,the training and testing,use the BLUE-4 and Metric for Evaluation of Translation with Explicit Ordering(METEOR)evaluation standard,description of the model to generate statements to rate,and has carried on the contrast experiment.The experimental results show that compared with the existing methods,this method has better image caption generation effect,higher evaluation score and better robustness.
作者
丁聪
许冲
DING Cong;XU Chong(School of Infonnation Engineering,Xizang Minzu University,Xianyang Shaanxi 712082,China)
出处
《信息与电脑》
2022年第22期62-66,73,共6页
Information & Computer
基金
西藏自治区自然科学基金项目“基于卷积神经网络的火灾检测方法研究”(项目编号:XZ202001ZR0048G)
西藏民族大学大学生创新创业训练计划项目“基于深度学习的图像自动标注方法研究”(项目编号:S202210695079)。