期刊文献+

Cross-Modal Entity Resolution for Image and Text Integrating Global and Fine-Grained Joint Attention Mechanism

原文传递
导出
摘要 In order to solve the problem that the existing cross-modal entity resolution methods easily ignore the high-level semantic informational correlations between cross-modal data,we propose a novel cross-modal entity resolution for image and text integrating global and fine-grained joint attention mechanism method.First,we map the cross-modal data to a common embedding space utilizing a feature extraction network.Then,we integrate global joint attention mechanism and fine-grained joint attention mechanism,making the model have the ability to learn the global semantic characteristics and the local fine-grained semantic characteristics of the cross-modal data,which is used to fully exploit the cross-modal semantic correlation and boost the performance of cross-modal entity resolution.Moreover,experiments on Flickr-30K and MS-COCO datasets show that the overall performance of R@sum outperforms by 4.30%and 4.54%compared with 5 state-of-the-art methods,respectively,which can fully demonstrate the superiority of our proposed method.
作者 曾志贤 曹建军 翁年凤 袁震 余旭 ZENG Zhician;CAO Jianjun;WENG Nianfeng;YUAN Zhen;YU Xu(The Sixty-third Research Institute,National University of Defense Technology,Nanjing 210007,China)
出处 《Journal of Shanghai Jiaotong university(Science)》 EI 2023年第6期728-737,共10页 上海交通大学学报(英文版)
基金 the Special Research Fund for the China Postdoctoral Science Foundation(No.2015M582832) the Major National Science and Technology Program(No.2015ZX01040201) the National Natural Science Foundation of China(No.61371196)。
  • 相关文献

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部