Cross-Modal Entity Resolution for Image and Text Integrating Global and Fine-Grained Joint Attention Mechanism

导出

摘要 In order to solve the problem that the existing cross-modal entity resolution methods easily ignore the high-level semantic informational correlations between cross-modal data,we propose a novel cross-modal entity resolution for image and text integrating global and fine-grained joint attention mechanism method.First,we map the cross-modal data to a common embedding space utilizing a feature extraction network.Then,we integrate global joint attention mechanism and fine-grained joint attention mechanism,making the model have the ability to learn the global semantic characteristics and the local fine-grained semantic characteristics of the cross-modal data,which is used to fully exploit the cross-modal semantic correlation and boost the performance of cross-modal entity resolution.Moreover,experiments on Flickr-30K and MS-COCO datasets show that the overall performance of R@sum outperforms by 4.30%and 4.54%compared with 5 state-of-the-art methods,respectively,which can fully demonstrate the superiority of our proposed method.

作者曾志贤曹建军翁年凤袁震余旭 ZENG Zhician;CAO Jianjun;WENG Nianfeng;YUAN Zhen;YU Xu(The Sixty-third Research Institute,National University of Defense Technology,Nanjing 210007,China)

机构地区 The Sixty-third Research Institute

出处《Journal of Shanghai Jiaotong university(Science)》 EI 2023年第6期728-737,共10页 上海交通大学学报（英文版）

基金 the Special Research Fund for the China Postdoctoral Science Foundation(No.2015M582832) the Major National Science and Technology Program(No.2015ZX01040201) the National Natural Science Foundation of China(No.61371196)。

关键词 cross-modal entity resolution joint attention mechanism deep learning feature extraction semantic correlation

分类号 TP311 [自动化与计算机技术—计算机软件与理论]

引文网络
相关文献

1郭永红,牛海涛,史超,郭铖.基于卷积和注意力机制的小样本目标检测[J].兵工学报,2023,44(11):3508-3515.
2WANG Jianbin,SHI Shuyuan,WANG Xuna,YU Jiahui.Image analysis considering textual correlations enables accurate user switching tendency prediction[J].Optoelectronics Letters,2023,19(8):498-505.
3Mingkang WANG,Min MENG,Jigang LIU,Jigang WU.Adequate alignment and interaction for cross-modal retrieval[J].Virtual Reality & Intelligent Hardware,2023,5(6):509-522.
4梁彦鹏,刘雪儿,马忠贵,李卓.嵌入共识知识的因果图文检索方法[J].工程科学学报,2024,46(2):317-328. 被引量：1
5Naixia Mou,Qi Jiang,Lingxian Zhang,Jiqiang Niu,Yunhao Zheng,Yanci Wang,Tengfei Yang.Personalized tourist route recommendation model with a trajectory understanding via neural networks[J].International Journal of Digital Earth,2022,15(1):1738-1759. 被引量：1
6李子东,王微微,尤枫,杨羊,赵瑞莲.Web应用漏洞报告理解及漏洞复现[J].计算机系统应用,2023,32(11):62-72.
7宋时东,沈路科,刘向华,曾渝.多层螺旋CT在主动脉夹层动脉瘤诊断中的应用[J].智慧健康,2023,9(21):6-9.
8Global Civilisation Initiative Outlines the Way to Transcend Estrangement,Clashes and Superiority[J].International Understanding,2023(2):16-17.
9Marcin Chwała,Danko J.Jerez,Hector A.Jensen,Michael Beer.Performance assessment of borehole arrangements for the design of rectangular shallow foundation systems[J].Journal of Rock Mechanics and Geotechnical Engineering,2023,15(12):3291-3304.
10Yuhuan HUANG,Chengjun SUN,Lina Lü,Neal Xiangyu DING,Liangmin YU,Guipeng YANG,Haibing DING.Characteristics of vertical distributions of methane and dimethylsulphoniopropionate in the southern Yap Trench[J].Journal of Oceanology and Limnology,2023,41(6):2101-2116.

Journal of Shanghai Jiaotong university(Science)

2023年第6期

浏览历史

内容加载中请稍等...

Cross-Modal Entity Resolution for Image and Text Integrating Global and Fine-Grained Joint Attention Mechanism

相关作者

相关机构

相关主题

浏览历史