期刊文献+
共找到1篇文章
< 1 >
每页显示 20 50 100
KnowER:Knowledge enhancement for efficient text-video retrieval
1
作者 Hongwei Kou Yingyun Yang Yan Hua 《Intelligent and Converged Networks》 EI 2023年第2期93-105,共13页
The widespread adoption of mobile Internet and the Internet of things(IoT)has led to a significant increase in the amount of video data.While video data are increasingly important,language and text remain the primary ... The widespread adoption of mobile Internet and the Internet of things(IoT)has led to a significant increase in the amount of video data.While video data are increasingly important,language and text remain the primary methods of interaction in everyday communication,text-based cross-modal retrieval has become a crucial demand in many applications.Most previous text-video retrieval works utilize implicit knowledge of pre-trained models such as contrastive language-image pre-training(CLIP)to boost retrieval performance.However,implicit knowledge only records the co-occurrence relationship existing in the data,and it cannot assist the model to understand specific words or scenes.Another type of out-of-domain knowledge—explicit knowledge—which is usually in the form of a knowledge graph,can play an auxiliary role in understanding the content of different modalities.Therefore,we study the application of external knowledge base in text-video retrieval model for the first time,and propose KnowER,a model based on knowledge enhancement for efficient text-video retrieval.The knowledge-enhanced model achieves state-of-the-art performance on three widely used text-video retrieval datasets,i.e.,MSRVTT,DiDeMo,and MSVD. 展开更多
关键词 text-video retrieval knowledge graph contrastive language-image pre-training(clip)
原文传递
上一页 1 下一页 到第
使用帮助 返回顶部