摘要
近年来,多模态学习逐步成为机器学习、数据挖掘领域的研究热点之一,并成功地应用于诸多现实场景中,如跨媒介搜索、多语言处理、辅助信息点击率预估等.传统多模态学习方法通常利用模态间的一致性或互补性设计相应的损失函数或正则化项进行联合训练,进而提升单模态及集成的性能.而在开放环境下,受数据缺失及噪声等因素的影响,多模态数据呈现不均衡性.具体表现为单模态信息不充分或缺失,从而导致"模态表示强弱不一致""模态对齐关联不一致"两大挑战,而针对不均衡多模态数据直接利用传统的多模态方法甚至会退化单模态和集成的性能.针对这类问题,可靠多模态学习被提出并进行了广泛研究,系统地总结和分析了目前国内外学者针对可靠多模态学习取得的进展,并对未来研究可能面临的挑战进行展望.
Recently, multi-modal learning is one of the important research fields of machine learning and data mining, and it has a wide range of practical applications, such as cross-media search, multi-language processing, auxiliary information click-through rate estimation, etc. Traditional multi-modal learning methods usually use the consistency or complementarity among modalities to design corresponding loss functions or regularization terms for joint training, thereby improving the single-modal and ensemble performance. However, in the open environment, affected by factors such as data missing and noise, multi-modal data is imbalanced, specifically manifested as insufficient or incomplete, resulting in "inconsistency modal feature representations" and "inconsistent modal alignment relationships". Direct use of traditional multi-modal methods will even degrade single-modal and ensemble performance. To solve these problems, reliable multi-modal learning has been proposed and studied. This paper systematically summarizes and analyzes the progress made by domestic and international scholars on reliable multi-modal research, and the challenges that future research may face.
作者
杨杨
詹德川
姜远
熊辉
YANG Yang;ZHAN De-Chuan;JIANG Yuan;XIONG Hui(School of Computer Science and Engineering,Nanjing University of Science and Technology,Nanjing 210094,China;State Key Laboratory for Novel Software Technology(Nanjing University),Nanjing 210023,China;Rutgers Business School,Newark,NJ 07012,USA)
出处
《软件学报》
EI
CSCD
北大核心
2021年第4期1067-1081,共15页
Journal of Software
基金
国家自然科学基金(61673201,62006118,61773198,61632004)
江苏省自然科学基金(BK20200460)
CCF-百度松果基金(CCF-BAIDU OF2020011)
百度TIC项目基金。
关键词
不均衡多模态数据
模态表示强弱不一致
模态对齐关联不一致
可靠多模态学习
imbalanced multi-modal data
inconsistent modal feature representations
inconsistent modal alignment relationships
reliable multi-modal learning