摘要
相关反馈是一种根据用户或系统的相关性判断重构初始检索提问的方法,已被证明可以有效地改进检索效果。具体到学术文献,其引用关系表征了文献内容上的相关性,因而可以为相关反馈提供有价值的辅助信息。本文提出了一种基于引用上下文、文献同被引和文献耦合的相关反馈改进算法。该算法的基本思想包括:利用学术文献的引用上下文信息扩充词包模型(bags of words)进行文本表示;在相关文献判断阶段利用相关文献在引文网络中与其他文献的同被引强度和耦合强度扩充相关文献集合;结合基于聚类的相关反馈思想抽取查询扩展项。实验证明该算法提高了相关反馈效果。此外,相关分析的结果表明文献同被引以及文献耦合强度与文献内容相似度具有显著的相关性。
Relevance feedback is a method for refactoring retrieval query according to the relevance judgment by system or user. It is proved to improve retrieval result effectively. And for the information retrieval on academic literature, the reference relationship characterizes the correlation on content, so the reference relationship can provide supplementary information in relevance feedback. In this paper, a novel relevance feedback algorithm based on citation context, co- citation and bibliographic coupling is proposed. A citation context is the text surrounding the reference markers used to refer to other scientific works. The citation context can provide additive terms to represent the academic literature, this algorithm use citation context to expand the "bags of words" model. In the stage of relevance judgment, we use the relation of co- citation and bibliographic coupling in citation network to expand the set of relevance document. Finally, the algorithm uses the clustering method to extract terms to expand query in relevance document. Experimental results show that the retrieval quality is improved. In addition, we investigate the correlation of co-citation,bibliographic coupling and literature content by correlation analysis in statistics.
出处
《情报学报》
CSSCI
北大核心
2012年第10期1052-1061,共10页
Journal of the China Society for Scientific and Technical Information
基金
国家社科基金项目“中文学术信息检索系统相关性集成研究”(项目批准号10CTQ027)、教育部人文社会科学研究规划基金项目“面向用户的相关性标准及其应用研究”(项目批准号07JA870006)及中国科学技术信息研究所合作研究项目的资助
关键词
相关反馈
引用上下文
同被引
文献耦合
聚类
relevance feedback, citation context, co-citation, bibliographic coupling, clustering