期刊文献+

基于Wikipedia的短文本语义相关度计算方法 被引量:15

SHORT TEXTS SEMANTIC RELEVANCE COMPUTATION METHOD BASED ON WIKIPEDIA
下载PDF
导出
摘要 语义相关度计算是自然语言处理领域的研究热点。现有的以文本相似度计算代替文本相关度计算的方法存在不足之处。提出从语形相似性和组元相关性两个方面来综合度量短文本之间的语义相关性,并提出2个以Wikipedia作为外部知识库的短文本相关度计算算法:最大词语关联法和动态组块法。在一个网络短文本测试集上对算法进行测评。实验结果表明,该算法与典型相似度计算算法比较,在正确率方面提高了20%以上。 Semantic relevance computation is the research focus in natural language processing field. Existing approach has the deficiency, which replaces the texts relevance computation with texts similarity computation. In this paper, we present to measure the semantic relevance between short texts comprehensively from two aspects of morphological similarity and group elements relevance, and present two computation algorithms for short texts relevance using Wikipedia as the external knowledge base: the maximum words correlation (MWC) algorithm and the dynamic chunking (DC) algorithm. The algorithm has been texted and assessed on a network short texts test set. Experimental results show that compared with typical similarity computation algorithm, this algorithm improves the accuracy rate up to 20% and higher.
出处 《计算机应用与软件》 CSCD 2015年第1期82-85,92,共5页 Computer Applications and Software
基金 教育部人文社会科学研究青年基金项目(12YJCZH201) 杭州市科技发展计划重大科技创新专项(20122511A18)
关键词 短文本 WIKIPEDIA 相关度计算 自然语言处理 Short texts Wikipedia Relevance computation Natural language processing
  • 相关文献

参考文献20

二级参考文献137

  • 1朱靖波,王宝库,姚天顺.一种规则描述语言NPRDL语言[J].东北大学学报(自然科学版),1996,17(6):651-655. 被引量:1
  • 2Salton G, McGill M J. Introduction to modem information retrieval. New York: McGraw-Hill, 1983
  • 3Baeza Yates R, Ribeiro-Neto B. Modern information retrieval. New York: ACM Press and Addison Wesley, 1999
  • 4van Rijsbergen C J . Information retrieval. London : Butterw - orths, 1979
  • 5Becker J, Kuropka D. Topic-based vector space model//Proceedings of Sixth International Conference on Business Information System. Colorado Springs, 2003 : 7-12
  • 6Wan Xiao-jun, Peng Yu xin. A new retrieval model based on TextTiling for document similarity search. Journal of Computer Science and Technology, 2005,20(4) : 552-558
  • 7Hearst M A. Multi paragraph segmentation of expository text// Proceedings of 32nd Meeting of the Association for Computa tional Linguistics. Los Cruces, 1994 : 9-16
  • 8Lovasz L, Plummer M D. Matching Theory. Amsterdam: Elsevier Science Publishers B V, 1986
  • 9Blei D M,Ng A Y,Jordan M I. Latent dirichlet allocation. Journal of Machine Learning Research, 2003,3 : 993-1022
  • 10Griffiths T L, Steyvers M. Finding Scientific Topics//Proceedings of the National Academy of Sciences. 2004:5228-5235

共引文献283

同被引文献102

引证文献15

二级引证文献150

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部