摘要
语义相关度计算是自然语言处理领域的研究热点。现有的以文本相似度计算代替文本相关度计算的方法存在不足之处。提出从语形相似性和组元相关性两个方面来综合度量短文本之间的语义相关性,并提出2个以Wikipedia作为外部知识库的短文本相关度计算算法:最大词语关联法和动态组块法。在一个网络短文本测试集上对算法进行测评。实验结果表明,该算法与典型相似度计算算法比较,在正确率方面提高了20%以上。
Semantic relevance computation is the research focus in natural language processing field. Existing approach has the deficiency, which replaces the texts relevance computation with texts similarity computation. In this paper, we present to measure the semantic relevance between short texts comprehensively from two aspects of morphological similarity and group elements relevance, and present two computation algorithms for short texts relevance using Wikipedia as the external knowledge base: the maximum words correlation (MWC) algorithm and the dynamic chunking (DC) algorithm. The algorithm has been texted and assessed on a network short texts test set. Experimental results show that compared with typical similarity computation algorithm, this algorithm improves the accuracy rate up to 20% and higher.
出处
《计算机应用与软件》
CSCD
2015年第1期82-85,92,共5页
Computer Applications and Software
基金
教育部人文社会科学研究青年基金项目(12YJCZH201)
杭州市科技发展计划重大科技创新专项(20122511A18)