期刊文献+

基于公共词块及N-gram模型的问句相似度算法 被引量:7

Question Similarity Algorithm Based on Common Chunks and N-Gram Model
下载PDF
导出
摘要 问句相似度算法是问答系统的核心问题,直接影响着问答系统的准确性。针对公共词块算法(CCS)对于中文文本的不适用性,提出一种改进的问句相似度算法(CNS)。该方法结合N-gram模型及公共词块来计算问句向量的相似度,其主要思路是把问句分解成一元模型和二元模型,然后再分析问句之间的公共词块并考虑其顺序结构。实验结果表明:新算法在Top-N条数据集的平均相似度和不同相似度阈值下的准确率均优于常用的问句相似度算法。 Question similarity algorithm is the key problem of QA,which directly affects the accuracy of QA. Aiming at the non applicability of the common chunks similarity algorithm( CCS) to Chinese text,an improved question similarity algorithm( CNS) is proposed,which combines the N-gram model and the common chunks to compute the similarity of the question vectors. The main idea is to break the question into unigram model and bigram model,then to analyze the common chunks between the questions and consider their sequential structure. Experimental results show that the new algorithm is better than the commonly used question similarity algorithms in the average similarity of Top-N data sets and the accuracy of different similarity threshold.
出处 《重庆理工大学学报(自然科学)》 CAS 2017年第10期175-179,197,共6页 Journal of Chongqing University of Technology:Natural Science
基金 教育部人文社科青年项目(16YJC860010) 重庆市社会科学规划博士项目(2015BS059)
关键词 问句相似度 N-GRAM模型 一元模型 公共词块 question similarity N-gram model unigram model common chunks
  • 相关文献

参考文献3

二级参考文献27

  • 1周强.规则和统计相结合的汉语词类标注方法[J].中文信息学报,1995,9(3):1-10. 被引量:43
  • 2Yuhua H, David McI~an, Zuhair A. Bandar, et al. Sen- tence Similarity Based on Semantic Nets and Corpus Sta- tistics [ J ]. Knowledge and Data Engineering, 2006, 18 (8) :1138 - 1150.
  • 3IslamA, Inkpen D. Semantic Text Similarity Using Cor- put-based Word Similarity and String Similarity[ J]. ACM Transactions on Knowledge Discovery from Data, 2008 (2) :1o.
  • 4Quirk C, Brockett C, Dolan W B. Monolingual Machine Translation for Paraphrase Generation [ C ]//EMNLP. USA : [ s. n. ] ,2004 : 142 - 149.
  • 5Dolan B, Quirk C, Brockett C. Unsupervised construction of large paraphrase corpora: Exploiting massively parallel news sources [ C ]//Proceedings of the 20th international conference on Computational Linguistics. Association for Computational Linguistics ,2004:350.
  • 6Irvine V C, Samir Khuller. Design and Analysis of Algo- rithms Lecture Notes [ R ]. Maryland, USA : Dept of Com- puter Science University of Maryland ,2003.
  • 7Win Thuzar Kyaw, Khin Mar Soe, Hla Hla Htay, et al. Information Extraction from Myanmar Text Using Condi tion Random Fields[A]. Planetary Scientific Research Cen ter [C]. 2014,51:62 -66.
  • 8DUMAIS S T. Improving the Retrieval Information from External Sources[J]. Behaviour Research Methods, In- struments and Computers, 1991,23 (2) : 229-236.
  • 9彭月娥.面向中文问题分类的大规模高质量问句集自动获取[D].马鞍山:安徽工业大学:硕士学位论文),2013.
  • 10张晓孪,王西锋,李乃乾.中文问答系统中问题理解的研究与实现[J].西华大学学报(自然科学版),2008,27(2):4-7. 被引量:7

共引文献22

同被引文献68

引证文献7

二级引证文献32

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部