摘要
当前大部分WordNet词语相似度计算方法由于未充分考虑词语的语义信息和位置关系,导致相似度的准确率降低。为解决上述问题,提出了一种使用词向量模型Word2Vec计算WordNet词语相似度的新方法。在构建WordNet数据集时提出一种新形式,不再使用传统的文本语料库,同时提出信息位置排列方法对数据集加以处理。利用Word2Vec模型训练WordNet数据集后得到向量表示。在公开的R&G-65、M&C-30和MED38词语相似度测评集上完成了词语相似度计算任务,从多个角度进行了Pearson相关系数对比实验。结果显示该文计算的相似度值与人工判定值计算取得的Pearson相关系数指标得到了显著提升。
Currently,most WordNet word similarity calculation methods do not fully consider the semantic information and the location relationships of words,leading to the similarity accuracy reduction.To solve these problems,this paper proposes a new method to calculate the WordNet word similarity using the word vector model Word2Vec.A new form of the WordNet data set is proposed instead of using the traditional text corpus,and the information position arrangement method is used to process the data set.The vector representations are obtained by training the WordNet data set with the Word2Vec model.The word similarity calculation task is completed on the open word similarity evaluation sets like R&G-65,M&C-30 and MED38,and the Pearson correlation coefficient comparative experiment is conducted from multiple angels.Experimental results show that Pearson correlation coefficient computed by the similarity value calculated in this paper and the artificial judgement value is significantly improved.
作者
陈丹华
王艳娜
周子力
赵晓函
李天宇
王凯莉
CHEN Danhua;WANG Yanna;ZHOU Zili;ZHAO Xiaohan;LI Tianyu;WANG Kaili(School of Cyber Science and Engineering,Qufu Normal University,Qufu,Shandong 273100,China;School of Physical Engineering,Qufu Normal University,Qufu,Shandong 273100,China)
出处
《计算机工程与应用》
CSCD
北大核心
2022年第3期222-229,共8页
Computer Engineering and Applications
基金
山东省自然科学基金(ZR2017MD019)
教育部高教司产学合作协同育人项目(201701020098)
曲阜师范大学交叉学科研究项目(QFNUSKC291809120)
赛尔网络下一代互联网技术创新项目(NGII20190516)。