摘要
现有的基准词选择方法存在着随机性和主观性的缺陷,提出了一种基于词聚类的基准词的选择方法:从目标领域本体中选出一组初始种子词进行扩展,聚类得出二代种子词,对二代种子词再进行扩展、聚类,依次迭代直至得到最优的聚类种子词,并作为最终选取的基准词。实验结果表明该方法提取的基准词在词的情感倾向分类中具有较高的准确率。
This paper put forward a method of selecting paradigm words, which was based on the existing randomness and sub- jectivity issue. Firstly, it expanded words by a group of selected initial seed words;secondly, it obtained the second generation of seed words by means of hierarchical clustering. According to the similarity between two different expanded words, then it ex- panded and clustered the second generation seed words. At last it orderly iterated by same procedure to get the optimal cluste- ring seed words as the final selected paradigm words. The experiment result indicates that the new method has a higher accuracy in selecting paradigm words while classifying the different emotional proclivities.
出处
《计算机应用研究》
CSCD
北大核心
2011年第1期114-116,共3页
Application Research of Computers
基金
高等院校博士点基金资助项目(20090111110016)
合肥工业大学科学研究发展基金资助项目(2010HGXJ0009)
关键词
基准词
词汇情感倾向
词的相似度
词的聚类
领域本体
paradigm word
word sentiment orientation
word similarity
word clustering
domain ontology