摘要
关键词在自然语言处理的各个领域有着十分重要的意义。对于中文自然语言处理,一词多义和多词一义问题始终是困扰研究人员的一个重大难题。传统的一些基于统计的方法,诸如KEA只是机械地统计了词频,而没有考虑词之间的关系。文中提出了一种基于同义词的中文关键词提取方法 SKEA,并建立一阶隐马尔可夫模型进行词义消歧,将文本从稀疏的词空间映射到语义空间,从而实现了文本的降维。同时改进了KEA的位置权重公式,并提出新的关键词特征选取项。最后对SKEA方法和KEA方法进行比较实验,证明SKEA是一种更优秀的中文关键词提取方法。
Keywords play a significant role in various fields of natural language processing.For Chinese natural language processing,the polysemy and synonym are a major problem that troubles researchers.The traditional statistics-based approach,such as KEA,simply calculates the frequencies of appearance,without taking into account the relationships between the words.In this paper,a synonym Chinese keyword extraction method SKEA with word meaning disambiguation using the hidden Markov model is proposed.This method projects maps from the vector space into the semantic space,which achieves the dimensionality reduction of texts.At the same time,it improves the the KEA position weight formula,and proposes a new keyword feature selection criteriion.Finally,this paper compares SKEA method and KEA method using the multiple controlled experiments and proves that SKEA is a better Chinese keyword extraction method.
出处
《江南大学学报(自然科学版)》
CAS
2013年第5期620-625,共6页
Joural of Jiangnan University (Natural Science Edition)