摘要
为提高中文关键字的提取准确率,提出一种基于竞争学习网络的中文关键字提取算法。对文章进行分词,得到单个词组或短语,视其为单个神经元,将神经元输入竞争学习网络的输入层,通过竞争层上神经元的相互竞争,获得一个或几个活跃的神经元,使用合并权值及聚类分析方法得到文章的关键字。实验结果表明,该算法提取关键字的平均命中率高于词频-逆文档频率算法和传统的词频算法,鲁棒性较好。
To solve this problem about the accuracy of the present Chinese keyword extraction algorithm,this paper presents a new keyword extraction algorithm based on competitive learning network.The algorithm adopts the method that it takes the divided word which comes from the Chinese article as the single neuron.And it can get one or more active neurons after these neurons are input the input layer and compete with each other on the competition layer.The keywords of the Chinese article are obtained through merging the weights and clustering analysis.Experimental results show that the hit rate of extracting keywords with this algorithm is higher than the algorithm of Term Frequency-inverse Document Frequency(TF-IDE) and the traditional algorithm named Term Frequency(TF),and has a good robustness.
出处
《计算机工程》
CAS
CSCD
2013年第2期207-210,215,共5页
Computer Engineering
关键词
关键字提取
平均命中率
竞争学习网络
神经元
输入层
竞争层
keyword extraction
average hit rate
competitive learning network
neuron
input layer
competitive layer