摘要
在微博中,新词的构词规则多样且复杂多变。针对基于词内部结合度与边界自由度的新词发现方法对新词内部结合度不高的问题,改进一种融合多字互信息与左右邻接熵的新词合成算法。利用多字互信息提高新词的内部结合度,最终达到提高新词识别精度的目的。实验结果表明,改进的方法能有效提高微博新词识别的性能。
In microblog, the word formation rules of new words are various and complex and changeable. Aiming at the problem that the new word discovery method does not have a high inner combination degree of new words, a new word synthesis algorithm that combines multiple word mutual information and branch entropy is improved. The inner combination degree of new words is improved through multi-word mutual information, which achieves the purpose of improving the accuracy of new word recognition. Experimental results show that the improved method can effectively improve the performance of microblog new word recognition.
作者
王欣
WANG Xin(College of Computer and Ilfformation Science, Chongqing Normal University, Chongqing 401331)
关键词
多字互信息
邻接熵
新词合成算法
Multi Word Mutual Information
Relative Branch Entropy
New Word Synthesis Algorithm