摘要
本文在统计语言模型构造中 ,提出了将词间距离信息结合到N gram统计语言模型中的思路 ,并称之为距离加权的关联词统计语言模型。该模型可以考虑一个句子中非相邻词之间的关系 ,基于“词距越近关系越密切”的原则 ,通过距离加权函数来引入距离信息 ,提高模型的预测能力。本文还将其应用到一个中文整句拼音输入法系统中。实验表明 ,该模型与传统的N gram统计语言模型相比 ,汉字误识率有所降低 。
Proposed in this paper is a novel language model based on the traditional N gram model,where the inter word distance information is integrated,and therefore the model is referred to as the distance weighted statistical language model.In this model,the relationship between disconnected words is taken into consideration.Based on the principle that closer words(in distance)have a closer relation.A distance weighted function has been used to integrate the information so as to improve the model's prediction ability.Compared with the original n gram model,the experiments results show that the proposed language model will reduce the Chinese whole sentence IME system's word error rate.
出处
《中文信息学报》
CSCD
北大核心
2001年第6期47-52,共6页
Journal of Chinese Information Processing