摘要
为了进一步提高基本名词短语(BaseNPs)的识别精度,针对最大熵方法和Brill方法各自的特点,提出基于两者相结合的英语基本名词短语识别算法.该算法是在高准确率词性标注的基础上实现的.在训练和测试两个阶段中,均先采用最大熵方法识别基本名词短语,然后将已具有很高精度的识别结果作为初始标注结果运用于Brill方法中.实验结果表明,此联合算法达到了94%的准确率和召回率,充分融合了最大熵方法和Brill方法的优点,可与基于相同训练和测试语料的目前最理想的英语基本名词短语识别结果相比.
To increase further the accuracy of BaseNP identification and utilize features of the maximum entropy approach and the Brill approach, an English BaseNPs identification algorithm based on a combined approach is presented. The algorithm is based on a high-performance POS (parts of speech) tagger. During the training phase and the application phase, maximum entropy approach is first applied to the initialization process of Brill approach, and the Brill approach is then run on its results already having high accuracy. Experimental results showed that this combined algorithm achieved a high precision and recall rate of over 94 %, fully inosculating the strength of the maximum entropy approach and the Brill approach. It is comparable to the most ideal results of existing English BaseNP identification based on the same training and testing corpus.
出处
《北京理工大学学报》
EI
CAS
CSCD
北大核心
2006年第6期500-503,共4页
Transactions of Beijing Institute of Technology
基金
国家部委预研项目(504-4)