摘要
为解决识别哈萨克语基本短语的问题,提出一种基于条件随机场模型的哈萨克语基本短语自动识别方法。利用基于贪心策略的特征模板自动选择算法,结合哈萨克语基本短语的特点,从众多上下文特征中选取出合适的特征;每次从备选特征模板中挑选出局部最优的特征模板项,加入到最终的特征模板中,进一步提高识别准确率。实验结果表明,该方法的识别准确率和召回率分别达到了89.01%和84.07%。
To solve the problem of identifying Kazakh basic phrases ,an automatic identification method was presented based on conditional random fields .There are many features around the context in the process of identification ,and an automatic selection method of feature template based on the greedy algorithm was adopted to select features to combine with characters of Kazakh base phrases .In this algorithm ,relatively best feature items were added to the final feature template at each time ,and the recog-nition precision was improved .Experimental results show the recognition precision and the recall rate reach 89.01% and 84.07%respectively .
出处
《计算机工程与设计》
CSCD
北大核心
2014年第10期3602-3607,共6页
Computer Engineering and Design
基金
国家自然科学基金项目(61063025
61363062)
关键词
基本短语识别
条件随机场
特征模板自动选择
哈萨克语
贪心策略
base phrase identification
conditional random fields
automatic selection of feature template
Kazakh
greedy strategy