摘要
介绍命名实体识别的基本概念,分析两种命名实体识别的基本方法:基于规则的命名实体识别方法和基于统计的命名实体识别方法,并以最大熵模型为理论基础,对中文菜名识别进行实证研究。根据中文命名实体的特点,设计6种特征模板。实验结果表明,在简单特征模板的基础上增加标注特征能有效提高命名实体的识别效果。对改进识别效果有用的特征依次为:标注特征、词性组合特征、后向词性依赖特征和词形特征。
This paper introduces the concept of Named Entity Recognition ( NER), analyzes two basic approaches, the rulebased approach and the statistical approach, and conducts an empirical study on Chinese dish name recognition based on the theory of Maximum Entropy Model (MEM). According to the characteristics of Chinese named entity, 6 fea- ture templates are designed. Experimental results show that adding tagging features to the basic simple feature template ean efficiently improve the performance of Named Entity Recognition. The features in order to improve recognition performance are as follow : tagging features, combination of POS features, forward POS dependency features and word form features.
出处
《现代图书情报技术》
CSSCI
北大核心
2011年第5期77-82,共6页
New Technology of Library and Information Service
基金
国家自然科学基金资助项目"Web2.0环境下基于本体学习的观点挖掘研究"(项目编号:70903047)
上海市重点学科建设项目"系统分析与集成"(项目编号:S30501)的研究成果之一
关键词
命名实体识别
最大熵模型
客户评论
文本挖掘
Named entity recognition Maximum entropy model User reviews Text mining