期刊文献+

一种面向微博文本的命名实体识别方法 被引量:7

An approach to named entity recognition towards micro-blog
下载PDF
导出
摘要 命名实体识别是自然语言处理领域的一项基础性技术。近年来微博等网络社交平台发展迅速,其独特的形式对传统的命名实体识别技术提出了新的挑战。故提出一种基于条件随机场模型的改进方法,针对微博文本短小、语义含糊等特点,引入外部数据源提取主题特征和词向量特征来训练模型,针对微博数据规模大、人工标准化处理代价大的特点,采取一种基于最小置信度的主动学习算法,以较小的人工代价强化模型的训练效果。在新浪微博数据集上的实验证明,该方法与传统的条件随机场方法相比F值提高了4.54%。 Named entity recognition is a fundamental technology in natural language processing( NLP). In recent years, rapid devel-opment of social network platforms such as microblog presents new challenges to the traditional named entity recognition( NER) tech-nology because of the unique form. In this paper, an improved method based on the conditional random field( CRF) model is pro-posed for microblog texts. Due to the short texts and semantic ambiguity, external data resources are introduced to generate the top-ic feature and word representation feature for training the model. Due to the large-scale of microblog data and the high cost of manual standardization, an active learning algorithm based on least confidence is adopted to enhance the training effect at a lower cost of labor. Experiments on a Sina weibo data set show that this method improves the F-score by 4. 54 % compared to the tradi-tional CRF methods.
作者 李刚 黄永峰
出处 《电子技术应用》 2018年第1期118-120,124,共4页 Application of Electronic Technique
基金 国家自然科学基金项目(U1536207)
关键词 命名实体识别 微博 条件随机场 词向量 主动学习 named entity recognition micro-blog conditional random field word representation active learning
  • 相关文献

同被引文献42

引证文献7

二级引证文献59

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部