期刊文献+

基于联合模型的中文社交媒体命名实体识别 被引量:2

Named Entity Recognition in Chinese Social Media Base on the Unified Model
下载PDF
导出
摘要 随着互联网的发展,对中文社交媒体中命名实体进行识别具有重要的意义,传统的做法是采用监督学习方法,局限于标注数据的稀缺。然而,通用领域中有足够的语料库且社交媒体中的海量未标注的文本可以用于提高命名实体识别的效果。论文提出了一个联合模型,利用通用领域语料库和社交网络领域中未标注的文本进行训练。该联合模型由两个模型组成,一个是跨领域学习模型另外一个是半监督学习模型。跨领域学习基于领域的相似性学习通用领域的信息。半监督学习通过主动学习目标域内未标注的信息。该联合模型提高了命名实体识别的效果,且大大减小了人工标注语料工作。 Named Entity Recognition(NER)in Chinese social media is important with the development of the internet. Previ-ous methods focus on in-domain supervised learning which is limited by the rare annotated data. However,there are enough corporain formal domains and massive in-domain unannotated texts which can be used to improve the task. A unified model which can learnfrom out-of-domain corpora and in-domain unannotated texts is proposed,the unified model contains two major functions,one isfor cross-domain learning and the other is for semi-supervised learning. Cross-domain leaning function can learn out-of-domain in-formation based on domain similarity. Semi-Supervised learning function can learn in-domain unannotated information by self-train-ing. Both learning functions outperform existing methods for NER in Chinese social media. Used unified model to experiment get abetter result and decrease the workload of manual tagged corpus.
出处 《计算机与数字工程》 2017年第12期2402-2406,2433,共6页 Computer & Digital Engineering
基金 国家高技术研究发展计划(863计划)(编号:2015AA015603) 国家自然科学基金项目(编号:61602114)资助
关键词 命名实体识别 社交媒体 跨领域学习 领域相似性 半监督学习 主动学习 named entity recognition social media cross domain leaning domain similarity semi-supervised learning self-training
  • 相关文献

参考文献4

二级参考文献42

共引文献21

同被引文献18

引证文献2

二级引证文献2

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部