摘要
基于专家内省方法获取领域知识已无法满足要求,试图发挥内省和统计两种方法的优势,提出包含生语料级、词汇级、句子级和篇章级4个层级的领域语料库设计框架,使每一个层级的语料可以独立地进行NLP分析和服务高一个层级语料.在此设计框架下,建立了大规模的石油化工领域语料库,为石油化工行业的知识获取和分析提供基础资源和素材,支撑石化领域知识工程项目的研究与应用.
As a method of expert introspection to obtain domain knowledge,it has been unable to meet the requirements.This paper tries to take advantage of both introspection and statistic method to present a framework of domain corpus,which includes four levels,such as raw corpus level,lexical level,sentence level and chapter level.Every layer of corpus can provide service for the higher and NLP analysis.A large-scale domain corpus in petrochemical field has been established,which can provide material for knowledge acquisition and knowledge engineering application.
出处
《成组技术与生产现代化》
2015年第1期29-33,50,共6页
Group Technology & Production Modernization
基金
国家科技支撑计划资助项目(2012BAH34F04)