摘要
为了明确大气污染物、污染源、影响因素、评价指标、危害等之间的关系,分析大气污染传播路径,建立了一个较为清晰、完善的大气污染领域本体.首先,基于机器学习和自然语言处理等技术,提出一种基于注意力机制的序列标注联合抽取实体关系的方法,在双向长短时记忆(long short-term memory,LSTM)网络模型中加入注意力机制,并将实体和关系联合标注,从而进行实体关系抽取.其次,结合词频-逆文档频率(term frequency-inverse document frequency,TF-IDF)核心概念挖掘方法进行知识抽取,并将概念、属性、关系和实例组织起来,从而实现大气污染本体模型的半自动构建.最后,在本体和实例的基础上通过Protégé的SPARQL Query模块和HermiT推理机分别进行条件推理和可视化推理.结果表明,基于注意力机制的序列标注实体关系联合抽取方法所构建的大气污染领域本体包含核心实体68个,实例数360个,相较于现有的本领域本体,在全面性、有效性、准确性和可重用性方面都有较好表现,同时推理出了Ca~(2+)和K~+等污染离子的传播路径.因此,基于注意力机制的序列标注联合抽取实体关系的方法能够有效地半自动构建大气污染领域本体,推理出清晰的大气污染传播路径.
To clarify the relationship among air pollutants,pollution sources,influencing factors,evaluation indicators and harms,and to analyze the air pollution transmission path,a clearer and more complete domain ontology of air pollution was established.First,a method of entity relationship joint extraction based on attention mechanism was proposed.Attention mechanism was added to the model of bi-directional long and short time memory network,and entity and relation were labeled jointly to extract entity relation.Second,it was combined with term frequency-inverse document frequency(TF-IDF)core concept mining method to extract knowledge,and then concepts,relationships,and relevant instances were organized in hierarchy.Furthermore,the ontology model was constructed semi-automatically.Finally,conditional reasoning and visual reasoning were carried out on the basis of ontology and instance through SPARQL Query module and HermiT reasoning machine of Protege.Results show that the domain ontology of atmospheric pollution constructed by the entity relation extraction method based on attention mechanism contains 68 core entities and 360 instances.Compared with the existing domain ontology,the validity,accuracy and comprehensiveness,reusability of this method have better performance.At the same time,the propagation paths of pollution of ions were deduced.Therefore,the method of sequence labeling and joint extraction of entity relations based on attention mechanism can effectively construct air pollution domain ontology semi-automatically and deduce air pollution propagation path clearly.
作者
刘博
张佳慧
李建强
李永
郎建垒
LIU Bo;ZHANG Jiahui;LI Jianqiang;LI Yong;LANG Jianlei(Faculty of Information Technology,Beijing University of Technology,Beijing 100124,China;Key Laboratory of Beijing on Regional Air Pollution Control,Beijing University of Technology,Beijing 100124,China)
出处
《北京工业大学学报》
CAS
CSCD
北大核心
2021年第3期246-259,共14页
Journal of Beijing University of Technology
基金
国家自然科学基金资助项目(61702021)
北京市自然科学基金资助项目(4174082,4182040)
北京市教育委员会科技计划资助项目(SQKM201710005021)。
关键词
本体
大气污染
自然语言处理
注意力机制
实体关系抽取
语义推理
ontology
air pollution
natural language processing
attention mechanism
entity relationship extraction
semantic reasoning