摘要
21世纪是信息快速发展的时代,网络就像是一个巨大宝库,里面汇集着大量的数据信息,人们通过网络能够获取网页数据,并将网页信息抽取技术作为研究重点,提出基于本体的网页数据抽取方法,当前信息抽取技术还存在一些问题,包括:1.人工干预比较多,很多技术都需要样本训练,给用户带来负担。2.适应性比较差。只有解决了存在的问题才能获得更好的发展,本文就对基于本体的网页数据抽取技术进行分析。
The 21st century is an era of rapid development of information,the network is like a huge treasure house,which brings together a large amount of data information,people can obtain web page data through the network,and take web page information extraction technology as the research focus,put forward the ontology-based web page data extraction method,the current information extraction technology still has some problems,including:1.There are many manual interventions,many technologies need sample training,bring burden to users.2.adaptability is poor.Only by solving the existing problems can we get better development.This paper analyzes the ontology-based web page data extraction technology.
基金
湖南省自然科学基金资助项目(项目编号:2017JJ2135)
湖南省教育厅科学研究项目(项目编号:18A481,19C1070)
关键词
基于本体
网页数据
抽取技术
分析
ontology-based
web data
extraction techniques
analysis