摘要
深网作为网络数据的载体,其中包含了大量的网络数据,可为科学研究及挖掘应用提供优质的分析数据集。但非合作结构化深网数据亦是异构的,使得传统方法挖掘结果普遍存在着挖掘精度较低、挖掘时间较长、内存占用率较高等问题。提出基于分层抽样的深网重叠数据特征自动挖掘方法。通过对非合作结构化深网重叠数据进行分析,采用贝叶斯网络方法确定相应的标签,建立动态全局模式,在此模式下,利用元组分层抽样的方法对深网重叠数据特征进行分层抽样,实现对深网重叠数据特征自动挖掘。实验结果表明,所提方法挖掘精度较高、挖掘时间较短、内存占用率较低。
As the carrier of network data, the deep web contains massive network data, which can provide high-quality analytical data sets for scientific research and mining application. The non-cooperative structured deep web data is heterogeneous. The traditional mining methods generally have low mining precision, long mining time and high memory occupancy. Therefore, an automatic mining method for deep web overlapping data feature based on stratified sampling was presented. By analyzing the overlapping data in non-cooperative structured deep web, we used Bayesian network method to determine the corresponding labels and build a dynamic global model. On this basis, we used the tuple stratified sampling to sample the characteristics of deep web overlapping data. Finally, the automatic mining of deep web overlapping data features was achieved. Simulation results prove that the proposed method has higher mining precision, shorter mining time and lower memory utilization.
作者
杨蕗菡
YANG LU-han(School of Management,Xi'an University of Architecture and Technology,Xi'an,Shanxi 710055,China)
出处
《计算机仿真》
北大核心
2019年第11期251-254,共4页
Computer Simulation
关键词
非合作结构化
深网重叠数据
挖掘
全局模式
Non-cooperative structure
Deep web overlapping data
Mining
Global model