摘要
运用数据挖掘技术研究钻井作业事故隐患的分布规律及其内在机理,是迫切需要解决的重要课题。针对冗余、复杂的钻井隐患数据在挖掘过程中频繁项集丢失及其生成效率低的问题,提出一种基于支持度矩阵的Apriori算法。首先,引入布尔矩阵来表示事务数据库,避免了数据库的重复扫描。其次,通过事务矩阵相乘构造支持度矩阵来获得支持度从而简化支持度计算方法。最后,对算法的连接策略进行优化,简化了频繁项集的生成过程,且在运算过程中不断约简矩阵结构。在UCI数据集上进行实验,证明了改进后的Apriori算法能有效地提高执行效率。将该算法应用于钻井历史隐患数据的关联挖掘,挖掘结果能为安全管理者提供科学的决策依据,实现对钻井作业事故隐患有效识别和风险控制,具有重要意义和推广应用价值。
It is very important to use data mining technology to study the distribution rule and inherent mechanism of hidden trouble in drilling operation.Aiming at frequent itemsets loss of complex hidden danger data and low generation efficiency,an Apriori algorithm based on support matrix is proposed.First,we introduce a boolean matrix in the transaction database to prevent repeated database scanning.Secondly,the support matrix is constructed by multiplying the transaction matrix to obtain support and simplify the calculation method of support.Finally,the connection strategy of the algorithm is optimized,which simplifies the generation process of frequent itemsets,and continuously reduces the matrix structure in the calculation process.Experiments on UCI datasets show that the improved Apriori algorithm can effectively improve the efficiency of execution.This algorithm is applied to the associated mining of historical drilling hazard data,the mining results can provide reasonable basis for safety managers,identify effectively hidden dangers and risk control,which is of great significance and worth of popularization and application.
作者
王兵
黄丹
李文璟
WANG Bing;HUANG Dan;LI Wenjing(School of Computer Science,Southwest Petroleum University,Chengdu,Sichuan 610500,China)
出处
《西南石油大学学报(自然科学版)》
CAS
CSCD
北大核心
2022年第2期113-122,共10页
Journal of Southwest Petroleum University(Science & Technology Edition)
基金
国家科技重大专项(2016ZX05020-006)。