期刊文献+

一次性条件下top-k高平均效用序列模式挖掘算法

Top-k high average utility sequential pattern mining algorithm under one-off condition
下载PDF
导出
摘要 针对传统序列模式挖掘(SPM)不考虑模式重复性且忽略各项的效用(单价或利润)与模式长度对用户兴趣度影响的问题,提出一次性条件下top-k高平均效用序列模式挖掘(TOUP)算法。TOUP算法主要包括两个核心步骤:平均效用计算和候选模式生成。首先,提出基于各项出现位置与项重复关系数组的CSP(Calculation Support of Pattern)算法计算模式支持度,从而实现模式平均效用的快速计算;其次,采用项集扩展和序列扩展生成候选模式,并提出了最大平均效用上界,基于该上界实现对候选模式的有效剪枝。在5个真实数据集和1个合成数据集上的实验结果表明,相较于TOUP-dfs和HAOP-ms算法,TOUP算法的候选模式数分别降低了38.5%~99.8%和0.9%~77.6%;运行时间分别降低了33.6%~97.1%和57.9%~97.2%。TOUP的算法性能更优,能更高效地挖掘用户感兴趣的模式。 To address the issue that traditional Sequential Pattern Mining(SPM)does not consider pattern repetition and ignores the effects of utility(unit price or profit)and pattern length on user interest,a Top-k One-off high average Utility sequential Pattern mining(TOUP)algorithm was proposed.The TOUP algorithm mainly includes two core steps:average utility calculation and candidate pattern generation.Firstly,a CSP(Calculation Support of Pattern)algorithm based on the occurrence position of each item and the item repetition relation array was proposed to calculate pattern support,thereby achieving rapid calculation of the average utility of patterns.Secondly,candidate patterns were generated by itemset extension and sequence extension,and a maximum average utility upper bound was proposed.Based on this upper bound,effective pruning of candidate patterns was achieved.Experimental results on five real datasets and one synthetic dataset show that compared to the TOUP-dfs and HAOP-ms algorithms,TOUP algorithm reduces the number of candidate patterns by 38.5%to 99.8%and 0.9%to 77.6%,respectively,and decreases the running time by 33.6%to 97.1%and 57.9%to 97.2%,respectively.Therefore,the algorithm performance of TOUP is better,and it can mine patterns of interests to users more efficiently.
作者 杨克帅 武优西 耿萌 刘靖宇 李艳 YANG Keshuai;WU Youxi;GENG Meng;LIU Jingyu;LI Yan(School of Artificial Intelligence,Hebei University of Technology,Tianjin 300401,China;School of Economics and Management,Hebei University of Technology,Tianjin 300401,China)
出处 《计算机应用》 CSCD 北大核心 2024年第2期477-484,共8页 journal of Computer Applications
基金 国家自然科学基金资助项目(61976240)。
关键词 数据挖掘 序列模式挖掘 高平均效用 一次性条件 TOP-K data mining sequential pattern mining high average utility one-off condition top-k
  • 相关文献

参考文献4

二级参考文献25

共引文献15

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部