Web日志中时态约束浏览模式挖掘算法研究被引量：3

An algorithm for temporal constraint browsing pattern mining in Weblogs

下载PDF

导出

摘要为了有效地从海量的Web日志中挖掘出有用的用户浏览模式,将顺序约束和时态约束加入到快速关联规则挖掘算法中,给出了一种基于时态约束的浏览模式挖掘算法FPMBTC.该算法简化了挖掘过程中候选模式的生成操作,对数据库扫描一次,求出所有事务的连续子序列集,利用集合交差运算求得支持度,同时逐步修正会话事务时间得到浏览模式的有效时间,根据网站结构及Web日志不断变化的特点,给出了增量更新挖掘算法.实验结果表明:与类Apriori算法相关工作相比,运行时间少,扩展性好,并且挖掘出的模式具有时效性,适合于不断变化的且有时态特点的Web日志信息的挖掘.此研究对于学习和研究Web挖掘技术具有很好的参考价值,对建造实际的Web挖掘系统具有重要的理论意义和实用价值. To effectively excavate useful browsing patterns from mass Weblogs, the sequential and temporal constraints are added in the quick mining algorithm based on the association rule in this paper. A browsing pattern mining algorithm based on temporal constraints：FPMBTC is presented. This algorithm simplifies the generation of candidate patterns. The continuous sub-sequence sets of all transactions were acquired by scanning over the database only once. The supporting degrees were calculated by the intersection and difference operation of sets. At the same time, the effective time of browsing patterns was obtained by the gradual correction for the session transaction time. On the basis of the above-mentioned process, the increment update algorithm was given according to the character of the continuous change in the structure of the homepage and the Weblogs. The experimental results show that the algorithm is able to excavate the patterns in a real-time way; meanwhile, it needs shorter running time and is more expandable than the Apriori-like algorithm. This approach suits to the mining of Weblogs which are in continuous change and with temporal feature, and can provide a good reference on learning and researching on Web mining technology.

作者宁慧李红宇吴培莲

机构地区哈尔滨工程大学计算机科学与技术学院哈尔滨师范大学阿城学院哈尔滨工业大学材料科学与工程学院

出处《哈尔滨工业大学学报》 EI CAS CSCD 北大核心 2008年第9期1474-1480,共7页 Journal of Harbin Institute of Technology

基金国家自然科学基金资助项目(60603092) 哈尔滨师范大学科研基金资助项目(KM2007-17)

关键词 WEB日志挖掘频繁访问模式有效时间 Weblog mining frequent access patterns valid time

分类号 TP311 [自动化与计算机技术—计算机软件与理论]

引文网络
相关文献

参考文献8

1郑跃平.基于约束数据预处理的Web日志挖掘研究[D].福州:福州大学,2005.
2施建生,伍卫国,陆丽娜,Yang Yiling,杨怡玲.Web日志中挖掘用户浏览模式的研究[J].西安交通大学学报,2001,35(6):621-624. 被引量：34
3陈才扣,金远平.基于Web的时间序列模式挖掘[J].计算机应用研究,2000,17(7):32-33. 被引量：7
4丁祥武.挖掘时态关联规则[J].武汉交通科技大学学报,1999,23(4):365-367. 被引量：13
5CHEN M S, PARK J S, YU P S. Efficient Data Mining for Path Traversal Patterns [ J ]. IEEE Transactions on Knowledge and Data Engineering, 1998, 10 ( 2 ) : 209 - 221.
6AGMWAL R, SRIDANT R. Fast algorithms for mining association rules [ C ]//Proc of the 20^th VLDB Conference. San Francisco : Morgan Kauf nann, 1994.
7刘念祖.时态数据挖掘的探讨[J].上海第二工业大学学报,2001,18(2):27-31. 被引量：12
8潘定,沈钧毅.时态数据挖掘的相似性发现技术[J].软件学报,2007,18(2):246-258. 被引量：41

二级参考文献15

1Jia-WeiHan,JianPei,Xi-FengYan.From Sequential Pattern Mining to Structured Pattern Mining： A Pattern-Growth Approach[J].Journal of Computer Science & Technology,2004,19(3):257-279. 被引量：18
2[1]P.Adriaans and D.Zantinge. Data Mining[M].Addison-Wesley:Harlow,England,1996.
3[2]U.M.Fayyad, G.Piatetsky-Sharpiro, P.Smyth and R.Uthurusamy. Advances in Knowledge Discovery and Data Mining[M]. AAAI/MIT Press,1996.
4[3]G.Piatetsky-Sharpiro, U.M.Fayyad and P.Smyth. From data mining to knowledge discovery: An overview. In U.M.Fayyad et al eds. Advances in Knowledge Discovery and Data Mining, 1-35[M]. AAAI/MIT Press, 1996.
5[4]M.S.Chen,J.Han and P.S.Yu. Data mining: An overview from a database perspective[J]. IEEE Trans.Knowledge and Data Engineering,8:866-883,1996.
6[5]A.Tansel et al eds. Temporal Databases: Theory, Design and Implementation[M]. The Benjamin/Cummings Publishing Company, 1993.
7[6]J.F.Allen. Maintaining Knowledge about Temporal Intervals[J]. Communications of ACM, 26(11),1993.
8[7]R.Agrawal,T.Imielinski and A.Swami. Mining Association Rules between Sets of Items in Large Databases[C]. Proceedings of ACM SIGMOD, May 1993.
9[8]C.J.Date. A guide to the SQL Standard[M]. Addison-Wesley Publishing Company, 1987.
10[9]J.Han et al. DMQL:A Data Mining Query Language for Relational Databases. SIGMOD'96 Workshop on Research Issues on Data Mining and Knowledge Discovery[C]. Canada:Montreal, 1996.

共引文献99

1王丽娜.Web日志挖掘技术研究[J].光盘技术,2008(4):34-36. 被引量：2
2梁晓蕾,张世栋.时态数据库中非数值型属性周期规律挖掘以及关联规则提取[J].计算机研究与发展,2007,44(z3):408-413. 被引量：1
3李炜,郑华,邱剑锋,朱丽进,蒋阿芳.基于时间序列相似性匹配算法的地震预测研究[J].四川地震,2010(2):10-16. 被引量：2
4朱丽红,赵燕平.Web挖掘研究综述[J].情报杂志,2004,23(7):2-5. 被引量：16
5胡建武,何贞铭,张贻权.WEB日志挖掘及其实现[J].计算机工程与应用,2004,40(14):156-158. 被引量：13
6姜萍,涂宇峰,周育辉,周芸.一种基于SLIQ的快速扩展分类算法的实现[J].宁波职业技术学院学报,2004,8(5):87-89.
7何丽,韩文秀.一种基于后缀树的Web访问模式挖掘算法[J].计算机应用,2004,24(11):68-70. 被引量：6
8邱均平,张洋.网络信息计量学综述[J].高校图书馆工作,2005,25(1):1-12. 被引量：44
9姜萍,涂宇峰,周育辉,周芸.一种基于SLIQ的快速扩展分类算法的实现[J].计算机与现代化,2005(3):19-21.
10谭华,张益林.时态关联规则中有效时间的不确定性研究[J].科学技术与工程,2005,5(9):581-584.

同被引文献17

1王媛媛,钟永恒.基于SQL Server 2005的Web日志挖掘系统构建[J].现代图书情报技术,2006(5):58-61. 被引量：7
2郜焕平,马希荣.Visual Basic6.0程序设计[M].北京:机械工业出版社,2004.
3唐学忠.SQL Server 2000管理及应用系统开发[M].北京:电子工业出版社,2005,5.
4Jayathilake P W D C. A novel mind map based approach for log data extraction[A].Sri Lanka,2011.
5Yu Hongyong,Wang Deshuai. Mass log data processing and mining based on Hadoop and cloud computing[A].Melboume,2012.
6Wang Zhenqi,Li Hailong. Research of massive web log data mining based on cloud computing[A].Hubei,2013.
7Sulaiman S,Shamsuddin S M,Ahmad N B. Meaningless to meaningful Web log data for generation of Web pre-caching decision rules using rough set[A].Langkawi,2012.
8马宜青,屈松川.数据挖掘技术在商业银行中的应用[J].福建电脑,2009,25(5):104-104. 被引量：7
9成保梅.Web数据挖掘在电子商务中的应用[J].福建电脑,2009,25(5):105-106. 被引量：1
10熊熙.基于Web日志挖掘的个性化服务技术的研究[J].网络安全技术与应用,2010(6):61-64. 被引量：1

引证文献3

1李红宇,刘友丹,李玉霞,于晓红.Web日志挖掘系统的设计与实现[J].电脑开发与应用,2009,22(7):58-59.
2罗骁茜,宁泽功.基于日志的客户感知问题辅助分析系统的应用研究[J].互联网天地,2014(4):52-55.
3费建刚,梁建国.数据挖掘技术在学生成绩分析中的应用[J].电脑知识与技术,2013,9(8X):5391-5393.

1姚青山,张春霞.基于关联规则的Web使用挖掘系统[J].河南科学,2008,26(3):329-332.
2许玲凤.基于WEB的数据挖掘技术[J].中小企业管理与科技,2015(28):208-208. 被引量：1
3范敏,黄席樾,石为人.基于Web挖掘的过程模型及算法[J].计算机应用,2005,25(3):646-648. 被引量：2
4施建生,伍卫国,陆丽娜,Yang Yiling,杨怡玲.Web日志中挖掘用户浏览模式的研究[J].西安交通大学学报,2001,35(6):621-624. 被引量：34
5魏榴花.基于Web日志的用户访问推荐系统的研究与实现[J].电脑知识与技术（过刊）,2010,0(30):8510-8512.
6李海威,李小福,樊安之.Web使用挖掘及其在电子商务中的应用研究[J].现代计算机,2010,16(8):8-12.
7刘沛骞,郭海儒,袁玲玲.Web日志挖掘中的用户访问模式识别[J].雁北师范学院学报,2006,22(2):23-25. 被引量：2
8郭有强,胡学钢.快速关联规则增量式更新算法研究[J].安庆师范学院学报（自然科学版）,2007,13(2):17-20.
9陈一明,李丽萍.XML快速关联规则挖掘算法的研究[J].微计算机信息,2009,25(12):221-222.
10吴雅双,张东站.基于BIPL的Web频繁访问模式挖掘[J].计算机工程与应用,2008,44(23):136-138.

哈尔滨工业大学学报

2008年第9期

浏览历史

内容加载中请稍等...

Web日志中时态约束浏览模式挖掘算法研究被引量：3

参考文献8

二级参考文献15

共引文献99

同被引文献17

引证文献3

相关作者

相关机构

相关主题

浏览历史

Web日志中时态约束浏览模式挖掘算法研究 被引量：3

参考文献8

二级参考文献15

共引文献99

同被引文献17

引证文献3

相关作者

相关机构

相关主题

浏览历史

Web日志中时态约束浏览模式挖掘算法研究被引量：3