摘要
为了有效地从海量的Web日志中挖掘出有用的用户浏览模式,将顺序约束和时态约束加入到快速关联规则挖掘算法中,给出了一种基于时态约束的浏览模式挖掘算法FPMBTC.该算法简化了挖掘过程中候选模式的生成操作,对数据库扫描一次,求出所有事务的连续子序列集,利用集合交差运算求得支持度,同时逐步修正会话事务时间得到浏览模式的有效时间,根据网站结构及Web日志不断变化的特点,给出了增量更新挖掘算法.实验结果表明:与类Apriori算法相关工作相比,运行时间少,扩展性好,并且挖掘出的模式具有时效性,适合于不断变化的且有时态特点的Web日志信息的挖掘.此研究对于学习和研究Web挖掘技术具有很好的参考价值,对建造实际的Web挖掘系统具有重要的理论意义和实用价值.
To effectively excavate useful browsing patterns from mass Weblogs, the sequential and temporal constraints are added in the quick mining algorithm based on the association rule in this paper. A browsing pattern mining algorithm based on temporal constraints:FPMBTC is presented. This algorithm simplifies the generation of candidate patterns. The continuous sub-sequence sets of all transactions were acquired by scanning over the database only once. The supporting degrees were calculated by the intersection and difference operation of sets. At the same time, the effective time of browsing patterns was obtained by the gradual correction for the session transaction time. On the basis of the above-mentioned process, the increment update algorithm was given according to the character of the continuous change in the structure of the homepage and the Weblogs. The experimental results show that the algorithm is able to excavate the patterns in a real-time way; meanwhile, it needs shorter running time and is more expandable than the Apriori-like algorithm. This approach suits to the mining of Weblogs which are in continuous change and with temporal feature, and can provide a good reference on learning and researching on Web mining technology.
出处
《哈尔滨工业大学学报》
EI
CAS
CSCD
北大核心
2008年第9期1474-1480,共7页
Journal of Harbin Institute of Technology
基金
国家自然科学基金资助项目(60603092)
哈尔滨师范大学科研基金资助项目(KM2007-17)