摘要
连锁商业数据的"分布异构"和"地域差异"特性,使得传统决策树算法难以满足其跨区域的数据挖掘要求.通过分析不同地域经济水平、人口规模等地域经济因素对连锁商业网点销售行为的影响特征,提出将地域因素引入到连锁商业的分布式数据挖掘模型中;并在定义地域因子、决策树特征和决策树特征差异率等概念的基础上,提出了由地域分枝算法模块、特征差异算法模块和分店子树构造算法模块构成的RDT算法.通过在浙江某连锁商业集团杭州、绍兴、宁波、温州、台州、丽水和金华七个地市门店的实证分析,充分验证了算法模型的有效性.
Chain business data have the feature of the"distributed heterogeneous"and"regional differences", which make the traditional decision tree algorithm is difficult to meet the requirements of cross-regional data mining.This paper analyzes impact features of the chain business network marketing behavior based on the different regional economic development level,size of population and other economic factors,puts forward introducing region-factor into distributed data mining module for the chain business;based on defining the concept of region-factor region factor,decision tree characteristics and characteristic differences rate,puts forward RDT algorithm which is made up of regional branch algorithm module,feature difference algorithm modules and branch store of the sub-tree construction algorithm module.Thought empirical analysis of a chain commercial group's branch stores which are in the cities of Hangzhou,Shaoxing,Ningbo,Wenzhou,Taizhou,Jinhua,Lishui,shows the algorithm is successfully applied to data mining process in a chain business enterprises.
出处
《系统工程理论与实践》
EI
CSSCI
CSCD
北大核心
2011年第6期1126-1133,共8页
Systems Engineering-Theory & Practice
基金
国家自然科学基金(71001088
71071141)
教育部人文社会科学研究(09YJC630205)
关键词
连锁商业
分布式
决策树
地域因素
chain business
distributed
decision tree
region-factor