摘要
为通过Map/Reduce基于键/值对以处理数据集与模式集的多对多的对应关系的方式实现数据挖掘,解决在较复杂的类型的模式的挖掘中存在的由组合爆炸导致的模式集过大的问题,提出了通过模式空间划分实现将处理数据集与模式集的对应关系的问题转化为处理数据集与子模式集的集合的对应关系的问题的方法,并对Map/Reduce集群的调度机制及组织、处理键/值对的方式进行了改进,增强了Map/Reduce执行模式挖掘任务的能力.在Map/Reduce集群上采用该思路实现某些较复杂类型的模式的挖掘算法时的并行度高于Map/Reduce化的传统算法.
In order to realize data mining through map/reduce based on key/value pairs,the way of processing the many-to-many corresponding relationship between the data set and the pattern set is adopted.For some of the more complex types of pattern,the pattern set is so large because of combinatorial explosion that the corresponding relationship cannot always be processed directly by the map/reduce.Therefore a way of pattern space is proposed to convert the problem of processing the many-to-many corresponding relationship between the date set and the pattern set to the problem of processing the many-to-many corresponding relationship between the data set and the set of the sub-pattern sets.At the same time,the scheduling mechanisms of the map/reduce cluster and the way of organizing the key/value pairs is improved to enhance the ability of map/reduce to execute pattern mining tasks.The results show that higher parallelism is achieved on map/reduce clusters by using this idea than the map/reduce of the traditional algorithm in mining some of the more complex types of pattern.
出处
《微电子学与计算机》
CSCD
北大核心
2011年第8期140-142,共3页
Microelectronics & Computer