摘要
为了减少虚拟环境下大数据运行时间,数据运行时能够反映出一定的规律性和特殊的分类性,需要对虚拟环境下大数据进行智能并行聚类;当前大数据聚类方法是根据K-均值聚类方法不断地进行大数据样本分类的调整,经过多次计算调整后达到数据并行聚类的效果,但每当有新的大数据流入时,都需要对当前全部数据进行K-均值聚类,计算过程复杂,聚类效率低;为此,提出了一种基于MapReduce的虚拟环境下大数据智能并行聚类方法;首先在虚拟环境下大数据中抽取小规模数据集并确定大数据簇的质心,采用Single法对所抽样的小规模数据进行聚类,获得虚拟环境下大数据属性的均值,利用最小距离分类规则将大数据属性的均值快速地向数据簇的真实中心移动,依据Davies-bouldin指标假设一个数据簇离散度参数,在此参数值中选出大数据智能并行聚类相似度最大值,最后利用聚类相似度最大值得到Davies-bouldin指数,以Davies-bouldin指数为基础将多个类别的质心间距以及聚类离散度指定阈值合并为一个类并进行迭代计算,得到数据最佳聚类中心位置,由此完成虚拟环境下大数据智能并行聚类;仿真实验结果证明,所提方法提高了大数据智能并行聚类的灵活性和普遍适用性,减少了聚类时间,并适合应用于教育技术领域,不仅可以使教育技术网络数据更加合理化,而且更加规范化。
In order to reduce the virtual environment data, operation time data at runtime to reflect certain regularity and the classifica tion of, need to intelligent virtual environment big data parallel clustering. The current big data clustering method is the k--means clustering method are based on continuously to adjust the large data samples, after adjusting for multiple computing data parallel clustering effect, but every time a new big data flows, all need to the k means clustering, are all current data calculation process is complex, low efficiency of the clustering. To this end, this paper proposes a intelligent virtual environment based on graphs of data parallel clustering method. Extracted first big data in a virtual environment of small data set and determine the center of mass of big data clusters, by using the method of Single sampling of small data clustering, average get attributes of the virtual environment is big data, using the minimum distance classification rules will he big data attributes mean to quickly move to real data cluster center, on the basis of Davies--houldin index assuming a bunch of discrete degree of parameters and data in the parameter values chosen big data intelligent maximum parallel clustering similarity, finally obtained by clustering similarity maximum Davies--bouldin index, on the basis of Davies--bouldin index to multiple categories of discrete degree of speci- fied threshold centroid distance and clustering combined into a class and the iterative calculation, get the best data clustering center position, resulting in a virtual environment intelligent parallel clustering large data. The simulation experimental results show that the proposed meth- od improves the big data intelligent parallel clustering flexibility and generality, reduce the time of clustering, and suitable for application in the field of education technology, not only can make education technology network data more rationalization, and more standardized.
出处
《计算机测量与控制》
2017年第6期257-260,共4页
Computer Measurement &Control
基金
陕西省教育科学"十三五"规划课题(SGH16H169)
宝鸡文理学院重点课题(ZK16073)
关键词
虚拟环境
大数据
智能并行
聚类方法
virtual environment
big data
intelligent parallel
clustering method