摘要
CARDBK聚类算法与批K-means算法的不同之处在于,每个点不是只归属于一个簇,而是同时影响多个簇的质心值,一个点影响某一个簇的质心值的程度取决于该点与其它离该点更近的簇的质心之间的距离值。从聚类结果的熵、纯度、F1值、Rand Index和NMI等5个性能指标值来看,与多个不同算法在多个不同数据集上分别聚类相比,该算法具有较好的聚类结果;与多个不同算法在同一数据集上很多不同的初始化条件下分别聚类相比,该算法具有较好且稳定的聚类结果;该算法在不同大小数据集上聚类时具有线性伸缩性且速度较快。
The difference between our clustering algorithm and batch K-means algorithm is that in our algorithm each point is not only attributable to one cluster,instead affects multiple cluster centroid values,and the degree of influence of a point on a cluster centroid depends on the distance values between this point and the other more near cluster centroids.Our algorithm and a number of different algorithms on a number of different data sets were clustered respectively from the point of view of their clustering result’s five performance index values such as entropy,purity,F1 value,Rand Index and normalized mutual information,and the results show our algorithm has a better clustering results.Our algorithm and a number of different algorithms were clustered respectively on one same data set but under many different initialization conditions,and clustering results of our algorithm are preferably more stable and better.Cluster on different size data sets by our algorithm has a linear scalability and is faster.
出处
《计算机科学》
CSCD
北大核心
2015年第3期201-205,共5页
Computer Science
基金
国家自然科学基金(61379019
71102149)
中国博士后科学基金(2013M540704)
四川省学术和技术带头人培养资金
四川省博士后科研基金资助