摘要
在无监督学习中,k均值聚类以其快速简单的特点得到了广泛的应用。EM算法是针对缺失数据的一种统计学习方法。然而,k均值和EM这两种不同领域的算法在思想上却有着一致的地方。本文分析了k均值中蕴含的EM思想,指出了k均值中样本隶属度更新和类中心更新与EM算法中的E步和M步的等价性。最后,利用R语言矩阵化运算的特点,介绍在如何在R语言中高效地实现k均值聚类算法。
In unsupervised learning, k-means clustering is widely applied in many fields due to the fact that it is very simple and fast. EM algorithm is a statistical learning approach for missing data. Although these two methods are applied in different areas, they are similar in terms of some ideas. The principle of EM implied in k-means clustering is analyzed in this paper. The equality between the two steps in k-means (the update of membership and the update of prototypes) and the E and M steps in EM algorithms is pointed out.
出处
《科技视界》
2015年第17期143-144,共2页
Science & Technology Vision
关键词
K均值
EM算法
聚类分析
k-means
EM algorithm
Clustering analysis