摘要
实际应用中大量的不完整的数据集,造成了数据中信息的丢失和分析的不方便,所以对缺失数据的处理已经成为目前分类领域研究的热点。由于EM方法随机选取初始代表簇中心会导致聚类不稳定,本文使用朴素贝叶斯算法的分类结果作为EM算法的初始使用范围,然后按E步M步反复求精,利用得到的最大化值填充缺失数据。实验结果表明,本文的算法加强了聚类的稳定性,具有更好的数据填充效果。
Dataset with missing values is quite common in real applications. It is a big problem of data pretreatment, and handling missing values has become a research hot issue. EM chooses the center of cluster randomly leading to cluster irregularly, so this paper uses the result of Na lye Bayesian as the initial range of EM, then refines the value reduplicative, finally gets the excepted maximize value. The research result suggests that this algorithm improved the level of cluster and had a better data makeup result.
出处
《微型机与应用》
2011年第16期75-77,81,共4页
Microcomputer & Its Applications
关键词
数据填充
EM算法
朴素贝叶斯算法
missing values implement
EM algorithm
Naive Bayesian algorithm