摘要
缺失数据处理通常基于统计学的方法,在数据预处理阶段对缺失值进行填补,其效率和准确性并不高。因此,提出了一种基于模糊C均值(FCM)聚类的嵌入式填充方法(FCMSI)。此算法通过平均比率法(ARM)对稀疏数据进行初始化填充;采用局部距离策略对FCM进行改进,并对数据进行聚类;将缺失数据作为变量,在每次聚类迭代后的簇内采用协同过滤(CF)的思想对变量值进行替换,直到结果收敛。利用UCI标准数据集进行对比实验,并采用三种不同评价指标衡量,验证了FCMSI方法比传统填充方法性能显著提高。
The missing data are usually filled by statistical method in the data preprocessing stage.The efficiency and accuracy are not high enough for practical application.Therefore,an Fuzzy C-Means based space data imputation(FCMSI)is proposed based on Fuzzy C-means(FCM)clustering.The sparse data are initialized by the average ratio method(ARM).The FCM is improved by using the local distance strategy to cluster the data.The missing data are taken as variables and will be replaced based on the idea of collaborative filtering(CF)in each cluster after clustering iteration until the results converge.The UCI standard data set isused to carry out comparative experiments.Three different evaluation indexes are used to measure the performance of FCMSI method.The results prove that the performance of FCMSI method is significantly improved compared with the traditional filling method.
作者
张楷卉
李鹏
ZHANG Kaihui;LI Peng(College of Entrepreneurship Education,Heilongjiang University,Harbin 150080,China;School of Software and Microelectronics,Harbin University of Science and Technology,Harbin 150080,China)
出处
《黑龙江大学自然科学学报》
CAS
2019年第6期750-756,共7页
Journal of Natural Science of Heilongjiang University
基金
国家自然科学基金资助项目(61103149)
黑龙江省普通高校基本科研业务费专项资金资助(LGYC2018JQ003)。
关键词
缺失数据填充
稀疏数据
模糊C均值聚类
协同过滤
missing data filling
sparse data
fuzzy C-means clustering
collaborative filtering