摘要
通过对K均值算法进行优化形成OKMI估计算法、小样本模糊预聚类,排除噪声数据干扰,让数据实例按照更为精准的导向进行聚类,进而产生预测值,用以辅助完成数据质量准确性和完整性的剖析和整改提升。通过对运营监测(控)中心的实际数据源进行实验分析,验证了OKMI估计算法的有效性。
The data quality is poor and lack of data quality management capacity in Utility industry. Base on the data life cycle, a closed-loop data quality control framework is proposed for SGCC Operation Monitoring Center, which de- scribes a comprehensive definition, profiling, metrics, enhance of data quality, and achieves all-round management of data quality. Meanwhile, this paper focuses on an algorithm of a fuzzy clustering approach for missing value impu- tation with noisy data immunity. The OKMI (Optimized K-Means Imputation ) method aggregates data instances to more accurate clusters for further appropriate estimation via information entropy after resampling pre-clustering and outlier test. The effectiveness of experimental results in SGCC Operational Monitoring Center demonstrate that the 0KMI proposed obtains higher precision both on quantitative and on nominal attributive missing value completion than other classic methods under all missingness mechanisms at varving missing rates with abnormal values.
出处
《华东电力》
北大核心
2013年第3期546-549,共4页
East China Electric Power